Spatialsnake for multi-sample integration
Before reading this section, make sure you have completed at least one tutorial in Select your data platform so that you already know how to prepare sample.txt and run Spatialsnake.
This tutorial is intended for studies with multiple spatial transcriptomics datasets from different experimental conditions, such as different tumor types or different normal tissue types.
The integrated object produced here can be used for the same downstream analyses as a single-sample object, although some result files and interpretations differ.
Multi-sample Integration Workflow
This page provides a basic multi-sample integration tutorial. For detailed core_analysis guidance, please refer to the corresponding tutorial pages. This section applies to data from all supported platforms. Please ensure you have already read the ingestion tutorial for your target platform and have a basic understanding of the ingestion step; this tutorial will not repeat that content.
- Step 1: Data download and storage paths
The Spatialsnake multi-sample ingestion step supports integrating samples from two different experimental conditions. The data path and
sample_idfor each sample must match the corresponding entries insample.txt. Multiple biological replicates are supported within each group. Regardless of the platform, the correct directory layout is:
# Sample IDs can be customized and are independent of group assignment
project_root/
├── data/
│ ├── Normal_1/
│ ├── Normal_2/
│ ├── Normal_3/
│ ├── cancer_1/
│ ├── cancer_2/
│ └── cancer_3/
├── sample.txt
└── results/
- Step 2: Configure
sample.txt Similarly, we use the sample file table to record the sample ID, input directory, and group information.
sample_id input_path group
Normal_1 data/Normal_1 Group1
Normal_2 data/Normal_2 Group1
Normal_3 data/Normal_3 Group1
cancer_1 data/ST8059051 Group2
cancer_2 data/ST8059052 Group2
cancer_3 data/ST8059053 Group2
- Step 3: Run the command
Unlike
single_analysis, multi-sample analysis requires multiple sample directories and asample.txtwith corresponding entries, and some command parameters differ.
For example, the compare_analysis integration command for Visium:
spatialsnake compare_analysis sample.txt visium --option=integrate
Summary:
This concludes the basic overview of multi-sample integration in Spatialsnake. For specific analysis steps, compare_analysis supports different parameter configurations, such as batch_method, to accommodate various experimental analysis needs.
The most important takeaway is: once sample.txt is correctly configured, run the compare_analysis command with the appropriate platform type, and you can proceed with multi-sample integration just as smoothly as with the single-sample workflow.
Below, we provide a detailed demonstration using Visium platform data. If you have already completed this step with your own data, you may skip this demo and proceed directly to the core_analysis and preprocess steps.
Important
The demo dataset used in this section will also be used in subsequent analyses; please use it as appropriate.
Demo Walkthrough
Multi-sample analysis typically combines several datasets. For a concise demonstration, we use multiple Visium mouse brain sections from a public dataset:
We use the demo data for a minimal core_analysis walkthrough that covers data ingestion, preprocessing, and annotation. This tutorial helps you intuitively understand the key differences between multi-sample and single-sample analyses. For detailed module functionality, parameter settings, and implementation of analysis features, please read the introductory tutorial at the beginning of each module. You can also use our Visium HD example dataset for a more comprehensive hands-on learning experience.
This tutorial uses five public samples:
Group 1:
ST8059048,ST8059049,ST8059050Group 2:
ST8059051,ST8059052
If you are using your own dataset rather than the example shown here, the main fields you need to replace are:
the sample IDs in the download script and in
sample.txtthe
input_pathvalues insample.txtthe biological group labels in the
groupcolumnthe platform type in the command if your data are not
visium
Step 1: create a download script in the project root
First make sure the current working directory contains a data/ folder, then create a script named download.sh.
This script downloads one expression matrix and one spatial archive for each sample and expands them directly into sample-specific directories under data/.
If you are working with another public dataset, keep the same folder logic but replace the sample IDs and download URLs.
#!/usr/bin/env bash
set -euo pipefail
ids=(ST8059048 ST8059049 ST8059050 ST8059051 ST8059052)
for id in "${ids[@]}"; do
mkdir -p "data/${id}"
cd "data/${id}"
wget -c "https://ftp.ebi.ac.uk/biostudies/fire/E-MTAB-/114/E-MTAB-11114/Files/${id}_filtered_feature_bc_matrix.h5"
wget -c "https://ftp.ebi.ac.uk/biostudies/fire/E-MTAB-/114/E-MTAB-11114/Files/${id}_spatial.tar.gz"
tar -xvzf "${id}_spatial.tar.gz"
cd - >/dev/null
done
Step 2: make the script executable and run it
chmod +x download.sh
./download.sh
After the script finishes, the input directory should follow the structure below. This is the folder layout that Spatialsnake expects when reading multi-sample Visium data.
Example input directory layout
project_root/
├── data/
│ ├── ST8059048/
│ │ ├── ST8059048_filtered_feature_bc_matrix.h5
│ │ └── spatial/
│ │ ├── tissue_positions_list.csv
│ │ ├── scalefactors_json.json
│ │ ├── tissue_lowres_image.png
│ │ └── tissue_hires_image.png
│ ├── ST8059049/
│ │ ├── ST8059049_filtered_feature_bc_matrix.h5
│ │ └── spatial/
│ ├── ST8059050/
│ │ ├── ST8059050_filtered_feature_bc_matrix.h5
│ │ └── spatial/
│ ├── ST8059051/
│ │ ├── ST8059051_filtered_feature_bc_matrix.h5
│ │ └── spatial/
│ └── ST8059052/
│ ├── ST8059052_filtered_feature_bc_matrix.h5
│ └── spatial/
├── sample.txt
└── results/
If you are using your own data, keep the same two-level organization:
one folder per sample under
data/one expression matrix and one
spatial/directory inside each sample folder
Step 3: prepare sample.txt with group information
For multi-sample integration, sample.txt is similar to the single-sample version but must also include a group column.
The group field defines the biological condition used later in comparison-oriented analyses, so it should reflect your real experimental design rather than arbitrary batch labels.
If you are using another dataset, replace both the sample IDs and the paths, and rename Group1 and Group2 to meaningful condition names such as Control and Disease.
sample_id input_path group
ST8059048 data/ST8059048 Group1
ST8059049 data/ST8059049 Group1
ST8059050 data/ST8059050 Group1
ST8059051 data/ST8059051 Group2
ST8059052 data/ST8059052 Group2
Step 4: run integration and merge the samples
spatialsnake compare_analysis sample.txt visium --option=integrate
At this stage, Spatialsnake reads all sample folders listed in sample.txt, standardizes them into one common object representation, and writes the merged result for downstream preprocessing.
If your data come from another platform, replace visium with the corresponding run_type.
The result structure is similar to the single-sample workflow:
results/
└── merge_data/
└── integrate/
└── concatenated_sdata
...........
Step 5: preprocess the integrated object
Single-sample analyses often do not require explicit batch correction, but for multi-sample analysis it is usually worth considering:
spatialsnake compare_analysis sample.txt visium --option=preprocess --min_genes=100 --min_cells=100 --batch_method=harmony
Step 6: continue with the steps
After preprocessing, the remaining steps follow the standard compare_analysis workflow.
The pipeline will generate joint visualizations and result tables across samples.
Because the data are integrated, manual annotation is usually performed on the shared cluster identities across all samples.
spatialsnake compare_analysis sample.txt visium --option=clustering --resolution=0.8 --pcs=20
spatialsnake compare_analysis sample.txt visium --option=annotation_help --species=mouse
For this mouse brain example, we provide only a coarse broad-region annotation. The labels are for tutorial demonstration only and are not intended as a formal reproduction of the original study annotation.
sample 0 1 2 3 4 5.........please input anno by order of cluster
thalamus,cortex,cortex,amygdala,hypothalamus,hypothalamus,striatum,cortex,cortex,white matter,hypothalamus,thalamus,hippocampus,hippocampus,hippocampus,piriform_cortex,cortex,cortex,cortex,cortex,cortex,cortex,cortex,amygdala,thalamus,thalamus
spatialsnake compare_analysis sample.txt visium --option=annotation --anno_algorithm=manual --annotation-file=annotation.txt
For your own data, replace these example labels with the cell types or tissue regions inferred from your own clustering and marker results.
You have now generated an integrated spatial transcriptomics object. The remaining analysis steps are similar to the single-sample workflow, with a few multi-sample-specific details explained on each step page.
Continue to Core Analysis Workflow.
Example visualizations are discussed in the downstream tutorial pages.