Annotation Support
Based on the clustering results, annotation_help performs marker gene statistics, enrichment analysis, and spatial cluster visualization to provide interpretable biological evidence for the downstream annotation step.
In a single-sample setting, this stage helps identify candidate cell types for each cluster. In an integrated multi-sample setting, it also helps assess whether marker and pathway results are influenced by sample composition.
This stage combines both Python and R tools. If the R environment has not yet been configured, use conda to install relevant R packages:
Workflow overview
Read the output object from
clusteringtogether with theclusterslabels.Compute differential marker genes by cluster and export both a combined table and per-cluster tables.
Generate marker dotplots, sample-by-cluster proportion plots, and spatial overlay figures.
Run KEGG enrichment analysis based on the marker genes and export pathway results.
Write all annotation support outputs into the
clusteringdirectory so that they can be used directly byannotation.
Step 1: Configure sample.txt
You can directly reuse the same sample.txt configuration file from the integrate step; no modifications are needed.
sample_id input_path
sample_id data/sample_id
Step 2: Parameter Selection and Configuration
This step includes several important parameters. Please adjust them according to your needs. Below are some key parameters and their functions:
Parameter |
Example |
Description |
|---|---|---|
|
|
Marker statistics method; |
|
|
Species background used for enrichment analysis, typically |
These parameters are passed directly to annotation_help and the enrichment workflow. If you want to switch strategies quickly, append them to the command, for example --markers_algorithm=t-test --species=mouse.
- Configuration recommendations:
This step performs differential expression analysis between clusters and gene set enrichment analysis to characterize the expression signatures of each cluster and assist in the interpretation of cluster identity and cell-type assignment.
The default parameters are generally sufficient. If you wish to customize the configuration, consider the
markers_algorithmandspeciesparameters to specify the differential analysis method and the species background. Both human and mouse are supported.If your study involves other species or requires additional parameters for more detailed differential gene analysis, please contact us.
Parameter configuration methods:
1.The parameters listed above are commonly used settings that can be passed directly on the command line.
If you are comfortable tuning spatial transcriptomics workflows, you can append them to the command as needed, for example --markers_algorithm=t-test --species=mouse.
2. Optional parameters through a configuration file. See annotation_help.yaml Reference for the full parameter reference. Generate a YAML template with:
The YAML file contains inline comments describing each parameter. You can adjust the settings according to your analysis needs, or consult the corresponding configuration reference page for detailed explanations.
After editing the configuration file, provide it on the command line with --configfile.
spatialsnake produce-file --option=annotation_help
Step 3: Run the Command
Based on the command-line introductions in previous tutorials, you should now be familiar with the logic for setting key parameters in Spatialsnake. Here we only demonstrate running the annotation_help command. If you are working with multi-sample integration data or another platform, simply modify the relevant parameters accordingly.
Remember to replace the example values with your chosen parameter settings or append them to the end of the command.
For the example dataset, we use --markers_algorithm=t-test --species=human for single_analysis or visium_HD.
spatialsnake single_analysis sample.txt visium_HD --option=annotation_help --markers_algorithm=t-test --species=mouse
Run with a YAML file. Remember to save the edited YAML file before execution. No additional command-line arguments are required; if you do provide them, they will override the YAML values.
spatialsnake single_analysis sample.txt visium_HD --option=annotation_help --configfile=annotation_help.yaml
Demo for annotation_help with visium_HD
We use the Colon_Cancer_P2_008um data from the previous step for this annotation_help demonstration.
The same sample.txt can be reused from the earlier analysis steps to maintain a consistent core analysis on the same sample.
sample_id input_path bin
Colon_Cancer_P2 data/Colon_Cancer_P2 8
spatialsnake single_analysis sample.txt visium_HD --option=annotation_help
If you prefer YAML-based configuration for more detailed parameter control:
# Generate and edit the YAML file
spatialsnake produce-file --option=annotation_help
spatialsnake single_analysis sample.txt visium_HD --option=annotation_help --configfile=annotation_help.yaml
Result file structure
This example shows single-sample annotation support for visium_HD. Before moving to manual annotation, first confirm that marker_genes_pval.csv and kegg_data.csv have been generated.
The workflow creates one subdirectory for each cluster, containing its differential marker table, KEGG enrichment results, spatial visualization outputs, and related files, making it easier to interpret the characteristic features of each candidate cell type.
results/
└── Colon_Cancer_P2_008um/
└── clustering/
├── marker_genes_pval.csv
├── kegg_data.csv
├── Colon_Cancer_P2rank_genes_groups_dotplot.png
├── Clusters_proportion.png
├── Colon_Cancer_P2_hires_image_cluster.png
├── [cluster_id]/
│ └── cluster_[cluster_id].csv
└── clusters.csv
Note
The main core analysis workflow is now complete. The annotation support results are stored in the clustering directory. Next, continue to Annotation Modules for manual annotation or algorithm-based annotation.
Key parameter recommendations
Parameter category |
Recommendation for single-sample analysis |
Recommendation for multi-sample or cross-condition analysis |
|---|---|---|
|
|
Use the same method across all samples to reduce comparison bias |
|
Match the species of the dataset, usually |
Keep this consistent across all samples or enrichment results will not be directly comparable |
|
Usually keep the default settings and complete the global analysis first |
For integrated objects, use a consistent layer type to avoid interpretation shifts caused by different visualization baselines |
|
Usually keep it disabled to inspect the full structure first |
Enable it only when focusing on a target region, and record the crop coordinates for reproducibility |
Input and output structure
After clustering is complete, you can usually reuse the same sample.txt file for annotation_help.
Analysis mode |
Input |
Output |
|---|---|---|
single_analysis (standard |
|
|
single_analysis ( |
|
|
compare_analysis |
|
|
How to inspect the results
This step produces a spatial cluster map showing the tissue distribution of each cluster. The figure panels typically include the original H&E-stained image together with the mapped cluster assignments, allowing you to interpret the spatial organization of the clusters in histological context.
The marker dotplot summarizes the strength and specificity of representative genes across clusters. You can combine it with the exported CSV tables to determine the defining markers for each cluster before annotation.
Each cluster-specific directory also contains differential marker tables and GO/KEGG enrichment results, helping you interpret the most enriched pathways for that cluster.
Other outputs
marker_genes_pval.csv(combined marker table)This is the main evidence table for annotation and summarizes the most representative genes for each cluster.
Prioritize gene combinations that appear consistently and agree with known cell-type characteristics.
In most cases, this table forms the primary basis for assigning cell-type names.
<cluster_id>/cluster_<cluster_id>.csv(per-cluster marker table)This file isolates each cluster into its own table for detailed inspection.
Cluster-specific review makes it easier to detect mixed signatures.
If one cluster contains multiple incompatible signals, revisit the upstream clustering settings and consider whether the clustering is too coarse.
kegg_data.csv(pathway enrichment)This table provides additional evidence about the biological processes associated with each cluster.
Pathway results should be interpreted together with marker genes rather than used alone for annotation.
If pathway and marker evidence clearly conflict, re-evaluate clustering quality before assigning labels.
{sample}rank_genes_groups_dotplot.png(global expression pattern overview)This plot helps you quickly see which genes are strongest in which clusters.
If some genes are clearly enriched in only one target cluster, annotation confidence is usually higher.
If several clusters share similar expression patterns, finer subclustering may still be useful.
Clusters_proportion.pngand[image]_Clusters.png(composition and spatial distribution)The cluster proportion plot shows compositional differences across samples or regions.
The spatial overlay figure helps verify whether cluster locations are coherent and consistent with tissue structure.
When composition patterns and spatial localization support the same interpretation, the data are usually ready for manual annotation.
Continue to Annotation Modules.
Note
The main core analysis workflow is now complete, and the annotation support results are stored in the clustering directory.
At this point, it is common to inspect the marker genes and enrichment results for each cluster and infer likely cell identities with the help of resources such as CellMarker or PanglaoDB.
Then continue to Annotation Modules for manual annotation or algorithm-based annotation.