Visium Segment Input Tutorial

Required files

Filename / pattern	Required	Format	Description
`segmented_outputs/spatial/tissue_hires_image.png`	Yes	PNG	High-resolution image corresponding to the segmentation coordinates
`segmented_outputs/spatial/scalefactors_json.json`	Yes	JSON	Image scale factors
`segmented_outputs/cell_segmentations.geojson`	Yes	GeoJSON	Cell segmentation polygons
`segmented_outputs/filtered_feature_bc_matrix.h5` or `segmented_outputs/raw_feature_bc_matrix.h5`	Yes	H5	Main expression matrix
`segmented_outputs/cell_feature_matrix.h5` / `segmented_outputs/filtered_feature_cell_matrix.h5` / `segmented_outputs/raw_feature_cell_matrix.h5`	No	H5	Alternative compatible matrix filenames

Where these files come from

Official download: segmented_outputs generated by the 10x Visium segmentation workflow
Experimental output: files produced by the image segmentation pipeline
Placeholder usage: you can first write data/S1 and replace it later with the actual sample directory

Leveraging the cell segmentation algorithm included in Space Ranger v4 by 10x Genomics, Spatialsnake provides a dedicated ingestion channel based on the segmentation output structure to facilitate downstream analysis. The data layout must conform to the structure shown below.

project_root/
├── data/ (stores your raw data)
│   └── {sample_id}/
├── sample.txt (key sample description file)
├── results/ (stores analysis outputs)
└── <analysis_option>.yaml (optional configuration file)

data/
└── {sample_id}/
    └── segmented_outputs/
        ├── filtered_feature_bc_matrix.h5
        ├── cell_segmentations.geojson
        └── spatial/
            ├── tissue_hires_image.png
            └── scalefactors_json.json

Demo Dataset Walkthrough

run_type: visium_segment. In this tutorial, we use the cell segmentation output from the public CRC P2 dataset provided by 10x Genomics. These files are generated automatically by Space Ranger v4.

Dataset link: Visium Segmentation Demo (multisample_raw_data.tar.gz)

Before running Spatialsnake, create the project directory, download the archive, and extract the segmentation output into a folder named Colon_Cancer_P2 so that the final directory contains segmented_outputs directly under the sample folder.

Example setup:

mkdir -p project_root/data/Colon_Cancer_P2/segmented_outputs
cd project_root/data/Colon_Cancer_P2/segmented_outputs

curl -L -o multisample_raw_data.tar.gz https://cf.10xgenomics.com/supp/spatial-exp/analysis-workshop/multisample_raw_data.tar.gz
tar -xf multisample_raw_data.tar.gz
mkdir -p spatial
mv tissue_hires_image.png spatial/tissue_hires_image.png
mv scalefactors_json.json spatial/scalefactors_json.json

After extraction, the sample directory should match the layout shown below. move the spatial files into the data/Colon_Cancer_P2/segmented_outputs/spatial folder.

Example directory layout

project_root/
├── data/ (stores your raw data)
│   └── Colon_Cancer_P2/
├── sample.txt (key sample description file)
├── results/ (stores analysis outputs)
└── <analysis_option>.yaml (optional configuration file)

data/
└── Colon_Cancer_P2/
    └── segmented_outputs/
        ├── filtered_feature_bc_matrix.h5
        ├── cell_segmentations.geojson
        └── spatial/
            ├── tissue_hires_image.png
            └── scalefactors_json.json

single_analysis:

sample_id input_path
Colon_Cancer_P2 data/Colon_Cancer_P2

Make sure sample.txt is located in your current working directory.

spatialsnake single_analysis sample.txt visium_segment --option=integrate

Result file structure

results/
├── Colon_Cancer_P2/
    └── integrate/
        ├── Colon_Cancer_P2.zarr # zarr-formatted data
        ├── total.png # histogram of total expression
        ├── total_umi_by_sample.png # histogram of total UMI counts by sample
        ├── total_genes_by_sample.png # histogram of detected genes by sample
        ├── genes_by_sample.png # histogram of mitochondrial signal by sample
        └── scatter.png # scatter plot of total expression versus gene counts

Main output: results/<sample>/integrate/<sample>.zarr
Additional output for comparison analysis: results/merge_data/integrate/concatenated_sdata
Additional QC plots: the ingestion script writes five QC figures into the integrate directory. These files are generated during execution even though they are not explicitly listed one by one in the Snakemake output declaration.

You have now ingested your data into a zarr object. For the subsequent core analysis, please refer to Core Analysis Workflow. We recommend starting with the example dataset to gain hands-on experience with the basic core-analysis workflow. If you prefer to proceed directly with your own data, each step page begins with a concise summary of the essential parameters. Simply follow the tutorial to update the sample name and platform-specific parameters, then continue with the next step: Preprocessing. If you want to run multi-sample integration analysis, continue to Spatialsnake for multi-sample integration.