Splitting Tool (`splitting`)

splitting is used to divide one spatial transcriptomics object into multiple smaller and easier-to-analyze subobjects.

Split by cell type or cluster, often for downstream subcluster analysis
Split by sample or experimental group, often for multi-sample organization before comparison
Split by ROI tables, often after lasso selection in Loupe or Xenium Explorer
Crop by image coordinates, often for focused local-region analysis

This page focuses on splitting zarr objects. For the configuration reference, see splitting.yaml Reference.

Typical use cases

You have completed core_analysis and want to isolate one major cell class, such as Tumor, for reclustering.
You have an integrated multi-sample object and want to separate it by sample or group.
You selected ROIs in external software and want to import the CSV files for direct batch splitting.
You want to focus on a local tissue region, or one slide contains multiple samples that should be separated by coordinates.

Before you start

The input object should be a .zarr path, ideally from integrate, preprocess, clustering, or annotation.
You know which field should be used for splitting, for example celltype, clusters, sample, or group.
The directory containing sample.txt matches your current working directory, or you use absolute paths in the command.

General command template

spatialsnake useful_tool --option=splitting <INPUT_ZARR_PATH> --split_by=<mode> --output_dir=results/useful_results

Scenario 1: split by cell type or cluster

Use this mode when you want to isolate one or more cell classes for further analysis, for example to refine the Tumor compartment in the example dataset.

spatialsnake useful_tool --option=splitting results/Colon_Cancer_P2_008um/annotation/Colon_Cancer_P2.zarr --split_by=celltype --barcodes=Tumor

You can also select multiple labels at once using commas, for example --barcodes=Tumor,B_cell.

spatialsnake useful_tool --option=splitting results/Colon_Cancer_P2_008um/annotation/Colon_Cancer_P2.zarr --split_by=celltype --barcodes=Tumor,Fibroblast --output_dir=results/useful_results

If --barcodes is not provided, the tool exports one object for every category in the selected field.

Output naming rules:

Without --barcodes: cluster_<label>.zarr
With --barcodes: celltype_selected_<label1_label2>.zarr or clusters_selected_<id>.zarr

Scenario 2: split by sample or group

This mode is used to split an integrated object by sample, region, or group.

spatialsnake useful_tool --option=splitting results/merge_data/integrate/concatenated_sdata --split_by=sample --output_dir=results/useful_results

Split by experimental group:

spatialsnake useful_tool --option=splitting results/merge_data/integrate/concatenated_sdata --split_by=group --output_dir=results/useful_results

Output naming rules:

split_by=sample or region: export <sample_name>.zarr by coordinate system
split_by=group: export group_<group_name>.zarr

Scenario 3: split by ROI CSV from Loupe or Xenium Explorer

How to select ROIs in Loupe

Import the Loupe file generated from Space Ranger into Loupe Browser.
Use the lasso tool to select a region.

Confirm the region command

Export the CSV file

How to select ROIs in Xenium Explorer

Import the folder generated by Xenium Ranger into Xenium Explorer.

Use the lasso tool to select a region and export it.

Import csv file for splitting.

spatialsnake useful_tool --option=splitting results/merge_data/integrate/concatenated_sdata --split_by=ROI --roi_csv= [path_to_csv]

roi_csv can be a single CSV file or a directory. If a directory is provided, multiple CSV files inside it are merged automatically.

The CSV file must contain at least a cell_id column. The ROI name column can be roi, region, sample, or group. If no ROI name column is found, the tool uses the CSV filename as the ROI name.

Output naming rule:

ROI_<ROI_name>.zarr

Scenario 4: crop by image coordinates

This mode crops a tissue region using a coordinate box. The figures generated in annotation_help include coordinate information that can be used as a reference. This approach is suitable only for rectangular regions. For more irregular or detailed ROI shapes, use Loupe or Xenium Explorer instead.

spatialsnake useful_tool --option=splitting results/Colon_Cancer_P2_008um/annotation/Colon_Cancer_P2.zarr --split_by=image --shape_elements=Colon_Cancer_P2 --min_x=0 --max_x=2000 --min_y=0 --max_y=2000 --output_dir=results/useful_results

Outputs:

spatial<min_x>_<max_x>_<min_y>_<max_y>.zarr (cropped subobject)
<coordinate_id>_shape.png (image plus shape visualization for that region)

Key parameters in practice

Parameter	Typical values	Description
`--split_by`	`celltype` / `clusters` / `sample` / `group` / `ROI` / `image`	Selects the splitting dimension and is the most important parameter
`--barcodes`	`Tumor` or `0,1,2`	Export only selected categories; if omitted, all categories in the field are exported
`--roi_csv`	`roi_tables` or `roi1.csv`	CSV file or directory used for ROI-based splitting
`--shape_elements`	`Colon_Cancer_P2`	Target layer or sample coordinate system for image-based splitting
`--min_x --max_x --min_y --max_y`	`0 2000 0 2000`	Coordinate boundaries used for image cropping
`--output_dir`	`results/useful_results`	Output directory for split results

Common errors and how to fix them

values not found in <field_name>
- Cause: the values provided in --barcodes do not exist in the chosen field.
- Fix: check spelling and letter case, then confirm whether the correct field is celltype or clusters.
group column not found in table obs
- Cause: the object does not contain a group column.
- Fix: use sample or region instead, or add group information in the upstream workflow.
cell_id column not found
- Cause: the ROI CSV does not contain a recognizable cell ID column.
- Fix: rename the column to cell_id or another accepted form such as Cell ID or barcode.

Suggested next steps

If you split out a cell population of interest, continue to Secondary Clustering (reclustering) for subcluster refinement.
If you split a multi-sample object, proceed to the downstream modules to compare the biological differences among the resulting subobjects.

Splitting Tool (splitting)