Merge Tool (merge)
merge is used either to combine multiple zarr objects into one object or to write external annotation results back into a base object.
For users without a programming background, it can be understood as a tool for “multi-object concatenation plus annotation-field write-back”.
For the configuration reference, see merge.yaml Reference.
Typical use cases
You want to merge multiple sample-level analysis results into one combined object for comparative analysis.
You want to merge multiple subset objects back into one object for downstream processing.
You have completed subcluster annotation and want to write
celltype_annotations.csvback into the original parent object.
Before you start
Make sure that:
All input objects are valid
.zarrpaths.For reannotation write-back, the CSV contains at least a cell ID column and an annotation label column.
The command is executed from the correct working directory, or absolute paths are used.
Command template
spatialsnake useful_tool --option=merge <INPUT1> <INPUT2> ... --merge_by=<mode> --output_dir=results/useful_results
Scenario 1: merge by sample
Use this mode to combine multiple sample objects into one concatenated_sdata.zarr object.
spatialsnake useful_tool --option=merge results/S1/annotation/S1.zarr results/S2/annotation/S2.zarr --merge_by=sample --re_sample=True --output_dir=results/useful_results
Notes:
With
--re_sample=True, if the object lacks asamplecolumn, the tool automatically reconstructs it from the filename.The output file is always written as
results/useful_results/concatenated_sdata.zarr.
Scenario 2: merge by cluster labels and reorder them
Use this mode to merge multiple objects by cluster labels and remap cluster IDs into a continuous sequence so that label conflicts are avoided.
spatialsnake useful_tool --option=merge results/S1/annotation/S1.zarr results/S2/annotation/S2.zarr --merge_by=clusters --cluster_key=clusters --reordering=True --output_dir=results/useful_results
If you want to reorder another column, such as celltype, change --cluster_key accordingly:
spatialsnake useful_tool --option=merge results/subset1.zarr results/subset2.zarr --merge_by=clusters --cluster_key=celltype --reordering=True --output_dir=results/useful_results
Notes:
With
--reordering=True, the script remaps labels from each object into a new continuous series following the input order.With
--reordering=False, original labels are preserved, which is suitable when all objects already use a unified label system.
Scenario 3: write external reannotation results back into the base object
Use this mode to write annotation results from an external CSV file back into the original zarr object, which is especially useful after subcluster annotation.
spatialsnake useful_tool --option=merge results/Colon_Cancer_P2_008um/annotation/Colon_Cancer_P2.zarr --merge_by=reannotation --annotation_csv=results/reclustering/celltype_annotations.csv --csv_cell_col=Barcode --csv_label_col=Grouped_Annotation --input_cell_col=cell_id --target_col=sub_celltype --original_celltype_col=celltype --output_dir=results/useful_results
Notes:
--annotation_csvcan be a single CSV file, a directory, or multiple CSV paths separated by commas.The tool matches cells by ID and updates
target_colaccordingly.If no label is found for a cell in the CSV file, the existing value in
target_colis preserved; if that field does not exist, the tool falls back tooriginal_celltype_col.
Key parameters in practice
Parameter |
Typical values |
Description |
|---|---|---|
|
|
Selects the merge mode |
|
|
When |
|
|
When |
|
|
Selects the column used for cluster-label merging or reordering |
|
|
Source of annotation data in |
|
|
Column name in the CSV file used to match cell IDs |
|
|
Column name in the CSV file containing annotation labels |
|
|
Column name in the base |
|
|
Target column used for writing the annotation back |
|
|
Fallback reference column if |
|
|
Output directory |
How to validate the results
Check whether
concatenated_sdata.zarror the expected output object has been generated.Try loading the object in the downstream workflow to confirm that it can be used normally.
In
reannotationmode, verify that the expected new labels appear intarget_col.
Common errors and how to fix them
annotation csv not foundCause: the path given in
--annotation_csvis incorrect.Fix: use an absolute path, or confirm that the directory actually contains CSV files.
required columns not found in <csv>Cause: the CSV file is missing either the cell ID column or the label column.
Fix: check that
--csv_cell_coland--csv_label_colmatch the actual column names.
no tables found in base zarrCause: the input object is invalid or the path is not a valid
zarrobject.Fix: first confirm that the object can be loaded correctly in the upstream workflow.
Suggested next steps
After merging, continue to downstream comparison or visualization modules for cross-sample analysis.
After reannotation write-back, continue with annotation export and result review.