MERFISH Input Tutorial
Required files
Filename / pattern |
Required |
Format |
Description |
|---|---|---|---|
|
required |
CSV |
Counts file. |
|
required |
CSV |
Transcript file. |
|
required |
Parquet |
Cell polygon boundaries. |
|
required |
Parquet |
Per-cell metadata file. |
|
No |
TIFF/OME.TIFF/OME-XML |
Includes TIFF image information. The exact file set may vary across MERFISH versions; please refer to the MERFISH documentation. If no image information is available, Spatialsnake will skip these files by default. Please ensure your data are from MERFISH version 2 or later. |
Following the output directory structure of the Vizgen MERFISH platform, organize the downloaded files under the data/ directory. Replace region_0 with your own sample name.
For example:
project_root/
├── data/ (stores your raw data)
│ └── region_0/
├── sample.txt (key sample description file)
└── results/ (stores analysis outputs)
data/
└── region_0/
├── cell_by_gene.csv
├── detected_transcripts.csv.gz
├── ......
└── images/
├── morphology_mip.ome.tif
├── ......
└── manifest.json
Because 3D multi-slice technology and the associated analysis pipeline are still under development, please select a single region for analysis at this stage. We will update this tutorial as established scientific and technical advances become available.
Where these files come from
Official download: standard Vizgen MERSCOPE/MERFISH output directory
Experimental output: cell and transcript files exported by a laboratory MERFISH pipeline
Placeholder usage: you can first write
/path/to/merfish_sampleand replace it later with the real directory
Demo Walkthrough
run_type: Merfish. In this tutorial, we assume a standard MERFISH output directory prepared from a public or in-house Vizgen-style dataset.
One public example release is available from the Vizgen Breast Cancer Tissue Microarray Region_R1:
Vizgen MERFISH Breast Cancer Dataset
For this demo, we use the Region_R1 Breast Cancer Tissue Microarray data from the download link above. Because the dataset is large and contains multiple files stored in a public repository layout, please download and organize the files following the expected directory hierarchy. Make sure you have already created the basic working directory as described in the earlier tutorial.
Example setup:
mkdir -p project_root/data/Mouse_Brain
cd project_root/data/Mouse_Brain
After download, the sample directory should match the layout shown below.
Example directory layout
project_root/
├── data/ (stores your raw data)
│ └── Breast_Cancer/
├── sample.txt (key sample description file)
└── results/ (stores analysis outputs)
data/
└── Breast_Cancer/
├── cell_by_gene.csv
├── detected_transcripts.csv.gz
├── ......
└── images/
├── morphology_mip.ome.tif
├── ......
└── manifest.json
single_analysis:
sample_id input_path
Mouse_Brain data/Mouse_Brain
Note
If you intend to perform multi-sample analysis, please first complete the content on this page, then refer to Spatialsnake for multi-sample integration for configuration and command instructions. Compared with single-sample analysis, multi-sample analysis requires multiple sample data directories and a sample.txt with additional rows. As with single-sample mode, the data path and sample_id for each sample must match the entries in sample.txt.
Below is the minimal command to run the data ingestion module. For additional parameter configuration, please refer to the corresponding YAML reference at integrate.yaml Reference.
spatialsnake single_analysis sample.txt Merfish --option=integrate
Output structure after ingestion
project_root/
├── data/ (stores your raw data)
│ └── Breast_Cancer/
├── sample.txt (key sample description file)
├──log/
└── results/ (stores analysis outputs)
results/
└── Breast_Cancer/
└── integrate/
├── Breast_Cancer.zarr
├── total.png
├── total_umi_by_sample.png
├── total_genes_by_sample.png
├── genes_by_sample.png
└── scatter.png
Main output:
results/<sample>/integrate/<sample>.zarrAdditional output for comparison analysis:
results/merge_data/integrate/concatenated_sdata.zarrAdditional QC plots: single-sample ingestion writes five QC figures into the
integratedirectory. These files are generated in practice even though they are not explicitly declared in the Snakemakeoutputsection.
You have now ingested your data into a zarr object. For the subsequent core analysis, please refer to Core Analysis Workflow. We recommend starting with the example dataset to gain hands-on experience with the basic core-analysis workflow. If you prefer to proceed directly with your own data, each step page begins with a concise summary of the essential parameters.
Simply follow the tutorial to update the sample name and platform-specific parameters, then continue with the next step: Preprocessing.
If you want to run multi-sample integration analysis, continue to Spatialsnake for multi-sample integration.