Raw sequencing was processed and aligned to the Mus musculus genome assembly (mm10) using Cell Ranger software (v3, 10× Genomics). Subsequent quality control and secondary analysis steps were carried out using Seurat and cells with high mitochondria content (10% of total reads) were removed. Cells with very high RNA or gene content (doublets) were also excluded from downstream analysis. Technical variations such as sequencing depth, proportion of mitochondrial transcripts, and differences in cell cycle states (dividing versus nondividing) were regressed out during data normalization and scaling. For each sample, cells with similar transcriptomic profiles were grouped into specific clusters by a shared nearest neighbor (SNN) modularity optimization-based clustering algorithm. We assigned cell type identities to clusters of interest based on canonical markers.

Samples from all 3 captures (1 from BM, 2 from lungs) were integrated to investigate shared cell states across multiple data sets. Differentially expressed genes for identity classes were identified using Wilcoxon’s rank sum test (Seurat FindMarkers default). Markers that are specific to each identity were then submitted to enrichR (34, 35) for gene ontology analysis.