参见作者原研究论文

本实验方案简略版
Aug 2019

本文章节


 

Using RNA Sequencing and Spike-in RNAs to Measure Intracellular Abundance of lncRNAs and mRNAs
应用RNA测序和RNAs峰值检测lncRNAs和mRNAs的细胞内丰度   

引用 收藏 提问与回复 分享您的反馈 Cited by

Abstract

Long noncoding RNAs (lncRNAs) play essential roles in normal physiology and in disease but their mechanisms of action can be challenging to identify. For mechanistic studies, it is often useful to know a lncRNA’s intracellular abundance, i.e., approximately how many molecules of the lncRNA are present in a typical cell of a cell-type of interest. At least two approaches have been used to approximate lncRNA intracellular abundance: single-molecule sensitivity RNA fluorescence in situ hybridization (smFISH) and single-gene, calibrated reverse-transcription followed by quantitative PCR (RT-qPCR). However, like all experimental approaches, these methods have their limitations. smFISH, when analyzed using diffraction-limited microscopy, can underestimate intracellular abundance, especially for lncRNAs that accumulate in focused subcellular regions. Calibrated RT-qPCR may return inaccurate estimates of abundance because individual PCR amplicons spaced across the length of a transcript can vary in their efficiency of reverse transcription. Here, we describe a sequencing-based approach that is straightforward, orthogonal to smFISH and RT-qPCR, and can be used to approximate the intracellular abundance for most expressed long RNAs (lncRNAs and mRNAs) in a cell type of interest. Firstly, the average weight of total RNA per cell for the cell type of interest is estimated by replicate rounds of RNA purification from a known number of cells. Secondly, an rRNA-depletion RNA-Seq protocol is performed after adding spike-in control RNAs to a known quantity of total cellular RNA. Lastly, by comparing read counts per transcript to a standard curve derived from the spiked-in RNAs, the intracellular abundance for each transcript is estimated. The sequencing-based approach provides a powerful complement to existing methods, particularly in situations where it is desirable to quantify the abundance of multiple lncRNAs and/or mRNAs simultaneously.

Keywords: RNA-Seq (RNA-Seq), Ribosomal RNA depletion (核糖体RNA消耗), lncRNA (长链非编码RNA), Xist (Xist), ERCC Spike-In RNAs (ERCC Spike-In RNAs), Transcriptome (转录组), RNA FISH (RNA FISH), smFISH (smFISH)

Background

Long noncoding RNAs (lncRNAs) play essential roles in biology but their mechanisms of action can be difficult to determine (Kopp and Mendell, 2018; Gil and Ulitsky, 2020). Relative to protein-coding genes, lncRNAs can evolve rapidly and are not constrained by codon usage (Cabili et al., 2011; Kutter et al., 2012; Necsulea et al., 2014; Schuler et al., 2014; Washietl et al., 2014; Hezroni et al., 2015; Chen et al., 2016; Ulitsky, 2016). Accordingly, they often lack easily identifiable domains that might otherwise provide insight into their molecular actions. Thus, for a given lncRNA, initial footholds into its molecular mechanism are often gained by examining its sub-cellular localization and its intracellular abundance (i.e., on average, how many molecules of the lncRNA are present in a single cell of a cell type of interest).

To these ends, single-molecule sensitivity RNA fluorescence in situ hybridization (smFISH) has been a boon, providing a convenient and cost-effective way to simultaneously investigate lncRNA sub-cellular localization and intracellular abundance (Cabili et al., 2015; Tsanov et al., 2016; Raj and Rinn, 2019). Nevertheless, while smFISH offers unparalleled benefits in regards to visualizing lncRNA sub-cellular distribution, it does have limitations in regards to estimating lncRNA intracellular abundance. In the simplest form of smFISH, the number of FISH puncta per cell is used as a proxy for a lncRNA’s intracellular abundance. However, especially when smFISH is performed using diffraction-limited microscopy, separate lncRNA molecules that are located in spatial proximity may not be individually resolved; instead, such lncRNAs may appear as single puncta. Thus, particularly for those lncRNAs that accumulate to high concentrations in specific subcellular regions (Chujo and Hirose, 2017; Ninomiya and Hirose, 2020), smFISH may underestimate intracellular abundance. This potential limitation can be overcome by performing smFISH using super-resolution microscopy and carefully quantifying signal intensity within individual puncta (Cerase et al., 2014; Smeets et al., 2014; Sunwoo et al., 2015). However, this latter approach requires high-end equipment and expertise that may not be readily accessible.

Here, we describe a sequencing-based approach that is orthogonal to smFISH and can provide estimates for the intracellular abundance of lncRNAs as well as mRNAs in cultured cells (Figure 1; Schertzer et al., 2019). The approach relies on RNA-Seq and requires minimal expertise beyond the ability to follow standard protocols in molecular biology and bioinformatics. The approach allows for the simultaneous quantitation of the intracellular abundance of all long RNAs (lncRNAs and mRNAs) that are expressed in a cell type of interest.

The sequencing-based approach is also likely to yield estimates of intracellular abundance that are more robust than those produced by single-gene approaches that rely on calibrated reverse transcription followed by gene-specific quantitative PCR (RT-qPCR [Schwaber et al., 2019]). A primary reason for this is because different qPCR amplicons spaced across the length of a transcript may vary dramatically in their efficiency of reverse transcription. In contrast, in RNA-seq, local, intra-transcript variations in reverse transcription efficiency are inherently averaged, owing to the chemical fragmentation of RNA that occurs just prior to reverse transcription of the RNA into cDNA (Hrdlickova et al., 2017). Moreover, in the sequencing-based approach, the use of ERCC Spike-In RNAs, which harbor diverse GC-contents, lengths, and abundances, obviates the need to explicitly estimate reverse transcription efficiency and instead allows transcript abundance to be estimated by comparison to a standard curve (Jiang et al., 2011).

Potential sources of errors in the sequencing-based approach include (1) errors associated with read alignment, which are most relevant if the lncRNA of interest contains sequence that is highly repetitive relative to other positions in the genome, (2) errors associated with transcript isoform uncertainty, which may arise if the predominant isoform of the lncRNA of interest is misannotated in the cell type of interest, and (3) errors associated with inaccurate or variable estimates of the total weight of RNA in the cell type of interest. Nevertheless, recently, the sequencing-based approach and smFISH performed using super-resolution microscopy have been shown to arrive at similar estimates of abundance for the lncRNA Xist (Smeets et al., 2014; Sunwoo et al., 2015; Schertzer et al., 2019), lending confidence that both approaches provide reasonable estimates of RNA abundance. The sequencing-based approach is additionally useful because in a single experiment, it can be used to estimate the abundance of all expressed mRNAs and lncRNAs simultaneously.


Figure 1. Overview of sequencing-based approach to quantify lncRNA and mRNA intracellular abundance

Materials and Reagents

  1. For Cell Counting
    1. Disposable Borosilicate Glass Pasteur Pipets (Fisher Scientific, catalog number: 13-678-20D ), sterilize before use
    2. 5 ml/10 ml sterile serological pipets (e.g., Genesee Scientific, catalog numbers: 12-102 , 12-104 )
    3. 15 ml/50 ml conical bottom centrifuge tubes (e.g., Corning, catalog numbers: 05-538-59A , 05-526B )
    4. Glass coverslip (e.g., Fischer Scientific, Hausser Hemacytometer Cover Glass, catalog number: 02-671-53 )
    5. Kimwipes (Fisher Scientific, catalog number: 06-666 )
    6. 6 cm tissue culture dishes (e.g., Genesee Scientific, catalog number: 25-260 )
    7. Mammalian cell type of interest (e.g., mouse trophoblast stem cells [Calabrese et al., 2012])
    8. Cell Culture Media supplemented with 10% serum (e.g., DMEM supplemented with 10% FBS; DMEM, Thermo Fisher Scientific, catalog number: 11995065 ; FBS, VWR, catalog number: 97068-085 )
    9. Sterile 1x PBS (e.g., Corning, catalog number: 21-040-CM )
    10. 0.25% Trypsin-EDTA (e.g., GIBCO, catalog number: 25200-072 )
    11. Trypan Blue Solution, 0.4% (Thermo Fisher Scientific, catalog number: 15-250-06 )
    12. 70% ethanol, stored at room temperature

  2. For RNA Purification
    1. RNase Zap (Thermo Fisher Scientific, catalog number: AM9780 )
    2. P20/200/1000 Barrier pipette tips (e.g., Olympus brand tips, Genesee Scientific, catalog numbers: 23-404 , 24-412 , 24-430 )
    3. 1.7 ml microcentrifuge tubes (e.g., Genesee Scientific, catalog number: 22-282 )
    4. TRIzol Reagent (Thermo Fisher Scientific, catalog number: 15596018 )
    5. Chloroform (Fisher Scientific, catalog number: BP1145-1 )
    6. Isopropanol (Fisher Scientific, catalog number: BP2618-1 )
    7. Linear Acrylamide (Thermo Fisher Scientific, catalog number: AM9520 )
    8. RNase-free water (e.g., we use deionized 18.2 MΩ water produced from a Synergy Water Purification System, Millipore, catalog number: SYNS0HFWW )
    9. 80% ethanol made with RNase-free water, stored at -20 °C
    10. LE Agarose (e.g., Genesee Scientific, catalog number: 20-102QD )
    11. Ethidium Bromide 1% Solution (Fisher Scientific, catalog number: BP1302-10 )
    12. 50x TAE Buffer (e.g., Thermo Fisher Scientific, catalog number: B49 )
    13. Agarose gel loading buffer and dye (e.g., NEB, catalog number: B7024S )
    14. 1 Kb Plus DNA Ladder (Thermo Fisher Scientific, catalog number: 10787026 )

  3. For RNA-Seq
    1. KAPA RNA HyperPrep Kits with RiboErase (Roche/KAPA Biosystems, catalog number: KK8560 )
    2. SeqCap Adapter Kit (see Note 1; Roche, catalog number: 714153000 )
    3. ERCC RNA Spike-In Mix 1 (Thermo Fisher Scientific, catalog number: 4456740 )
    4. Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, catalogue number: Q32854 )
    5. ERCC annotation files (ERCC92.fa, ERCC92.gtf, and ERCC_Controls_Analysis.txt found in the ERCC RNA Spike-In product page, Thermo Fisher Scientific, catalog number: 4456740 )
    6. Transcriptome annotation gtf file (e.g., downloaded from Illumina’s iGenomes site: https://support.illumina.com/sequencing/sequencing_software/igenome.html)
    7. Appropriate genome sequence (e.g., downloaded from Illumina’s iGenomes site: https://support.illumina.com/sequencing/sequencing_software/igenome.html)

Equipment

  1. For Cell Counting
    1. Tissue Culture Incubator (e.g., Forma Series II 3110 Water-Jacketed CO2 Incubator, Thermo Fisher Scientific, catalog number: 3110)
    2. Pipet Aid (e.g., Drummond, catalog number: DP-101)
    3. P2/P20/200/1000 micropipettes (e.g., Research Plus 4-pack, Eppendorf, catalog number: EPPR4330)  
    4. Centrifuge for 15 ml/50 ml conical tubes (e.g., Eppendorf, model: 5810, catalog number: 022628157)
    5. Phase Hemocytometer (e.g., Fischer Scientific, Hausser Bright-Line, catalog number: 02-671-6)
    6. Hand Tally Counter (e.g., VWR, catalog number: 23609-102)
    7. Inverted microscope with 10x objective (e.g., Zeiss Primovert, catalog number: 491206-0004-000)

  2. For RNA Purification
    1. Safety goggles (e.g., Thermo Fisher Scientific, catalog number: 19-053-950)
    2. Microcentrifuge, stored at 4 °C (e.g., 5424 Microcentrifuge, Eppendorf, catalog number: 5424)
    3. Mini Centrifuge with 1.7 ml tube rotor and PCR strip tube rotors (e.g., Genesee Scientific MyFuge Mini, catalog number: 31-500)
    4. Nanodrop spectrophotometer (e.g., Thermo Fisher Scientific, catalog number: ND-LITE)
    5. Mini Gel Electrophoresis System (e.g., Thermo Fisher Scientific, catalog number: B1A)
    6. Agarose Gel Imaging System (e.g., Bio-Rad, Chemidoc MP, catalog number: 12003154)

  3. For RNA-Seq
    1. Thermal Cycler (e.g., Bio-Rad, model: C1000 Touch, catalog number: 1851148)
    2. Magnetic bead stand (e.g., DynaMag-2 Magnet, Thermo Fisher Scientific, catalog number: 12321D)
    3. Qubit Fluorometer (Thermo Fisher Scientific, catalog number: Q33238)
    4. Mini Gel Electrophoresis System (e.g., Thermo Fisher Scientific, catalog number: B1A)
    5. Agarose Gel Imaging System (e.g., Bio-Rad, Chemidoc MP, catalog number: 12003154)
    6. Access to an Illumina sequencing instrument (e.g., Illumina, model: NextSeq500, catalog number: SY-415-1001)

Software

  1. (Optional) SRA toolkit (Leinonen et al., 2011); https://ncbi.github.io/sra-tools/
  2. STAR aligner (Dobin et al., 2013); https://github.com/alexdobin/STAR 
  3. featureCounts from Subread package (Liao et al., 2014); http://subread.sourceforge.net
  4. Samtools (Li et al., 2009); https://samtools.github.io
  5. Microsoft Excel or equivalent, or Rstudio (RStudio_Team, 2015); https://rstudio.com

Procedure

  1. Calculate the average amount of RNA per cell (see Note 2)
    1. Prior to initiating this portion of the protocol, ensure that you have bench space, pipettes, and pipette tips that are clean and suitable for working with RNA (see Note 3).
    2. Culture cells in a 6 cm dish until they are 60% to 80% confluent.
    3. Preheat cell culture media and 0.25% trypsin-EDTA to 37 °C; once warmed, clean the bottles containing media, trypsin-EDTA, and 1x PBS with 70% ethanol and place them in a biological-safety cabinet.
    4. Create a 0.125% trypsin solution by diluting the 0.25% trypsin-EDTA solution with an equal volume of 1x PBS.
    5. Remove cultured cells from the incubator and place them in a biological-safety cabinet.
    6. Aspirate the media from the cells using a disposable glass Pasteur pipet (or equivalent).
    7. Wash cells on the plate by gently adding 4 ml of 1x PBS and then aspirating it.
    8. Add 2 ml of 0.125% Trypsin solution to the cells and let the cell plate stand in the biological-safety cabinet at room temperature for 3 min.
      Note: Dissociation protocols may differ for your cell type.
    9. Using a clean 5 ml serological pipet attached to a pipet aid, pipet the trypsinized cell solution up and down against the plate to obtain a single-cell suspension.
    10. Transfer the trypsinized cell solution to a 15 ml conical tube that contains 8 ml of culture media with 10% serum.
    11. Invert the 15 ml conical tube 10-15 times to obtain a homogenous solution.
    12. Using a P20 micropipette, remove 12 μl of cell suspension and carefully pipette it into the bottom of a 1.7 ml microcentrifuge tube.
    13. Repeat Steps A11 and A12 one more time to obtain two replicates for cell counting.
    14. Add 12 μl of Trypan Blue solution to each of the 12 μl cell suspensions and mix by pipetting, taking care to keep the Trypan Blue/cell solution at the bottom of the tube.
    15. Separately, spin down the remainder of the trypsinized cell solution (~10 ml; in the 15 ml conical) in a centrifuge for 5 min at 1,000 rpm (~200 x g).
    16. During the spin, clean the hemocytometer and glass coverslip with a Kimwipe sprayed with 70% ethanol, to remove any particulates.
    17. Add 12 μl of the Trypan Blue cell suspension to the hemocytometer, ensuring that the solution distributes evenly underneath the coverslip.
    18. Under a 20x objective, count the non-blue cells within each of the four hemocytometer quandrants, keeping track of the counts in each quadrant with a hand tally counter.
    19. Calculate the number of cells per ml using the equation below, then average the cell-count-per-ml between replicates:
      1. Replicate 1, # of cells per ml = [(sum of counts in all 4 quadrants)/4]*2*104
      2. Replicate 2, # of cells per ml = [(sum of counts in all 4 quadrants)/4]*2*104
      3. Average # of cells per ml = (Replicate 1 counts + Replicate 2 counts)/2
    20. After the spin from Step A15 has completed, remove the trypsin-containing media taking care not to disturb the cell pellet, add 10 ml of 1x PBS, mix by pipeting, and spin down the PBS/cell solution in a centrifuge for 5 min at 1,250 rpm (~300 x g).
    21. Remove the PBS, replace it with another 10 ml of 1x PBS, mix by pipetting, and spin down the cell solution in a centrifuge for 5 min at 1,250 rpm (~300 x g).
    22. Remove all traces of PBS and add 1ml of TRIzol to the cell pellet (see Note 4).
    23. Using a P1000, pipette up and down ~15x to lyse the cells and maximize the efficiency of RNA extraction.
    24. Place the cell/TRIzol mixture into a -80 °C freezer.
    25. Repeat Steps A1-A24 at least three times, ideally on separate days, to obtain biological replicates.
    26. Remove the TRIzol suspensions from the freezer and let them thaw at room temperature.
    27. Once thawed, let the TRIzol suspensions sit for 5 min at room temperature to help ribonucleoprotein complexes dissociate (see Note 5).
    28. Add 0.2 ml of chloroform to the TRIzol suspension, vortex vigorously for ~20 s, and let the sample stand at room temperature for another 2 min.
    29. Centrifuge the sample for 15 min at 12,000 x g at 4 °C.
    30. Transfer the aqueous phase containing the RNA to a new 1.7 ml tube (~0.5 ml).
    31. Add 10 μl of linear acrylamide to the extracted aqueous phase and vortex vigorously.
    32. Add 0.5 ml of isopropanol to the aqueous phase (amount roughly equal to the volume of the aqueous phase), and vortex vigorously.
    33. Incubate for ≥1 h at -20 °C (this step differs from the manufacturer’s instructions for purification of RNA from TRIzol).
    34. Centrifuge the sample at top speed for 30 min in a microcentrifuge at 4 °C (> 12,000 x g; see Note 6).
    35. Using a P1000, remove the water/isopropanol solution, being mindful not to remove the RNA pellet, which should be located below the hinge of the microcentrifuge tube.
    36. To the precipitated RNA pellet, gently add 1 ml of an ice-cold mixture of 80% ethanol and 20% RNase-free water.
    37. Using a P1000, remove the 80% ethanol, being mindful not to remove the RNA pellet.
    38. Pulse-spin the 1.7 ml tube in a mini-centrifuge to bring the residual ethanol from the sides of the tube down to the bottom.
    39. Using a P200 pipette and tip, remove the remaining 80% ethanol.
    40. Repeat Steps A38-A39 until no 80% ethanol remains.
    41. Re-suspend the pellet in 30 μl of RNase-free water.
    42. Let the RNA-containing solution stand for 1 h at room temperature with intermittent mixing (every 15 min) by flicking or vortexing and then pulse-spinning the tube, or by pipetting the solution up and down (see Note 7).
    43. After the RNA has dissolved, quantify the concentration of RNA using a Nanodrop spectrophotometer.
      1. An ideal ratio of absorbance at 260 nm and 280 nm for RNA is between ~1.8 and ~2.
      2. If the RNA is contaminated with residual ethanol, phenol, or guanidine, or if the RNA is not completely dissolved, the 260/280 ratio will be lower, usually < 1.6.
      3. See Thermo Fisher's technical notes on NanoDrop Spectrophotometers for more information.
    44. Next, determine whether the purified RNA is intact. To do this, set up a gel electrophoresis apparatus and prepare a 1% agarose/0.0001% ethidium bromide gel with 1x TAE buffer. Submerge the agarose gel in 1x TAE buffer.
    45. In a 1.7 ml tube, mix ~250-500 ng of RNA with an appropriate amount of glycerol-based agarose gel loading buffer.
    46. In separate lanes of the agarose gel, load the RNA/gel loading mixture as well as 0.5-1 μg of DNA ladder (the latter sample provides a size reference).
    47. Run the samples in 1x TAE about ~8 cm through the agarose gel, at a voltage of 5 V/cm of distance between electrodes.
    48. Take a picture of the gel on an appropriate gel imaging system. Intact RNA purified from a typical mammalian cell should yield two distinct bands running at apparent sizes of ~1,500 and ~750 nucleotides relative to the DNA ladder, which correspond to the 28s and 18s rRNA species, respectively (Figure 2; see Note 8).


      Figure 2. RNA run on a 1% agarose gel pre-stained with ethidium bromide. Relative to the DNA size ladder, the 28S and 18S rRNA species migrate at approximate sizes of ~1,400 and ~750 nucleotides, respectively.

    49. If the RNA is intact, proceed to calculate the weight of RNA per cell:
      1. RNA-per-cell = [concentration of RNA in g/L] x [volume in L of water used to resuspend RNA (e.g., 30 x 10-6 L)]/[the number of cells lysed in TRIzol]
      2. Calculate the average of the RNA-per-cell numbers obtained from biological replicate RNA preparations (see Note 9).

  2. Prepare cDNA libraries for RNA-Seq (see Notes 10, 11, 12, and 13)
    1. Ensure that cDNA libraries are prepared from two or more biological replicate RNA preparations (see Note 14).
    2. For each sample to be sequenced, aliquot 1 μg of total cellular RNA in a total volume of 8 μl of RNase-free water.
    3. To each sample add 2 μl of a fresh 1:100 dilution of ERCC RNA Spike-In Mix #1 (see Note 15).
    4. Starting with the mixture of ERCC Spike-in RNA and 1 μg of total cellular RNA, follow the manufacturer’s instructions for cDNA library preparation. We follow the instructions essentially as they are written in the user technical datasheet from KAPA (Note 16). The following are inputs required from the user:
      1. From Section 6 of the technical datasheet (“RNA Elution, Fragmentation and Priming”), we select the fragmentation conditions of 6 min at 94 °C, which will generate 200-300 nucleotide-long RNA fragments.
      2. From Section 12 of the technical datasheet (“Library Amplification”), we perform PCR amplification of our final cDNA library using only half of our purified cDNA library (i.e., 10 μl of cDNA library in a 50 μl PCR reaction), rather than all 20 μl of library in the 50 μl reaction (see Note 17).
      3. From cDNA libraries prepared using 1 μg of total RNA, using half of the purified cDNA library, we typically perform 11 cycles of PCR for the final amplification step.
    5. Using a Qubit fluorometer, quantify the concentration of DNA in the PCR-amplified, purified cDNA library (see Note 18).
    6. Prepare a 1% agarose/0.0001% ethidium bromide gel with 1x TAE buffer. Submerge the agarose gel in 1x TAE buffer.
    7. Run 2 μl (1/10th) of the amplified cDNA library ~6 cm into the agarose gel, alongside of 250 ng of 1 Kb Plus DNA Ladder.
    8. Image the agarose gel and estimate the average size in base pairs of each prepared cDNA library (see Note 19).
    9. Calculate the molarity of the purified cDNA library:
      1. The average molecular weight of a DNA base-pair is 650 Daltons, or 650 g/mole.
      2. Using the average length determined in Step B8 above, calculate the molar weight of the cDNA library:
        [cDNA library g/mole] = [average_length] x 650 g/mole
      3. Now, using the DNA concentration determined in Step B5 above, calculate the molarity of the cDNA library (see Note 20):
        [cDNA library moles/Liter] = [cDNA library concentration in ng/μl]) x ( [1 x 10-9 g]/[1 x 10-6 L])*(1/[cDNA library g/mole])
    10. Pool together the cDNA libraries to be sequenced in a way that will ensure an equimolar amount of each library is present in the pool and that each library will be sequenced to a depth of > 20 million reads (see Notes 21 and 22).
    11. Sequence the pooled cDNA libraries on an Illumina platform (see Note 23).

Data analysis

Note: See Note 24.

  1. Align data and obtain read counts (see Note 25)
    1. Within a UNIX command terminal, create a master directory to perform the sequence alignment, filtering, and read-counting (e.g., ./ercc_mpc_analysis).

      mkdir ./ercc_mpc_analysis

    2. Obtain an RNA-Seq fastq file from the Illumina sequencing run (e.g. rnaseq_file.fastq), and move this file to the master directory (see Note 26).

      mv rnaseq_file.fastq ./ercc_mpc_analysis

    3. Download the appropriate genome fasta file (e.g., genome.fa) and gene-gtf file (e.g., genes.gtf) for your cell type and place them in the master directory (./ercc_mpc_analysis) (see Note 27).
    4. Download the ERCC Spike-In RNA annotation files from the ERCC RNA Spike-In product page on Thermo Fisher’s website and place them in the master directory -- ./ercc_mpc_analysis (see Note 28). File names are:
      1. ERCC Controls Analysis: ERCC RNA Spike-In Control Mixes (e.g., ERCC92_conc.txt)
      2. ERCC92.fa & ERCC92.gtf sequence and annotation files (.zip)
    5. Within the master directory (./ercc_mpc_analysis), create a new directory to store the genome index that will be built by STAR

      mkdir ./GenomeDir/

    6. Build a STAR genome-index that includes both the genome and ERCC reference sequences; the index will be created and stored in ./GenomeDir/.

      STAR --runThreadN 8
      --runMode genomeGenerate
      --genomeDir ./GenomeDir
      --genomeFastaFiles genome.fa ERCC92.fa
      --sjdbGTFfile genes.gtf ERCC92.gtf

    7. Use STAR to align an RNA-Seq fastq file (e.g., rnaseq_file.fastq) to the genome index. The alignments will be saved to a file with the appendix “Aligned.out.sam” (e.g., rnaseq_file_out_Aligned.out.sam).

      STAR --runThreadN 12
      --genomeDir ./GenomeDir
      --readFilesIn rnaseq_file.fastq
      --outFileNamePrefix rnaseq_file_out_

    8. Use samtools to filter the “Aligned.out.sam” file for mapping quality of > 30 (this step selects for uniquely mapped reads). In this example, the filtered file is named “rnaseq_file_out _q30.sam”.
      samtools view -Shq 30

      rnaseq_file_out_Aligned.out.sam > rnaseq_file_out _q30.sam

    9. Use featureCounts in the Subread package to count the number of reads that align to each ERCC Spike-In transcript (Note 29). In this example, the file containing ERCC counts is named “ercc_featureCounts_output.txt”.
      featureCounts

      -s 2
      -a ERCC92.gtf
      -o ercc_featureCounts_output.txt
      rnaseq_file_out_q30.sam

    10. Use featureCounts in the Subread package to count the number of reads that align to each genic transcript in the genes.gtf file. In this example, the file containing ERCC counts is named “mm9_genes_featureCounts_output.txt”.
      featureCounts

      -s 2
      -a genes.gtf
      -o mm9_genes_featureCounts_output.txt
      rnaseq_file_out_q30.sam

  2. Create a standard curve that relates ERCC Spike-In RNA-Seq read counts to the absolute amount of each ERCC transcript added to the RNA just prior to preparing the cDNA library for RNA-Seq. We recommend using Excel or Rstudio for these calculations. Templates and examples can be found here. Figure 3 below shows a standard curve derived from a single RNA-Seq replicate (Schertzer et al., 2019).


    Figure 3. Representative standard curve relating RNA-Seq read counts (y-axis) to molecular abundance of the ERCC Spike-In RNAs (x-axis)

    1. Copy and paste the contents of the ERCC92_conc.txt file (downloaded in step A4) into a new Excel spreadsheet – call this the “ERCC Mix In” spreadsheet (See Note N30). A picture of the ERCC92_conc.txt file is below (Figure 4).


      Figure 4. Screenshot of the ERCC92_conc.txt file

    2. Delete the final three columns of the table in the Excel spreadsheet ("concentration in Mix 2 (attomoles/ul)", "expected fold-change ratio", "log2(Mix 1/Mix 2)").
    3. Create a new column in the table–column E, “Attomoles added”–that uses the values in column D "concentration in Mix 1 (attomoles/ul)" to calculate the number of attomoles of ERCC Spike-In RNAs added to the total RNA prior to RNA-Seq library preparation (see Note 31; Figure 5).


      Figure 5. Screenshot of an example calculation of the number of attomoles of ERCC Spike-Ins added to the RNA prior to library preparation

    4. Create a new column–column F, “Moles”–that divides the number in column E “Attomoles added” by 1E18.
    5. Create a new column–column G, “Molecules”–that multiplies the values in column F “Moles” by 6.022E23 (molecules per mole; Avogadro’s number).
    6. Create a new column–column H, “log2(molecules)” – that calculates the log-base-2 of the values in column G “Molecules”.
    7. Sort the table such that the data in column B “ERCC_ID” appear in ascending order; this will be important later (Figure 6).


      Figure 6. Screenshot of sorted ERCC calculation table

    8. In a separate Excel spreadsheet, copy and paste the contents of the ercc_featureCounts_output.txt file generated in step A9–call this the “fCounts data” spreadsheet (see Note 32).
    9. Within this new spreadsheet, calculate the number of aligned reads per kilobase per million aligned reads (RPKM) for each ERCC transcript.
      1. First, in your dataset of interest, find the total number of reads that aligned to the genome with a mapping quality of > 30. This can be done from the UNIX command line, using samtools view:

        samtools view -c rnaseq_file_out _q30.sam > rnaseq_file_counts.txt

      2. Next, create a new column in the Excel spreadsheet–column H, “RPM”–in which the read counts in column G are divided by aligned read count from “rnaseq_file_counts.txt” then multiplied by 1 million. This gives reads per million (RPM).
      3. Then, create a new column in the Excel spreadsheet–column I, “RPKM”–in which the RPM value in column H is divided by the value in column F “length” (the length in nucleotides of each ERCC transcript), then multiplied by 1,000. This converts RPM into RPKM, or read counts per kilobase of transcript per million aligned reads.
      4. Finally, create a new column in the Excel spreadsheet–column J, “log2(RPKM)”–that calculates the log-base-2 of the values in column I “RPKM” (Figure 7).


      Figure 7. Screenshot of ERCC RPKM calculation

    10. Ensure that the Excel spreadsheets created in step B1 and in step B8 are sorted by ERCC ID in ascending order (see Note 33).
    11. Paste the log2(RPKM) values from the “fCounts data” spreadsheet (created in step B8) into a new column in the “ERCC Mix In” spreadsheet (created in step B1).
    12. Remove ERCC transcripts that had zero aligned reads.
    13. Generate a scatter plot where log2(molecules) is on the x-axis and log2(RPKM) is on the y-axis. See Figure 3 and Figure 8 below.


      Figure 8. Screenshot of scatter plot generation in Excel

    14. Fit a straight line to the data
      1. In Excel, select the points on the graph, right click, and ‘Add Trendline’. In the window, select ‘Linear’ and ‘Display Equation on chart’ (Figure 9).


        Figure 9. Screenshot of trendline-adding in Excel

      2. Expect the R2 value to be greater than 0.90 (typically, it is above 0.95).
      3. Use the y = mx + b equation in the next section.

  3. Calculate molecules per cell for the lncRNA and mRNA genes of interest.
    1. Copy the contents of the mm9_genes_featureCounts_output.txt file (file generated in step A10) and paste them into a new Excel spreadsheet – call this the “MPC” spreadsheet (see Note 34).
    2. Using same procedure outlined in step B9 above, convert the read counts for each gene (column G of the MPC spreadsheet) into RPM as a new column H, then into RPKM as a new column I, and then into log2(RPKM) as a new column J.
    3. For each gene of interest, calculate log2(molecules) using the y = mx + b equation from step B14.
      1. x = log2(molecules)
      2. y = log2(RPKM) values calculated in column J
      3. b = y-intercept from equation in step B14
      4. m = slope from equation in step B14
      5. Create a new column in the MPC Excel spreadsheet–column K “log2(molecules)”–which performs the following calculation:
        x= (y-b)/m
    4. Create a new column in the MPC Excel spreadsheet–column L “molecules”–in which the value in column K is converted to molecules using the exponential 2x. In Excel notation, this is performed by setting the equation in column L to “=2^column_K” (see Note 35).
    5. Create a final column in the MPC Excel spreadsheet–column M “MPC”–in which the value in column L is converted to molecules-per-cell:
      1. Divide 1 μg, the amount of total RNA used to prepare the RNA-seq library, by the weight of RNA-per-cell calculated in Step A49 of the “Procedure” section of this protocol. This value represents the approximate total number of cells used to prepare the RNA-seq library (see Note 36).
      2. For each transcript of interest, divide the number of molecules in column L by the total number of cells used for RNA-Seq to estimate molecules of transcript per cell.

Notes

  1. Roche recently purchased KAPA Biosystems and discontinued their small reaction-number sequence adapter kits. The SeqCap Adapter Kit is their replacement product, but note that this kit only comes in a 96-reaction format. For less than the list-price of the SeqCap kit, users can purchase their own Illumina-compatible adapters in bulk from a commercial oligonucleotide provider. These adapters then need to be resuspended and annealed by the user. However, the advantage of purchasing adapters in bulk is that the cost per reaction is dramatically reduced. Thus, users that plan to perform many RNA-seq assays (or, for that matter, any other *–seq assay) will find that purchasing and annealing their own adapters is far more cost effective than purchasing a pre-aliquoted set of adapters. For those interested in purchasing and annealing their own adapters, we have provided instructions here. Users that plan to carry out only a small number of RNA-seq assays may find it more cost-effective to purchase their RNA-seq kits and adapters from a manufacturer that sells reagents in small-sized packages, such as NEB.
  2. This portion of the protocol describes how to calculate the average amount of RNA per cell in a cell type of interest. The protocol is designed for cultured adherent cells but could easily be adapted to cultured suspension cells. With additional optimization, it could also be adapted to a tissue of interest. In this latter case, the user would need a method to approximate the total number of cells per mass of tissue of interest (for example, how many cells are present in one milligram of tissue?). With an accurate estimate of cell-number-per-mass-of-tissue, the user could then perform replicate rounds of RNA purification from a known mass of tissue. The yield in weight of RNA would then be divided by the number of cells that were used to obtain the RNA to derive an estimate of the amount of RNA per average cell in the tissue of interest.
  3. To clean a bench, spray the area with a light coat of RNase Zap and wipe the solution clean with paper towels. Lightly spray the pipettes to be used for the RNA prep with RNase Zap and then wipe them clean. Use boxes of pipette tips that have not been exposed to any source of RNase. Common sources of RNase are from plasmid DNA preparation kits and human skin/saliva. On a given workday, minimize the chance of RNase contamination by performing RNA work prior to performing any plasmid DNA preparations (or any other protocol that involves an RNase) and wear gloves at all times. Moreover, although it may sound draconian, once an RNA preparation begins in earnest (Step A26, Procedure section) avoid talking, coughing, chewing gum etc. in the vicinity of open microcentrifuge tubes. We also use barrier tips on our pipettes. By taking these simple precautions, you will help to ensure the success of the protocol.
  4. TRIzol contains phenol, which is corrosive to the eyes, skin, and respiratory tract. When working with TRIzol, users should wear safety goggles, closed-toed shoes, and a lab coat at all times and take care not to splash TRIzol on any part of their body. If users are sensitive to fumes from TRIzol, work should be performed in a fume hood. TRIzol needs to be disposed of by following safety guidelines that are appropriate to the institution.
  5. Recently, Chujo and colleagues found that certain nuclear-retained lncRNAs are recovered at a higher efficiency when TRIzol suspensions are incubated for 10 min at 55 °C (Chujo and Hirose, 2017). In our prior study of lncRNA intracellular abundance (Schertzer et al., 2019), we did not perform this 55 °C incubation step. However, we see no downside to the 55 °C incubation, and may perform it in the future.
  6. Prior to starting the spin cycle, align the spines of each microcentrifuge tube so that the hinges are all facing outward. Aligning the spines will ensure that the RNA/acrylamide pellet in each tube is located directly below the tube hinge. Knowing where in the tube to expect the pellet helps to take some of the guesswork out of the protocol, especially when you are working with small amounts of RNA.
  7. If mixing by pipetting, please note that the RNA pellet can sometimes stick to the inside of the pipette tip. If this scenario occurs, continue pipetting the water up and down until the pellet visibly dissolves.
  8. RNA that has experienced varying levels of degradation will appear as a smear on the gel.
  9. Using this procedure, our RNA-per-cell estimates for mouse embryonic stem and trophoblast stem cells arrived at 20 picograms and 30 picograms, respectively (Calabrese et al., 2007; Schertzer et al., 2019).
  10. To estimate intracellular abundance, we recommend using RNA-Seq library preparation protocols that purify genic transcripts away from rRNA by rRNA-depletion rather than by polyA-selection. In a pilot study, we estimated intracellular abundance using rRNA-depletion and polyA-selection for three lncRNAs of interest (Xist, Airn, and Kcnq1ot1) and found that the two protocols arrived at dramatically different estimates (not shown). Our interpretation of these pilot data is that during the polyA-selection protocol, a number of factors likely cause the efficiency of capture to vary for different polyadenylated RNAs. Variations in the extent of polyadenylation, the length of the polyA tail, the extent to which RNA base-pairing interferes with polyA capture, and the amount of internal A-rich sequence may cause certain polyadenylated transcripts to be captured with greater efficiency than others. These variations would skew estimates of intracellular abundance in ways that are difficult to predict. In contrast, variations in polyA-capture efficiency are not relevant for rRNA-depletion protocols, which deplete rRNA from total RNA preparations using oligonucleotides that are complementary to the major rRNA species. Thus, intracellular abundance may be measured more accurately by rRNA-depletion RNA-Seq than by polyA-selection RNA-seq. That being said, transcripts harboring internal homology to the rRNAs would be selectively depleted by rRNA-depletion and would require special consideration under this protocol.
  11. To prepare samples for RNA-Seq, our lab routinely uses the RNA HyperPrep Kit with RiboErase from Kapa Biosciences/Roche. The instructions from the KAPA RNA HyperPrep Kit with RiboErase are clear and walk users in-depth through each step of the protocol, which begins with the degradation of rRNA, followed by DNase treatment, RNA fragmentation and priming, cDNA synthesis, second-strand cDNA synthesis and A-tailing, adapter ligation, and finally, library amplification. Most of these steps are followed by a purification and buffer exchange using polystyrene–magnetite beads that are provided as part of the kit. In our lab, this kit has been robust to multiple users collecting different datasets over timeframes that span multiple years. However, many other companies sell high-quality kits to prepare ribo-deplete RNA-seq libraries, including Illumina and NEB. Generally speaking, these kits should perform equivalently. For users that plan to perform only a few RNA-seq experiments, a company such as NEB may be preferable, because they sell kits in smaller sizes than KAPA.
  12. The KAPA RNA HyperPrep protocol is a modified version of the dUTP second-strand protocol described in Parkhomchuk et al. (2009) and Levin et al. (2010), in which stranded-ness of the RNA-seq library is maintained by performing second-strand cDNA synthesis in the presence of dUTP, followed by cDNA library amplification using a DNA polymerase that has been engineered to preferentially amplify DNA that contains deoxythymidine and not deoxyuridine. When using this kit or any other kit that employs a dUTP-based method, researchers should be aware that the stranded-ness of the amplified library is not perfect. With a low frequency, the engineered DNA polymerases will still amplify the deoxyuridine-containing second-strand. The result of this low-frequency second-strand amplification that for highly expressed genes, a “shadow” of RNA-seq signal is often visible on the strand that is opposite (i.e., antisense) to the correct strand of the gene. In practice, this shadow signal has never affected our downstream analyses, but users should be aware that it exists.
  13. It is not uncommon to make mistakes during the first run-through of this protocol. We recommend that first-time users start by going through the entire protocol below using only one or two samples from which biological material is non-limiting. This way, users can work out the logistics of library preparation without the stress of needing the protocol to work the very first time.
  14. For a robust biological replicate, we recommend using RNA prepared on different days or from different animals etc.
  15. To minimize pipetting error, we recommend pipetting volumes of 2 μl or more. For example, instead of pipetting 1 μl of ERCC Spike-In solution to 99 μl RNase-free water, we would pipette 2 μl of Spike-In solution to 198 μl of RNase-free water.
  16. KAPA RNA HyperPrep Kit with RiboErase; catalog number: KK8560 ; technical datasheet version KR1351–v2.17; this same datasheet is also included in our github page.
  17. The reason for using only half of the library is that it preserves cDNA material in case the user needs to repeat the final PCR reaction owing to an error in PCR setup, or over- or under-amplification of the cDNA library.
  18. After PCR amplification, clean-up, and elution in 20 μl of buffer as specified in the technical datasheet, the amount of DNA per sample should be in the range of 7-150 ng/μl. Concentrations of DNA outside of this range may be acceptable, but in the rare instances in which the concentrations of our own cDNA libraries have fallen outside of this range, we have elected to repeat the final PCR amplification using an adjusted number of PCR cycles, rather than submit the originally-amplified library for RNA-seq. The reason for this is that the user is trying to ensure that the final PCR remains within the linear range of amplification. Amplification to a concentration of 150 ng/μl may be close to the top of the linear range of the KAPA kit. Similarly, final library concentrations < 7 ng/μl may also be acceptable, but under this scenario, we have elected to repeat the final PCR using more cycle numbers rather than submit the low concentration libraries for sequencing. Most frequently, the final concentration of our amplified libraries is between 10-50 ng/μl; this is our optimal target range.
  19. Using the conditions for cDNA library preparation specified above, users should expect the average size of amplified DNA fragment in each library to be between 300-400 nucleotides.
  20. Here is an example with numbers: The concentration calculated in Step B5 (Prodedure section) is 25 ng/μl. The average library size estimated in Step B8 (Prodedure section) is 300 base pairs. The molarity of the library is 25 x (1 x 10-9)/(1 x 10-6)/(300 x 650) = 128 nM.
  21. The high-throughput sequencing facility at UNC asks that users submit their pooled cDNA libraries to them at a final concentration of 15 nM. Thus, as an example, if we were hoping to include 12 separate cDNA libraries in a single pool, that would mean each library would need to be present in the pool at a final concentration of 1.25 nM, or [15 nM/12]. One easy way to create a pool of cDNA libraries at the appropriate concentration is to first create separate aliquots of each library at the final concentration of the pool–in this example, that concentration would be 15 nM. Then, equal volumes of each library can be combined to create the 15 nM pool.
  22. In order to determine the maximum number of libraries that can be included in a pool such that that each library is still sequenced to an appropriate depth, first determine the number of sequencing reads that you expect to be returned from your run on the Illumina sequencing instrument. For example, at UNC, the average 75-cycle high-output run on a NextSeq500 Instrument will return 500 million reads. In order to obtain at least 20 million reads per cDNA library in a pool, the maximum number of libraries we should include in that pool is 500/20, or 25 libraries. In practice, we often include fewer libraries than this maximum number, which results in read-depths per library of > 20 million reads. High read-depth per sample is never a problem. Moreover, fewer than 20 million reads from a single library may also be tolerable; just note that as the number of reads decreases, so does your ability to confidently quantify the abundance lowly-expressed transcripts.
  23. We have performed our data analyses after obtaining 75 base, single-end reads from an Illumina NextSeq500 instrument. Shorter, longer, or even paired-end reads would also be suitable.
  24. Please see the github page associated with this protocol for example files and analysis templates in Excel and in R (https://github.com/mschertzer/ercc_analysis).
  25. For RNA-Seq alignments we generally use STAR, but note that other aligners that support gapped-alignments should perform equivalently (Baruzzo et al., 2017; Bushnell, 2010; Dobin et al., 2013).
  26. If following along with the example provided on the github page, you may use the SRA toolkit to download the fastq file associated with record SRR7685881 in the NCBI Sequence Read Archive (Leinonen et al., 2011).
  27. In the example in this protocol, we use the mm9 genome.fa and genes.gtf files compiled by the UCSC Genome Browser (Haeussler et al., 2019) and downloaded from Illumina’s iGenomes site: https://support.illumina.com/sequencing/sequencing_software/igenome.html.
  28. Users can also find these files on the github page associated with this protocol (https://github.com/mschertzer/ercc_analysis).
  29. The “-s 2” option of featureCounts is specific to libraries that are prepared using methods that generate “reverse-stranded” data, such as the KAPA RNA HyperPrep Kit with RiboErase described in this protocol.
  30. Steps B2 through B7 (Data analysis) below have already been performed in the “ERCC Mix In” spreadsheet that is included in the “ERCC_analysis_template.xlsx” template on the github page associated with this protocol (https://github.com/mschertzer/ercc_analysis).
  31. In Schertzer et al. (2019), we added 2 μl of a 1:100 dilution of ERCC Mix 1 Spike-In RNA to 1 μg total RNA. Thus, to follow our example, divide the ERCC Spike-In Mix concentration in column D by 100 and then multiply by 2 to calculate the number of attomoles added as column E.
  32. In the template “ERCC_analysis_template.xlsx” provided on the github page, this second spreadsheet is called “fCounts data”.
  33. After sorting, the order in which each ERCC transcript appears in each spreadsheet should be identical.
  34. For an example of MPC calculations performed over the entire transcriptome using an RNA-seq dataset from Schertzer et al. (2019), see the MPC spreadsheet in the “ERCC vprtta analysis example.xlsx” file on the github page associated with this protocol (https://github.com/mschertzer/ercc_analysis).
  35. The result of this calculation is an estimation of the number of RNA molecules that were present in the pool of total RNA that was used to prepare the RNA-Seq cDNA library (which in our case was 1 μg total RNA).
  36. In Schertzer et al. (2019), we estimated that our trophoblast stem cell line harbors 30 picograms per cell. Thus, 1 μg of RNA corresponds to 33,333 cells.

Acknowledgments

We thank Keean Braceros for proofreading this protocol and Jackson Trotman for the gel image used in Figure 2. This work was supported by the National Institutes of Health (NIH) Grant GM121806. Schertzer et al. (2019) is the original paper from which this protocol was derived.

Competing interests

The authors have no competing interests to declare.

References

  1. Baruzzo, G., Hayer, K. E., Kim, E. J., Di Camillo, B., FitzGerald, G. A. and Grant, G. R. (2017). Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat Methods 14(2): 135-139. 
  2. Bushnell, B. (2010). BBMap (sourceforge.net/projects/bbmap/).
  3. Cabili, M. N., Dunagin, M. C., McClanahan, P. D., Biaesch, A., Padovan-Merhar, O., Regev, A., Rinn, J. L. and Raj, A. (2015). Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol 16: 20. 
  4. Cabili, M. N., Trapnell, C., Goff, L., Koziol, M., Tazon-Vega, B., Regev, A. and Rinn, J. L. (2011). Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25(18): 1915-1927. 
  5. Calabrese, J. M., Seila, A. C., Yeo, G. W. and Sharp, P. A. (2007). RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. Proc Natl Acad Sci U S A 104(46): 18097-18102. 
  6. Calabrese, J. M., Sun, W., Song, L., Mugford, J. W., Williams, L., Yee, D., Starmer, J., Mieczkowski, P., Crawford, G. E. and Magnuson, T. (2012). Site-specific silencing of regulatory elements as a mechanism of X inactivation. Cell 151(5): 951-963.
  7. Cerase, A., Smeets, D., Tang, Y. A., Gdula, M., Kraus, F., Spivakov, M., Moindrot, B., Leleu, M., Tattermusch, A., Demmerle, J., Nesterova, T. B., Green, C., Otte, A. P., Schermelleh, L. and Brockdorff, N. (2014). Spatial separation of Xist RNA and polycomb proteins revealed by superresolution microscopy. Proc Natl Acad Sci U S A 111(6): 2235-2240.
  8. Chen, J., Shishkin, A. A., Zhu, X., Kadri, S., Maza, I., Guttman, M., Hanna, J. H., Regev, A. and Garber, M. (2016). Evolutionary analysis across mammals reveals distinct classes of long non-coding RNAs. Genome Biol 17: 19. 
  9. Chujo, T. and Hirose, T. (2017). Nuclear bodies built on architectural long noncoding rnas: unifying principles of their construction and function. Mol Cells 40(12): 889-896.
  10. Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1): 15-21. 
  11. Gil, N. and Ulitsky, I. (2020). Regulation of gene expression by cis-acting long non-coding RNAs. Nat Rev Genet 21(2): 102-117. 
  12. Haeussler, M., Zweig, A. S., Tyner, C., Speir, M. L., Rosenbloom, K. R., Raney, B. J., Lee, C. M., Lee, B. T., Hinrichs, A. S., Gonzalez, J. N., Gibson, D., Diekhans, M., Clawson, H., Casper, J., Barber, G. P., Haussler, D., Kuhn, R. M. and Kent, W. J. (2019). The UCSC Genome Browser database: 2019 update. Nucleic Acids Res 47(D1): D853-D858. 
  13. Hezroni, H., Koppstein, D., Schwartz, M. G., Avrutin, A., Bartel, D. P. and Ulitsky, I. (2015). Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep 11(7): 1110-1122.
  14. Hrdlickova, R., Toloue, M. and Tian, B. (2017). RNA-Seq methods for transcriptome analysis. Wiley Interdiscip Rev RNA 8(1).
  15. Jiang, L., Schlesinger, F., Davis, C. A., Zhang, Y., Li, R., Salit, M., Gingeras, T. R. and Oliver, B. (2011). Synthetic spike-in standards for RNA-seq experiments. Genome Res 21(9): 1543-1551.
  16. Kopp, F. and Mendell, J. T. (2018). Functional Classification and Experimental Dissection of Long Noncoding RNAs. Cell 172(3): 393-407. 
  17. Kutter, C., Watt, S., Stefflova, K., Wilson, M. D., Goncalves, A., Ponting, C. P., Odom, D. T. and Marques, A. C. (2012). Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet 8(7): e1002841. 
  18. Leinonen, R., Sugawara, H., Shumway, M. and International Nucleotide Sequence Database, C. (2011). The sequence read archive. Nucleic Acids Res 39(Database issue): D19-21.
  19. Levin, J. Z., Yassour, M., Adiconis, X., Nusbaum, C., Thompson, D. A., Friedman, N., Gnirke, A. and Regev, A. (2010). Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat Methods 7(9): 709-715.
  20. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R. and Genome Project Data Processing, S. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16): 2078-2079.
  21. Liao, Y., Smyth, G. K. and Shi, W. (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7): 923-930. 
  22. Necsulea, A., Soumillon, M., Warnefors, M., Liechti, A., Daish, T., Zeller, U., Baker, J. C., Grutzner, F. and Kaessmann, H. (2014). The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505(7485): 635-640.
  23. Ninomiya, K. and Hirose, T. (2020). Short tandem repeat-enriched architectural rnas in nuclear bodies: functions and associated diseases. Noncoding RNA 6(1).
  24. Parkhomchuk, D., Borodina, T., Amstislavskiy, V., Banaru, M., Hallen, L., Krobitsch, S., Lehrach, H. and Soldatov, A. (2009). Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res 37(18): e123.
  25. Raj, A. and Rinn, J. L. (2019). Illuminating Genomic Dark Matter with RNA Imaging. Cold Spring Harb Perspect Biol 11(5).
  26. RStudio_Team. (2015). RStudio: Integrated Development for R.
  27. Schertzer, M. D., Braceros, K. C. A., Starmer, J., Cherney, R. E., Lee, D. M., Salazar, G., Justice, M., Bischoff, S. R., Cowley, D. O., Ariel, P., Zylka, M. J., Dowen, J. M., Magnuson, T. and Calabrese, J. M. (2019). lncRNA-Induced Spread of Polycomb Controlled by Genome Architecture, RNA Abundance, and CpG Island DNA. Mol Cell 75(3): 523-537 e510. 
  28. Schuler, A., Ghanbarian, A. T. and Hurst, L. D. (2014). Purifying selection on splice-related motifs, not expression level nor RNA folding, explains nearly all constraint on human lincRNAs. Mol Biol Evol 31(12): 3164-3183. 
  29. Schwaber, J., Andersen, S. and Nielsen, L. (2019). Shedding light: The importance of reverse transcription efficiency standards in data interpretation. Biomol Detect Quantif 17: 100077.
  30. Smeets, D., Markaki, Y., Schmid, V. J., Kraus, F., Tattermusch, A., Cerase, A., Sterr, M., Fiedler, S., Demmerle, J., Popken, J., Leonhardt, H., Brockdorff, N., Cremer, T., Schermelleh, L. and Cremer, M. (2014). Three-dimensional super-resolution microscopy of the inactive X chromosome territory reveals a collapse of its active nuclear compartment harboring distinct Xist RNA foci. Epigenetics Chromatin 7: 8.
  31. Sunwoo, H., Wu, J. Y. and Lee, J. T. (2015). The Xist RNA-PRC2 complex at 20-nm resolution reveals a low Xist stoichiometry and suggests a hit-and-run mechanism in mouse cells. Proc Natl Acad Sci U S A 112(31): E4216-4225. 
  32. Tsanov, N., Samacoits, A., Chouaib, R., Traboulsi, A. M., Gostan, T., Weber, C., Zimmer, C., Zibara, K., Walter, T., Peter, M., Bertrand, E. and Mueller, F. (2016). smiFISH and FISH-quant - a flexible single RNA detection approach with super-resolution capability. Nucleic Acids Res 44(22): e165.
  33. Ulitsky, I. (2016). Evolution to the rescue: using comparative genomics to understand long non-coding RNAs. Nat Rev Genet 17(10): 601-614. 
  34. Washietl, S., Kellis, M. and Garber, M. (2014). Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals. Genome Res 24(4): 616-628.

简介

[摘要]长非编码RNA(lncRNA)在正常生理和疾病中起着至关重要的作用,但其作用机理可能难以鉴定。对于机理研究,了解lncRNA的胞内丰度(即在目标细胞类型的典型细胞中大约存在多少个lncRNA分子)通常很有用。至少两种方法已用于估算lncRNA细胞内丰度:单分子敏感性RNA荧光原位杂交(smFISH )和单基因,校准的逆转录,然后进行定量PCR(RT-qPCR)。但是,像所有实验方法一样,这些方法也有其局限性。使用衍射极限显微镜进行分析时,smFISH可能会低估细胞内的丰度,尤其是对于聚集在集中的亚细胞区域的lncRNA。校准的RT-qPCR可能会返回不正确的丰度估算值,因为在转录本长度范围内隔开的各个PCR扩增子的逆转录效率可能会有所不同。在这里,我们描述了一种基于测序的方法,该方法简单易行,与smFISH和RT-qPCR正交,可用于估算感兴趣的细胞类型中大多数表达的长RNA(lncRNA和mRNA)的细胞内丰度。首先,通过从已知数量的细胞中重复进行RNA纯化来估算目标细胞类型每个细胞总RNA的平均重量。其次,在将掺入的对照RNA添加到已知数量的总细胞RNA中之后,执行rRNA耗尽RNA-Seq方案。最后,通过将每个转录本的读数计数与源自加标RNA的标准曲线进行比较,可以估算出每个转录本的细胞内丰度。基于测序的方法为现有方法提供了强大的补充,特别是在需要同时定量多个lncRNA和/或mRNA的丰度的情况下。


[背景]长非编码RNA(lncRNA)在生物学中起着至关重要的作用,但其作用机理可能难以确定(Kopp和Mendell,2018年;Gil和Ulitsky,2020年)。相对于蛋白质编码基因,lncRNA可以快速进化且不受密码子使用的限制(Cabili等人,2011; Kutter等人,2012; Necsulea等人,2014; Schuler等人,2014; Washietl等人。,2014 ; Hezroni 。等人,2015;陈等人,2016; Ulitsky,2016) 。因此,它们通常缺乏容易识别的结构域,否则这些结构域可以提供对其分子作用的深入了解。因此,对于给定的lncRNA,通常可以通过检查其亚细胞定位及其细胞内丰度来获得进入其分子机制的最初立足点(即,平均而言,单个细胞类型的lncRNA中存在多少个lncRNA分子)利益)。

为此,单分子敏感性RNA荧光原位杂交(smFISH )是一个福音,它为同时研究lncRNA亚细胞定位和细胞内丰度提供了一种方便且经济高效的方式(Cabili等人,2015;Tsanov等人等人,2016 ;Raj and Rinn,2019)。然而,尽管smFISH在可视化lncRNA亚细胞分布方面提供了无与伦比的好处,但在估计lncRNA细胞内丰度方面确实存在局限性。在smFISH的最简单形式中,每个细胞的FISH点数被用作lncRNA细胞内丰度的替代物。然而,ESPE cially当smFISH使用衍射限制显微镜,它们位于空间接近可以不个别地解决单独lncRNA分子进行; 相反,此类lncRNA可能显示为单个点。因此,特别是对于那些积累高浓度的特定的亚细胞区lncRNAs小号(Chujo和广濑,2017;二宫和广濑,2020年),smFISH可能低估的细胞内丰度。可以通过使用超分辨率显微镜执行smFISH并仔细定量单个点内的信号强度来克服这一潜在限制(Cerase等,2014; Smeets等,2014; Sunwoo等,2015)。但是,后一种方法需要高端设备和专业知识,而这些设备和专业知识可能不容易获得。

在这里,我们描述了一种基于测序的方法,该方法与smFISH正交,并且可以提供lncRNA的细胞内丰度以及培养细胞中mRNA的估计值(图1 ; Schertzer等人,2019年)。该方法依赖于RNA-Seq,并且除了遵循分子生物学和生物信息学标准协议的能力外,所需的专业知识最少。该方法允许同时定量所有以真正细胞类型表达的长RNA(lncRNA和mRNA)的细胞内丰度。

基于测序的方法也可能产生细胞内丰度的估计值,该估计值比通过单基因方法产生的估计值更可靠,后者依赖于经过校准的逆转录然后进行基因特异性定量PCR(RT-qPCR [ Schwaber等人,2019 ] )。这样做的主要原因是因为跨转录物长度分布的不同qPCR扩增子的逆转录效率可能会发生巨大变化。相反,在RNA序列中,由于恰好在RNA逆转录成cDNA之前发生RNA的化学断裂,因此固有地平均了逆转录效率中的局部转录本变化。(Hrdlickova et al。,2017)。此外,在基于测序的方法中,使用ERCC穗在的RNA,其怀有多样GC-内容,长度,以及丰度,避免了需要明确地估计逆转录效率,而是允许通过比较来估计转录物丰度标准曲线(Jiang等,2011)。

在基于测序的方法的误差的潜在来源包括与读取对准相关联的(1)的误差,这是最重东升如果感兴趣的lncRNA包含序列相对于其他位置的基因组中的高度重复的是,具有相关联(2)的误差如果感兴趣的lncRNA的主要同工型在目标细胞类型中被错误标注,则可能会产生转录本亚型不确定性,并且(3)与目标细胞类型中RNA总重量的不准确或可变估计有关的错误。Neverth eless,近日,基于测序的方法和smFISH使用超高分辨率显微镜已显示出在对lncRNA丰类似的估计到达执行的Xist (Smeets的。等2014;鲜宇等人,2015年; Schertzer等。,2019 ),贷款的信心,这两种方法提供RNA丰合理的估计。基于测序的方法还很有用,因为在单个实验中,它可用于同时估计所有表达的mRNA和lncRNA的丰度。


图1.基于测序的定量lncRNA和mRNA细胞内丰度的方法概述

关键字:RNA-Seq, 核糖体RNA消耗, 长链非编码RNA, Xist, ERCC Spike-In RNAs, 转录组, RNA FISH, smFISH

材料和试剂

用于细胞计数
一次性硼硅酸盐玻璃巴斯德吸管(Fisher Scientific,目录号:13-678-20D),在使用前进行灭菌
5 ml / 10 ml无菌血清移液管(例如,Genesee Scientific,目录号:12-102、12-104)
15 ml / 50 ml锥形底部离心管(例如,Corning,目录号:05-538-59A,05-526B)
玻璃盖玻片(例如,Fischer Scientific ,Hausser血细胞计数器盖玻片,目录号:02-671-53)
Kimwipes (Fisher Scientific,目录号:06-666)
6 cm t发行培养皿(例如,Genesee Scientific,目录号:25-260)
感兴趣的哺乳动物细胞类型(例如。,小鼠滋养层干细胞[卡拉布雷斯等人,2012 ] )
补充有10%血清的细胞培养基(例如,补充有10%FBS的DMEM; DMEM,Thermo Fisher Scientific,目录号:11995065; FBS,VWR,目录号:97068-085)
无菌1x PBS(例如,Corning,目录号:21-040-CM)
0.25%胰蛋白酶-EDTA(例如,GIBCO,目录号:25200-072)
台盼蓝溶液0.4%(Thermo Fisher Scientific,目录号:15-250-06)
70%乙醇,在室温下储存
 

用于RNA纯化
RNase Zap(Thermo Fisher Scientific,目录号:AM9780)
P20 / 200/1000移液器吸头(例如,Olympus品牌吸头,Genesee Scientific,目录号:23-404、24-412、24-430)
1.7 ml微量离心管(例如,Genesee Scientific,目录号:22-282)
TRIzol试剂(Thermo Fisher Scientific,目录号:15596018)
氯仿(Fisher Scientific,目录号:BP1145-1)
异丙醇(Fisher Scientific,目录号:BP2618-1)
线性丙烯酰胺(Thermo Fisher Scientific,目录号:AM9520)
不含RNase的水(例如,我们使用由密理博的Synergy净水系统生产的去离子18.2MΩ水,目录号:SYNS0HFWW)
用无RNase的水制成的80%乙醇,储存在-20°C
LE琼脂糖(例如,Genesee Scientific,目录号:20-102QD)
溴化乙锭1%溶液(Fisher Scientific,目录号:BP1302-10)
50x TAE缓冲液(例如,Thermo Fisher Scientific,目录号:B49)
琼脂糖凝胶上样缓冲液和染料(例如NEB,目录号:B7024S)
1 Kb Plus DNA梯子(Thermo Fisher Scientific,目录号:10787026)
 

对于RNA-Seq
带有RiboErase的KAPA RNA HyperPrep试剂盒(Roche / KAPA Biosystems,目录号:KK8560)
SeqCap适配器套件(s EE注1;罗氏(Roche),目录号:714153000)
ERCC RNA Spike-In Mix 1(Thermo Fisher Scientific,目录号:4456740)
Qubit dsDNA HS检测试剂盒(Thermo Fisher Scientific,目录号:Q32854)
ERCC注释文件(在ERCC RNA Spike-In产品页面Thermo Thermo Scientific中找到的ERCC92.fa,ERCC92.gtf和ERCC_Controls_Analysis.txt ,目录号:4456740)
转录组注释gtf文件(例如,从Illumina的iGenomes网站下载:https : //support.illumina.com/sequencing/sequencing_software/igenome.html)
适当的基因组序列(例如,从Illumina的iGenomes网站下载:https : //support.illumina.com/sequencing/sequencing_software/igenome.html)
 

设备

 

用于细胞计数
组织培养箱(例如。,FORMA系列II 3110水套CO 2孵化器,赛默飞世尔科技,产品目录号:3110)
移液器(例如,德拉蒙德,目录号:DP-101)
P2 / P20 /千分之二百微量(例如,研究加4-包,的Eppendorf ,目录号:EPPR4330)
离心15 ml / 50 ml锥形管(例如,Eppendorf ,型号:5810,目录号:022628157)
相位血细胞计数器(例如,Fischer Scientific ,Hausser Bright-Line,目录号:02-671-6)
手动理货计数器(例如,VWR,目录号:23609-102)
具有10倍物镜的倒置显微镜(例如Zeiss Primovert ,目录号:491206-0004-000)
 

用于RNA纯化
安全护目镜(例如,Thermo Fisher Scientific,目录号:19-053-950)
微量离心机,在4°C下储存(例如,5424微量离心机,Eppendorf,目录号:5424)
带有1.7 ml离心管转子和PCR条带离心管转子的小型离心机(例如,Genesee Scientific MyFuge Mini,目录号:31-500)
Nanodrop分光光度计(例如,Thermo Fisher Scientific,目录号:ND-LITE)
迷你凝胶电泳系统(例如,Thermo Fisher Scientific,目录号:B1A)
琼脂糖凝胶成像系统(例如,Bio - R ad ,Chemidoc MP,目录号:12003154)
 

对于RNA-Seq
热循环仪(例如,Bio - R ad ,型号:C1000 Touch,目录号:1851148)
磁珠支架(例如,DynaMag-2磁铁,Thermo Fisher Scientific,目录号:12321D)
Qubit荧光计(Thermo Fisher Scientific ,目录号:Q33238)
迷你凝胶电泳系统(例如,Thermo Fisher Scientific,目录号:B1A)
琼脂糖凝胶成像系统(例如,Bio -R ad ,Chemidoc MP,目录号:12003154)
使用Illumina测序仪(例如,Illumina ,型号:NextSeq500,目录号:SY-415-1001)
 

软件

 

(可选)SRA工具包(Leinonen等,2011);https://ncbi.github.io/sra-tools/
STAR aligner (Dobin et al。,2013); https://github.com/alexdobin/STAR
Subread包中的featureCounts (Liao et al。,2014); http://subread.sourceforge.net
Samtools (Li等,2009);https://samtools.github.io
Microsoft Excel或等效版本,或Rstudio (RStudio_Team,2015年);https://rstudio.com
 

程序

 

计算RNA的每平均量细胞(小号EE注2)
1.之前发起协议的这一部分,确保具有工作台空间,移液管,和枪头是洁净适合与RNA(工作小号EE注3)。      

2.在6厘米的培养皿中培养细胞,直至其融合度达到60%至80%。      

3.将细胞培养基和0.25%胰蛋白酶-EDTA预热至37       ℃下; 加热后,清洁装有培养基,胰蛋白酶-EDTA和含70%乙醇的1x PBS的瓶子,并将其放在生物安全柜中。

4.通过用等体积的1x PBS稀释0.25%的胰蛋白酶-EDTA溶液来创建0.125%的胰蛋白酶溶液。      

5.从培养箱中取出培养的细胞,并将其放在生物安全柜中。      

6.使用一次性的玻璃巴斯德吸管(或类似产品)从细胞中吸出培养基。      

7.轻轻加入4 ml 1x PBS,然后吸出,以洗涤板上的细胞。      

8.向细胞中加入2 ml的0.125%胰蛋白酶溶液,并将细胞板在室温下置于生物安全柜中3分钟。      

注意:D关联协议可能因您的细胞类型而异。

9.使用连接到移液器的干净的5 ml血清移液管,将胰蛋白酶处理的细胞溶液上下吸移板,以获得单细胞悬液。      

10.将胰蛋白酶处理的细胞溶液转移到15 ml锥形管中,该锥形管包含8 ml含10%血清的培养基。   

11.将15 ml锥形管倒置10-15次以获得均匀溶液。   

12.使用P20微量移液器,移出12μl细胞悬液,然后小心地将其移至1.7 ml微量离心管的底部。   

13.再重复一次St eps A 11和A 12,以获得两次重复进行细胞计数。   

14.添加12微升的台盼蓝溶液到每个12个微升细胞悬浮液,并通过移液混合,同时注意保持吨他台盼蓝/在管的底部的细胞溶液。   

15.分别以1,000 rpm(〜200 xg )的转速离心5分钟,将剩余的胰蛋白酶消化的细胞溶液(约10 ml;在15 ml锥形杯中)离心。   

16.在旋转过程中,用喷有70%乙醇的Kimwipe清洁血细胞计数器和玻璃盖玻片,以除去任何微粒。   

17.添加12微升的台盼蓝的细胞悬液至血细胞计数仪,确保溶液均匀地分布在盖玻片下。   

18.根据20倍物镜,计数内的每个的四个血细胞计数器的非蓝色细胞quandrants ,跟踪计数的在每个象限中具有手式计数器。   

19.使用下面的公式计算每毫升的细胞数,然后平均两次重复之间的每毫升细胞数:   

复制1,每毫升细胞数= [((所有4个象限的总计数)/ 4)* 2 * 10 4
复制2,每毫升细胞数= [(所有4个象限的计数之和)/ 4] * 2 * 10 4
每毫升平均细胞数=(重复1个计数+重复2个计数)/ 2
20.在完成步骤A 15的旋转后,小心除去含胰蛋白酶的培养基,以免干扰细胞沉淀,加入1 0毫升1x PBS,通过移液混合,并在离心机中离心PBS /细胞溶液在1,250 rpm(〜300 xg )下持续5分钟。   

21.取出PBS,用另外10 ml的1x PBS代替,通过移液混合,并在离心机中以1,250 rpm(〜300 xg )离心5分钟以旋转细胞溶液。   

22.移除PBS的所有痕迹和加入1ml TRIzol试剂到细胞象素设(小号EE注4)。   

23.使用P1000,上下吸液管约15倍,以裂解细胞并使RNA提取效率最大化。   

24.将样品池/ TRIzol混合物放入-80 °C的冰箱中。   

25.重复步骤A1-A24至少三次,最好在单独的几天中,以获得生物重复。   

26.从冰箱中取出TRIzol悬浮液,并使其在室温下解冻。   

27.一旦解冻,让TRIzol试剂悬浮液在室温下静置5分钟,以帮助核糖核蛋白复合dissocia TE(小号EE注5)。   

28.在TRIzol悬浮液中加入0.2 ml氯仿,剧烈涡旋约20 s,然后将样品在室温下静置2分钟。   

29.在4°C下以12,000 xg离心样品15分钟。   

30.将含有RNA的水相转移到新的1.7 ml试管(〜0.5 ml)中。   

31.向萃取的水相中加入10μl线性丙烯酰胺,并剧烈涡旋。   

32.向水相中加入0.5毫升异丙醇(量约等于水相的体积),并剧烈涡旋。   

33.在-20 °C下孵育≥1小时(此步骤与制造商从TRIzol纯化RNA的说明不同)。   

34.离心机在4℃以顶级速度将样品在微量离心30分钟℃,(> 12000 ×g下;小号EE注6)。   

35.使用P1000除去水/异丙醇溶液,切记不要除去RNA沉淀,RNA沉淀应位于微量离心管铰链的下方。   

36.向沉淀的RNA沉淀中,轻轻加入1 ml 80%乙醇和20%无RNase的冰冷混合物。   

37.使用P1000除去80%的乙醇,注意不要除去RNA沉淀。   

38.在微型离心机中脉冲旋转1.7 ml试管,以使残留的乙醇从试管侧面降至底部。   

39.用P200移液器和吸头除去剩余的80%乙醇。   

40.重复步骤A 38- A 3 9,直到没有80%的乙醇残留。   

41.将沉淀重悬于30μl不含RNase的水中。   

42.让通过溶液支架在室温下间歇搅拌1个小时(每隔15分钟)含有RNA的轻弹或涡旋,然后脉冲旋转管,或通过吸取溶液上下(小号EE注7)。   

43. RNA溶解后,使用Nanodrop分光光度计对RNA的浓度进行定量。   

RNA在260 nm和280 nm处的理想吸光度比在〜1.8和〜2之间。
如果RNA被残留的乙醇,苯酚或胍污染,或者RNA没有完全溶解,则260/280的比率会更低,通常< 1.6。
有关更多信息,请参见Thermo Fisher有关NanoDrop分光光度计的技术说明。
44.接下来,确定纯化的RNA是否完整。为此,请设置凝胶电泳仪,并用1x TAE缓冲液制备1%琼脂糖/0.0001%溴化乙锭凝胶。将琼脂糖凝胶浸入1x TAE缓冲液中。   

45.在1.7 ml试管中,将〜250-500 ng RNA与适量的甘油基琼脂糖凝胶上样缓冲液混合。   

46.在琼脂糖凝胶的不同泳道中,加载RNA /凝胶加载混合物以及0.5-1    μg DNA阶梯(后一个样品提供大小参考)。

47.以1x TAE在约8厘米处通过琼脂糖凝胶电泳,电极之间的距离电压为5 V / cm。   

48.在适当的凝胶成像系统上为凝胶拍照。从典型的哺乳动物细胞中纯化的完整RNA应产生两条不同的条带,相对于DNA阶梯,其表观大小分别为〜1、500和〜750个核苷酸,分别对应于28s和18s rRNA(图2;s ee)。注8)。   

 



图2. RNA在预先用溴化乙锭染色的1%琼脂糖凝胶上电泳。相对于DNA大小梯,28S和18S的rRNA的物种在〜1的大致大小迁移,400分别和〜750个核苷酸。

 

49.如果RNA是完整的,请继续计算每个细胞的RNA重量:   

每细胞RNA = [以g / L表示的RNA浓度] * [每升用于重悬RNA的水中L的体积(例如30 x 10 -6 L)] / [在TRIzol中裂解的细胞数]
计算平均从生物复制RNA制剂(获得的RNA的每单元数的小号EE注9)。
 

制备cDNA的LiBr白羊用于RNA-SEQ(小号EE备注10,11,12,和13)
1.确保cDNA文库是从两个或更多个生物重复的RNA制备物(制备小号EE注14)。      

2.对于每个样品在8的总体积进行测序,等分试样1微克总细胞RNA的微升不含RNA酶的水。      

3.向每个样品ADD 2微升新鲜的1:100稀释的ERCC RNA穗在混合#1(小号EE注15)。      

4.从ERCC Spike-in RNA和1μg总细胞RNA的混合物开始,按照制造商的说明进行cDNA文库制备。我们基本上按照KAPA用户技术数据表中的说明进行操作(注16)。以下是用户要求的输入:       

FR OM的技术数据表(“RNA洗脱,分段和底漆”)的第6节中,我们在94选择的6分钟的断裂条件℃,这将产生200-300个核苷酸长的RNA片段。
从技术数据表(“文库扩增”)第12条,我们执行只用一半的纯化cDNA文库的我们最后的cDNA文库的PCR扩增(即,10微升cDNA文库的50微升PCR反应),而不是全部20微升库在50 μ升反应(小号EE注17)。
从使用1μg总RNA制备的cDNA文库,使用一半的纯化cDNA文库中,我们通常执行11个PCR循环用于最终扩增步骤。
5.使用一个量子位荧光计,量化在PCR扩增,纯化的cDNA文库(DNA的浓度小号EE注18)。      

6.用1x TAE缓冲液制备1%琼脂糖/0.0001%溴化乙锭凝胶。将琼脂糖凝胶浸入1x TAE缓冲液中。      

7.运行2微升(1/10个)扩增的cDNA文库-6厘米的琼脂糖凝胶的,沿着250纳克1个所述的KB加DNA梯。      

8.图像琼脂糖凝胶,并估计在碱基对每个制备的cDNA文库(平均大小小号EE注19)。      

9.计算纯化的cDNA文库的摩尔浓度:      

DNA碱基对的平均分子量为650道尔顿,即650 g / mol。
使用上面S tep B 8中确定的平均长度,计算cDNA文库的摩尔质量:
[cDNA文库克/摩尔] = [平均长度] * 650克/摩尔

现在,使用在所确定的DNA浓度小号TEP乙5以上,计算出cDNA文库(的摩尔浓度小号EE注20):
[cDNA文库摩尔/升] = [cDNA文库浓度,单位为ng / μl ])*([1 x 10 -9 g] / [1 x 10 -6 L])*(1 / [cDNA文库g /摩尔])

10.在一个方式将保证每个库的等摩尔量的待测序池一起cDNA文库存在于池和每个文库将被测序,以深度> 20000000读取(小号EE注小号21和22 )。   

11.序列在Illumina平台上的汇集的cDNA文库(小号EE注23)。

 

数据分析

 

ñ OTE:见注24。

对齐数据,并获得读数S(小号EE注25)
在一个UNIX命令端子,创建一个主目录进行序列比对,过滤,和读计数(例如,./ ercc_mpc_analysis )。
 

mkdir ./ ercc_mpc_analysis

 

获得RNA-Seq的的fastq从Illumina测序运行(文件如rnaseq_ file.fastq ),并将该文件移动到主目录(小号EE注26)。
 

mv rnaseq_file.fastq ./ ercc_mpc_analysis

 

下载相应的基因组发STA文件(例如,genome.fa )和基因- GTF文件(例如,genes.gtf )为您的细胞类型,并放置在主目录(./ ercc_mpc_analysis )(小号EE注27)。
下载从ERCC穗在RNA批注文件ERCC RNA穗在产品页面默飞世尔网站的主目录,并将它们- ./ ercc_mpc_analysis (小号EE注28)。文件名是:
ERCC控制分析:ERCC RNA穗在控制混合料(例如,ERCC92_conc.txt)
ERCC92.fa和ERCC92.gtf序列和注释文件(.zip)
在主目录(./ ercc_mpc_analysis )中,创建一个新目录来存储将由STAR构建的基因组索引
 

mkdir ./ GenomeDir /

 

建立包括基因组和ERCC参考序列的STAR基因组索引;索引将被创建并存储在./ GenomeDir /中。
 

STAR- runThreadN 8

- runMode genomeGenerate

- genomeDir ./ GenomeDir

-基因组FastaFiles基因组.fa ERCC92.fa

- sjdbGTFfile genes.gtf ERCC92.gtf

 

利用明星对齐的转录组测序的fastq文件(例如,rnaseq_file.fastq )的基因组指数。该路线将被保存到阑尾“文件Aligned.out.sam ”(例如,rnaseq_file_out_Aligned.out.sam )。
 

STAR - RUNT hreadN 12

- genomeDir ./ GenomeDir

- readFilesIn rnaseq_file.fastq

- outFileNamePrefix rnaseq_file_out _

 

使用samtools过滤“ Aligned.out.sam ”文件,以实现> 30的映射质量(此步骤选择唯一映射的读取)。在此示例中,过滤后的文件名为“ rnaseq_file_out _q30.sam”。
samtools视图-Shq 30

 

rnaseq_file_out_Aligned.out.sam > rnaseq_file_out _q30.sam

 

使用Subread包中的featureCounts可以计算与每个ERCC Spike-In转录一致的读取次数(注29)。在此示例中,包含ERCC计数的文件名为“ ercc_featureCounts_output.txt”。
featureCounts

 

-s 2

-a ERCC92.gtf

-o ercc_featureCounts_output.txt

rnaseq_file_out_q30.sam

 

使用Subread包中的featureCounts可以计算与genes.gt f文件中每个基因转录本对齐的读取数。在此示例中,包含ERCC计数的文件名为“ mm9_genes_featureCounts_output.txt”。
featureCounts

 

-s 2

-a基因

-o mm9_genes_featureCounts_output.txt

rnaseq_file_out_q30.sam

 

创建一条标准曲线,将ERCC Spike-In RNA-Seq读数与即将添加到RNA的每个ERCC转录本的绝对量相关,就在准备用于RNA-Seq的cDNA库之前。我们建议使用Excel或Rstudio进行这些计算。模板和示例可以在这里找到。下面的图3显示了源自单个RNA-Seq复制的标准曲线(Schertzer等,2019)。
 



图3. RNA-Seq读数计数(y轴)与ERCC Spike-In RNA分子丰度(x轴)相关的代表性标准曲线

 

1.将ERCC92_conc.txt文件(在步骤A4中下载)的内容复制并粘贴到新的Excel电子表格中–将其称为“ ERCC Mix In”电子表格(请参见注释N30)。ERCC92_conc.txt文件的图片如下(图4)。      

 



图4. ERCC92_conc.txt文件的屏幕截图

 

2.删除Excel电子表格中表格的最后三列(“混合2中的浓度(原子/升)”,“预期倍数变化率”,“对数2 (混合1 /混合2)”)。      

3.在表E列中创建一个新列,“添加的原子”,该列使用列D“混合物1中的浓度(原子/ ul)”中的值来计算添加到其中的ERCC Spike-In RNA的原子数之前RNA测序文库制备的总RNA(小号EE注31;图5)。      





图5.在文库制备之前添加到RNA中的ERCC Spike-In的原子数计算示例屏幕截图

 

4.创建一个新列F列“ Moles”,将E列“添加的原子”中的数字除以1E18。       

5.创建一个新列G列“分子”,该列将F列“摩尔”中的值乘以6.022E23(每摩尔分子; Avogadro数)。       

6.创建一个新列H列,“ log 2 (分子)”,该列计算G列“分子”中值的对数以2为底。      

7.排序表,使B列“ ERCC_ID”中的数据以升序显示;这将在以后变得很重要(图6)。      

 



图6.排序后的ERCC计算表的屏幕截图

 

8.在一个单独的Excel电子表格中,复制并粘贴ercc_featureCounts_output.txt的内容文件中生成小号TEP甲9-称此为“ fCounts数据”电子表格(小号EE注32) 。      

9.在这个新的电子表格中,计算每个ERCC笔录的每千碱基对比对读数(RPKM)。      

首先,在感兴趣的数据集中,找到与基因组比对且定位质量> 30的读取总数。这可以使用samtools视图在UNIX命令行中完成:
 

samtools视图-c rnaseq_file_out _q30.sam> rnaseq_file_counts.txt

 

接下来,在Excel电子表格中创建一个新列-H列,“ RPM”,其中G列中的读取计数除以“ rnaseq_file_counts.txt”中对齐的读取计数,然后乘以100万。这给出了每百万的读取数(RPM)。
然后,在Excel电子表格中创建一个新列,即列I,“ RPKM”,其中,列H中的RPM值除以列F中的“长度”(每个ERCC转录本的核苷酸长度)中的值,然后相乘1 ,000将其转换成RPM RPKM,或每对准每百万成绩单的碱基读计数读取。
最后,在Excel电子表格中创建一个新列-列J,“ log 2 (RPKM)”,该列计算第I列“ RPKM”中的值的log-base-2(图7)。
 



图7. ERCC RPKM计算的屏幕截图

 

10.确保Excel电子表格创建以s TEP乙1并且在步骤乙8升序(由ERCC ID排序小号EE注33)。   

11.将“ fCounts数据”电子表格(在步骤B 8中创建)的log 2 (RPKM)值粘贴到“ ERCC混合”电子表格(在步骤B 1中创建)的新列中。   

12.删除对齐读数为零的ERCC成绩单。   

13.生成一个散点图,其中log 2 (分子)在x轴上,log 2 (RPKM)在y轴上。请参见下面的图3和图8。   

 



图8.在Excel中生成散点图的屏幕截图

 

14.将一条直线拟合到数据   

在Excel中,选择图形上的点,单击并单击“添加趋势线”。在窗口中,选择“线性”和“在图表上显示方程式”(图9)。
 



图9.在Excel中添加趋势线的屏幕截图

 

预期R 2值大于0.90(通常大于0.95)。
在下一部分中使用y = mx + b方程。
 

计算每个细胞所需的lncRNA和mRNA基因的分子。
复制mm9_genes_featureCounts_output.txt文件(步骤A10生成的文件)的内容,并将其粘贴到新的Excel电子表格-称之为“MPC”电子表格(小号EE注34)。
使用上面步骤B9中概述的相同步骤,将每个基因(MPC电子表格的G列)的读取计数转换为RPM作为新列H,然后转换为RPKM作为新列I,然后转换为对数2 (RPKM)新列J。
对于每个感兴趣的基因,使用s tep B 14中的y = mx + b方程计算对数2 (分子)。
x =对数2 (分子)
y =在J列中计算的log 2 (RPKM)值
b =根据步骤B14中的公式得出的y截距
m =步骤B14中方程的斜率
在MPC Excel电子表格中创建新列K“ log 2 (分子)”列,该列执行以下计算:
x =(yb)/米

在MPC Excel电子表格中创建一个新列L列“分子”,其中列K中的值使用指数2x转换为分子。在Excel中表示法,这是通过在L列的方程设置为执行“ = 2 ^ column_K”(小号EE注35)。
在MPC Excel电子表格中创建最后一列-列M“ MPC”,其中列L中的值转换为每细胞分子:
分1微克的总RNA的用于制备RNA-SEQ库中的量,按重量RNA的每单元在计算出的小号此协议的“过程”部分的TEP A49。此值表示用于制备RNA-SEQ库(细胞的近似总数目小号EE注36)。
对于每个感兴趣的转录物,将L列中的分子数除以用于RNA-Seq的细胞总数,以估算每个细胞的转录物分子。
 

笔记

 

罗氏(Roche)最近购买了KAPA Biosystems,并终止了其小的反应号序列适配器套件。该SeqCap适配器套件是他们的替代产品,但要注意,这个套件时采用的是96-反应格式。用户可以以低于SeqCap套件标价的价格从商业寡核苷酸供应商批量购买自己的Illumina兼容适配器。然后,这些适配器需要由用户重新悬浮和退火。但是,批量购买适配器的优点是大大降低了每个反应的成本。因此,计划执行许多RNA-seq分析(或就此而言,其他任何* -seq分析)的用户会发现,购买和退火自己的适配器比购买预先等分的适配器更具成本效益。对于那些有兴趣购买和退火自己的适配器的人,我们在这里提供了说明。计划只进行少量RNA-seq分析的用户可能会发现,从以小包装出售试剂的制造商(例如NEB)购买他们的RNA-seq试剂盒和适配器会更具成本效益。
协议的此部分描述了如何计算目标细胞类型中每个细胞的平均RNA量。该协议是为培养的贴壁细胞设计的,但可以很容易地适应培养的悬浮细胞。通过额外的优化,它也可以适应感兴趣的组织。在后一种情况下,用户将需要一种方法来估计每单位感兴趣组织的细胞总数(例如,每毫克组织中存在多少个细胞?)。通过准确估计每个组织的细胞数,用户可以从已知的组织质量中重复进行RNA纯化。然后将RNA的重量产量除以用于获得RNA的细胞数量,以得出目标组织中每个平均细胞的RNA量估计值。
要清洁工作台,请用一小层RNase Zap喷涂该区域,然后用纸巾将溶液擦拭干净。用RNase Zap轻轻喷雾用于RNA制备的移液器,然后将其擦拭干净。使用未暴露于任何RNase来源的移液器吸头盒。RNase的常见来源来自质粒DNA制备试剂盒和人的皮肤/唾液。在给定的工作日,通过在进行任何质粒DNA制备(或任何其他涉及RNase的方案)之前始终进行RNA工作并始终戴着手套,以最大程度地减少RNase污染的机会。此外,尽管听起来有些苛刻,但是一旦开始认真进行RNA制备(步骤A26 ,程序部分),请避免在敞开的微量离心管附近说话,咳嗽,嚼口香糖等。我们还在移液器上使用隔离提示。通过采取这些简单的预防措施,您将有助于确保协议的成功。
TRIzol含有酚,对眼睛,皮肤和呼吸道具有腐蚀性。当工作与的TRIzol ,用户应佩戴安全护目镜,封闭趾鞋,白大褂在任何时候,注意不要溅到的TRIzol自己身体上的任何部分。如果用户对TRIzol的烟雾敏感,应在通风橱中进行工作。TRIzol需要按照适用于该机构的安全准则进行处置。
最近,Chujo及其同事发现,将TRIzol悬浮液在55°C下孵育10分钟时,可以以更高的效率回收某些核保留的lncRNA (Chujo和Hirose ,2017年)。在我们先前对lncRNA细胞内丰度的研究中(Schertzer et al。,2019),我们没有执行此55°C的孵育步骤。但是,我们认为55°C的培养没有任何不利之处,并且将来可能会进行。
在开始旋转之前,将每个微量离心管的棘爪对齐,以使铰链都朝外。对准棘刺将确保每个试管中的RNA /丙烯酰胺沉淀位于试管铰链的正下方。知道试管中期望沉淀的位置有助于避免协议中的某些猜测,尤其是当您使用少量RNA时。
如果通过移液进行混合,请注意,RNA沉淀有时会粘在移液管尖端的内部。如果发生这种情况,请继续上下吹打水,直到沉淀明显溶解为止。
经历了不同程度降解的RNA将在凝胶上显示为涂片。
使用此程序,我们对小鼠胚胎干细胞和滋养层干细胞的每细胞RNA估计分别达到20皮克和30皮克(Calabrese等人,2007; Schertzer等人,2019)。
为了估计细胞内的丰度,我们建议使用RNA-Seq文库制备方案,该方案通过rRNA缺失而不是polyA选择从rRNA中纯化基因转录本。在一项前瞻性研究中,我们使用rRNA耗竭和polyA选择来估计三个感兴趣的lncRNA(Xist ,Airn和Kcnq1ot1 ),从而估计了细胞内的丰度,并发现这两种方案得出的估计值截然不同(未显示)。我们对这些先导数据的解释是,在polyA选择方案中,许多因素可能导致捕获效率因不同的聚腺苷酸化RNA而变化。聚腺苷酸化程度,polyA尾巴的长度,RNA碱基配对干扰polyA捕获的程度以及内部富A序列的数量的变化可能导致某些聚腺苷酸化的转录物比其他捕获效率更高。这些变化会使细胞内丰度的估计难以预测。相反,polyA捕获效率的变化与rRNA消耗协议无关,后者使用与主要rRNA种类互补的寡核苷酸从总RNA制剂中消耗rRNA。因此,通过rRNA消耗RNA-Seq比通过polyA-选择RNA-seq可以更准确地测量细胞内丰度。话虽这么说,与rRNA内部具有同源性的转录本会通过rRNA的去除而被选择性地去除,并且在此协议下需要进行特殊考虑。
为了制备用于RNA-Seq的样品,我们的实验室通常使用来自Kapa Biosciences / Roche的带有RiboErase的RNA HyperPrep试剂盒。带有RiboErase的KAPA RNA HyperPrep试剂盒中的说明很明确,并向用户深入介绍了该协议的每个步骤,该步骤从rRNA的降解开始,然后是DNase处理,RNA片段化和引物,cDNA合成,第二链cDNA合成和A尾,衔接子连接,最后是文库扩增。所有这些步骤之后,都是使用试剂盒提供的聚苯乙烯-磁铁矿珠进行纯化和缓冲液交换。在我们的实验室中,该工具包对于多个用户跨多年收集不同数据集的功能非常强大。但是,许多其他公司也出售高质量的试剂盒来制备核糖耗尽的RNA-seq文库,包括Illumina和NEB。一般来说,这些套件应具有同等的性能。对于仅计划执行少量RNA序列实验的用户,NEB之类的公司可能更可取,因为他们出售的试剂盒尺寸小于KAPA。
KAPA RNA HyperPrep协议是Parkhomchuk等人(1988年)描述的dUTP第二链协议的修改版本。(2009 )和Levin等。(2010 ),其中通过在存在dUTP的情况下进行第二链cDNA合成来维持RNA-seq文库的链式性,然后使用经工程改造可优先扩增含脱氧胸苷的DNA的DNA聚合酶扩增cDNA文库。而不是脱氧尿苷。当使用此试剂盒或任何其他使用基于dUTP的方法的试剂盒时,研究人员应注意,扩增文库的链状度并不理想。改造后的DNA聚合酶在低频下仍会扩增含脱氧尿苷的第二链。此低频第二链扩增的结果,对于高表达基因,一个RNA-SEQ信号的“影子”往往是上是相反的(股线可见即,反义)的基因的正确链。实际上,该阴影信号从未影响过我们的下游分析,但用户应该意识到它的存在。
在该协议的第一次调试过程中经常犯错误是很常见的。我们建议初次使用的用户首先使用下面的整个协议,使用一个或两个生物材料不受限制的样本。这样,用户可以计算出图书馆准备工作的后勤工作,而无需第一次使用该协议即可工作。
为了获得可靠的生物学复制,我们建议使用在不同日期或不同动物等制备的RNA 。
为了最大程度地减少移液误差,建议移液体积为2μl或更大。例如,代替吹打1微升ERCC的穗在溶液中,以99微升不含RNA酶的水,我们将吸液管2微升穗在溶液中,以198的微升不含RNA酶的水。
带有RiboErase的KAPA RNA HyperPrep试剂盒; 货号:KK8560; 技术数据表版本KR1351-v2.17;相同的数据表也包含在我们的github页面中。
仅使用文库一半的原因是,如果用户由于PCR设置错误或cDNA文库扩增过度或扩增不足而需要重复进行最终的PCR反应,它将保留cDNA材料。
PCR扩增后,清理,及ELU后在20和灰微升如在技术资料中所缓冲液,DNA的每个样品的量应在7〜- 150纳克/微升。超出此范围的DNA浓度是可以接受的,但是在极少数情况下,我们自己的cDNA文库的浓度超出此范围,我们选择使用调整后的PCR循环数来重复最终的PCR扩增,提交原始扩增的RNA-seq文库。原因是用户正在尝试确保最终的PCR保持在线性扩增范围内。扩增至150 ng / μl的浓度可能接近KAPA试剂盒线性范围的顶部。同样,最终文库浓度< 7 ng / μl也可以接受,但是在这种情况下,我们选择使用更多的循环数重复进行最终PCR,而不是将低浓度文库提交测序。最常见地,我们的扩增的文库的最终浓度是10之间- 50纳克/微升; 这是我们的最佳目标范围。
使用用于cDNA文库制备条件上述指定用户应该期望扩增的DNA片段的平均大小在每个库为300之间-的核苷酸400。
下面是数字的例子:在计算出的浓度小号TEP乙5 (Prodedure部分)为25纳克/微升。在估计平均文库大小小号TEP乙8 (Prodedure部分)是300个碱基对。库的摩尔浓度为25 *(1 x 10 -9 )/(1 x 10 -6 )/(300 * 650)= 128 nM。
UNC的高通量测序设施要求用户以15 nM的最终浓度向他们提交其合并的cDNA文库。因此,举例来说,如果我们希望在一个库中包含12个独立的cDNA文库,则意味着每个文库在库中的最终浓度应为1.25 nM或[15 nM / 12]。创建合适浓度的cDNA文库的一种简单方法是,首先以该库的最终浓度为每个文库创建独立的等分试样-在本示例中,该浓度为15 nM。然后,每个库的相等容量可以合并以创建15 nM池。
为了确定库中可以包含的最大文库数量,以使每个文库仍能测序到适当的深度,请首先确定期望从Illumina测序仪器上运行返回的测序读段数。例如,在UNC,使用NextSeq500仪器进行的平均75周期高输出运行将返回5亿次读取。为了在一个库中每个cDNA文库获得至少2000万个读数,我们应在该库中包括的最大库数为500/20,即25个库。实际上,我们通常包括的库少于此最大数量,这导致每个库的读取深度大于2000万次读取。每个样品的高读取深度从来都不是问题。此外,从单个库读取的少于2000万个读本也可能是可容忍的;请注意,随着读取次数的减少,您自信地量化大量低表达转录本的能力也会随之降低。
从Illumina NextSeq500仪器获得75个碱基的单端读数后,我们进行了数据分析。较短,较长或什至配对末端的读取也将是合适的。
请参阅与此协议相关联的github页面,以获取Excel和R中的示例文件和分析模板(https://github.com/mschertzer/ercc_analysis)。
对于RNA-Seq的比对,我们一般采用STAR,但请注意,其他矫正器的支持跳空的对齐应该等效执行(Baruzzo等人,2017年;博士,2010; Dobin等。人,2013年)。
如果遵循github页上提供的示例,则可以使用SRA工具包在NCBI Sequence Read Archive (Leinonen等人,2011)中下载与记录SRR7685881相关的fastq文件。
在此协议的示例中,我们使用由UCSC Genome Browser (Haeussler et al。,2019)编译并从Illumina的iGenomes网站下载的mm9基因组.fa和genes.gtf文件:https://support.illumina.com/ sequence / sequencing_software / igenome.html。
用户还可以在与此协议相关的github页面上找到这些文件(https://github.com/mschertzer/ercc_analysis)。
featureCounts的“ -s 2”选项特定于使用生成“反向链”数据的方法(例如,本协议中所述的带有RiboErase的KAPA RNA HyperPrep试剂盒)制备的库。
步骤乙2通过乙7 (数据分析)b elow已经在被包括在关于“ERCC_analysis_template.xlsx”模板“ERCC混合在”电子表格进行github上与此协议(相关联的页的https:// github上。 com / mschertzer / ercc_analysis)。
在Schertzer等人中。(2019) ,我们添加2微升的1:100稀释ERCC混合物1穗在RNA的至1μg总RNA。因此,按照我们的示例,将D列中的ERCC Spike-In Mix浓度除以100,然后乘以2以计算作为E列添加的阿托姆数。
在github页面上提供的模板“ ERCC_analysis_template.xlsx”中,第二个电子表格称为“ fCounts数据”。
排序后,每个电子表格中每个ERCC成绩单出现的顺序应相同。
例如,使用Schertzer等人的RNA-seq数据集对整个转录组进行MPC计算的例子。(2019),请参阅与此协议相关的github页面上的“ ERCC vprtta analysis example.xlsx”文件中的MPC电子表格(https://github.com/mschertzer/ercc_analysis)。
计算结果是对用于制备RNA-Seq cDNA文库(在我们的情况下为1μg总RNA)的总RNA池中存在的RNA分子数量的估计。
在Schertzer等人中。(2019),我们估计我们的滋养层干细胞系每个细胞可容纳30皮克。因此,1μgRNA对应于33,333个细胞。


致谢

 

我们感谢Keean Braceros对该协议进行了校对,并感谢Jackson Trotman对于图2中使用的凝胶图像。他的工作得到了美国国立卫生研究院(NIH)Grant GM121806的支持。Schertzer等。(2019)是该协议的原始论文。

 

利益争夺

 

作者没有竞争利益要声明。

 

参考文献

 

Baruzzo,G.,Hayer,KE,Kim,EJ,Di Camillo,B.,FitzGerald,GA和Grant,GR(2017)。基于仿真的RNA序列比对器综合基准测试。Nat Methods 14(2):135-139。              
Bushnell,B。(2010)。BBMap(sourceforge.net/projects/bbmap/)。
明尼苏达州的卡比利(Cabili),密西西比州的杜纳金(Dunagin),MC,麦克拉纳汉(PD),比亚伊施(Biaesch),A。帕多万·梅哈(Padovan-Merhar),雷吉夫(Regev),A。里恩(Rinn)和拉吉(A. 在单细胞和单分子分辨率下对人lncRNA的定位和丰度分析。Genome Biol 16:20 。              
明尼苏达州的卡比利(Cabili,MN),特拉普尔(Trapnell),C。,戈夫(Goff),L。人类大型基因间非编码RNA的整合注释揭示了全局特性和特定的亚类。基因开发25(18):1915-1927。              
Calabrese,JM,Seila,AC,Yeo,GW和Sharp,PA(2007)。RNA序列分析定义了Dicer在小鼠胚胎干细胞中的作用。美国国家科学院院刊104(46):18097-18102。              
Calabrese,JM,Sun,W.,Song,L.,Mugford,JW,Williams,L.,Yee,D.,Starmer,J.,Mieczkowski,P.,Crawford,GE和Magnuson,T.(2012)。作为X灭活机制的调节元件的位点特异性沉默。单元格151(5):951-963。
Cerase,A.,Smeets,D.,Tang,YA,Gdula,M.,Kraus,F.,Spivakov,M.,Moindrot,B.,Leleu,M.,Tattermusch,A.,Demmerle,J.,Nesterova ,TB,Green,C.,Otte,AP,Schermelleh,L.和Brockdorff,N.(2014年)。Xist RNA和聚梳蛋白的空间分离由超分辨率显微镜显示。美国国家科学院院刊111(6):2235-2240。
Chen,J.,Shishkin,AA,Zhu,X.,Kadri,S.,Maza,I.,Guttman,M.,Hanna,JH,Regev,A.和Garber,M.(2016)。跨哺乳动物的进化分析揭示了不同种类的长非编码RNA。Genome Biol 17:19 。              
Chujo,T.和Hirose,T.(2017年)。建立在较长的非编码架构上的核体:其构造和功能的统一原理。摩尔细胞40(12):889-896。
Dobin,A.,Davis,CA,Schlesinger,F.,Drenkow,J.,Zaleski,C.,Jha,S.,Batut,P.,Chaisson,M. and Gingeras,TR(2013)。STAR:超快速通用RNA-seq对准仪。生物信息学29(1):15-21。              
Gil,N.和Ulitsky,I.(2020年)。通过顺式作用的长非编码RNA调控基因表达。Nat Rev Genet 21(2):102-117。              
Haeussler,M.,Zweig,AS,Tyner,C.,Speir,ML,Rosenbloom,KR,Raney,BJ,Lee,CM,Lee,BT,Hinrichs,AS,Gonzalez,JN,Gibson,D.,Diekhans,M 。,Clawson,H.,Casper,J.,Barber,GP,Haussler,D.,Kuhn,RM和Kent,WJ(2019)。UCSC基因组浏览器数据库:2019年更新。核酸研究47(D1):D853-D858。              
Hezroni,H.,Koppstein,D.,Schwartz,MG,Avrutin,A.,Bartel,DP和Ulitsky,I.(2015)。从17种物种的转录组直接比较中得出长期非编码RNA进化的原理。细胞代表11(7):1110-1122。
Hrdlickova,R.,Toloue,M.和Tian,B.(2017年)。用于转录组分析的RNA-Seq方法。Wiley Interdiscip Rev RNA 8(1)。
Jiang L.,Schlesinger,F.,Davis,CA,Zhang,Y.,Li,R.,Salit,M.,Gingeras,TR and Oliver,B.(2011)。RNA-seq实验的合成加标标准品。Genome Res 21(9):1543-1551。
Kopp,F.和Mendell,JT(2018)。长非编码RNA的功能分类和实验解剖。细胞172(3):393-407。              
Kutter,C.,Watt,S.,Stefflova,K.,Wilson,MD,Goncalves,A.,Ponting,CP,Odom,DT和Marques,AC(2012)。长的非编码RNA的快速周转和基因表达的演变。PLoS Genet 8(7):e1002841。              
Leinonen,R.,Sugawara,H.,Shumway,M.和国际核苷酸序列数据库,C.(2011)。序列读取存档。Nucleic Acids Res 39(数据库问题):D19-21。
Levin,JZ,Yassour,M.,Adiconis,X.,Nusbaum,C.,Thompson,DA,Friedman,N.,Gnirke,A.和Regev,A.(2010)。链特异性RNA测序方法的全面比较分析。Nat Methods 7(9):709-715。
Li H.,Handsaker,B.,Wysoker,A.,Fennell,T.,Ruan,J.,Homer,N.,Marth,G.,Abecasis,G.,Durbin,R. and Genome Project Data Processing, S.(2009)。序列比对/图谱格式和SAMtools。生物信息学25(16):2078-2079。
廖Y,史密斯,GK和施威(2014)。featureCounts:一种高效的通用程序,用于将序列读取分配给基因组特征。生物信息学30(7):923-930。              
Necsulea,A.,Soumillon,M.,Warnefors,M.,Liechti,A.,Daish,T.,Zeller,U.,Baker,JC,Grutzner,F.和Kaessmann,H.(2014)。在四足动物中,lncRNA谱的演变和表达模式。自然505(7485):635-640。
Ninomiya,K.和Hirose,T.(2020年)。核体内短串联重复富集的建筑RNA:功能和相关疾病。非编码RNA 6(1)。
Parkhomchuk,D.,Borodina,T.,Amstislavskiy,V.,Banaru,M.,Hallen,L.,Krobitsch,S.,Lehrach,H.和Soldatov,A.(2009)。通过互补DNA的链特异性测序进行转录组分析。Nucleic Acids Res 37(18):e123。
Raj,A.和Rinn,JL(2019)。用RNA成像照亮基因组暗物质。冷泉阁Perspect Biol 11(5)。
RStudio_Team 。(2015)。RStudio:R的集成开发。
Schertzer,MD,Braceros,KCA,Starmer,J.,Cherney,RE,Lee,DM,Salazar,G.,Justice,M.,Bischoff,SR,Cowley,DO,Ariel,P.,Zylka,MJ,Dowen, JM,T.Magnuson和JM的Calabrese(2019)。lncRNA诱导的由基因组结构,RNA丰度和CpG岛DNA控制的多梳菌传播。摩尔电池75(3):523-537 e510。              
Schuler,A.,Ghanbarian,AT和Hurst,LD(2014)。纯化对剪接相关基序的选择,而不是表达水平或RNA折叠,可以解释几乎所有对人lincRNA的限制。Mol Biol Evol 31(12):3164-3183。              
Schwaber,J.,Andersen,S.和Nielsen,L.(2019年)。亮点:逆转录效率标准在数据解释中的重要性。Biomol Detect Quantif 17:100077。
Smeets,D.,Markaki,Y.,Schmid,VJ,Kraus,F.,Tattermusch,A.,Cerase,A.,Sterr,M.,Fiedler,S.,Demmerle,J.,Popken,J.,Leonhardt H.,Brockdorff,N.,Cremer,T.,Schermelleh,L.和Cremer,M.(2014)。非活性X染色体区域的三维超分辨率显微镜显示了其活跃核区的崩溃,该核区具有独特的Xist RNA病灶。表观遗传染色质7:8。
Sunwoo H.,Wu JJ和Lee JT(2015)。Xist RNA-PRC2复合物在20 nm的分辨率下显示出较低的Xist化学计量,并暗示了小鼠细胞中的运行机制。美国国家科学院院刊112(31):E4216-4225。              
Tsanov,N.,Samacoits,A.,Chouaib,R.,Traboulsi,AM,Gostan,T.,Weber,C.,Zimmer,C.,Zibara,K.,Walter,T.,Peter,M.,Bertrand ,E.和Mueller,F.(2016)。smiFISH和FISH-quant-具有超分辨率功能的灵活的单RNA检测方法。Nucleic Acids Res 44(22):e165。
Ulitsky,I.(2016年)。进化以营救:使用比较基因组学来了解长的非编码RNA。Nat Rev Genet 17(10):601-614。              
Washietl,S.,Kellis,M.和Garber,M.(2014)。在六个哺乳动物中人类长非编码RNA的进化动力学和组织特异性。Genome Res 24(4):616-628。
登录/注册账号可免费阅读全文
  • English
  • 中文翻译
免责声明 × 为了向广大用户提供经翻译的内容,www.bio-protocol.org 采用人工翻译与计算机翻译结合的技术翻译了本文章。基于计算机的翻译质量再高,也不及 100% 的人工翻译的质量。为此,我们始终建议用户参考原始英文版本。 Bio-protocol., LLC对翻译版本的准确性不承担任何责任。
Copyright: © 2020 The Authors; exclusive licensee Bio-protocol LLC.
引用:Schertzer, M. D., Murvin, M. M. and Calabrese, J. M. (2020). Using RNA Sequencing and Spike-in RNAs to Measure Intracellular Abundance of lncRNAs and mRNAs. Bio-protocol 10(19): e3772. DOI: 10.21769/BioProtoc.3772.
提问与回复
提交问题/评论即表示您同意遵守我们的服务条款。如果您发现恶意或不符合我们的条款的言论,请联系我们:eb@bio-protocol.org。

如果您对本实验方案有任何疑问/意见, 强烈建议您发布在此处。我们将邀请本文作者以及部分用户回答您的问题/意见。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片的形式来说明遇到的问题。

如果您对本实验方案有任何疑问/意见, 强烈建议您发布在此处。我们将邀请本文作者以及部分用户回答您的问题/意见。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片的形式来说明遇到的问题。