参见作者原研究论文

本实验方案简略版
Jun 2020

本文章节


 

EmPC-seq: Accurate RNA-sequencing and Bioinformatics Platform to Map RNA Polymerases and Remove Background Error
EmPC-seq: 使用精确的RNA测序和生物信息学平台绘制RNA聚合酶并消除背景错误   

引用 收藏 提问与回复 分享您的反馈 Cited by

Abstract

Transcription errors can substantially affect metabolic processes in organisms by altering the epigenome and causing misincorporations in mRNA, which is translated into aberrant mutant proteins. Moreover, within eukaryotic genomes there are specific Transcription Error-Enriched genomic Loci (TEELs) which are transcribed by RNA polymerases with significantly higher error rates and hypothesized to have implications in cancer, aging, and diseases such as Down syndrome and Alzheimer’s. Therefore, research into transcription errors is of growing importance within the field of genetics. Nevertheless, methodological barriers limit the progress in accurately identifying transcription errors. Pro-Seq and NET-Seq can purify nascent RNA and map RNA polymerases along the genome but cannot be used to identify transcriptional mutations. Here we present background Error Model-coupled Precision nuclear run-on Circular-sequencing (EmPC-seq), a method combining a nuclear run-on assay and circular sequencing with a background error model to precisely detect nascent transcription errors and effectively discern TEELs within the genome.

Keywords: Transcriptional mutagenesis (转录诱变), RNA polymerase (核糖核酸聚合酶), Nascent RNA (新生RNA), Deep RNA sequencing (RNA深度测序), Accurate RNA sequencing (精确的RNA测序)

Background

Transcriptional errors due to ribonucleotide misincorporation are ubiquitous to all living organisms (Carey, 2015). Given that each messenger RNA (mRNA) can be translated 2-4 thousand times (Schwanhausser et al., 2011) and many special RNAs are expressed only once per cell at a given time (Islam et al., 2011; Pelechano et al., 2010), even a single transcription error at a critical residue can make large differences in a specific protein’s expression. In addition, transcriptional errors can accelerate protein aggregation leading to age-related diseases in humans (van Leeuwen et al., 1998). While transcription errors are conventionally held to have a random distribution across the genome, there is evidence indicating that transcription errors could be enriched at certain structural motifs and specific genomic regions (Imashimizu et al., 2013; van Leeuwen et al., 1998). These Transcription Error-Enriched genomic Loci (TEELs) have notable biological significance in various diseases such as Down syndrome and Alzheimer’s and are gaining attention in genetics research (Burns et al., 2010; Saxowsky et al., 2008). Unfortunately, there are major challenges that must be circumvented for the study of transcriptional error due to RNA polymerase unconfounded by RNA-editing processes such as those from post-transcriptional modifications. This requires purification of nascent RNA coupled with a highly accurate RNA sequencing method that can identify TEELs and elucidate transcriptional regulation and dysregulation contributing to transcriptional errors with implications to diseases.


There are several complications which impede the accurate detection of de novo transcription errors. The first challenge is eliminating the noise from post-transcriptional modifications, which requires the purification of nascent RNA freshly made by RNA polymerases. Hence, current RNA sequencing (RNA-seq) studies on transcriptional errors often overlook this requirement and therefore overestimate transcription error rates. The second challenge is rectifying the systematic noise from Next Generation Sequencing (NGS). NGS on average misreads approximately one base in every 1,000 (Minoche et al., 2011), and this is further compounded by the fact that reverse transcriptase (required for generating cDNA for NGS) misincorporates one base in every 10,000 (Ji and Loeb, 1992). The third challenge is computationally discerning TEELs from background noise. Even with accurate sequencing data, it is still difficult to computationally identify TEELs amongst background errors which are stochastically introduced by RNA polymerases (de Mercoyrol et al., 1992). Here, we present our background Error Model-coupled Precision nuclear run-on Circular-sequencing (EmPC-seq) method (Figure 1) to overcome these three main challenges. EmPC-seq consists of three core components: (1) a nuclear run-on assay to capture nascent RNA before post-transcriptional modifications (Mahat et al., 2016), (2) a circular-resequencing step that generates cDNA via rolling-cycle reverse transcription of circularized nascent RNA molecules (Acevedo and Andino, 2014) to improve sequencing accuracy by generating tandem cDNA repeats of the same circularized RNA molecule by rolling circle amplification so that the RNA molecule can be sequenced multiple times. (3) We also developed a background error model algorithmic analysis to remove stochastic background noise by simulating de novo sequencing data and subsequent error to serve as a control group carrying background alterations from sequencing noise, non-uniform sequencing depth, and alignment artifacts (Cheung et al., 2020). EmPC-seq aims to detect nascent transcriptional errors and elucidate their origins that may have implications to diseases.



Figure 1. Schematic of EmPC-seq. Real transcription errors are represented using orange dots. Dots in other colors represent systematic noise, including enzymatic errors and sequencing errors. (Step 1) Yeast cell is permeabilized. (Step 2) In vivo transcription is halted by adding all 4 kinds of biotinylated NTPs during the Nuclear Run-on assay. (Step 3) Yeast total RNA is extracted and purified via ethanol precipitation. (Step 4) RNA is fragmented with base hydrolysis into short (60-100nt) RNAs. (Step 5) Biotin-labeled nascent RNA is enriched through Streptavidin bead purification. (Step 6) Re-purified nascent RNAs are circularized by RNA ligase and processed into tandem copy cDNAs through rolling circle reverse transcription. (Step 7) Library DNA is prepared with a kit and then submitted for Next Generation Sequencing. (Step 8) Transcription errors are accurately detected by combining consensus sequence results with our background error model. This schematic is adapted from Cheung et al. (2020).


Materials and Reagents

  1. 1.5 ml tubes

  2. Pipette tips

  3. Cuvettes

  4. 0.22 μm filter

  5. W303 yeast cells (GenBank Number: JRIU00000000)

  6. 1 M Dithiothreitol (DTT, ThermoFisher, catalog number: P2325)

  7. Diethyl pyrocarbonate (Sigma, catalog number: 40718)

  8. UltraPureTM DNase/RNase-Free Distilled Water (ThermoFisher, catalog number: 10977015)

  9. Yeast Extract (Sigma, catalog number: Y1625)

  10. Peptone (Sigma, catalog number: P0556)

  11. D-(+)-Glucose (Sigma, catalog number: G8270)

  12. Adenine (Sigma, catalog number: A8626)

  13. N-Lauroylsarcosine sodium salt, Sarkosyl (Sigma, catalog number: L9150)

  14. Trizma® hydrochloride (Sigma, catalog number: T5941)

  15. Potassium chloride (Sigma, catalog number: P9333)

  16. Magnesium chloride (Sigma, catalog number: M8266)

  17. Biotinylated Nucleotides (Jena Bioscience, catalog number: NU series)

  18. RNase Inhibitor, Murine (NEB, catalog number: M0314L)

  19. Diethyl pyrocarbonate, DEPC (Sigma, catalog number: 40718)

  20. Liquified Phenol (Sigma, catalog number: P9346)

  21. Sodium acetate (Sigma, catalog number: S2889)

  22. Ethylenediaminetetraacetic acid, EDTA (Sigma, catalog number: EDS)

  23. Sodium dodecyl sulfate (Sigma, catalog number: L3771)

  24. Chloroform (Sigma, catalog number: C2432)

  25. GlycoBlueTM Coprecipitant (ThermoFisher, catalog number: AM9515)

  26. Ethyl alcohol, Pure (Sigma, catalog number: E7023)

  27. Sodium hydroxide (Sigma, catalog number: S8045)

  28. TritonTM X-100 (Sigma, catalog number: X100)

  29. Monarch® RNA Cleanup Kit (NEB, catalog number: T2030L)

  30. DynabeadsTM M-280 Streptavidin (ThermoFisher, catalog number: 60210)

  31. Sodium chloride (Sigma, catalog number: S7653)

  32. TRIzolTM Reagent (ThermoFisher, catalog number: 15596018)

  33. AmbionTM T4 RNA Ligase (ThermoFisher, catalog number: AM2141)

  34. T4 Polynucleotide Kinase (NEB, catalog number: M0201S)

  35. Polyethylene glycol 8000 (Sigma, catalog number: 1546605)

  36. Adenosine 5'-Triphosphate, ATP (NEB, catalog number: P0756S)

  37. dNTP Mix (ThermoFisher, catalog number: 18427088)

  38. Random Hexamer Primer (ThermoFisher, catalog number: SO142)

  39. SuperScriptTM III First-Strand Synthesis System (ThermoFisher, catalog number: 18080051)

  40. NEBNext® UltraTM II Directional RNA Second Strand Synthesis Module (NEB, catalog number: E7550L)

  41. MinElute PCR Purification Kit Print (QIAGEN, catalog number: 28004)

  42. NEBNext® UltraTM II DNA Library Prep Kit for Illumina® (NEB, catalog number: E7645L)

  43. QubitTM dsDNA HS Assay Kit (ThermoFisher, catalog number: Q32851)

  44. YEPD medium (see Recipes)

  45. 2.5× Transcription buffer (see Recipes)

  46. AES Buffer (see Recipes)

  47. Beads washing buffer (see Recipes)

  48. Binding washing buffer (see Recipes)

  49. Low Salt washing buffer (see Recipes)

  50. High Salt washing buffer (see Recipes)

  51. DEPC-H2O (see Recipes)

  52. 1 M sodium acetate solution (see Recipes)

Equipment

  1. Eppendorf® Research® Plus Pipettes (Eppendorf, catalog number: EP series)

  2. MaxQTM 6000 Incubated/Refrigerated Stackable Shakers (ThermoFisher, catalog number: SHKE6000)

  3. Eppendorf BioPhotometer® (Eppendorf, model: D30)

  4. Megafuge® (Heraeus, model: 1.0R)

  5. 5 Liter General Purpose Water Bath (PolyScience, catalog Number: WBE05A11B)

  6. NEBNext® Magnetic Separation Rack (NEB, catalog number: S1515S)

  7. Roto-Shake Genie® (Zymo Research, catalog number: S5008)

  8. ProFlex PCR System (ThermoFisher, catalog number: 4484075)

  9. 5200 Fragment Analyzer System (Agilent, catalog number: M5310AA)

Software

  1. ProSize (Agilent, https://explore.agilent.com/Software-Download-Fragment-Analyzer-Prosize)

  2. Python (version 2.7.12, https://www.python.org/)

  3. Cython (version 0.23.4, https://cython.org/)

  4. NumPy (version 1.11.0, https://numpy.org/)

  5. SciPy (version 0.17.0, https://www.scipy.org/)

  6. Burrows-Wheeler Aligner (version 0.7.17-r1188, http://bio-bwa.sourceforge.net/)

  7. samtools (version 1.9, http://www.htslib.org/)

  8. pysam (version 0.15.0, https://pysam.readthedocs.io/en/latest/installation.html)

  9. matplotlib (version 2.2.2, https://matplotlib.org/)

Procedure

  1. Prepare yeast cells for nuclear run-on assay

    1. Inoculate a single yeast colony in 5 ml YEPD medium.

    2. Incubate overnight in a 30 °C incubator with the rotation speed set as 200 rpm.

    3. Measure the optical density (OD600nm) of each cell culture.

    4. Dilute and re-inoculate the yeast to 10 ml of YEPD medium with an OD600 of 0.2.

    5. Incubate the yeast cells in a 30 °C incubator until they reach an OD600 of 0.4-0.6.


  2. Yeast nuclear run-on assay

    1. Centrifuge the yeast cells at 2,000 × g for 5 min at 4 °C and draw off the supernatant.

    2. Wash the cell pellet with 5 ml of ice-cold DEPC-H2O by pipetting up and down.

    3. Re-centrifuge the yeast cells at 2,000 × g for 5 min at 4 °C and draw off the supernatant.

    4. Resuspend the cell pellet with 4.75 ml of ice-cold DEPC-H2O.

    5. Add 250 μl of 10% Sarkosyl solution and gently mix the solution.

    6. Hold on ice for 20 min.

    7. Centrifuge the yeast cells at 400 × g for 5 min at 4 °C and draw off the supernatant.

    8. Resuspend the cell pellet with 120 μl of 2.5× Transcription Buffer, 6 μl of 0.1 M DTT, 3.5 μl of each kind of 1 mM biotin-NTP, and 3 μl of RNase Inhibitor to a total volume as 143 μl.

    9. Add 142 μl DEPC-H2O and 15 μl of 10% Sarkosyl and gently mix the solution.

    10. Incubate the mixture at 30 °C for 5 min with gentle pipetting up and down at the half-time point (2.5 min).


  3. Nascent RNA extraction

    1. Centrifuge the reaction mixture at 400 × g for 5 min at 4 °C and draw off the supernatant.

    2. Quickly resuspend the pellet in 500 μl of Liquified phenol.

    3. Add 500 μl of AES Buffer and pipette up and down to rupture cells.

    4. Incubate the mixture at 65 °C for 5 min with vortex once every minute.

    5. Incubate the mixture on ice for 5 min.

    6. Add 200 μl of chloroform to it and vortex for 30 s.

    7. Incubate the mixture at room temperature for 2 min.

    8. Centrifuge it at 14,000 × g for 5 min at 4 °C.

    9. Draw off the aqueous layer and put it into another new tubes (be careful to avoid the interface).

    10. Add 1 M sodium acetate solution to the aqueous layer to a final concentration of 200 mM.

    11. Split the aqueous RNA solution mixed with sodium acetate into two 1.5 ml tubes.

    12. Add 1 μl of GlycoBlueTM Coprecipitant and 3× volume of 100% ethanol into the two 1.5 ml tube containing aqueous layer in Step C11 (this mixture can be stored overnight).

    13. Centrifuge the mixture at 14,000 × g for 30 min at 4 °C and draw off the supernatant (be careful to avoid disturbing the blue pellet).

    14. Wash the pellet with freshly prepared RNase-free 75% ethanol.

    15. Centrifuge it at 14,000 × g for 5 min at 4 °C and draw off the supernatant.

    16. Let the pellet dry for 5 min and resuspend it in 20 μl of DEPC-H2O.


  4. RNA fragmentation by Base Hydrolysis

    1. Heat denature the RNA solution at 65 °C for 40 s and place it on ice.

    2. Add 5μl of ice cold 1 N NaOH and incubate it on ice for 10 min.

    3. Add 25 μl of 1 M Tris-HCl (pH 6.8).

    4. Purify the RNA with Monarch® RNA Cleanup Kit (10 μg) with elution volume of 20 μl.

    5. Add 1 μl RNase inhibitor.


  5. Biotin-labeled nascent RNA pull-out

    1. Wash 30 μl of Streptavidin M280 beads with 500 μl of beads washing buffer by adding and mixing the washing buffer with the beads, setting on a magnet for 1 min and drawing off the supernatant.

    2. Wash the beads twice with 500 μl of 100 mM NaCl solution, the operation is the same as Step E1.

    3. Resuspend the beads in 50 μl of binding washing buffer.

    4. Mix the purified RNA in section D to 50 μl with binding washing buffer.

    5. Mix the RNA solution with the resuspended beads.

    6. Incubate the mixture at room temperature on a rotator for 20 min.

    7. Place the mixture on a magnet for 1 min and draw off the supernatant.

    8. Resuspend the beads in 500 μl of high salt washing buffer.

    9. Place the beads on a magnet for 1 min and draw off the supernatant.

    10. Repeat Steps E8-E10.

    11. Wash the beads twice with 500 μl of binding washing buffer.

    12. Wash the beads twice with 500 μl of low salt washing buffer.

    13. Resuspend the beads in 300 μl of TRIzolTM solution.

    14. Incubate the mixture on ice for 3 min.

    15. Add 60 μl of chloroform to the mixture and vortex it thoroughly for at least 20 s.

    16. Centrifuge the mixture at 20,000 × g for 5 min at 4 °C.

    17. Purify the RNA in the aqueous layer (be careful to avoid the interface) with Monarch® RNA Cleanup Kit (10 μg) with an elution volume of 20 μl.


  6. Cyclization of RNA sample

    1. Heat denature the purified RNA at 65 °C for 1 min and place it on ice.

    2. Add the following reagents to 19 μl of the RNA solution: 4 μl of 10× T4 ligase I reaction buffer, 2 μl of T4 Ligase I enzyme, 2 μl of PNK enzyme, 1 μl of RNase inhibitor, 8 μl of 50% PEG8000 and 4 μl of 10 mM ATP solution.

    3. Incubate the mixture at 25 °C for 2 h or at 16 °C overnight.

    4. Purify the RNA in the mixture with Monarch® RNA Cleanup Kit (10 μg) with elution volume of 20 μl.


  7. Rolling-cycle reverse transcription

    1. Add the following reagents to 9 μl of the RNA solution: 4 μl of 10mM dNTPs solution, 4 μl of 50 ng/μl Random Hexamers and 3 μl of RNase-free water.

    2. Heat denature the reaction mix at 65 °C for 1 min and place it on ice for more than 2 min.

    3. Add the following materials into the reaction mix: 8 μl of 5× First-strand synthesis buffer, 4 μl of 0.1 mM DTT solution, 4 μl of SuperScript III enzyme and 8μl of water

      Note: The buffers and enzymes used here are from SuperScriptTM III First-Strand Synthesis System Kit.

    4. Incubate the mixture at 25 °C for 10 min and then incubate at 42 °C for 20 min.

    5. Purify the RNA in the mixture with Monarch® RNA Cleanup Kit (10 μg) with an elution volume of 20 μl.

    6. Add 20 μl of RNase-free water to the RNA solution.


  8. Second strand synthesis

    1. Add 38 μl of RNA solution with the following materials: 8 μl of 10× Second strand synthesis buffer, 4 μl of Second strand synthesis enzyme mix and 30 μl of RNase-free water

      Note: The buffers and enzymes used here are from NEBNext® UltraTM II Directional RNA Second Strand Synthesis Module Kit.

    2. Incubate the reaction mix at 16 °C for 2 h with the thermocycler lid set at 50 °C.

    3. Purify the DNA in the mixture with the MinElute PCR Purification Kit.


  9. Library preparation and submitting for NGS

    1. Measure the concentration of cDNA in step H3 by using QubitTM dsDNA HS Assay Kit.

    2. Prepare the sequencing library with the NEBNext® UltraTM II DNA Library Prep Kit for Illumina® according to the manufacturer’s instructions by end preparation, adaptor ligation, size selection and PCR enrichment.

    3. Submit the cDNA library to MiSeq Illumina platform with read length as single-end 300 base pairs for performing Next Generation Sequencing.

Data analysis

  1. Use Ubuntu 16.04 LTS (Xenial Xerus) for running the script.

  2. Download the script from GitHub (https://github.com/ustsam/Em-PC_seq).

  3. Unzip the file.

  4. Make sure all software in software section is installed.

  5. Open a terminal and enter the script directory

  6. Compile the code by typing “python setup_newreloc.py build_ext –inplace” in a terminal.

  7. Call the function using the command in the terminal:


    “./run_noQsfilter.sh {PATH to the output directory} {PATH to the reference file} {PATH to the script directory} DUMMY 2 ${twice of the max readlength} ${PATH of the data file in gzipped form}.”


  8. data.sam.gz can be found in the output directory. “data.sam.gz” contains all the transcripts after the consensus generation step. The file is in compressed sam format. To decompress the file, one can type the following command in a linux terminal:

    gunzip data.sam.gz


  9. Run the command in the output directory to perform data analysis:


    “bash data_analysis.sh {PATH to the output directory} {PATH to the reference file} {PATH to the script directory} 1 {maximum depth per site} {minimum base quality} {minimum mapping quality} {number of simulation fastq files generated}.”


    Ambiguity is defined as the number of ways to map a transcript to the reference genome (Cheung et al., 2020). We used the strictest threshold, which is ambiguity=1, minimum mapping quality=30, and minimum base quality=30.

  10. Run the following command to reproduce the analysis (assuming the script and output directory, as well as the reference fasta file are in the current directory):


    “bash data_analysis.sh ./ ./rDNA1.fasta ./ 1 500000 30 30 100.”


  11. Run the following command to preprocess the output from “data_analysis.sh” script for plotting:


    “bash plot.sh {PATH to the output directory} {PATH to the reference file} {PATH to the script directory} {ambiguity threshold} {maximum depth per site} {minimum base quality} {minimum mapping quality} 1.”


  12. The output of plot.sh files figures (.png format) are the following:

    1. “Distribution_NumberOfWaysToMap.png”: the distribution of the number of ways the transcripts can be mapped to the reference genome (ambiguity). The y-axis is the number of transcripts and the x-axis is ambiguity. An example is shown below (Figure 2).



      Figure 2. An example of a figure of the percentage of transcripts with a particular ambiguity. The ambiguity is the number of ways a transcript can be mapped to the reference genome. Ambiguity arises from the lack of information about the start of transcription due to the circularization step during the experiment.


    2. “MutationTypeSpectrum.png”: The mutational frequency for each type of mutation in the RNA transcript. The mutational frequency is the number of errors divided by the coverage of the corresponding reference base. For example: number of A  C errors divided by coverage of base A. An example is shown below (Figure 3).



      Figure 3. An example of a figure of the mutational spectrum. The mutational frequency of each type of substitution on the RNA chain is shown.


    3. “Muta_Frequency_inChrom_***.png”: The mutational frequency along the sites in the chromosome for the experimental and simulation data. “***” is the name of the chromosome. The transcription error-enriched genomic loci (TEEL) is shown as red dots. An example is shown below (Figure 4).



      Figure 4. An example of a figure of the mutational frequency across the genomic sites. The mutational frequency for each site in the chromosome is shown as the red lines. The background error obtained by simulation is shown as the gray line. The sites that are identified as TEEL are shown as red dots.


    4. “ErrorRate_per_PositionInTranscripts.png”: The error rate at each position in the transcript. The Position 0 corresponds to the 3’ end of the transcript. An example is shown below (Figure 5).



      Figure 5. An example of a figure of the error rate at each positional site in the transcript. The error rate for each position from the experimental data is shown as red dots. The error rate for each position from the simulation data is shown as blue dots. The x-axis is the position in the transcript, where position 0 corresponds to the 3’ end of the transcript. The increased error rate could be correlated by pausing of the RNA Polymerase, yet the order of event could not be determined in this experiment. Two scenarios could fit the observed data, either RNA Polymerase pauses to fix the error or RNA Polymerase pauses due to the misincorporation.


  13. If one would prefer to use other software tools, the output text files to plot the figures are:

    1. “Distribution_of_Ambiguity.txt” can be used to plot the figure described in step 12a.

    2. “MutationTypeSpectrum.txt” can be used to plot the figure described in step 12b.

    3. “MutationalFrequency_Exp_chrom_***.txt” can be used to plot the figure described in step 12c and “MutationalFrequency_Sim_chrom_***.txt” are the files containing the Mutational Frequencies per site in a chromosome for experimental and simulation, respectively. “MutationalFrequency_TEEL_chrom_***.txt” contains the mutational frequency of the sites considered as TEEL. The first column is the position in the chromosome and the second column is the mutational frequency.

    4. “MutationalFrequency_PerPositionInTranscript.txt” can be used to plot the figure described in step 12d. The first column is the position in the transcript. The second and third column is the average and standard deviation of the simulation data. The fourth and fifth column is the average and standard deviation of the experimental data. The average and standard deviation of experimental data are calculated from error rate binomial distribution estimated by maximum likelihood.

Notes

  1. RNA digestion: There are several alternative methods of RNA fragmentation since the type of RNA and the expected final size impact the sequencing technology to be used. For instance, ultra-sonication is widely used to fragment DNA for NGS sequencing libraries and can be optimized for RNA fragmentation as well, however, the size of the product is around 100-200 nt and careful consideration is needed when trying to obtain more than 2 tandem repeats during rolling-circle reverse transcription. Meanwhile, another biological digestion method using RNase III enzyme can fragment RNA molecules into smaller sizes (60-120 nt) but the recovery rate is relatively low (around 10%).

  2. RNA electrophoresis: We suggest collecting RNA samples at each step for troubleshooting. RNA electrophoresis can assay the size and amount of RNA. For small amounts of RNA larger than 200 bp but smaller than 6,000 bp, a Fragment Analyzer (Agilent) can be used since it needs no more than 2 ng of RNA and can generate a comprehensive size/amount spectrum of the RNA samples.

  3. RNA purification: Monarch® RNA Cleanup Kit is used to replace ethanol precipitation as it is quicker and easier to operate with multiple samples. Ethanol precipitation can still be used if RNA cleanup kit is not available but improper phase separation of aqueous layer may result in contamination with TRIzolTM.

  4. Library Preparation: In this protocol we use the NEBNext® UltraTM II DNA Library Prep Kit for Illumina® to prepare the cDNA library for sequencing. There are also other alternative kits for library preparation such as Ovation® Ultralow V2 DNA-Seq Library Preparation Kit and KAPA HTP/LTP Library Preparation Kits. While the steps of preparing cDNA libraries using different library preparation kits are similar, there are several points that are needed to be mentioned. Firstly, the dilution of the adaptor and the PCR enrichment cycles are different among these kits according to the specific mass of cDNA used for library preparation. For example, NEBNext® UltraTM II DNA Library Prep Kit for Illumina® needs 25-fold dilution of adaptor and a 10-cycles PCR if the sample mass is around 1 ng. Secondly, different library preparation kits have designated index primers, it should be considered that each kit might have their own requirement about the index combination based on the number of cDNA samples. For instance, NEBNext® UltraTM II DNA Library Prep Kit for Illumina® suggests several unique combinations when the number of cDNA samples is less than 7.

  5. DEPC Operation: Solid DEPC is toxic and harmful. Thus, there should be proper eye-shield, face-shield, full-face respirator, gloves and chemical hood when performing any operation with solid DEPC.

  6. Data analysis: Due to circularization, the information about the transcriptional direction and starting point of the transcripts are unknown. The scripts prepared are specifically used to treat rRNA transcripts, where they assume that all the transcripts are transcribed in the negative direction, i.e., the starting position is the 3’ end of the transcript. The consensus generation step (steps 7 and 8) is general for all raw data and does not depend on the transcription direction. However, starting from step 9, we assume negative direction of transcription, thus changes are required to generalize the script for other transcripts. The following files will need to be modified: “data_analysis.sh”, “pysam_make_pileup.py”, “simulation.py”, “binomial_distribution.py”, “plotting.py” and “plotting_preprocess.py”. In addition, ambiguity occurs due to re-localization during data treatment. Ambiguity is defined as the number of ways a transcript can be mapped to the reference genome (Cheung et al., 2020). The script assumes the strictest requirement, where only the transcripts with ambiguity equals to one are considered in analysis.

Recipes

  1. 1 L YEPD medium

    1. Add 10 g of Yeast Extract

    2. Add 20 g of peptone

    3. Add 850 ml dH2O

    4. Dissolve 20 g glucose in 100 ml dH2O in a new bottle

    5. Dissolve 40 mg adenine in 50 ml dH2O in a new bottle

    6. Autoclave above reagents at 120 °C for 15 min

    7. Add glucose and adenine solutions into the pre-medium after it cools down

  2. 2.5× Transcription buffer

    1. Add Tris-HCl (pH=7.7) to the final concentration as 50 mM

    2. Add KCl to the final concentration as 500 mM

    3. Add MgCl2 to the final concentration as 12.5 mM

    4. Add DEPC-H2O to a total volume as 50 ml

  3. AES Buffer

    1. Add Sodium Acetate (pH=5.3) to the final concentration as 50 mM

    2. Add EDTA to the final concentration as 10 mM

    3. Add SDS to the final concentration as 1% (w/v)

    4. Add DEPC-H2O to a total volume as 50 ml

  4. Beads washing buffer

    1. Add NaOH to the final concentration as 0.1 N

    2. Add NaCl to the final concentration as 50 mM

    3. Add DEPC-H2O to a total volume as 50 ml

  5. Binding washing buffer

    1. Add Tris-HCl (pH = 7.4) to the final concentration as 10 mM

    2. Add NaCl to the final concentration as 300 mM

    3. Add TritonTM X-100 to the final concentration as 0.1% (v/v)

    4. Add DEPC-H2O to a total volume as 50 ml

  6. Low Salt washing buffer

    1. Add Tris-HCl (pH = 7.4) to the final concentration as 5 mM

    2. Add TritonTM X-100 to the final concentration as 0.1% (v/v)

    3. Add DEPC-H2O to a total volume as 50 ml

  7. High Salt washing buffer

    1. Add Tris-HCl (pH = 7.4) to the final concentration as 50 mM

    2. Add NaCl to the final concentration as 2 M

    3. Add TritonTM X-100 to the final concentration as 0.5% (v/v)

    4. Add DEPC-H2O to a total volume as 50 ml

  8. DEPC-H2O

    1. Add 0.1% (v/v) DEPC to RNase-free Water

    2. Mix overnight and then autoclave

    3. Filter-sterilize the solution with a 0.22 μm filter

    4. Store in room temperature for up to one year

  9. 1 M sodium acetate solution

    1. Add 4.1 g solid sodium acetate

    2. Add DEPC-H2O to a total volume as 50 ml

Acknowledgments

This work was supported by Shenzhen Science and Technology Innovation Committee (JCYJ20170413173837121), Innovation and Technology Commission (ITCPD/17-9), the Hong Kong Research Grant Council (HKUST C6009-15G, AoE/P-705/16, T13-605/18-W) to X.H; GRF 16301319 to P.P.C; National Institute of Health GM25232 to J.L. NSFC (31922088); Hong Kong ITC (ITCPD/17-9, ITS/480/18FP); Hong Kong RGC (26102719, N_HKUST606/17, C6002-17GF, C7065-18GF, R4017-18) to J.W. B.J. was supported by the Institute of Advanced Studies (IAS) postdoctoral fellowship.

This protocol was derived from Cheung et al. (2020)

Competing interests

All authors declare no conflicts of interests.

References

  1. Acevedo, A. and Andino, R. (2014). Library preparation for highly accurate population sequencing of RNA viruses. Nat Protoc 9(7): 1760-1769.
  2. Burns, J. A., Dreij, K., Cartularo, L. and Scicchitano, D. A. (2010). O6-methylguanine induces altered proteins at the level of transcription in human cells. Nucleic Acids Res 38(22): 8178-8187.
  3. Carey, L. B. (2015). RNA polymerase errors cause splicing defects and can be regulated by differential expression of RNA polymerase subunits. Elife 4: e09954.
  4. Cheung, P. P., Jiang, B., Booth, G. T., Chong, T. H., Unarta, I. C., Wang, Y., Suarez, G. D., Wang, J., Lis, J. T. and Huang, X. (2020). Identifying Transcription Error-Enriched Genomic Loci Using Nuclear Run-on Circular-Sequencing Coupled with Background Error Modeling. J Mol Biol 432(13): 3933-3949.
  5. de Mercoyrol, L., Corda, Y., Job, C. and Job, D. (1992). Accuracy of wheat-germ RNA polymerase II. General enzymatic properties and effect of template conformational transition from right-handed B-DNA to left-handed Z-DNA. Eur J Biochem 206(1): 49-58.
  6. Imashimizu, M., Oshima, T., Lubkowska, L. and Kashlev, M. (2013). Direct assessment of transcription fidelity by high-resolution RNA sequencing. Nucleic Acids Res 41(19): 9090-9104.
  7. Islam, S., Kjallquist, U., Moliner, A., Zajac, P., Fan, J. B., Lonnerberg, P. and Linnarsson, S. (2011). Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res 21(7): 1160-1167.
  8. Ji, J. P. and Loeb, L. A. (1992). Fidelity of HIV-1 reverse transcriptase copying RNA in vitro. Biochemistry 31(4): 954-958.
  9. Mahat, D. B., Kwak, H., Booth, G. T., Jonkers, I. H., Danko, C. G., Patel, R. K., Waters, C. T., Munson, K., Core, L. J. and Lis, J. T. (2016). Base-pair-resolution genome-wide mapping of active RNA polymerases using precision nuclear run-on(PRO-seq). Nat Protoc 11: 1455-1476.
  10. Minoche, A. E., Dohm, J. C. and Himmelbauer, H. (2011). Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol 12(11): R112.
  11. Pelechano, V., Chavez, S. and Perez-Ortin, J. E. (2010). A complete set of nascent transcription rates for yeast genes. PLoS One 5(11): e15442.
  12. Saxowsky, T. T., Meadows, K. L., Klungland, A. and Doetsch, P. W. (2008). 8-Oxoguanine-mediated transcriptional mutagenesis causes Ras activation in mammalian cells. Proc Natl Acad Sci U S A 105(48): 18877-18882.
  13. Schwanhausser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W. and Selbach, M. (2011). Global quantification of mammalian gene expression control. Nature 473(7347): 337-342.
  14. van Leeuwen, F. W., de Kleijn, D. P., van den Hurk, H. H., Neubauer, A., Sonnemans, M. A., Sluijs, J. A., Koycu, S., Ramdjielal, R. D., Salehi, A., Martens, G. J., Grosveld, F. G., Peter, J., Burbach, H. and Hol, E. M. (1998). Frameshift mutants of beta amyloid precursor protein and ubiquitin-B in Alzheimer's and Down patients. Science 279(5348): 242-247.

简介

[摘要]转录错误可通过改变表观基因组并引起mRNA的错误整合而严重影响生物体内的代谢过程,从而将其翻译为异常的突变蛋白。此外,真核基因组内有特定转录错误富集的基因组基因座(TEELs),它们由RNA聚合酶与显著更高的错误率转录并推测为具有影响在癌症,老化和疾病例如唐氏综合征和阿尔茨海默'秒。因此,在遗传学领域对转录错误的研究越来越重要。尽管如此,方法上的障碍限制了准确识别转录错误的进展。Pro-Seq和NET-Seq可以沿基因组纯化新生RNA并绘制RNA聚合酶,但不能用于鉴定转录突变。在这里,我们本背景误差模型耦合的精密核圆形测序上运行(EMPC -SEQ),一种方法COMBIN荷兰国际集团测定和圆形测序核上运行与背景误差模型精确地检测新生转录错误和有效地辨别TEELs基因组中。

[背景]核糖核苷酸错掺导致的转录错误在所有活生物体中无处不在(Carey,2015)。假设每个信使RNA(mRNA)可以翻译2-4千次(Schwanhausser et al。,2011),并且许多特殊RNA在给定时间每个细胞仅表达一次(Islam et al。,2011; Pelechano et al。,2011)。,2010),即使是关键残基的单个转录错误也会使特定蛋白质的表达产生很大差异。另外,转录错误可加速蛋白质聚集,导致人类中与年龄有关的疾病(van Leeuwen等,1998)。虽然转录错误通常被保持为具有在整个基因组的随机分布,存在证据二cating转录错误可能会在某些结构基序和特定基因组区域中富集(Imashimizu等人,2013;货车Leeuwen等人,1998年)。这些转录错误富集的基因位点(TEELs)在各种疾病显着的生物学意义,例如唐氏综合征和老年痴呆症' S和遗传学研究都受到关注(伯恩斯等人,2010; Saxowsky 。等,2008) 。不幸的是,由于RNA聚合酶不受RNA编辑过程(例如来自转录后修饰的过程)的干扰,对于转录错误的研究必须克服主要的挑战。这就要求对新生的RNA进行纯化,并结合一种高度准确的RNA测序方法,该方法可以识别TEEL,并阐明转录调控和调节失调,这些转录调控和失调会导致涉及疾病的转录错误。

有几种复杂性阻碍了从头转录错误的准确检测。第一个挑战是消除转录后修饰产生的噪音,这需要纯化由RNA聚合酶新鲜制备的新生RNA。因此,目前有关转录错误的RNA测序(RNA-seq)研究通常忽略了这一要求,因此高估了转录错误率。第二个挑战是纠正来自下一代测序(NGS)的系统噪声。平均误读在大约一个碱基NGS每千(Minoche等人,2011) ,并且这是由逆转录酶(需要用于产生的cDNA NGS)的事实进一步复杂化misincorporates在每10,000个一种碱(籍和Loeb,1992 )。第三个挑战是通过计算将TEEL与背景噪声区分开。即使具有准确的测序数据,仍然难以从RNA聚合酶随机引入的背景误差中计算识别TEEL (de Mercoyrol等,1992)。在这里,我们提出了背景误差模型耦合精确核运行循环排序(EmPC -seq)方法(图1),以克服这三个主要挑战。EmPC- seq由三个核心组成部分组成:(1)进行核转录的分析,以在转录后修饰之前捕获新生的RNA (Mahat等,2016),(2)循环重测序步骤,通过滚动循环逆转环状新生RNA分子的转录(Acevedo和Andino,2014年),以通过滚环扩增产生同一环状RNA分子的串联cDNA重复序列来提高测序准确性,从而可以对RNA分子进行多次测序。(3)我们还开发了背景误差模型算法分析,以通过模拟从头测序数据和随后的误差来消除随机背景噪声,以作为对照组,其中携带了由于序列噪声,不均匀的测序深度和比对伪像而产生的背景变化(张等人,2020)。EmPC -seq旨在检测新生的转录错误并阐明可能与疾病有关的起源。





图1. EmPC -seq的示意图。实际的转录错误用橙色点表示。其他颜色的点代表系统性噪声,包括酶学错误和测序错误。(步骤1)使酵母细胞通透。(第2步)通过在核运行分析中添加所有4种生物素化的NTP来停止体内转录。(步骤3)提取酵母总RNA,并通过乙醇沉淀纯化。(第4步)通过碱水解将RNA片段化为短(60-100nt)RNA。(步骤5)通过链霉亲和素珠纯化可富集生物素标记的新​​生RNA 。(第6步)将重新纯化的新生RNA通过RNA连接酶环化,并通过滚环反转录加工成串联复制cDNA。(第7步)使用试剂盒制备文库DNA,然后提交给下一代测序。(第8步)通过将共有序列结果与我们的背景错误模型相结合,可以准确地检测出转录错误。该示意图改编自Cheung等。(2020年)。

关键字:转录诱变, 核糖核酸聚合酶, 新生RNA, RNA深度测序, 精确的RNA测序

材料和试剂
1. 1.5毫升管     
2.移液器技巧     
3.蛋糕     
4. 0.22微米过滤器     
5. W303酵母细胞(基因库编号:JIUU00000000)     
6. 1中号二硫苏糖醇(DTT,钍ermoFisher ,目录号:P2325)     
7.焦碳酸二乙酯(Sigma,目录号:40718 )     
8. UltraPure TM不含DNase / RNase的蒸馏水(ThermoFisher ,目录号:10977015)     
9.酵母提取物(西格玛,目录号:Y1625)     
10.蛋白ept(Sigma,目录号:P0556) 
11. d - (+) -葡萄糖(Sigma,目录号:G8270) 
12.腺嘌呤(Sigma,目录号:A8626) 
13. N-月桂酰基肌氨酸钠盐,Sarkosyl(Sigma,目录号:L9150) 
14.盐酸卓玛® (Sigma,目录号:T5941) 
15.氯化钾(西格玛,目录号:P9333) 
16.氯化镁(西格玛,目录号:M8266) 
17.生物素化核苷酸(Jena Bioscience,目录号:NU系列) 
18.小鼠RNase抑制剂(NEB,目录号:M0314L) 
19.焦碳酸二乙酯,DEPC(Sigma,目录号:40718) 
20.液化苯酚(西格玛,目录号:P9346) 
21.乙酸钠(Sigma,目录号:S2889) 
22.乙二胺四乙酸,EDTA(Sigma,目录号:EDS) 
23.十二烷基硫酸钠(西格玛,目录号:L3771) 
24.氯仿(Sigma,目录号:C2432) 
25. GlycoBlue TM共沉淀剂(ThermoFisher ,目录号:AM9515) 
26.纯乙醇(西格玛,目录号:E7023) 
27.氢氧化钠(西格玛,目录号:S8045) 
28. Triton TM X-100(Sigma,目录号:X100) 
29.君主® RNA纯化试剂盒(NEB,目录号:T2030L) 
30. Dynabeads TM M-280链霉亲和素(ThermoFisher ,目录号:60210) 
31.氯化钠(西格玛,目录号:S7653) 
32. TRIzol TM试剂(ThermoFisher ,目录号:15596018) 
33. Ambion TM T4 RNA连接酶(ThermoFisher ,目录号:AM2141) 
34. T4多核苷酸激酶(NEB,目录号:M0201S) 
35.聚乙二醇8000(西格玛,目录号:1546605) 
36. 5'-三磷酸腺苷,ATP(NEB,目录号:P0756S) 
37. dNTP Mix(ThermoFisher ,目录号:18427088) 
38.随机六边形底漆(ThermoFisher ,目录号:SO142) 
39. SuperScript TM III第一链合成系统(ThermoFisher ,目录号:18080051) 
40. NEBNext ®超TM II定向RNA第二链合成模块(NEB,目录号:E7550L) 
41. MinElute PCR纯化试剂盒打印(QIAGEN,目录号:28004) 
42. NEBNext ®超TM II DNA文库制备试剂盒对于Illumina公司® (NEB,目录号:E7645L) 
43. Qubit TM dsDNA HS检测试剂盒(ThermoFisher ,目录号:Q32851) 
44. YEPD培养基(请参阅食谱) 
45. 2.5×转录缓冲区(请参阅食谱) 
46. AES缓冲区(请参阅食谱) 
47.洗珠缓冲液(请参阅食谱) 
48.结合洗涤缓冲液(请参见配方) 
49.低盐洗涤缓冲液(请参阅食谱) 
50.高盐洗涤缓冲液(请参阅食谱) 
51. DEPC-H 2 O(请参阅食谱) 
52. 1 M乙酸钠溶液(请参阅食谱) 

设备



1.的Eppendorf ®研究®加移液器仪(Eppendorf,目录号:EP系列)     
2. MaxQ TM 6000温育/冷藏可堆叠摇床(ThermoFisher ,目录号:SHKE6000)     
3.的Eppendorf BioPhotometer生物分光光度计®仪(Eppendorf,米Odel等:D30)     
4. Megafuge ® (贺利氏,米Odel等:1.0R)     
5. 5升通用水浴锅(PolyScience ,目录号:WBE05A11B)     
6. NEBNext ®磁分离架(NEB,目录号:S1515S)     
7.鼓式摇精灵® (ZYMO研究,目录号:S5008)     
8. ProFlex PCR系统(ThermoFisher ,目录号:4484075)     
9. 5200片段分析仪系统(安捷伦,目录号:M5310AA)     

软件
1 ProSize (安捷伦,https: //explore.agilent.com/Software-Download-Fragment-Analyzer-Prosize)
2. Python(2.7.12版,https: //www.python.org/ )     
3. Cython (版本0.23.4,https: //cython.org/ )     
4. NumPy(版本1.11.0,https: //numpy.org/ )     
5. SciPy(版本0.17.0,https: //www.scipy.org/ )     
6. Burrows-Wheeler Aligner(版本0.7.17-r1188,http://bio-bwa.sourceforge.net/ )     
7. samtools (1.9版,http: //www.htslib.org/ )     
8. pysam (版本0.15.0 ,https: //pysam.readthedocs.io/en/latest/installation.html )     
9. matplotlib(版本2.2.2,https: //matplotlib.org/ )     

程序



准备酵母细胞进行核运行分析
在5 ml YEPD培养基中接种单个酵母菌落。
在30 °C的培养箱中以200 rpm的转速孵育过夜。
测量每种细胞培养物的光密度(OD 600nm )。
将酵母稀释并重新接种到OD 600为0.2的10 ml YEPD培养基中。
将酵母细胞在30 °C的培养箱中孵育,直到OD 600为0.4-0.6。

酵母核试验
1.在4 °C下以2,000 × g离心酵母细胞5分钟,并去除上清液。     
2.上下吹打,用5 ml ice- d d DEPC-H 2 O洗涤细胞沉淀。     
3.将酵母细胞在4 °C下以2,000 × g离心5分钟,并吸出上清液。     
4.用4.75 ml冰冷的DEPC-H 2 O重悬细胞沉淀。     
5.添加250μl的10%Sarkosyl溶液并轻轻混合。     
6.在冰上放置20分钟。     
7.在4 °C下以400 × g离心酵母细胞5分钟,并除去上清液。     
8.重悬带120中的细胞沉淀微升2.5×转录缓冲液,6微升的0.1M DTT的,3.5微升每种1米的中号生物素NTP,和3微升RNA酶抑制剂与总体积为143微升。     
9.添加142微升DEPC-H 2 O和15微升10%的十二烷基肌氨酸钠溶液中,并轻轻混合。     
10.将混合物在30 ° C下温育5分钟,并在一半时间点(2.5分钟)上下轻轻吸打。 

新生RNA提取
1.在4 °C下将反应混合物以400 × g离心5分钟,并除去上清液。     
2.快速重悬沉淀在500微升液化苯酚。     
3.加入500微升AES缓冲液和PIP ETTE上下破裂的细胞。     
4.将混合物在65 °C下每分钟涡旋孵育5分钟。     
5.将混合物在冰上孵育5分钟。     
6.向其中加入200μl氯仿,涡旋30秒钟。     
7.将混合物在室温下孵育2分钟。     
8.在4 °C下以14,000 × g离心5分钟。     
9.排出水层,然后将其放入另一个新管中(注意避免接触)。     
10.广告d 1M乙酸钠溶液到水层中,以200的终浓度毫米。 
11.将混合有乙酸钠的RNA水溶液分成两个1.5 ml试管。 
12.添加1微升的GlycoBlue TM共沉淀和3×体积的100%乙醇为两个1.5含水层毫升管小号TEP C11(这mixtu重可存储过夜)。 
13.在4 °C下以14,000 × g的速度离心混合物30分钟,并除去上清液(注意避免干扰蓝色沉淀)。 
14.用新鲜制备的无RNase的5%乙醇洗涤沉淀。 
15.在4 °C下以14,000 × g离心5分钟,并除去上清液。 
16.让粒料干燥5分钟,并在20重悬它微升DEPC-H的2 O. 

碱基水解产生的RNA片段化
1.在65 °C下加热变性RNA溶液40 s,然后将其放在冰上。     
2.加入5微升冰冷的1 N氢氧化钠,在冰上孵育10分钟。     
3.添加25微升1所述的1M Tris-HCl(pH6.8)中。     
4.纯化,用君的RNA ® RNA纯化试剂盒(10微克)用20的洗脱体积微升。     
5.加入1微升RNA酶抑制剂。     

生物素标记的新生RNA提取
洗30微升的链亲和素M280珠子用500微升通过添加和混合的珠子洗涤缓冲液,1分钟上的磁体和设置在抽出上清液的珠子洗涤缓冲液。
用500μl100 mM NaCl溶液洗涤小珠两次,操作与S tep E1相同。
将小珠重悬于50μl的结合洗涤缓冲液中。
将D部分中的纯化RNA与结合洗涤缓冲液混合至50μl 。
将RNA溶液与重悬的磁珠混合。
将混合物在室温下在旋转器上孵育20分钟。
将混合物放在磁铁上1分钟,然后吸出上清液。
重悬珠子500微升高盐洗涤缓冲液中。
将珠子放在磁铁上1分钟,然后吸出上清液。
重复步骤E8-E10。
用500洗珠两次微升结合洗涤缓冲液中。
用500洗珠两次微升低盐洗涤缓冲液中。
重悬珠子300微升的TRIzol试剂TM溶液。
将混合物在冰上孵育3分钟。
添加60微升氯仿到混合物中并充分涡旋,然后至少20秒。
将混合物在20,000 × g下于4 °C离心5分钟。
纯化在水层中(小心,以避免界面)与君主的RNA ® RNA纯化试剂盒(10微克)用20的洗脱体积微升。

RNA样品的环化
将纯化的RNA在65 °C加热变性1分钟,然后将其置于冰上。
加入下列试剂,以19微升RNA溶液:4微升10×T4连接酶I反应缓冲液,2微升T4连接酶I酶,2微升PNK酶,1微升RNA酶抑制剂的,8微升50%PEG8000和的4微升的10mM ATP的溶液。
将混合物在25 °C下孵育2小时或在16 °C下孵育过夜。
净化与君主的混合物的RNA ® RNA纯化试剂盒(10微克)用20的洗脱体积微升。

滚动循环反转录
1.加入下列试剂到9微升RNA溶液:4微升的10mM的溶液的dNTP,4微升50纳克/的微升随机六聚体和3微升的无RNA酶的水。     
2.将反应混合物在65 °C下加热变性1分钟,然后将其置于冰上2分钟以上。     
3.添加下列物质到反应混合物:8微升5×第一链合成缓冲液,4微升的0.1mM的DTT溶液,4微升的的SuperScript III酶和Wa的8μL叔     
注意:此处使用的缓冲液和酶来自SuperScript TM III第一链合成系统试剂盒。


4.将混合物在25 °C孵育10分钟,然后在42 °C孵育20分钟。     
5.净化与君主的混合物的RNA ® RNA纯化试剂盒(10微克)用20的洗脱体积微升。     
6. 20添加μ升的无RNase水到RNA溶液。     

第二链合成
1.加入38μl具有以下材料的RNA溶液:8μl10 ×第二链合成缓冲液,4μl第二链合成酶混合物和30μl无RNase的水     
注:该缓冲液和酶使用,这里是来自NEBNext ®超TM II定向RNA第二链合成模块套件。


2.在将热循环仪盖设置在50 °C的情况下,将反应混合物在16 °C孵育2 h 。     
3.使用MinElute PCR纯化试剂盒纯化混合物中的DNA 。     

为NGS准备和提交图书馆
1.使用Qubit TM dsDNA HS检测试剂盒在步骤H3中测量cDNA的浓度。     
2.用准备测序文库NEBNext ®超TM II DNA文库制备试剂盒对Illumina公司®根据最终准备,接头连接,大小选择和PCR富集制造商的说明。     
3.以单端300个碱基对的长度将cDNA文库提交给MiSeq Illumina平台,以进行下一代测序。     

数据分析



1.使用Ubuntu 16.04 LTS(Xenial Xerus)运行脚本。     
2.从GitHub(https://github.com/ustsam/Em-PC_seq)下载脚本。     
3.解压缩文件。     
4.确保已安装“软件”部分中的所有软件。     
5.打开终端并输入脚本目录     
6.编译代码通过键入“蟒setup_newreloc.py build_ext -就地”中的终端。     
7.使用终端中的命令调用该函数:     

“ ./run_noQsfilter.sh {输出目录的路径} {参考文件的路径} {脚本目录的路径} DUMMY 2 $ {最大读取长度的两倍} $ {数据文件以gzip压缩的形式的路径}。”



8. data.sam.gz可以在输出目录y中找到。“ data.sam.gz”包含共识生成步骤之后的所有转录本。该文件为压缩的sam格式。要解压缩该文件,可以在linux终端中键入以下命令:     
gunzip data.sam.gz



9.在输出目录中运行命令以执行数据分析:     

“ bash data_analysis.sh {输出目录的路径} {参考文件的路径} {脚本目录的路径} 1 {每个站点的最大深度} {最低的基本质量} {最低的映射质量} {生成的仿真fastq文件的数量}。”



模糊度定义为将转录物定位到参考基因组的方式的数量(Cheung等人,2020)。我们使用了最严格的阈值,即模糊度= 1,最小映射质量= 30,最小基本质量= 30。


10.运行以下命令以重现分析(假设脚本和输出目录以及参考fasta文件位于当前目录中): 

“ bash data_analysis.sh”。/ ./rDNA1.fasta ./ 1 500000 30 30100。”



11.运行以下命令以预处理“ data_analysis.sh ”脚本的输出以进行绘图: 

“ bash plot.sh {到输出目录的路径} {到参考文件的路径} {到脚本目录的路径} {模糊度阈值} {每个站点的最大深度} {最低基本质量} {最低映射质量} 1. ”



12. (的plot.sh文件附图中的输出PNG格式)如下: 
一种。“ Distribution_NumberOfWaysToMap.png”:转录本可以映射到参考基因组的方式数量的分布(歧义)。y轴是笔录数量,x轴是模棱两可。下面显示一个示例(图2)。       





图2.具有特定歧义性的笔录百分比图的示例。歧义性是转录本可以定位到参考基因组的方式的数量。由于实验过程中的环化步骤,导致缺乏有关转录起始信息的歧义。



b。“ MutationTypeSpectrum.png”:RNA转录物中每种突变类型的突变频率。突变频率是错误数除以相应参考碱基的覆盖率。例如:A的数目 Ç误差由基体A的覆盖范围划分的例子如下所示(图3) 。       





图3.突变谱图的一个例子。显示了RNA链上每种取代类型的突变频率。



C。“ Muta_Frequency_inChrom _ ***。png ”:实验和模拟数据沿染色体位点的突变频率。“ ***”是染色体的名称。富含转录错误的基因组基因座(TEEL)显示为红点。下面显示了一个示例(图4 )。       





图4.整个基因组位点的突变频率图的示例。染色体中每个位点的突变频率显示为红线。通过仿真获得的背景误差显示为灰线。标识为TEEL的位置显示为红点。



d。“ ErrorRate_per_PositionInTranscripts.png”:笔录中每个位置的错误率。位置0对应于转录本的3'端。下面显示了一个示例(图5)。       





图5.笔录中每个位置位的错误率图示例。实验数据中每个位置的错误率显示为红点。模拟数据中每个位置的错误率显示为蓝点。x轴是笔录中的位置,其中位置0对应于笔录的3'末端。通过暂停RNA聚合酶可以增加错误率,但是在此实验中无法确定事件的顺序。有两种情况可以拟合观察到的数据,要么是RNA聚合酶暂停以纠正错误,要么是由于错误掺入而导致RNA聚合酶暂停。


13.如果希望使用其他软件工具,则用于绘制图形的输出文本文件为: 
一种。“ Distribution_of_Ambiguity.txt”可用于绘制步骤12a中描述的图形。       
b。“ MutationTypeSpectrum.txt”可用于以绘制在步骤12b中所描述的图中。       
C。“ MutationalFrequency_Exp_chrom _ ***。txt”可用于绘制步骤12c中所述的图形,“ MutationalFrequency_Sim_chrom _ ***。txt”是分别包含用于实验和模拟的染色体中每个位点的Mutational Frequency的文件。“ MutationalFrequency_TEEL_chrom _ ***。txt”包含被视为TEEL的位点的突变频率。第一列是染色体中的位置,第二列是突变频率。       
d。“ MutationalFrequency_PerPositionInTranscript.txt”可用于绘制步骤12d中描述的图形。第一列是笔录中的位置。第二列和第三列是模拟数据的平均值和标准偏差。第四和第五列是实验数据的平均值和标准偏差。实验数据的平均和标准偏差是根据最大似然估计的错误率二项式分布计算得出的。       

笔记



1. RNA消化:由于RNA的类型和预期的最终大小会影响要使用的测序技术,因此有几种其他的RNA片段化方法。例如,超声处理被广泛用于NGS测序文库的DNA片段化,并且也可以针对RNA片段化进行优化,但是,产物的大小约为100-200 nt ,在尝试获得超过200 nt时需要仔细考虑。在滚环反转录过程中重复2个串联。同时,另一种使用RNase III酶的生物消化方法可以将RNA分子片段化为较小的大小(60-120 nt ),但回收率相对较低(约10%)。     
2. RNA电泳:我们建议在每个步骤收集RNA样品以进行故障排除。RNA电泳可以测定RNA的大小和数量。对于少量的RNA的大于200个碱基,但小于6 ,000碱基对,片段分析器(安捷伦)可以被使用,因为它需要不超过2纳克RNA的并且能够产生RNA样品的全面尺寸/量谱。     
3. RNA纯化:君主® RNA纯化试剂盒用于替换乙醇沉淀,因为它是更快和更容易用多个样品进行操作。如果RNA净化试剂盒不可用但水层的不当的相分离可导致乙醇沉淀仍可使用contaminati上用的TRIzol TM 。     
4.文库制备:在这个协议中,我们使用NEBNext ® UltraTM II DNA文库制备试剂盒的Illumina ®以制备cDNA文库进行测序。Ť这里是也用于文库制备如的Ovation其他替代试剂盒®超低V2 DNA-SEQ文库制备试剂盒和KAPA HTP / LTP文库制备试剂盒。同时准备使用不同的cDNA文库的步骤文库制备试剂盒是相似的,有几点是需要ED被提及。首先,根据用于文库制备的cDNA的比重,这些试剂盒中衔接子的稀释度和PCR富集周期是不同的。例如,NEBNext ®超TM II DNA文库制备试剂盒用于Illumina公司®需要适配器的25倍稀释液和10个PCR循环,如果样品质量为约1毫微克。其次,不同的文库制备试剂盒具有指定的索引引物,应该考虑到每个试剂盒可能根据cDNA样本数量对索引组合有自己的要求。例如,NEBNext ®超TM II DNA文库制备试剂盒用于Illumina公司®表明几个独特的组合,当cDNA样品的数量小于7 。     
5. DEPC操作:固体DEPC有毒有害。因此,在使用固体DEPC进行任何操作时,应有适当的眼罩,面罩,全面罩呼吸器,手套和化学防护罩。     
6.数据分析:由于环化,有关转录本转录方向和起点的信息未知。所准备的脚本专门用于处理rRNA转录本,其中它们假定所有转录本均沿负方向转录,即起始位置为转录本的3'末端。共识生成步骤(步骤s 7和8)适用于所有原始数据,并且不取决于转录方向。但是,从第9步开始,我们假设转录的方向是负的,因此需要进行更改才能将脚本通用化为其他转录本。需要修改以下文件:“ data_analysis.sh”,“ pysam_make_pileup.py”,“ simulation.py”,“ binomial_distribution.py”,“ plotting.py”和“ plotting_preprocess.py”。另外,由于在数据处理过程中的重新定位而产生歧义。歧义定义为转录本可以定位到参考基因组的方式的数量(Cheung等,2020)。该脚本假设最严格的要求,其中在分析中仅考虑歧义性等于1的成绩单。     

菜谱



1. 1 L YEPD培养基     
一种。加入10克酵母提取物     
b。加入20克蛋白ept     
C。加入850毫升dH 2 O     
d。在新瓶中将20 g葡萄糖溶于100 ml dH 2 O中     
e。在新瓶中将40 mg腺嘌呤溶解在50 ml dH 2 O中     
F。将上述试剂在120 °C下高压灭菌15分钟       
G。待冷却后,将葡萄糖和腺嘌呤溶液添加到预培养基中     
2. 2.5×转录缓冲液     
一种。加入Tris-HCl(pH = 7.7)至终浓度为50 mM     
b。加入KCl至终浓度为500 mM     
C。加入MgCl 2至终浓度为12.5 mM     
d。加入DEPC-H 2 O至总体积为50 ml     
3. AES缓冲区     
一种。加入乙酸钠(pH = 5.3)至终浓度为50 mM     
b。加入EDTA至终浓度为10 mM     
C。加入SDS至终浓度为1%(w / v)     
d。加入DEPC-H 2 O至总体积为50 ml     
4.洗珠缓冲液     
一种。加入NaOH至终浓度为0.1 N     
b。加入NaCl至终浓度为50 mM     
C。加入DEPC-H 2 O至总体积为50 ml     
5.结合洗涤缓冲液     
一种。加入Tris-HCl(pH = 7.4)至终浓度为10 mM     
b。加入NaCl至终浓度为300 mM     
C。加入Triton TM X-100至终浓度为0.1%(v / v)     
d。加入DEPC-H 2 O至总体积为50 ml     
6.低盐洗涤缓冲液     
一种。加入Tris-HCl(pH = 7.4)至终浓度为5 mM     
b。加入Triton TM X-100至终浓度为0.1%(v / v)     
C。加入DEPC-H 2 O至总体积为50 ml     
7.高盐洗涤缓冲液     
一种。加入Tris-HCl(pH = 7.4)至终浓度为50 mM     
b。加入NaCl至终浓度为2 M     
C。加入Triton TM X-100至终浓度为0.5%(v / v)     
d。加入DEPC-H 2 O至总体积为50 ml     
8. DEPC-H 2 O     
一种。向不含RNase的水中添加0.1%(v / v)DEPC   
b。混合过夜,然后高压灭菌     
C。过滤灭菌使用0.22的溶液微米过滤器     
d。在室温下储存长达一年     
9. 1 M醋酸钠溶液     
一种。加入4.1克固体乙酸钠     
b。加入DEPC-H 2 O至总体积为50 ml     

致谢



这项工作得到了深圳市科学技术创新委员会(JCYJ20170413173837121),创新技术委员会(ITCPD / 17-9),香港研究资助局(HKUST C6009-15G,AoE / P-705 / 16,T13-605)的支持/ 18-W)到XH; PPC的GRF 16301319;国立卫生研究院GM25232至JL NSFC(31922088);香港国际贸易中心(ITCPD / 17-9,ITS / 480 / 18FP); 获JWBJ资助的香港研资局(26102719,N_HKUST606 / 17,C6002-17GF,C7065-18GF,R4017-18)得到了高级研究所(IAS)博士后研究金的支持。


该协议源自Cheung等。(2020年)



利益争夺



所有作者均声明无利益冲突。



参考



Acevedo,A.和Andino ,R.(2014)。用于高度准确地对RNA病毒进行种群测序的文库制备。Nat Protoc 9(7):1760-1769。
伯恩斯(JA),德赖伊(K.Dreij),卡特洛(L.Cartallo )和达西奇奇塔诺(Scichitano )(2010)O6-甲基鸟嘌呤在人类细胞的转录水平上诱导蛋白质的改变。Nucleic Acids Res 38(22):8178-8187。
凯里(LB),2015年。RNA聚合酶错误会导致剪接缺陷,并且可以通过RNA聚合酶亚基的差异表达进行调控。Elife 4:e09954。
Cheung,PP,Jiang,B.,Booth,GT,Chong,TH,Unarta,IC,Wang,Y.,Suarez,GD,Wang,J.,Lis,JT and Huang,X.(2020年)。使用核运行循环测序结合背景误差建模来鉴定富含转录误差的基因组位点。分子生物学杂志432(13):3933-3949。
de Mercoyrol,L.,Corda,Y.,Job,C. and Job,D.(1992)。小麦胚芽RNA聚合酶II的准确性。从右手B-DNA到左手Z-DNA的一般酶学性质和模板构象转变的影响。Eur J Biochem 206(1):49-58。
Imashimizu,M.,Oshima,T.,Lubkowska,L.和Kashlev,M.(2013)。通过高分辨率RNA测序直接评估转录保真度。核酸研究(Nucleic Acids Res)41(19):9090-9104。
伊斯兰教,S.,Kjallquist,U.Moliner,A.Zajac,P.Fan,JB,Lonnerberg,P。和Linnarsson,S。(2011)。通过高度复用的RNA-seq表征单细胞转录环境。Genome Res 21(7):1160-1167。
Ji,JP和Loeb,LA(1992)。保真HIV-1逆转录酶复制RNA的体外。 生物化学31(4):954-958。
Mahat,DB,Kwak,H.,Booth,GT,Jonkers,IH,Danko,CG,Patel,RK,Waters,CT,Munson,K.,Core,LJ和Lis,JT(2016)。使用精密核运行(PRO-seq)的活性RNA聚合酶的碱基对分辨率全基因组定位。Nat Protoc 11 :1455-1476。
AE的Minoche,JC的Dohm和H.Himmelbauer(2011)。对在Illumina HiSeq和基因组分析仪系统上生成的基因组高通量测序数据进行评估。Genome Biol 12(11):R112。
Pelechano,V.,Chavez,S.和Perez-Ortin,JE(2010)。酵母基因的完整新生转录速率的完整集合。PLoS One 5(11):e15442。
Saxowsky,TT,Meadows,KL,Klungland,A.和Doetsch,PW(2008)。8-氧鸟嘌呤介导的转录诱变在哺乳动物细胞中引起Ras活化。美国国家科学院院刊105(48):18877-18882。
Schwanhausser,B.,Busse,D.,Li,N.,Dittmar,G.,Schuchhardt,J.,Wolf,J.,Chen,W. and Selbach,M.(2011)。哺乳动物基因表达控制的全球量化。自然473(7347):337-342。
van Leeuwen,FW,de Kleijn,DP,van den Hurk,HH,Neubauer,A.,Sonnemans,MA,Sluijs,JA,Koycu,S.,Ramdjielal,RD,Salehi,A.,Martens,GJ,Grosveld,FG ,Peter,J.,Burbach,H。和Hol,EM(1998)。β淀粉样蛋白前体蛋白和泛素-B在阿尔茨海默氏病和唐氏病患者中的移码突变体。科学279(5348):242-247。
登录/注册账号可免费阅读全文
  • English
  • 中文翻译
免责声明 × 为了向广大用户提供经翻译的内容,www.bio-protocol.org 采用人工翻译与计算机翻译结合的技术翻译了本文章。基于计算机的翻译质量再高,也不及 100% 的人工翻译的质量。为此,我们始终建议用户参考原始英文版本。 Bio-protocol., LLC对翻译版本的准确性不承担任何责任。
Copyright: © 2021 The Authors; exclusive licensee Bio-protocol LLC.
引用:Wang, Y., Chong, T., Unarta, I. C., Xu, X., Suarez, G. D., Wang, J., Lis, J. T., Huang, X. and Cheung, P. (2021). EmPC-seq: Accurate RNA-sequencing and Bioinformatics Platform to Map RNA Polymerases and Remove Background Error. Bio-protocol 11(4): e3921. DOI: 10.21769/BioProtoc.3921.
提问与回复
提交问题/评论即表示您同意遵守我们的服务条款。如果您发现恶意或不符合我们的条款的言论,请联系我们:eb@bio-protocol.org。

如果您对本实验方案有任何疑问/意见, 强烈建议您发布在此处。我们将邀请本文作者以及部分用户回答您的问题/意见。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片的形式来说明遇到的问题。

如果您对本实验方案有任何疑问/意见, 强烈建议您发布在此处。我们将邀请本文作者以及部分用户回答您的问题/意见。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片的形式来说明遇到的问题。