参见作者原研究论文

本实验方案简略版
Jun 2020
Advertisement

本文章节


 

Evaluation of the Sequence Variability within the PCR Primer/Probe Target Regions of the SARS-CoV-2 Genome
SARS-CoV-2基因组PCR引物/探针靶区序列变异性评价   

引用 收藏 1 提问与回复 分享您的反馈 Cited by

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; initially named 2019-nCoV) is responsible for the recent coronavirus disease (COVID-19) pandemic, and polymerase chain reaction (PCR) is the current standard method for diagnosis from patient samples. As PCR assays are prone to sequence mismatches due to mutations in the viral genome, it is important to verify the genomic variability at primer/probe binding regions periodically. This step-by-step protocol describes a bioinformatics approach for an extensive evaluation of the sequence variability within the primer/probe target regions of the SARS-CoV-2 genome. The protocol can be applied to any molecular diagnostic assay of choice using freely available software programs and the ready-to-use multiple sequence alignment (MSA) file provided.

Graphic abstract:


Overview of the sequence tracing protocol. The figure was created using the Library of Science and Medical Illustrations from somersault18:24 licensed under a CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/).

Video abstract: https://youtu.be/M1lV1liWE9k


Keywords: Coronavirus SARS-CoV-2 (冠状病毒SARS-CoV-2), COVID-19 (新型冠状病毒肺炎), Diagnosis (诊断), Genomic variability (基因组变异性), Polymerase chain reaction (PCR) (多聚酶链反应(PCR)), Mutations (突变)

Background

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; initially named as 2019-nCoV) is the cause of novel coronavirus disease termed COVID-19. The virus originated from Wuhan, China and rapidly spreaded around the world causing a global pandemic (Worldometers.info, 2020). Sequencing of the virus showed that its single-stranded RNA genome is ~30 kb in size (Chan et al., 2020; Lu et al., 2020; Wu et al., 2020; Zhou et al., 2020). Availability of the viral sequence early in the outbreak helped the development of several polymerase chain reaction (PCR) detection protocols that have been instrumental in the diagnosis of the disease from patients samples (WHO, 2020). However, genetic variability in the viral genome during natural evolution poses a potential risk of mismatches between the diagnostic assays and the template that can result in false-negative results (Whiley and Sloots, 2005; Chow et al., 2011). Sequences of SARS-CoV-2 viruses isolated from around the world are being deposited in the sequence databases and mutations have been identified in the genomes of the circulating viruses (Ugurel et al., 2020).

We performed an extensive evaluation of published diagnostic PCR assays, including those recommended by the World Health Organization (WHO), based on evaluation of sequence variation in the primer/probe binding regions using more than 17,000 publicly available viral sequences (Khan and Cheung, 2020). Another concurrent publication reported mutations in primer/probe binding regions using 1825 sequences but a detailed sequence tracing protocol was not provided (Osorio and Correia-Neves, 2020). This step-by-step protocol outlines a bioinformatics pipeline that uses freely available open-source software programs. The pipeline can be performed on a regular desktop computer without any need for special hardware and does not require extensive computational skills. The provision of a ready-to-use Multiple Sequence Alignment (MSA) file through Open Science Framework (OSF) makes it an even more intuitive task. Inclusivity analysis through verification of in silico nucleotide identity match is one of the regulatory requirements for approval of COVID-19 diagnostic assays (Commission-Services, 2020; FDA, 2020; Health-Canada, 2020). The protocol can also be applied to other molecular diagnostic assays of SARS-CoV-2 including point-of-care CRISPR-based diagnostic assays under development (Tsang and LaManna, 2020).

Equipment

  1. A regular Windows or Mac OS X laptop or desktop.

    Note: There is no specific processor or RAM requirement, but memory issues can be avoided by opening a limited number of files at the same time. The outlined protocol was performed on a laptop installed with Windows 10, an Intel Core i5-8265U processor, CPU @1.60GHz and an 8 GB RAM.

Software

  1. MAFFT version 7 online service (Katoh et al., 2002 and 2019) (available from https://mafft.cbrc.jp/alignment/software/closelyrelatedviralgenomes.html)

  2. AliView version 1.26 (Larsson, 2014) (available from https://ormbunkar.se/aliview/)

  3. Sequence Manipulation Suite version 2 (Stothard, 2000) (available from https://www.bioinformatics.org/sms2/rev_comp.html)

  4. SequenceTracer (Nagy et al., 2019) (available from http://www1.szu.cz:8080/EntropyCalcWeb/sequences)

  5. ElimDupes (https://www.hiv.lanl.gov/content/sequence/elimdupesv2/elimdupes.html)

  6. PNNS calculator (available from http://entropy.szu.cz:8080/EntropyCalcWeb/pnns)

  7. A web browser (for example Google Chrome or Mozilla Firefox)

  8. A text editor (for example Microsoft Notepad)

Procedure

You can jump to Procedure D to download the latest version of a ready-to-use SARS-CoV-2 Multiple Sequence Alignment (MSA) file from our project page on OSF.


  1. Viral sequence dataset

    1. Download the viral sequences from the repository of your choice.

      Note: Check the terms and conditions of each repository with attention to the data sharing policy. Registration may be required.

      1. GISAID's EpiCoV database (https://www.gisaid.org/) (Shu and McCauley, 2017).

      2. NCBI virus (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/) (Hatcher et al., 2017).

      3. The Chinese National Genomics Data Center (NGDC) database (https://bigd.big.ac.cn/ncov) (NGDC, 2020).

      4. EMBL-EBI's COVID-19 Data Portal (https://www.covid19dataportal.org/).

      5. COVID-19 Genomics UK (COG-UK) Consortium (https://www.cogconsortium.uk/data/).

    2. Download the complete genome of Wuhan-Hu-1 (NCBI Reference Sequence: NC_045512.2; https://www.ncbi.nlm.nih.gov/nucleotide/).


  2. Multiple Sequence Alignment (MSA) using MAFFT online service dedicated to MSA of closely-related viral genomes (https://mafft.cbrc.jp/alignment/software/closelyrelatedviralgenomes.html).

    1. Input (Figure 1A):

      1. The complete genome of Wuhan-Hu-1 (NC_045512.2) to the “Existing alignment” box.

      2. Input the other sequences to the “Fragmentary sequence(s)" box.

        Note: MAFFT online service supports up to 20,000 sequences of ~30 kb in length. The task should be performed in batches if more sequences are being aligned and results should be combined after sequence stratification.

    2. Parameters (Figure 1B):

      1. UPPERCASE/lowercase, select "same as input'.

      2. Direction of nucleotide sequences, select “Adjust direction according to the first sequence”.

      3. Output order, select "aligned".

    3. Advanced Settings (Figure 1C):

      1. Keep alignment length, select "Yes".

      2. Strategy, select "auto".

    4. Download the aligned sequence in FASTA format, once available.



      Figure 1. Multiple Sequence Alignment (MSA) using MAFFT online service


      Video for Procedure A-B: https://youtu.be/hbnsXnikRak


  3. Alternatively, download a ready-to-use MSA file from our OSF page (https://doi.org/10.17605/OSF.IO/NPCS6).

    Note: Data in our original publication (Khan and Cheung, 2020) was downloaded from GISAID that does not permit the release of MSA files publicly. The MSA file provided on our OSF page was generated using sequences downloaded from the NCBI virus. The file would be updated periodically during the pandemic (file 1 aligning 19863 SARS-CoV-2 sequences).


  4. Save Region of Interest (ROI) for each primer/probe as a separate FASTA file

    1. Open the MSA file from Procedure B or Procedure C in the AliView program.

      Note: Aliview program (available from https://ormbunkar.se/aliview/) needs to be downloaded on the computer in advance.

    2. Find the primer binding site using the “find” function (Figure 2A) or using “add and align sequences from clipboard” function (Figure 2B).

    3. Reverse-complement the primer/probe sequence as necessary using Sequence Manipulation Suite (https://www.bioinformatics.org/sms2/rev_comp.html).

      >CN-CDC-N_F

      GGGGAACTTCTCCTGCTAGAAT



      Figure 2. Finding the Region of Interest (ROI) in MSA


    4. Select the ROI and copy selection as FASTA format (Figure 3A).

    5. Open a new file, "Paste (fasta-sequences)" and "Save" (Figure 3B). The sequence can be pasted in a text editor and saved as a FASTA file.



      Figure 3. Saving Region of Interest (ROI) for each primer/probe as a separate FASTA file


      Video for Procedure D: https://youtu.be/H9UxkgAsMdE


  5. Sequence stratification: Option 1 – SequenceTracer

    1. Upload the individual FASTA file to SequenceTracer and hit "Submit" (Figure 4A) (http://www1.szu.cz:8080/EntropyCalcWeb/sequences). The SequenceTracer segregates data into discrete groups of identical sequence variants and presents a detailed view of the nucleotide variation in each ROI along with the frequency of each variant. Moreover, the sequences showing ambiguous sequences are grouped as “outgroup1”, short sequences are grouped as “outgroup2” and missing sequences are grouped as “excluded”.

    2. Download the stratified data showing a list of sequence variants and/or a chart (Figure 4B).



      Figure 4. Sequence stratification using SequenceTracer


      Video for Steps E1-E2: https://youtu.be/ysT_KBXkpvw


    3. The "stratify" file can be opened using Microsoft Excel while the "compressed" file can be opened using the AliView program or a text editor (Figure 5).

    4. The data of any sequence variant group can be downloaded (Figure 6).

      1. First, select the group.

      2. Then, “Add all to Notes”.

      3. Finally, "export".



        Figure 5. Expected results from SequenceTracer



        Figure 6. Downloading data of a specific variant group


      Video for Steps E3-E4: https://www.youtube.com/watch?v=4S0T9TW5ax4


  6. Sequence stratification: Option 2 – ElimDupes

    1. Upload the individual FASTA file to ElimDupes (https://www.hiv.lanl.gov/content/sequence/elimdupesv2/elimdupes.html).

    2. Select parameters as shown and hit "Submit" (Figure 7A).

    3. View or Download "Unique sequences with rank and count appended (_count)" (Figure 8B).

    4. The file would show sequence variants, along with rank and count added to the sequence name. The file can be opened using the AliView program or a text editor (Figure 7C).

      Note: As opposed to SequenceTracer, variants with ambiguous sequences, and with missing sequences would be ranked in the results and need to be separated manually.



      Figure 7. Sequence stratification using ElimDupes


  7. Position Nucleotide Numerical Summary (PNNS)

    1. As the sequence variation was moderate, the base composition of each nucleotide position was not analyzed in the original publication. This can be performed for highly variable regions using the Position Nucleotide Numerical Summary (PNNS) calculator (http://entropy.szu.cz:8080/EntropyCalcWeb/pnns).

Data analysis

First, the sequences with ambiguous nucleotides (outgroup 1), short sequences (outgroup 2) and missing sequences (excluded) are removed and the number of “informative” sequences is calculated by subtracting these three groups from the total number of sequences. SequenceTracer performs this calculation automatically whereas the calculation needs to be performed manually if using ElimDupes. The informative group is then divided into hits with a perfect match and hits with mismatches for each primer and probe. To minimize the effect of low prevalent variants and sequencing errors in the data on the analysis, we defined a threshold of 0.5% in our original publication (Khan and Cheung, 2020) where only the sequence variants with ≥ 0.5% incidence were further considered. As more high quality viral sequences become available a more stringent threshold (for instance 0.1%) may be defined. Another way of defining a threshold is to include all the mutations in the analysis that occur more than once in different sequencing experiments (Osorio and Correia-Neves, 2020). The number and frequency of the sequences with the perfect match and with mismatches are then calculated from sequences above the defined threshold for each primer and probe. As an example, the analysis of CN-CDC-N forward primer 5′-GGGGAACTTCTCCTGCTAGAAT-3′ (WHO, 2020) is shown in Table 1. The summary of the analysis for 27 previously published PCR assays is presented in Table 2 of our previous publication (Khan and Cheung, 2020).

    Mismatches can also be divided into mismatches in the 3′ end (last 5 nucleotides) and the 5′ end. It is known that PCR amplification is more prone to mismatches at the 3′ end of the primer (Whiley and Sloots, 2005; Stadhouders et al., 2010 ; Lefever et al., 2013 ). Moreover, mismatches in the probe can have a deleterious effect on PCR amplification. Even a single mismatch may reduce the sensitivity of the assay and lead to false-negative results due to the prevention of probe binding and subsequence fluorescence (Chow et al., 2011 ; Brault et al., 2012).


Table 1. Analysis of CN-CDC-N forward primer

Acknowledgments

The protocol is the detailed version of the method used in our previous publication (Khan and Cheung, 2020). We gratefully acknowledge the authors, originating and submitting laboratories of the sequences from GISAID’s EpiCoVTM Database on which our research is based. The list is included in electronic supplementary material, file 1 of our original publication. Funding for this study was provided by the Canadian Institutes of Health Research operating grant (number RN227427 – 324983) awarded to PC.

Competing interests

The author has no competing interests.

References

  1. Brault, A. C., Fang, Y., Dannen, M., Anishchenko, M. and Reisen, W. K. (2012). A naturally occurring mutation within the probe-binding region compromises a molecular-based West Nile virus surveillance assay for mosquito pools(Diptera: Culicidae). J Med Entomol 49(4): 939-941.
  2. Chan, J. F., Kok, K. H., Zhu, Z., Chu, H., To, K. K., Yuan, S. and Yuen, K. Y. (2020). Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerg Microbes Infect 9(1): 221-236.
  3. Chow, C. K., Qin, K., Lau, L. T. and Cheung-Hoi Yu, A. (2011). Significance of a single-nucleotide primer mismatch in hepatitis B virus real-time PCR diagnostic assays. J Clin Microbiol 49(12): 4418-4419; author reply 4420.
  4. Commission-Services (2020). Current performance of COVID-19 test methods and devices and proposed performance criteria- Working document of Commission services. Retrieved May 6, 2020, from https://ec.europa.eu/docsroom/documents/40805.
  5. FDA (2020). Policy for Coronavirus Disease-2019 Tests During the Public Health Emergency(Revised). Retrieved May 6, 2020, from https://www.fda.gov/media/135659/download.
  6. Hatcher, E. L., Zhdanov, S. A., Bao, Y., Blinkova, O., Nawrocki, E. P., Ostapchuck, Y., Schaffer, A. A. and Brister, J. R. (2017). Virus Variation Resource- improved response to emergent viral outbreaks. Nucleic Acids Res 45(D1): D482-D490.
  7. (2020). Applications for medical devices under the Interim Order for use in relation to COVID-19: Guidance document. Retrieved July 2, 2020, from https://www.canada.ca/en/health-canada/services/drugs-health-products/drug-products/announcements/interim-order-importation-sale-medical-devices-covid-19.html.
  8. Katoh, K., Misawa, K., Kuma, K. and Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30(14): 3059-3066.
  9. Katoh, K., Rozewicki, J. and Yamada, K. D. (2019). MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform 20(4): 1160-1166.
  10. Khan, K. A. and Cheung, P. (2020). Presence of mismatches between diagnostic PCR assays and coronavirus SARS-CoV-2 genome. Royal Society Open Science 7(6): 200636.
  11. Larsson, A. (2014). AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics 30(22): 3276-3278.
  12. Lefever, S., Pattyn, F., Hellemans, J. and Vandesompele, J. (2013). Single-nucleotide polymorphisms and other mismatches reduce performance of quantitative PCR assays. Clin Chem 59(10): 1470-1480.
  13. Lu, R., Zhao, X., Li, J., Niu, P., Yang, B., Wu, H., Wang, W., Song, H., Huang, B., Zhu, N., Bi, Y., Ma, X., Zhan, F., Wang, L., Hu, T., Zhou, H., Hu, Z., Zhou, W., Zhao, L., Chen, J., Meng, Y., Wang, J., Lin, Y., Yuan, J., Xie, Z., Ma, J., Liu, W. J., Wang, D., Xu, W., Holmes, E. C., Gao, G. F., Wu, G., Chen, W., Shi, W. and Tan, W. (2020). Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet 395(10224): 565-574.
  14. Nagy, A., Jirinec, T., Jirincova, H., Cernikova, L. and Havlickova, M. (2019). In silico re-assessment of a diagnostic RT-qPCR assay for universal detection of Influenza A viruses. Sci Rep 9(1): 1630.
  15. Osorio, N. S. and Correia-Neves, M. (2020). Implication of SARS-CoV-2 evolution in the sensitivity of RT-qPCR diagnostic assays. Lancet Infect Dis (20)30435-7.
  16. Shu, Y. and McCauley, J. (2017). GISAID: Global initiative on sharing all influenza data- from vision to reality. Euro Surveill 22(13): 30494.
  17. Stadhouders, R., Pas, S. D., Anber, J., Voermans, J., Mes, T. H. and Schutten, M. (2010). The effect of primer-template mismatches on the detection and quantification of nucleic acids using the 5' nuclease assay. J Mol Diagn 12(1): 109-117.
  18. Stothard, P. (2000). The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences. Biotechniques 28(6): 1102-1104.
  19. Tsang, J. and LaManna, C. M. (2020). Open Sharing During COVID-19: CRISPR-Based Detection Tools. The CRISPR Journal 3(3): 142-145.
  20. Ugurel, O. M., Ata, O. and Turgut-Balik, D. (2020). An updated analysis of variations in SARS-CoV-2 genome. Turk J Biol 44(3): 157-167.
  21. Whiley, D. M. and Sloots, T. P. (2005). Sequence variation in primer targets affects the accuracy of viral quantitative PCR. J Clin Virol 34(2): 104-107.
  22. WHO. (2020). In-house developed molecular assays. Retrieved April 16, 2020, from https://www.who.int/docs/default-source/coronaviruse/whoinhouseassays.pdf?sfvrsn=de3a76aa_2.
  23. Worldometers.info (2020). COVID-19 Coronavirus Pandemic. Retrieved April, 16, 2020, from https://www.worldometers.info/coronavirus/.
  24. Wu, F., Zhao, S., Yu, B., Chen, Y. M., Wang, W., Song, Z. G., Hu, Y., Tao, Z. W., Tian, J. H., Pei, Y. Y., Yuan, M. L., Zhang, Y. L., Dai, F. H., Liu, Y., Wang, Q. M., Zheng, J. J., Xu, L., Holmes, E. C. and Zhang, Y. Z. (2020). A new coronavirus associated with human respiratory disease in China. Nature 579(7798): 265-269.
  25. Zhou, P., Yang, X. L., Wang, X. G., Hu, B., Zhang, L., Zhang, W., Si, H. R., Zhu, Y., Li, B., Huang, C. L., Chen, H. D., Chen, J., Luo, Y., Guo, H., Jiang, R. D., Liu, M. Q., Chen, Y., Shen, X. R., Wang, X., Zheng, X. S., Zhao, K., Chen, Q. J., Deng, F., Liu, L. L., Yan, B., Zhan, F. X., Wang, Y. Y., Xiao, G. F. and Shi, Z. L. (2020). A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579(7798): 270-273.
  26. NDGC. (2020). Database Resources of the National Genomics Data Center in 2020. Nucleic Acids Res 48:D24-D33. 10.1093/nar/gkz913.

简介

[摘要]严重急性呼吸综合征冠状病毒2(SARS-CoV的-2;最初命名2019-nCoV)是负责最近冠状病(COVID-19)的流行,和聚合酶链式反应(PCR)是用于诊断的当前标准方法从患者样本中提取。 由于病毒基因组中的突变,PCR分析容易发生序列错配,因此定期验证引物/探针结合区的基因组变异性非常重要。此分步协议描述了一种生物信息学方法,可用于广泛的评估SARS-CoV-2基因组引物/探针靶区域内的序列变异性。该协议可使用免费提供的软件程序和提供的即用型多序列比对(MSA)文件应用于选择的任何分子诊断分析。

图形摘要:

序列跟踪协议概述。该图是根据CC BY-NC-SA 4.0许可证(https://creativecommons.org/licenses/by-nc-sa/4.0/)许可的翻筋斗科学和医学插图图书馆18:24创建的。

视频一bstract:https://youtu.be/M1lV1liWE9k

[背景]严重急性呼吸综合征冠状病毒2(SARS-CoV-2;最初命名为2019-nCoV)是被称为COVID-19的新型冠状病毒疾病的病因。该病毒起源于中国武汉,并在世界范围内迅速传播,引起了全球大流行(Worldometers.info,2020)。该病毒的测序表明其单链RNA基因组大小约为30 kb (Chan等,2020; Lu等,2020; Wu等,2020; Zhou等,2020)。爆发初期病毒序列的可用性帮助开发了几种聚合酶链反应(PCR)检测方案,这些方案已有助于从患者样本中诊断疾病(WHO,2020年)。但是,病毒基因组在自然进化过程中的遗传变异性会在诊断方法和模板之间造成错配的潜在风险,从而可能导致假阴性结果(Whiley和Sloots,2005; Chow等,2011)。从世界各地分离出的SARS-CoV-2病毒的序列正在序列数据库中保存,并且已在循环病毒的基因组中鉴定出突变(Ugurel等,2020)。

我们使用超过17,000种可公开获得的病毒序列对引物/探针结合区中的序列变异进行了评估,从而对已发表的诊断性PCR分析进行了广泛的评估,包括世界卫生组织(WHO)推荐的诊断(Khan和Cheung,2020年)。另一同时发表的出版物报道了使用1825个序列的引物/探针结合区域中的突变,但是没有提供详细的序列追踪方案(Osorio和Correia-Neves,2020)。此分步协议概述了使用免费提供的开源软件程序的生物信息学管道。该流水线可以在常规的台式计算机上执行,而无需任何特殊的硬件,并且不需要大量的计算技能。通过开放科学框架(OSF)提供的即用型多序列A点火(MSA)文件使其变得更加直观。我nclusivity分析通过验证,在硅片核苷酸身份匹配是对COVID-19的诊断分析的审批监管要求之一(佣金服务年,2020年,FDA年,2020年;健康加拿大,2020年)。该协议还可以应用于SARS-CoV-2的其他分子诊断测定,包括正在开发的基于护理点的基于CRISPR的诊断测定(Tsang和LaManna,2020)。

关键字:冠状病毒SARS-CoV-2, 新型冠状病毒肺炎, 诊断, 基因组变异性, 多聚酶链反应(PCR), 突变

 
设备
 
常规Windows或Mac OS X笔记本电脑或台式机。
注意:不需要特殊的处理器或RAM,但是可以通过同时打开有限数量的文件来避免内存问题。概述的协议是在安装有进行笔记本电脑的Windows 10,英特尔的Core i5-8265U处理器,CPU @ 1.60GHz的和的8 GB RAM。
 
软件
 
MAFFT版本7在线服务(Katoh等人,2002和2019)(可从https://mafft.cbrc.jp/alignment/software/closelyrelatedviralgenomes.html获得)
AliView版本1.26 (2014年,拉尔森)(可从https://ormbunkar.se/aliview/获得)
序列操作套件版本2 (Stothard,2000)(可从https://www.bioinformatics.org/sms2/rev_comp.html获得)
SequenceTracer (Nagy等人,2019)(可从http://www1.szu.cz:8080/EntropyCalcWeb/sequences获取)。
ElimDupes(https://www.hiv.lanl.gov/content/sequence/elimdupesv2/elimdupes.html)
PNNS计算器(可从http://entropy.szu.cz:8080/EntropyCalcWeb/pnns获得)
网络浏览器(例如Google Chrome或Mozilla Firefox)
文本编辑器(例如Microsoft记事本)。
 
程序
 
你可以跳牛逼Ø程序d下载最新versio一个随时可以使用的SARS-COV-2多序列A的ñ FR lignment(MSA)文件上的OSF OM我们的项目页面。
 
病毒序列数据库
从您选择的存储库中下载病毒序列。
注意:请检查每个存储库的条款和条件,并注意数据共享策略。可能需要注册。
全球共享禽流感数据倡议组织的EPICO V数据库(https://www.gisaid.org/)(舒和麦考,2017年)。
NCBI病毒(https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/)(孵化器等,2017) 。
中国国家基因组学数据中心(NGDC)数据库(https://bigd.big.ac.cn/ncov)(NGDC,2020年)。
EMBL-EBI的COVID-19数据门户(https://www.covid19dataportal.org/)。
英国COVID-19基因组学(COG-UK)联盟(https://www.cogconsortium.uk/data/)。
下载武汉-Hu-1的完整基因组(NCBI参考序列:NC_045512.2;https://www.ncbi.nlm.nih.gov/nucleotide/)。
 
使用专用于紧密相关病毒基因组MSA的MAFFT在线服务进行多序列A修饰(MSA)(https://mafft.cbrc.jp/alignment/software/closelyrelatedviralgenomes.html)。
输入端(图URE 1 A):
武汉-Hu-1(NC_045512.2)的完整基因组到“现有比对”框中。
将其他序列输入“片段序列”框中。
注意:MAFFT在线服务最多支持20,000个序列,长度约30 kb。如果要比对更多序列,则应分批执行任务,并且应在序列分层后合并结果。
参数(图1 B):
大写/小写,选择“与输入相同”。
核苷酸序列的方向,选择“根据第一个序列调整方向”。
输出顺序,选择“对齐”。
高级设置(图1 C):
保持对齐长度,选择“是”。
策略,选择“自动” 。
一旦可用,以FASTA格式下载比对序列。
 


Figu重1 。使用MAFFT在线服务的多序列比对(MSA)
 
程序AB的视频:https : //youtu.be/hbnsXnikRak
 
或者,从我们的OSF页面(https://doi.org/10.17605/OSF.IO/NPCS6)下载即可使用的MSA文件。
注意:我们原始出版物(Khan和Cheung,2020年)中的数据是从GISAID下载的,不允许公开发布MSA文件。我们OSF页面上提供的MSA文件是使用从NCBI病毒下载的序列生成的。该文件将在大流行期间定期更新(文件1对齐19863 SARS-CoV-2序列)。
 
将每个引物/探针的感兴趣区域(ROI)保存为单独的FASTA文件
从AliView程序中的过程B或过程C打开MSA文件。
注意:Aliview程序(可从https://ormbunkar.se/aliview/获得)需要预先在计算机上下载。
使用“查找”功能(图2 A)或使用“从剪贴板添加和比对序列”功能(图2 B)查找引物结合位点。
使用序列操作套件(https://www.bioinformatics.org/sms2/rev_comp.html),根据需要对引物/探针序列进行反互补。
> CN-CDC-N_F
GGGGAACTTCTCCTGCTAGAAT
 


FIGUR ê 2 。翅片定感兴趣区域(ROI)的MSA
 
选择ROI并复制选择为FASTA格式(图URE 3 A)。
打开一个新文件,“粘贴(fasta序列)”和“保存”(图3 B)。该序列可以粘贴在文本编辑器中,并保存为FASTA文件。
 


图3 。将每个引物/探针的关注区域(ROI)保存为单独的FASTA文件
 
程序D的视频:https : //youtu.be/H9UxkgAsMdE
 
序列分层:选项1 – SequenceTracer。
将单个FASTA文件上传到SequenceTracer并单击“提交”(图4 A)(http://www1.szu.cz:8080/EntropyCalcWeb/sequences)。所述SequenceTracer偏析数据到相同的序列的离散组变体,并提出在每个ROI中的核苷酸变异的与每个变体的频率沿的详细视图。此外,将表示不明确序列的序列分组为“ outgroup1”,将短序列分组为“ outgroup2”,将缺失序列分组为“ excluded”。
下载显示序列变体列表和/或图表的分层数据(图4 B)。
 


FIGUR ê 4 。使用SequenceTracer进行序列分层
 
视频为科技教育p小号E1- ê 2:https://youtu.be/ysT_KBXkpvw
 
的“分层”文件可以使用Microsoft Excel而“压缩”的文件可被打开打开全光照克AliView程序或文本编辑器(图5 )。
可以下载任何序列变异组的数据(图6 )。
首先,选择组。
然后,“全部添加到便笺” 。
最后是“导出” 。
 


FIGUR ê 5 。SequenceTracer的预期结果
图片包含图形用户界面描述自动生成
FIGUR ê 6 。下载特定变体组的数据
 
视频的步骤小号E3- è 4:https://www.youtube.com/watch?v=4S0T9TW5ax4
 
序列小号tratification:选项2 - ElimDupes。
将单个FASTA文件上传到ElimDupes(https://www.hiv.lanl.gov/content/sequence/elimdupesv2/elimdupes.html)。
如图所示和按下“提交”(图选择参数URE 7 A)。
查看或下载“附加了等级和计数(_count)的唯一序列”(图8B)。
该文件将显示序列变体,以及添加到序列名称的等级和计数。可以使用AliView程序或文本编辑器打开文件(图7 C)。
注意:与SequenceTracer相对,具有歧义序列和缺失序列的变体将在结果中排名,并且需要手动分离。
 


Figu重新7 。小号使用层序分层ElimDupes
 
位置核苷酸数值汇总(PNNS)
由于序列变化适中,因此在原始出版物中未分析每个核苷酸位置的碱基组成。可以使用位置核苷酸数值摘要(PNNS)计算器(http://entropy.szu.cz:8080/EntropyCalcWeb/pnns)对高度可变的区域执行此操作。
 
数据分析
 
首先,去除具有歧义核苷酸(第1组),短序列(第2组)和缺失序列(排除)的序列,并通过从序列总数中减去这3个组来计算“信息性”序列的数量。SequenceTracer自动执行此计算,而如果使用ElimDupes,则需要手动执行计算。然后,将信息丰富的组划分为每个引物和探针具有最佳匹配的匹配结果和具有不匹配的匹配结果。为了最大程度地减少数据中的低普遍性变异和测序误差对分析的影响,我们在原始出版物(Khan和Cheung,2020年)中将阈值定义为0.5%,其中仅进一步考虑发生率≥0.5%的序列变异。随着更多高质量的病毒序列变得可用,可以定义更严格的阈值(例如0.1%)。定义阈值的另一种方法是包括分析中所有在不同测序实验中发生一次以上的突变(Osorio和Correia-Neves,2020)。然后,根据每个引物和探针的定义阈值以上的序列,计算出具有完全匹配和不匹配的序列的数目和频率。作为一个例子,的CN-CDC-N的分析正向引物5 ' - GGGGAACTTCTCCTGCTAGAAT- 3 ' (WHO,2020)被示出在Ť能够1.分析27所先前公布的PCR测定的本发明内容中给出Ť能够2我们以前的出版物(汗和张,2020)。
错配也可以分为3 '端(最后5个核苷酸)和5 '端的错配。已知PCR扩增在引物的3 '末端更倾向于错配(Whiley和Sloots,2005; Stadhouders等,2010; Lefever等,2013)。而且,探针中的错配可能对PCR扩增具有有害作用。由于防止了探针结合和子序列荧光,即使是单个错配也可能降低测定的灵敏度并导致假阴性结果(Chow等,2011; Brault等,2012)。


表1 。CN-CDC-N正向引物分析
 
序列数(频率)
序列总数
17027
删除的序列(outgroup1 + outgroup2 +排除)
170
信息序列
16857
完美匹配的序列
13533
序列不匹配
3324
序列超出阈值(0.5%)

16662
完美搭配
13533(81.22%)
不匹配
3129(18.78%)
序列超过阈值(0.1%)

16817
完美搭配
13533(80.47%)
不匹配
3284(19.53%)
出现次数超过1的序列

16852
完美搭配
13533(80.31%)
不匹配
3319(19.69%)
 
致谢
 
该协议是我们以前的出版物中使用的方法的详细版本(Khan和Cheung,2020年)。我们非常感谢GISAID EpiCoV TM数据库序列的作者,实验室的发起者和提交者,这些序列是我们研究的基础。该列表包含在电子补充材料中,即我们原始出版物的文件1。这项研究的资金由授予PC的加拿大卫生研究院运营补助金(编号RN227427 – 324983)提供。
 
利益争夺
 
作者没有竞争利益。
 
参考文献
              Brault,AC,Fang,Y.,Dannen,M.,Anishchenko,M.和Reisen,WK(2012)。探针结合区内的天然突变破坏了基于分子的西尼罗河病毒对蚊帐的监测方法(双翅目::科)。Med Entomol 49(4):939-941。
Chan,JF,Kok,KH,Zhu,Z.,Chu,H.,To,KK,Yuan,S. and Yuen,KY(2020)。在访问武汉后从一名非典型肺炎患者中分离出的2019年新型人类致病性冠状病毒的基因组特征。新兴微生物感染9(1):221-236。
Chow,CK,Qin,K.,Lau,LT and Cheung-Hoi Yu,A.(2011年)。单核苷酸引物错配在乙型肝炎病毒实时PCR诊断测定中的意义。临床微生物学杂志49(12):4418-4419; 作者回复4420。
委员会服务(2020)。COVID-19测试方法和设备的当前性能以及建议的性能标准-委员会服务的工作文件。于2020年5月6日从https://ec.europa.eu/docsroom/documents/40805检索。
FDA(2020年)。公共卫生突发事件中的冠状病毒疾病-2019测试政策(修订版)。于2020年5月6日从https://www.fda.gov/media/135659/download检索。
              Hatcher,EL,Zhdanov,SA,Bao,Y.,Blinkova,O.,Nawrocki,EP,Ostapchuck,Y.,Schaffer,AA and Brister,JR(2017)。病毒变异资源-改进了对突发病毒爆发的反应。核酸研究45(D1):D482-D490。
加拿大卫生署(2020)。临时命令下针对COVID-19使用的医疗设备的申请:指导文件。于2020年7月2日从https://www.canada.ca/en/health-canada/services/drugs-health-products/drug-products/announcements/interim-order-importation-sale-medical-devices-covid检索-19.html。
              Katoh,K.,Misawa,K.,Kuma,K。和Miyata,T。(2002)。MAFFT:一种基于快速傅立叶变换的快速多序列比对的新方法。Nucleic Acids Res 30(14):3059-3066。              
              Katoh,K.,Rozewicki,J.和Yamada,KD(2019)。MAFFT在线服务:多序列比对,交互式序列选择和可视化。Brief Bioinform 20(4):1160-1166。
              Khan,KA and Cheung,P.(2020年)。预小号ENCE诊断性PCR测定和冠状SARS-CoV的-2基因组之间的不匹配的。皇家学会开放科学7(6):200636。
              Larsson,A.(2014年)。AliView:快速,轻量级的路线查看器和编辑器,用于大型数据集。生物信息学30(22):3276-3278。              
              Lefever,S.,Pattyn,F.,Hellemans,J.和Vandesompele,J.(2013)。单核苷酸多态性和其他错配会降低定量PCR分析的性能。临床化学59(10):1470-1480。              
              陆荣华,赵旭。,李健。,牛平。,杨斌。,吴虹。,王威。,宋虹。,黄斌。,朱娜, Bi,Y.,Ma,X.,Zhan,F.,Wang,L.,Hu,T.,Zhou,H.,Hu,Z.,Zhou,W.,Zhao,L.,Chen,J., Meng,Y.,Wang,J.,Lin,Y.,Yuan,J.,Xie,Z.,Ma,J.,Liu,WJ,Wang,D.,Xu,W.,Holmes,EC,Gao, GF,Wu,G.,Chen,W.,Shi,W. and Tan,W.(2020)。2019年新型冠状病毒的基因组表征和流行病学:对病毒起源和受体结合的影响。柳叶刀395(10224):565-574。
Nagy,A.,Jirinec,T.,Jirincova,H.,Cernikova,L.和Havlickova,M.(2019)。在计算机上对RT-qPCR诊断分析进行计算机重新评估,以全面检测A型流感病毒。科学代表9(1):1630。              
NS,Osorio和M.Correia-Neves(2020)。SARS-CoV-2进化对RT-qPCR诊断检测灵敏度的影响。柳叶刀感染病。doi:10.1016 / S1473-3099(20)30435-7 。
              Shu,Y.和McCauley,J.(2017年)。GISAID:共享所有流感数据的全球计划-从视觉到现实。欧洲监视22(13):30494。              
Stadhouders,R.,Pas,SD,Anber,J.,Voermans,J.,Mes,TH和Schutten,M.(2010)。引物-模板错配对使用5'核酸酶测定法检测和定量核酸的影响。分子诊断杂志12(1):109-117。
              Stothard,P.(2000年)。序列操作套件:用于分析和格式化蛋白质和DNA序列的JavaScript程序。生物技术28(6):110 2-11 04。
Tsang,J.和LaManna,CM(2020)。COVID-19期间的开放共享:基于CRISPR的检测工具。CRISPR Journal 3(3):142-145。              
              乌格里尔(Ogurel,OM),阿塔(Ota)和O.Turgut-Balik,D.(2020)。SARS-CoV-2基因组变异的最新分析。Turk J Biol 44(3):157-167。              
Whiley,DM和Sloots,TP(2005)。引物靶的序列变异影响病毒定量PCR的准确性。临床病毒学杂志34(2):104-107。
              世卫组织。(2020)。内部开发的分子测定。于2020年4月16日从https://www.who.int/docs/default-source/coronaviruse/whoinhouseassays.pdf?sfvrsn=de3a76aa_2检索。
Worldometers.info(2020)。COVID-19冠状病毒大流行。于2020年4月16日从https://www.worldometers.info/coronavirus/检索。
              吴峰,赵胜,于宝斌,陈艳梅,王文W,宋ZG,胡燕燕,陶,ZW,田,JH,裴,YY,袁,ML,张,YL,Dai,FH,Liu,Y.,Wang,QM,Zheng,JJ,Xu,L.,Holmes,EC and Zhang,YZ(2020)。一种与人类呼吸系统疾病有关的新型冠状病毒。自然579(7798):265-269。              
              周平,杨,XL,王,XG,胡,B.,张L.,张W.,Si,HR,朱,Y.,李B.,黄,CL,陈,HD,陈洁。,罗勇。,郭浩。,江RD,刘,MQ,陈勇。,沉,XR,王,X.,郑,XS,赵,K.,陈QJ,邓飞,刘琳琳,严宝龙,詹FX,王玉英,肖广发,石振龙(2020)。与可能是蝙蝠起源的新冠状病毒相关的肺炎暴发。自然579(7798):270-273。
NDGC。(2020年)。2020年国家基因组数据中心的数据库资源。核酸研究48:D24-D33。10.1093 / nar / gkz913 。
登录/注册账号可免费阅读全文
  • English
  • 中文翻译
免责声明 × 为了向广大用户提供经翻译的内容,www.bio-protocol.org 采用人工翻译与计算机翻译结合的技术翻译了本文章。基于计算机的翻译质量再高,也不及 100% 的人工翻译的质量。为此,我们始终建议用户参考原始英文版本。 Bio-protocol., LLC对翻译版本的准确性不承担任何责任。
Copyright: © 2020 The Authors; exclusive licensee Bio-protocol LLC.
引用:Khan, K. A. and Cheung, P. (2020). Evaluation of the Sequence Variability within the PCR Primer/Probe Target Regions of the SARS-CoV-2 Genome. Bio-protocol 10(24): e3871. DOI: 10.21769/BioProtoc.3871.
提问与回复
提交问题/评论即表示您同意遵守我们的服务条款。如果您发现恶意或不符合我们的条款的言论,请联系我们:eb@bio-protocol.org。

如果您对本实验方案有任何疑问/意见, 强烈建议您发布在此处。我们将邀请本文作者以及部分用户回答您的问题/意见。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片的形式来说明遇到的问题。

如果您对本实验方案有任何疑问/意见, 强烈建议您发布在此处。我们将邀请本文作者以及部分用户回答您的问题/意见。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片的形式来说明遇到的问题。

Kashif Khan
York University作者
The current protocol is especially useful for people with limited computational skills. However, some procedures may become laborious or exceed the capacity of the software when handling a large number of sequences. These bash commands can be used to perform indicated procedures more easily on macOS Terminal or Ubuntu subsystem for Windows.

#Procedure B Multiple Sequence Alignment using MAFFT
mafft --auto --keeplength --maxambiguous 0.01 --addfragments othersequences.fasta referencesequence.fasta > MSA.fasta

#Procedure D Selection of ROI (replace START and LENGHT of ROI)
awk 'BEGIN{RS=">";FS="\n"}NR>1{seq="";for (i=2;i<=NF;i++) seq=seq""$i; print ">"$1"\n"substr(seq,START,LENGTH)}' MSA.fasta > ROI.fasta

2021/8/24 15:32:56 回复