基因组学 -系统生物学-BIO-PROTOCOL

Classification of a Massive Number of Viral Genomes and Estimation of Time of Most Recent Common Ancestor (tMRCA) of SARS-CoV-2 Using Phylodynamic Analsysis

使用系统动力学对大量病毒基因组分类并预估SARS-CoV-2最近共同祖先年代（tMRCA）

XH Xiaowen Hu SG Siqin Guan YH Yiliang He GY Guohui Yi LY Lei Yao* JZ Jiaming Zhang*

0 Q&A 1072 Views Mar 20, 2024

Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes.

Key features

• Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA.

• Classification of genomes based on highly linked sites using custom scripts.

• Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST).

• Visualization of posterior distribution of tMRCA using Tracer.v1.7.2.

• Optimized for the SARS-CoV-2.

Graphical overview

Graphical workflow of time of most recent common ancestor (tMRCA) estimation process

Phylogenetic Inference of Homologous/Orthologous Genes among Distantly Related Plants

远缘植物中同源/直系同源基因的系统发育推断

ZX Zilong Xu WS Wenyan Sun

Ziqiang Zhu BZ Bojian Zhong ZZ Zhenhua Zhang*

0 Q&A 578 Views Dec 5, 2023

The recent surge in plant genomic and transcriptomic data has laid a foundation for reconstructing evolutionary scenarios and inferring potential functions of key genes related to plants’ development and stress responses. The classical scheme for identifying homologous genes is sequence similarity–based searching, under the crucial assumption that homologous sequences are more similar to each other than they are to any other non-homologous sequences. Advances in plant phylogenomics and computational algorithms have enabled us to systemically identify homologs/orthologs and reconstruct their evolutionary histories among distantly related lineages. Here, we present a comprehensive pipeline for homologous sequences identification, phylogenetic relationship inference, and potential functional profiling of genes in plants.

Key features

• Identification of orthologs using large-scale genomic and transcriptomic data.

• This protocol is generalized for analyzing the evolution of plant genes.

Workflow for High-throughput Screening of Enzyme Mutant Libraries Using Matrix-assisted Laser Desorption/Ionization Mass Spectrometry Analysis of Escherichia coli Colonies

使用基质辅助激光解吸/电离质谱分析大肠杆菌菌落高通量筛选酶突变体文库的工作流程

KC Kisurb Choe JS Jonathan V. Sweedler*

0 Q&A 490 Views Nov 5, 2023

High-throughput molecular screening of microbial colonies and DNA libraries are critical procedures that enable applications such as directed evolution, functional genomics, microbial identification, and creation of engineered microbial strains to produce high-value molecules. A promising chemical screening approach is the measurement of products directly from microbial colonies via optically guided matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). Measuring the compounds from microbial colonies bypasses liquid culture with a screen that takes approximately 5 s per sample. We describe a protocol combining a dedicated informatics pipeline and sample preparation method that can prepare up to 3,000 colonies in under 3 h. The screening protocol starts from colonies grown on Petri dishes and then transferred onto MALDI plates via imprinting. The target plate with the colonies is imaged by a flatbed scanner and the colonies are located via custom software. The target plate is coated with MALDI matrix, MALDI-MS analyzes the colony locations, and data analysis enables the determination of colonies with the desired biochemical properties. This workflow screens thousands of colonies per day without requiring additional automation. The wide chemical coverage and the high sensitivity of MALDI-MS enable diverse screening projects such as modifying enzymes and functional genomics surveys of gene activation/inhibition libraries.

Key features

• Mass spectrometry analyzes a range of compounds from E. coli colonies as a proxy for liquid culture testing enzyme mutant libraries.

• Colonies are transferred to a MALDI target plate by a simple imprinting method.

• The screen compares the ratio among several products or searches for the qualitative presence of specific compounds.

• The protocol requires a MALDI mass spectrometer.

Graphical overview

Overview of the MALDI-MS analysis of microbial colonies for screening mutant libraries. Microbial cells containing a mutant library for enzymes/metabolic pathways are first grown in agar. The colonies are then imprinted onto a MALDI target plate using a filter paper intermediate. An optical image of the MALDI target plate is analyzed by custom software to find the locations of individual colonies and direct subsequent MALDI-MS analyses to the selected colonies. After applying MALDI matrix onto the target plate, MALDI-MS analysis of the colonies is performed. Colonies showing the desired product profiles are found by data analysis via the software, and the colonies are picked for downstream analysis.

Testing for Allele-specific Expression from Human Brain Samples

测试人脑样本的等位基因特异性表达

MD Maria E. Diaz-Ortiz NJ Nimansha Jain MG Michael D. Gallagher MP Marijan Posavi TU Travis L. Unger AC Alice S. Chen-Plotkin*

0 Q&A 405 Views Oct 5, 2023

Many single nucleotide polymorphisms (SNPs) identified by genome-wide association studies exert their effects on disease risk as expression quantitative trait loci (eQTL) via allele-specific expression (ASE). While databases for probing eQTLs in tissues from normal individuals exist, one may wish to ascertain eQTLs or ASE in specific tissues or disease-states not characterized in these databases. Here, we present a protocol to assess ASE of two possible target genes (GPNMB and KLHL7) of a known genome-wide association study (GWAS) Parkinson’s disease (PD) risk locus in postmortem human brain tissue from PD and neurologically normal individuals. This was done using a sequence of RNA isolation, cDNA library generation, enrichment for transcripts of interest using customizable cDNA capture probes, paired-end RNA sequencing, and subsequent analysis. This method provides increased sensitivity relative to traditional bulk RNAseq-based and a blueprint that can be extended to the study of other genes, tissues, and disease states.

Key features

• Analysis of GPNMB allele-specific expression (ASE) in brain lysates from cognitively normal controls (NC) and Parkinson’s disease (PD) individuals.

• Builds on the ASE protocol of Mayba et al. (2014) and extends application from cells to human tissue.

• Increased sensitivity by enrichment for desired transcript via RNA CaptureSeq (Mercer et al., 2014).

• Optimized for human brain lysates from cingulate gyrus, caudate nucleus, and cerebellum.

Graphical overview

Protocols to Induce, Prevent, and Treat Post-traumatic Stress Disorder-like Memory in Mice: Optogenetics and Behavioral Approaches

诱导、预防和治疗小鼠创伤后应激障碍样记忆的方案：光遗传学和行为方法

AA Aline S. Al Abed AS Azza Sellami ED Eva-Gunnel Ducourneau CB Chloé Bouarab AM Aline Marighetto AD Aline Desmedt*

0 Q&A 2211 Views Oct 5, 2021

One of the cardinal features of post-traumatic stress disorder (PTSD) is a paradoxical memory alteration including both emotional hypermnesia for salient trauma-related cues and amnesia for the surrounding traumatic context. Interestingly, some clinical studies have suggested that contextual amnesia would causally contribute to the PTSD-related hypermnesia insofar as decontextualized, traumatic memory is prone to be reactivated in contexts that can be very different from the original traumatic context. However, most current animal models of PTSD-related memory focus exclusively on the emotional hypermnesia, i.e., the persistence of a strong fear memory, and do not distinguish normal (adaptive) from pathological (PTSD-like) fear memory, leaving unexplored the hypothetical critical role of contextual amnesia in PTSD-related memory formation, and thus challenging the development of innovative treatments. Having developed the first animal model that precisely recapitulates the two memory components of PTSD in mice (emotional hypermnesia and contextual amnesia), we recently demonstrated that contextual amnesia, induced by optogenetic inhibition of the hippocampus (dorsal CA1), is a causal cognitive process of PTSD-like hypermnesia formation. Moreover, the hippocampus-dependent contextualization of traumatic memory, by optogenetic activation of dCA1 in traumatic condition, prevents PTSD-like hypermnesia formation. Finally, once PTSD-like memory has been formed, the re-contextualization of traumatic memory by its reactivation in the original traumatic context normalizes this pathological fear memory. Revealing the key role of contextual amnesia in PTSD-like memory, this procedure opens a therapeutic perspective based on trauma contextualization and the underlying hippocampal mechanisms.

Protocol for RNA-seq Expression Analysis in Yeast

酵母RNA-seq表达分析方法

SB Stefan Bohn*

0 Q&A 3008 Views Sep 20, 2021

Genome-wide sequencing of RNA (RNA-seq) has become an inexpensive tool to gain key insights into cellular and disease mechanisms. Sample preparation and sequencing are streamlined and allow the acquisition of hundreds of gene expression profiles in a few days; however, in particular, data processing, curation, and analysis involve numerous steps that can be overwhelming to non-experts. Here, the sample preparation, sequencing, and data processing workflow for RNA-seq expression analysis in yeast is described. While this protocol covers only a small portion of the RNA-seq landscape, the principal workflow common to such experiments is described, allowing the reader to adapt the protocol where necessary.

Graphic abstract:

Basic workflow of RNA-seq expression analysis.

Detecting Differentially Methylated Promoters in Genes Related to Disease Phenotypes Using R

利用R检测与疾病表型相关基因的差异甲基化启动子

JM Jordi Martorell Marugán PC Pedro Carmona-Sáez*

0 Q&A 2808 Views Jun 5, 2021

DNA methylation in gene promoters plays a major role in gene expression regulation, and alterations in methylation patterns have been associated with several diseases. In this context, different software suites and statistical methods have been proposed to analyze differentially methylated positions and regions. Among them, the novel statistical method implemented in the mCSEA R package proposed a new framework to detect subtle, but consistent, methylation differences. Here, we provide an easy-to-use pipeline covering all the necessary steps to detect differentially methylated promoters with mCSEA from Illumina 450K and EPIC methylation BeadChips data. This protocol covers the download of data from public repositories, quality control, data filtering and normalization, estimation of cell type proportions, and statistical analysis. In addition, we show the procedure to compare disease vs. normal phenotypes, obtaining differentially methylated regions including promoters or CpG Islands. The entire protocol is based on R programming language, which can be used in any operating system and does not require advanced programming skills.

Identification of R-loop-forming Sequences in Drosophila melanogaster Embryos and Tissue Culture Cells Using DRIP-seq

DRIP-seq法鉴定黑腹果蝇胚胎和组织培养细胞R-loop形成序列

CA Célia Alecki NF Nicole J. Francis*

0 Q&A 4953 Views May 5, 2021

R-loops are non-canonical nucleic structures composed of an RNA–DNA hybrid and a displaced ssDNA. Originally identified as a source of genomic instability, R-loops have been shown over the last decade to be involved in the targeting of proteins and to be associated with different histone modifications, suggesting a regulatory function. In addition, R-loops have been demonstrated to form differentially during the development of different tissues in plants and to be associated with diseases in mammals. Here, we provide a single-strand DRIP-seq protocol to identify R-loop-forming sequences in Drosophila melanogaster embryos and tissue culture cells. This protocol differs from earlier DRIP protocols in the fragmentation step. Sonication, unlike restriction enzymes, generates a homogeneous and highly reproducible nucleic acid fragment pool. In addition, it allows the use of this protocol in any organism with minimal optimization. This protocol integrates several steps from published protocols to identify R-loop-forming sequences with high stringency, suitable for de novo characterization.

Graphic abstract:

Figure 1. Overview of the strand-specific DRIP-seq protocol

Computational Analysis and Phylogenetic Clustering of SARS-CoV-2 Genomes

SARS-CoV-2基因组计算分析及系统进化聚类分析

BJ Bani Jolly VS Vinod Scaria*

1 Q&A 5075 Views Apr 20, 2021

COVID-19, the disease caused by the novel SARS-CoV-2 coronavirus, originated as an isolated outbreak in the Hubei province of China but soon created a global pandemic and is now a major threat to healthcare systems worldwide. Following the rapid human-to-human transmission of the infection, institutes around the world have made efforts to generate genome sequence data for the virus. With thousands of genome sequences for SARS-CoV-2 now available in the public domain, it is possible to analyze the sequences and gain a deeper understanding of the disease, its origin, and its epidemiology. Phylogenetic analysis is a potentially powerful tool for tracking the transmission pattern of the virus with a view to aiding identification of potential interventions. Toward this goal, we have created a comprehensive protocol for the analysis and phylogenetic clustering of SARS-CoV-2 genomes using Nextstrain, a powerful open-source tool for the real-time interactive visualization of genome sequencing data. Approaches to focus the phylogenetic clustering analysis on a particular region of interest are detailed in this protocol.

Reference-free Association Mapping from Sequencing Reads Using k-mers

利用k-mers进行测序读取的无参考关联映射

ZM Zakaria Mehrab JM Jaiaid Mobin IT Ibrahim Asadullah Tahmid LP Lior Pachter* AR Atif Rahman*

0 Q&A 4296 Views Nov 5, 2020

Association mapping is the process of linking phenotypes with genotypes. In genome wide association studies (GWAS), individuals are first genotyped using microarrays or by aligning sequenced reads to reference genomes. However, both these approaches rely on reference genomes which limits their application to organisms with no or incomplete reference genomes. To address this, reference free association mapping methods have been developed. Here we present the protocol of an alignment free method for association studies which is based on counting k-mers in sequenced reads, testing for associations between k-mers and the phenotype of interest, and local assembly of the k-mers of statistical significance. The method can map associations of categorical phenotypes to sequence and structural variations without requiring prior sequencing of reference genomes.