系统生物学


分类

现刊
往期刊物
0 Q&A 264 Views Mar 5, 2025

Mitochondrial genomes (mitogenomes) display relatively rapid mutation rates, low sequence recombination, high copy numbers, and maternal inheritance patterns, rendering them valuable blueprints for mapping lineages, uncovering historical migration patterns, understanding intraspecific population dynamics, and investigating how environmental pressures shape traits underpinned by genetic variation. Here, we present the bioinformatic pipeline and code used to assemble and annotate the complete mitogenomes of five houndsharks (Chondrichthyes: Triakidae) and compare them to the mitogenomes of other closely related species. We demonstrate the value of a combined assembly approach for detecting deviations in mitogenome structure and describe how to select an assembly approach that best suits the sequencing data. The datasets required to run our analyses are available on the GitHub and Dryad repositories.

0 Q&A 225 Views Mar 5, 2025

The limited standards for the rigorous and objective use of mitochondrial genomes (mitogenomes) can lead to uncertainties regarding the phylogenetic relationships of taxa under varying evolutionary constraints. The mitogenome exhibits heterogeneity in base composition, and evolutionary rates may vary across different regions, which can cause empirical data to violate assumptions of the applied evolutionary models. Consequently, the unique evolutionary signatures of the dataset must be carefully evaluated before selecting an appropriate approach for phylogenomic inference. Here, we present the bioinformatic pipeline and code used to expand the mitogenome phylogeny of the order Carcharhiniformes (groundsharks), with a focus on houndsharks (Chondrichthyes: Triakidae). We present a rigorous approach for addressing difficult-to-resolve phylogenies, incorporating multi-species coalescent modelling (MSCM) to address gene/species tree discordance. The protocol describes carefully designed approaches for preparing alignments, partitioning datasets, assigning models of evolution, inferring phylogenies based on traditional site-homogenous concatenation approaches as well as under multispecies coalescent and site heterogenous models, and generating statistical data for comparison of different topological outcomes. The datasets required to run our analyses are available on GitHub and Dryad repositories.

0 Q&A 1177 Views Jul 5, 2024

In recent years, the increase in genome sequencing across diverse plant species has provided a significant advantage for phylogenomics studies, allowing the analysis of one of the most diverse gene families in plants: nucleotide-binding leucine-rich repeat receptors (NLRs). However, due to the sequence diversity of the NLR gene family, identifying key molecular features and functionally conserved sequence patterns is challenging through multiple sequence alignment. Here, we present a step-by-step protocol for a computational pipeline designed to identify evolutionarily conserved motifs in plant NLR proteins. In this protocol, we use a large-scale NLR dataset, including 1,862 NLR genes annotated from monocot and dicot species, to predict conserved sequence motifs, such as the MADA and EDVID motifs, within the coiled-coil (CC)-NLR subfamily. Our pipeline can be applied to identify molecular signatures that have remained conserved in the gene family over evolutionary time across plant species.

0 Q&A 1231 Views Mar 20, 2024

Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes.


Key features

• Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA.

• Classification of genomes based on highly linked sites using custom scripts.

• Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST).

• Visualization of posterior distribution of tMRCA using Tracer.v1.7.2.

• Optimized for the SARS-CoV-2.


Graphical overview



Graphical workflow of time of most recent common ancestor (tMRCA) estimation process

0 Q&A 899 Views Dec 5, 2023

The recent surge in plant genomic and transcriptomic data has laid a foundation for reconstructing evolutionary scenarios and inferring potential functions of key genes related to plants’ development and stress responses. The classical scheme for identifying homologous genes is sequence similarity–based searching, under the crucial assumption that homologous sequences are more similar to each other than they are to any other non-homologous sequences. Advances in plant phylogenomics and computational algorithms have enabled us to systemically identify homologs/orthologs and reconstruct their evolutionary histories among distantly related lineages. Here, we present a comprehensive pipeline for homologous sequences identification, phylogenetic relationship inference, and potential functional profiling of genes in plants.


Key features

• Identification of orthologs using large-scale genomic and transcriptomic data.

• This protocol is generalized for analyzing the evolution of plant genes.

0 Q&A 682 Views Nov 5, 2023

High-throughput molecular screening of microbial colonies and DNA libraries are critical procedures that enable applications such as directed evolution, functional genomics, microbial identification, and creation of engineered microbial strains to produce high-value molecules. A promising chemical screening approach is the measurement of products directly from microbial colonies via optically guided matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). Measuring the compounds from microbial colonies bypasses liquid culture with a screen that takes approximately 5 s per sample. We describe a protocol combining a dedicated informatics pipeline and sample preparation method that can prepare up to 3,000 colonies in under 3 h. The screening protocol starts from colonies grown on Petri dishes and then transferred onto MALDI plates via imprinting. The target plate with the colonies is imaged by a flatbed scanner and the colonies are located via custom software. The target plate is coated with MALDI matrix, MALDI-MS analyzes the colony locations, and data analysis enables the determination of colonies with the desired biochemical properties. This workflow screens thousands of colonies per day without requiring additional automation. The wide chemical coverage and the high sensitivity of MALDI-MS enable diverse screening projects such as modifying enzymes and functional genomics surveys of gene activation/inhibition libraries.


Key features

• Mass spectrometry analyzes a range of compounds from E. coli colonies as a proxy for liquid culture testing enzyme mutant libraries.

• Colonies are transferred to a MALDI target plate by a simple imprinting method.

• The screen compares the ratio among several products or searches for the qualitative presence of specific compounds.

• The protocol requires a MALDI mass spectrometer.


Graphical overview



Overview of the MALDI-MS analysis of microbial colonies for screening mutant libraries. Microbial cells containing a mutant library for enzymes/metabolic pathways are first grown in agar. The colonies are then imprinted onto a MALDI target plate using a filter paper intermediate. An optical image of the MALDI target plate is analyzed by custom software to find the locations of individual colonies and direct subsequent MALDI-MS analyses to the selected colonies. After applying MALDI matrix onto the target plate, MALDI-MS analysis of the colonies is performed. Colonies showing the desired product profiles are found by data analysis via the software, and the colonies are picked for downstream analysis.
0 Q&A 547 Views Oct 5, 2023

Many single nucleotide polymorphisms (SNPs) identified by genome-wide association studies exert their effects on disease risk as expression quantitative trait loci (eQTL) via allele-specific expression (ASE). While databases for probing eQTLs in tissues from normal individuals exist, one may wish to ascertain eQTLs or ASE in specific tissues or disease-states not characterized in these databases. Here, we present a protocol to assess ASE of two possible target genes (GPNMB and KLHL7) of a known genome-wide association study (GWAS) Parkinson’s disease (PD) risk locus in postmortem human brain tissue from PD and neurologically normal individuals. This was done using a sequence of RNA isolation, cDNA library generation, enrichment for transcripts of interest using customizable cDNA capture probes, paired-end RNA sequencing, and subsequent analysis. This method provides increased sensitivity relative to traditional bulk RNAseq-based and a blueprint that can be extended to the study of other genes, tissues, and disease states.


Key features

• Analysis of GPNMB allele-specific expression (ASE) in brain lysates from cognitively normal controls (NC) and Parkinson’s disease (PD) individuals.

• Builds on the ASE protocol of Mayba et al. (2014) and extends application from cells to human tissue.

• Increased sensitivity by enrichment for desired transcript via RNA CaptureSeq (Mercer et al., 2014).

• Optimized for human brain lysates from cingulate gyrus, caudate nucleus, and cerebellum.


Graphical overview


0 Q&A 2366 Views Oct 5, 2021

One of the cardinal features of post-traumatic stress disorder (PTSD) is a paradoxical memory alteration including both emotional hypermnesia for salient trauma-related cues and amnesia for the surrounding traumatic context. Interestingly, some clinical studies have suggested that contextual amnesia would causally contribute to the PTSD-related hypermnesia insofar as decontextualized, traumatic memory is prone to be reactivated in contexts that can be very different from the original traumatic context. However, most current animal models of PTSD-related memory focus exclusively on the emotional hypermnesia, i.e., the persistence of a strong fear memory, and do not distinguish normal (adaptive) from pathological (PTSD-like) fear memory, leaving unexplored the hypothetical critical role of contextual amnesia in PTSD-related memory formation, and thus challenging the development of innovative treatments. Having developed the first animal model that precisely recapitulates the two memory components of PTSD in mice (emotional hypermnesia and contextual amnesia), we recently demonstrated that contextual amnesia, induced by optogenetic inhibition of the hippocampus (dorsal CA1), is a causal cognitive process of PTSD-like hypermnesia formation. Moreover, the hippocampus-dependent contextualization of traumatic memory, by optogenetic activation of dCA1 in traumatic condition, prevents PTSD-like hypermnesia formation. Finally, once PTSD-like memory has been formed, the re-contextualization of traumatic memory by its reactivation in the original traumatic context normalizes this pathological fear memory. Revealing the key role of contextual amnesia in PTSD-like memory, this procedure opens a therapeutic perspective based on trauma contextualization and the underlying hippocampal mechanisms.

0 Q&A 3202 Views Sep 20, 2021

Genome-wide sequencing of RNA (RNA-seq) has become an inexpensive tool to gain key insights into cellular and disease mechanisms. Sample preparation and sequencing are streamlined and allow the acquisition of hundreds of gene expression profiles in a few days; however, in particular, data processing, curation, and analysis involve numerous steps that can be overwhelming to non-experts. Here, the sample preparation, sequencing, and data processing workflow for RNA-seq expression analysis in yeast is described. While this protocol covers only a small portion of the RNA-seq landscape, the principal workflow common to such experiments is described, allowing the reader to adapt the protocol where necessary.


Graphic abstract:



Basic workflow of RNA-seq expression analysis.


0 Q&A 2947 Views Jun 5, 2021

DNA methylation in gene promoters plays a major role in gene expression regulation, and alterations in methylation patterns have been associated with several diseases. In this context, different software suites and statistical methods have been proposed to analyze differentially methylated positions and regions. Among them, the novel statistical method implemented in the mCSEA R package proposed a new framework to detect subtle, but consistent, methylation differences. Here, we provide an easy-to-use pipeline covering all the necessary steps to detect differentially methylated promoters with mCSEA from Illumina 450K and EPIC methylation BeadChips data. This protocol covers the download of data from public repositories, quality control, data filtering and normalization, estimation of cell type proportions, and statistical analysis. In addition, we show the procedure to compare disease vs. normal phenotypes, obtaining differentially methylated regions including promoters or CpG Islands. The entire protocol is based on R programming language, which can be used in any operating system and does not require advanced programming skills.