系统生物学


分类

现刊
往期刊物
0 Q&A 1072 Views Mar 20, 2024

Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes.


Key features

• Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA.

• Classification of genomes based on highly linked sites using custom scripts.

• Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST).

• Visualization of posterior distribution of tMRCA using Tracer.v1.7.2.

• Optimized for the SARS-CoV-2.


Graphical overview



Graphical workflow of time of most recent common ancestor (tMRCA) estimation process

0 Q&A 577 Views Dec 5, 2023

The recent surge in plant genomic and transcriptomic data has laid a foundation for reconstructing evolutionary scenarios and inferring potential functions of key genes related to plants’ development and stress responses. The classical scheme for identifying homologous genes is sequence similarity–based searching, under the crucial assumption that homologous sequences are more similar to each other than they are to any other non-homologous sequences. Advances in plant phylogenomics and computational algorithms have enabled us to systemically identify homologs/orthologs and reconstruct their evolutionary histories among distantly related lineages. Here, we present a comprehensive pipeline for homologous sequences identification, phylogenetic relationship inference, and potential functional profiling of genes in plants.


Key features

• Identification of orthologs using large-scale genomic and transcriptomic data.

• This protocol is generalized for analyzing the evolution of plant genes.

0 Q&A 489 Views Nov 5, 2023

High-throughput molecular screening of microbial colonies and DNA libraries are critical procedures that enable applications such as directed evolution, functional genomics, microbial identification, and creation of engineered microbial strains to produce high-value molecules. A promising chemical screening approach is the measurement of products directly from microbial colonies via optically guided matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). Measuring the compounds from microbial colonies bypasses liquid culture with a screen that takes approximately 5 s per sample. We describe a protocol combining a dedicated informatics pipeline and sample preparation method that can prepare up to 3,000 colonies in under 3 h. The screening protocol starts from colonies grown on Petri dishes and then transferred onto MALDI plates via imprinting. The target plate with the colonies is imaged by a flatbed scanner and the colonies are located via custom software. The target plate is coated with MALDI matrix, MALDI-MS analyzes the colony locations, and data analysis enables the determination of colonies with the desired biochemical properties. This workflow screens thousands of colonies per day without requiring additional automation. The wide chemical coverage and the high sensitivity of MALDI-MS enable diverse screening projects such as modifying enzymes and functional genomics surveys of gene activation/inhibition libraries.


Key features

• Mass spectrometry analyzes a range of compounds from E. coli colonies as a proxy for liquid culture testing enzyme mutant libraries.

• Colonies are transferred to a MALDI target plate by a simple imprinting method.

• The screen compares the ratio among several products or searches for the qualitative presence of specific compounds.

• The protocol requires a MALDI mass spectrometer.


Graphical overview



Overview of the MALDI-MS analysis of microbial colonies for screening mutant libraries. Microbial cells containing a mutant library for enzymes/metabolic pathways are first grown in agar. The colonies are then imprinted onto a MALDI target plate using a filter paper intermediate. An optical image of the MALDI target plate is analyzed by custom software to find the locations of individual colonies and direct subsequent MALDI-MS analyses to the selected colonies. After applying MALDI matrix onto the target plate, MALDI-MS analysis of the colonies is performed. Colonies showing the desired product profiles are found by data analysis via the software, and the colonies are picked for downstream analysis.
0 Q&A 402 Views Oct 5, 2023

Many single nucleotide polymorphisms (SNPs) identified by genome-wide association studies exert their effects on disease risk as expression quantitative trait loci (eQTL) via allele-specific expression (ASE). While databases for probing eQTLs in tissues from normal individuals exist, one may wish to ascertain eQTLs or ASE in specific tissues or disease-states not characterized in these databases. Here, we present a protocol to assess ASE of two possible target genes (GPNMB and KLHL7) of a known genome-wide association study (GWAS) Parkinson’s disease (PD) risk locus in postmortem human brain tissue from PD and neurologically normal individuals. This was done using a sequence of RNA isolation, cDNA library generation, enrichment for transcripts of interest using customizable cDNA capture probes, paired-end RNA sequencing, and subsequent analysis. This method provides increased sensitivity relative to traditional bulk RNAseq-based and a blueprint that can be extended to the study of other genes, tissues, and disease states.


Key features

• Analysis of GPNMB allele-specific expression (ASE) in brain lysates from cognitively normal controls (NC) and Parkinson’s disease (PD) individuals.

• Builds on the ASE protocol of Mayba et al. (2014) and extends application from cells to human tissue.

• Increased sensitivity by enrichment for desired transcript via RNA CaptureSeq (Mercer et al., 2014).

• Optimized for human brain lysates from cingulate gyrus, caudate nucleus, and cerebellum.


Graphical overview


0 Q&A 322 Views Sep 20, 2023

Information on RNA localisation is essential for understanding physiological and pathological processes, such as gene expression, cell reprogramming, host–pathogen interactions, and signalling pathways involving RNA transactions at the level of membrane-less or membrane-bounded organelles and extracellular vesicles. In many cases, it is important to assess the topology of RNA localisation, i.e., to distinguish the transcripts encapsulated within an organelle of interest from those merely attached to its surface. This allows establishing which RNAs can, in principle, engage in local molecular interactions and which are prevented from interacting by membranes or other physical barriers. The most widely used techniques interrogating RNA localisation topology are based on the treatment of isolated organelles with RNases with subsequent identification of the surviving transcripts by northern blotting, qRT-PCR, or RNA-seq. However, this approach produces incoherent results and many false positives. Here, we describe Controlled Level of Contamination coupled to deep sequencing (CoLoC-seq), a more refined subcellular transcriptomics approach that overcomes these pitfalls. CoLoC-seq starts by the purification of organelles of interest. They are then either left intact or lysed and subjected to a gradient of RNase concentrations to produce unique RNA degradation dynamics profiles, which can be monitored by northern blotting or RNA-seq. Through straightforward mathematical modelling, CoLoC-seq distinguishes true membrane-enveloped transcripts from degradable and non-degradable contaminants of any abundance. The method has been implemented in the mitochondria of HEK293 cells, where it outperformed alternative subcellular transcriptomics approaches. It is applicable to other membrane-bounded organelles, e.g., plastids, single-membrane organelles of the vesicular system, extracellular vesicles, or viral particles.


Key features

• Tested on human mitochondria; potentially applicable to cell cultures, non-model organisms, extracellular vesicles, enveloped viruses, tissues; does not require genetic manipulations or highly pure organelles.

• In the case of human cells, the required amount of starting material is ~2,500 cm2 of 80% confluent cells (or ~3 × 108 HEK293 cells).

• CoLoC-seq implements a special RNA-seq strategy to selectively capture intact transcripts, which requires RNases generating 5′-hydroxyl and 2′/3′-phosphate termini (e.g., RNase A, RNase I).

• Relies on nonlinear regression software with customisable exponential functions.


Graphical overview


0 Q&A 872 Views Sep 20, 2023

Dietary saturated fatty acids (SFAs) are upregulated in the blood circulation following digestion. A variety of circulating lipid species have been implicated in metabolic and inflammatory diseases; however, due to the extreme variability in serum or plasma lipid concentrations found in human studies, established reference ranges are still lacking, in addition to lipid specificity and diagnostic biomarkers. Mass spectrometry is widely used for identification of lipid species in the plasma, and there are many differences in sample extraction methods within the literature. We used ultra-high performance liquid chromatography (UPLC) coupled to a high-resolution hybrid triple quadrupole-time-of-flight (QToF) mass spectrometry (MS) to compare relative peak abundance of specific lipid species within the following lipid classes: free fatty acids (FFAs), triglycerides (TAGs), phosphatidylcholines (PCs), and sphingolipids (SGs), in the plasma of mice fed a standard chow (SC; low in SFAs) or ketogenic diet (KD; high in SFAs) for two weeks. In this protocol, we used Principal Component Analysis (PCA) and R to visualize how individual mice clustered together according to their diet, and we found that KD-fed mice displayed unique blood profiles for many lipid species identified within each lipid class compared to SC-fed mice. We conclude that two weeks of KD feeding is sufficient to significantly alter circulating lipids, with PCs being the most altered lipid class, followed by SGs, TAGs, and FFAs, including palmitic acid (PA) and PA-saturated lipids. This protocol is needed to advance knowledge on the impact that SFA-enriched diets have on concentrations of specific lipids in the blood that are known to be associated with metabolic and inflammatory diseases.


Key features

• Analysis of relative plasma lipid concentrations from mice on different diets using R.

• Lipidomics data collected via ultra-high performance liquid chromatography (UPLC) coupled to a high-resolution hybrid triple quadrupole-time-of-flight (QToF) mass spectrometry (MS).

• Allows for a comprehensive comparison of diet-dependent plasma lipid profiles, including a variety of specific lipid species within several different lipid classes.

• Accumulation of certain free fatty acids, phosphatidylcholines, triglycerides, and sphingolipids are associated with metabolic and inflammatory diseases, and plasma concentrations may be clinically useful.


Graphical overview


0 Q&A 544 Views Jan 5, 2023

Accessible chromatin regions modulate gene expression by acting as cis-regulatory elements. Understanding the epigenetic landscape by mapping accessible regions of DNA is therefore imperative to decipher mechanisms of gene regulation under specific biological contexts of interest. The assay for transposase-accessible chromatin sequencing (ATAC-seq) has been widely used to detect accessible chromatin and the recent introduction of single-cell technology has increased resolution to the single-cell level. In a recent study, we used droplet-based, single-cell ATAC-seq technology (scATAC-seq) to reveal the epigenetic profile of the transit-amplifying subset of thymic epithelial cells (TECs), which was identified previously using single-cell RNA-sequencing technology (scRNA-seq). This protocol allows the preparation of nuclei from TECs in order to perform droplet-based scATAC-seq and its integrative analysis with scRNA-seq data obtained from the same cell population. Integrative analysis has the advantage of identifying cell types in scATAC-seq data based on cell cluster annotations in scRNA-seq analysis.

0 Q&A 415 Views Jan 5, 2023

Understanding how genes are differentially expressed across tissues is key to reveal the etiology of human diseases. Genes are never expressed in isolation, but rather co-expressed in a community; thus, they co-act through intricate but well-orchestrated networks. However, existing approaches cannot coalesce the full properties of gene–gene communication and interactions into networks. In particular, the unavailability of dynamic gene expression data might impair the application of existing network models to unleash the complexity of human diseases. To address this limitation, we developed a statistical pipeline named DRDNetPro to visualize and trace how genes dynamically interact with each other across diverse tissues, to ascertain health risk from static expression data. This protocol contains detailed tutorials designed to learn a series of networks, with the illustration example from the Genotype-Tissue Expression (GTEx) project. The proposed toolbox relies on the method developed in our published paper (Chen et al., 2022), coding all genes into bidirectional, signed, weighted, and feedback looped networks, which will provide profound genomic information enabling medical doctors to design precise medicine.


Graphical abstract



Flowchart illustrating the use of DRDNetPro. The left panel contains the summarized pipeline of DRDNetPro and the right panel contains one pseudo-illustrative example. See the Equipment and Procedure sections for detailed explanations.

0 Q&A 1392 Views Nov 20, 2022

Genome-wide screens using yeast or phage displays are powerful tools for identifying protein–ligand interactions, including drug or vaccine targets, ligand receptors, or protein–protein interactions. However, assembling libraries for genome-wide screens can be challenging and often requires unbiased cloning of 105–107 DNA fragments for a complete representation of a eukaryote genome. A sub-optimal genomic library can miss key genomic sequences and thus result in biased screens. Here, we describe an efficient method to generate genome-wide libraries for yeast surface display using Gibson assembly. The protocol entails genome fragmentation, ligation of adapters, library cloning using Gibson assembly, library transformation, library DNA recovery, and a streamlined Oxford nanopore library sequencing procedure that covers the length of the cloned DNA fragments. We also describe a computational pipeline to analyze the library coverage of the genome and predict the proportion of expressed proteins. The method allows seamless library transfer among multiple vectors and can be easily adapted to any expression system.

0 Q&A 1728 Views Nov 20, 2022

Chemical proteomics focuses on the drug–target–phenotype relationship for target deconvolution and elucidation of the mechanism of action—key and bottleneck in drug development and repurposing. Majorly due to the limits of using chemically modified ligands in affinity-based methods, new, unbiased, proteome-wide, and MS-based chemical proteomics approaches have been developed to perform drug target deconvolution, using full proteome profiling and no chemical modification of the studied ligand. Of note among them, thermal proteome profiling (TPP) aims to identify the target(s) by measuring the difference in melting temperatures between each identified protein in drug-treated versus vehicle-treated samples, with the thermodynamic interpretation of “protein melting” and curve fitting of all quantified proteins, at all temperatures, in each biological replicate. Including TPP, all the other chemical proteomics approaches often fail to provide target deconvolution with sufficient proteome depth, statistical power, throughput, and sustainability, which could hardly fulfill the final purpose of drug development. The proteome integral solubility alteration (PISA) assay provides no thermodynamic interpretation, but a throughput 10–100-fold compared to the other proteomics methods, high sustainability, much lower time of analysis and sample amount requirements, high confidence in results, maximal proteome coverage (~10,000 protein IDs), and up to five drugs / test molecules in one assay, with at least biological triplicates of each treatment. Each drug-treated or vehicle-treated sample is split into many fractions and exposed to a gradient of heat as solubility perturbing agent before being recomposed into one sample; each soluble fraction is isolated, then deep and quantitative proteomics is applied across all samples. The proteins interacting with the tested molecules (targets and off-targets), the activated mechanistic factors, or proteins modified during the treatment show reproducible changes in their soluble amount compared to vehicle-treated controls. As of today, the maximal multiplexing capability is 18 biological samples per PISA assay, which enables statistical robustness and flexible experimental design accommodation for fuller target deconvolution, including integration of orthogonal chemical proteomics methods in one PISA assay. Living cells for studying target engagement in vivo or, alternatively, protein extracts to identify in vitro ligand-interacting proteins can be studied, and the minimal need in sample amount unlocks target deconvolution using primary cells and their derived cultures.


Graphical abstract: