系统生物学


分类

现刊
0 Q&A 11 Views Sep 20, 2024

Phenotypic variations of most biological traits are largely driven by genomic variants. The single nucleotide polymorphism (SNP) is the most common form of genomic variants. Multiple algorithms have been developed for discovering genomic variants, including SNPs, with next-generation sequencing (NGS) data. Here, we present a widely used variant discovery pipeline based on the software Genome Analysis ToolKits (GATK). The pipeline uses whole-genome sequencing (WGS) data as input and includes read mapping, variant calling, and the variant filtering process. This pipeline has been successfully applied to many genomic projects and represents a solution for variant calling using NGS data.

往期刊物
0 Q&A 669 Views Aug 20, 2024

Bottom-up proteomics utilizes sample preparation techniques to enzymatically digest proteins, thereby generating identifiable and quantifiable peptides. Proteomics integrates with other omics methodologies, such as genomics and transcriptomics, to elucidate biomarkers associated with diseases and responses to drug or biologics treatment. The methodologies employed for preparing proteomic samples for mass spectrometry analysis exhibit variability across several factors, including the composition of lysis buffer detergents, homogenization techniques, protein extraction and precipitation methodologies, alkylation strategies, and the selection of digestion enzymes. The general workflow for bottom-up proteomics consists of sample preparation, mass spectrometric data acquisition (LC-MS/MS analysis), and subsequent downstream data analysis including protein quantification and differential expression analysis. Sample preparation poses a persistent challenge due to issues such as low reproducibility and inherent procedure complexities. Herein, we have developed a validated chloroform/methanol sample preparation protocol to obtain reproducible peptide mixtures from both rodent tissue and human cell line samples for bottom-up proteomics analysis. The protocol we established may facilitate the standardization of bottom-up proteomics workflows, thereby enhancing the acquisition of reliable biologically and/or clinically relevant proteomic data.

0 Q&A 914 Views Jul 5, 2024

In recent years, the increase in genome sequencing across diverse plant species has provided a significant advantage for phylogenomics studies, allowing the analysis of one of the most diverse gene families in plants: nucleotide-binding leucine-rich repeat receptors (NLRs). However, due to the sequence diversity of the NLR gene family, identifying key molecular features and functionally conserved sequence patterns is challenging through multiple sequence alignment. Here, we present a step-by-step protocol for a computational pipeline designed to identify evolutionarily conserved motifs in plant NLR proteins. In this protocol, we use a large-scale NLR dataset, including 1,862 NLR genes annotated from monocot and dicot species, to predict conserved sequence motifs, such as the MADA and EDVID motifs, within the coiled-coil (CC)-NLR subfamily. Our pipeline can be applied to identify molecular signatures that have remained conserved in the gene family over evolutionary time across plant species.

0 Q&A 580 Views Jul 5, 2024

Vascular cognitive impairment (VCI) is a syndrome defined as cognitive decline caused by vascular disease and is associated with various types of dementia. Chronic cerebral hypoperfusion (CCH) is one of the major contributors to VCI. Among the various rodent models used to study CCH-induced VCI, we have found the mouse bilateral common carotid artery stenosis (BCAS) model to be highly suitable. Here, we introduce the BCAS model of C57BL/6J mice generated using microcoils with an internal diameter of 0.18 mm. To produce the mouse BCAS model, the bilateral common carotid arteries are isolated from the adhering tissues and vagus nerves and twined around the microcoils. This model shows cognitive impairment and white matter lesions preceding neuronal dysfunction around postoperative day 28, which is similar to the human clinical picture. Overall, the mouse BCAS model will continue to be useful in studying CCH-induced VCI.

0 Q&A 640 Views May 5, 2024

Ribosomes are an archetypal ribonucleoprotein assembly. Due to ribosomal evolution and function, r-proteins share specific physicochemical similarities, making the riboproteome particularly suited for tailored proteome profiling methods. Moreover, the structural proteome of ribonucleoprotein assemblies reflects context-dependent functional features. Thus, characterizing the state of riboproteomes provides insights to uncover the context-dependent functionality of r-protein rearrangements, as they relate to what has been termed the ribosomal code, a concept that parallels that of the histone code, in which chromatin rearrangements influence gene expression. Compared to high-resolution ribosomal structures, omics methods lag when it comes to offering customized solutions to close the knowledge gap between structure and function that currently exists in riboproteomes. Purifying the riboproteome and subsequent shot-gun proteomics typically involves protein denaturation and digestion with proteases. The results are relative abundances of r-proteins at the ribosome population level. We have previously shown that, to gain insight into the stoichiometry of individual proteins, it is necessary to measure by proteomics bound r-proteins and normalize their intensities by the sum of r-protein abundances per ribosomal complex, i.e., 40S or 60S subunits. These calculations ensure that individual r-protein stoichiometries represent the fraction of each family/paralog relative to the complex, effectively revealing which r-proteins become substoichiometric in specific physiological scenarios. Here, we present an optimized method to profile the riboproteome of any organism as well as the synthesis rates of r-proteins determined by stable isotope-assisted mass spectrometry. Our method purifies the r-proteins in a reversibly denatured state, which offers the possibility for combined top-down and bottom-up proteomics. Our method offers a milder native denaturation of the r-proteome via a chaotropic GuHCl solution as compared with previous studies that use irreversible denaturation under highly acidic conditions to dissociate rRNA and r-proteins. As such, our method is better suited to conserve post-translational modifications (PTMs). Subsequently, our method carefully considers the amino acid composition of r-proteins to select an appropriate protease for digestion. We avoid non-specific protease cleavage by increasing the pH of our standardized r-proteome dilutions that enter the digestion pipeline and by using a digestion buffer that ensures an optimal pH for a reliable protease digestion process. Finally, we provide the R package ProtSynthesis to study the fractional synthesis rates of r-proteins. The package uses physiological parameters as input to determine peptide or protein fractional synthesis rates. Once the physiological parameters are measured, our equations allow a fair comparison between treatments that alter the biological equilibrium state of the system under study. Our equations correct peptide enrichment using enrichments in soluble amino acids, growth rates, and total protein accumulation. As a means of validation, our pipeline fails to find “false” enrichments in non-labeled samples while also filtering out proteins with multiple unique peptides that have different enrichment values, which are rare in our datasets. These two aspects reflect the accuracy of our tool. Our method offers the possibility of elucidating individual r-protein family/paralog abundances, PTM status, fractional synthesis rates, and dynamic assembly into ribosomal complexes if top-down and bottom-up proteomic approaches are used concomitantly, taking one step further into mapping the native and dynamic status of the r-proteome onto high-resolution ribosome structures. In addition, our method can be used to study the proteomes of all macromolecular assemblies that can be purified, although purification is the limiting step, and the efficacy and accuracy of the proteases may be limited depending on the digestion requirements.

0 Q&A 1170 Views Mar 20, 2024

Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes.


Key features

• Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA.

• Classification of genomes based on highly linked sites using custom scripts.

• Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST).

• Visualization of posterior distribution of tMRCA using Tracer.v1.7.2.

• Optimized for the SARS-CoV-2.


Graphical overview



Graphical workflow of time of most recent common ancestor (tMRCA) estimation process

0 Q&A 761 Views Dec 5, 2023

The recent surge in plant genomic and transcriptomic data has laid a foundation for reconstructing evolutionary scenarios and inferring potential functions of key genes related to plants’ development and stress responses. The classical scheme for identifying homologous genes is sequence similarity–based searching, under the crucial assumption that homologous sequences are more similar to each other than they are to any other non-homologous sequences. Advances in plant phylogenomics and computational algorithms have enabled us to systemically identify homologs/orthologs and reconstruct their evolutionary histories among distantly related lineages. Here, we present a comprehensive pipeline for homologous sequences identification, phylogenetic relationship inference, and potential functional profiling of genes in plants.


Key features

• Identification of orthologs using large-scale genomic and transcriptomic data.

• This protocol is generalized for analyzing the evolution of plant genes.

0 Q&A 587 Views Nov 5, 2023

High-throughput molecular screening of microbial colonies and DNA libraries are critical procedures that enable applications such as directed evolution, functional genomics, microbial identification, and creation of engineered microbial strains to produce high-value molecules. A promising chemical screening approach is the measurement of products directly from microbial colonies via optically guided matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). Measuring the compounds from microbial colonies bypasses liquid culture with a screen that takes approximately 5 s per sample. We describe a protocol combining a dedicated informatics pipeline and sample preparation method that can prepare up to 3,000 colonies in under 3 h. The screening protocol starts from colonies grown on Petri dishes and then transferred onto MALDI plates via imprinting. The target plate with the colonies is imaged by a flatbed scanner and the colonies are located via custom software. The target plate is coated with MALDI matrix, MALDI-MS analyzes the colony locations, and data analysis enables the determination of colonies with the desired biochemical properties. This workflow screens thousands of colonies per day without requiring additional automation. The wide chemical coverage and the high sensitivity of MALDI-MS enable diverse screening projects such as modifying enzymes and functional genomics surveys of gene activation/inhibition libraries.


Key features

• Mass spectrometry analyzes a range of compounds from E. coli colonies as a proxy for liquid culture testing enzyme mutant libraries.

• Colonies are transferred to a MALDI target plate by a simple imprinting method.

• The screen compares the ratio among several products or searches for the qualitative presence of specific compounds.

• The protocol requires a MALDI mass spectrometer.


Graphical overview



Overview of the MALDI-MS analysis of microbial colonies for screening mutant libraries. Microbial cells containing a mutant library for enzymes/metabolic pathways are first grown in agar. The colonies are then imprinted onto a MALDI target plate using a filter paper intermediate. An optical image of the MALDI target plate is analyzed by custom software to find the locations of individual colonies and direct subsequent MALDI-MS analyses to the selected colonies. After applying MALDI matrix onto the target plate, MALDI-MS analysis of the colonies is performed. Colonies showing the desired product profiles are found by data analysis via the software, and the colonies are picked for downstream analysis.
0 Q&A 488 Views Oct 5, 2023

Many single nucleotide polymorphisms (SNPs) identified by genome-wide association studies exert their effects on disease risk as expression quantitative trait loci (eQTL) via allele-specific expression (ASE). While databases for probing eQTLs in tissues from normal individuals exist, one may wish to ascertain eQTLs or ASE in specific tissues or disease-states not characterized in these databases. Here, we present a protocol to assess ASE of two possible target genes (GPNMB and KLHL7) of a known genome-wide association study (GWAS) Parkinson’s disease (PD) risk locus in postmortem human brain tissue from PD and neurologically normal individuals. This was done using a sequence of RNA isolation, cDNA library generation, enrichment for transcripts of interest using customizable cDNA capture probes, paired-end RNA sequencing, and subsequent analysis. This method provides increased sensitivity relative to traditional bulk RNAseq-based and a blueprint that can be extended to the study of other genes, tissues, and disease states.


Key features

• Analysis of GPNMB allele-specific expression (ASE) in brain lysates from cognitively normal controls (NC) and Parkinson’s disease (PD) individuals.

• Builds on the ASE protocol of Mayba et al. (2014) and extends application from cells to human tissue.

• Increased sensitivity by enrichment for desired transcript via RNA CaptureSeq (Mercer et al., 2014).

• Optimized for human brain lysates from cingulate gyrus, caudate nucleus, and cerebellum.


Graphical overview


0 Q&A 383 Views Sep 20, 2023

Information on RNA localisation is essential for understanding physiological and pathological processes, such as gene expression, cell reprogramming, host–pathogen interactions, and signalling pathways involving RNA transactions at the level of membrane-less or membrane-bounded organelles and extracellular vesicles. In many cases, it is important to assess the topology of RNA localisation, i.e., to distinguish the transcripts encapsulated within an organelle of interest from those merely attached to its surface. This allows establishing which RNAs can, in principle, engage in local molecular interactions and which are prevented from interacting by membranes or other physical barriers. The most widely used techniques interrogating RNA localisation topology are based on the treatment of isolated organelles with RNases with subsequent identification of the surviving transcripts by northern blotting, qRT-PCR, or RNA-seq. However, this approach produces incoherent results and many false positives. Here, we describe Controlled Level of Contamination coupled to deep sequencing (CoLoC-seq), a more refined subcellular transcriptomics approach that overcomes these pitfalls. CoLoC-seq starts by the purification of organelles of interest. They are then either left intact or lysed and subjected to a gradient of RNase concentrations to produce unique RNA degradation dynamics profiles, which can be monitored by northern blotting or RNA-seq. Through straightforward mathematical modelling, CoLoC-seq distinguishes true membrane-enveloped transcripts from degradable and non-degradable contaminants of any abundance. The method has been implemented in the mitochondria of HEK293 cells, where it outperformed alternative subcellular transcriptomics approaches. It is applicable to other membrane-bounded organelles, e.g., plastids, single-membrane organelles of the vesicular system, extracellular vesicles, or viral particles.


Key features

• Tested on human mitochondria; potentially applicable to cell cultures, non-model organisms, extracellular vesicles, enveloped viruses, tissues; does not require genetic manipulations or highly pure organelles.

• In the case of human cells, the required amount of starting material is ~2,500 cm2 of 80% confluent cells (or ~3 × 108 HEK293 cells).

• CoLoC-seq implements a special RNA-seq strategy to selectively capture intact transcripts, which requires RNases generating 5′-hydroxyl and 2′/3′-phosphate termini (e.g., RNase A, RNase I).

• Relies on nonlinear regression software with customisable exponential functions.


Graphical overview


0 Q&A 981 Views Sep 20, 2023

Dietary saturated fatty acids (SFAs) are upregulated in the blood circulation following digestion. A variety of circulating lipid species have been implicated in metabolic and inflammatory diseases; however, due to the extreme variability in serum or plasma lipid concentrations found in human studies, established reference ranges are still lacking, in addition to lipid specificity and diagnostic biomarkers. Mass spectrometry is widely used for identification of lipid species in the plasma, and there are many differences in sample extraction methods within the literature. We used ultra-high performance liquid chromatography (UPLC) coupled to a high-resolution hybrid triple quadrupole-time-of-flight (QToF) mass spectrometry (MS) to compare relative peak abundance of specific lipid species within the following lipid classes: free fatty acids (FFAs), triglycerides (TAGs), phosphatidylcholines (PCs), and sphingolipids (SGs), in the plasma of mice fed a standard chow (SC; low in SFAs) or ketogenic diet (KD; high in SFAs) for two weeks. In this protocol, we used Principal Component Analysis (PCA) and R to visualize how individual mice clustered together according to their diet, and we found that KD-fed mice displayed unique blood profiles for many lipid species identified within each lipid class compared to SC-fed mice. We conclude that two weeks of KD feeding is sufficient to significantly alter circulating lipids, with PCs being the most altered lipid class, followed by SGs, TAGs, and FFAs, including palmitic acid (PA) and PA-saturated lipids. This protocol is needed to advance knowledge on the impact that SFA-enriched diets have on concentrations of specific lipids in the blood that are known to be associated with metabolic and inflammatory diseases.


Key features

• Analysis of relative plasma lipid concentrations from mice on different diets using R.

• Lipidomics data collected via ultra-high performance liquid chromatography (UPLC) coupled to a high-resolution hybrid triple quadrupole-time-of-flight (QToF) mass spectrometry (MS).

• Allows for a comprehensive comparison of diet-dependent plasma lipid profiles, including a variety of specific lipid species within several different lipid classes.

• Accumulation of certain free fatty acids, phosphatidylcholines, triglycerides, and sphingolipids are associated with metabolic and inflammatory diseases, and plasma concentrations may be clinically useful.


Graphical overview