计算生物学与生物信息学


分类

现刊
往期刊物
0 Q&A 1072 Views Mar 20, 2024

Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes.


Key features

• Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA.

• Classification of genomes based on highly linked sites using custom scripts.

• Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST).

• Visualization of posterior distribution of tMRCA using Tracer.v1.7.2.

• Optimized for the SARS-CoV-2.


Graphical overview



Graphical workflow of time of most recent common ancestor (tMRCA) estimation process

0 Q&A 891 Views Feb 20, 2024

Coiled-coil domains (CCDs) are structural motifs observed in proteins in all organisms that perform several crucial functions. The computational identification of CCD segments over a protein sequence is of great importance for its functional characterization. This task can essentially be divided into three separate steps: the detection of segment boundaries, the annotation of the heptad repeat pattern along the segment, and the classification of its oligomerization state. Several methods have been proposed over the years addressing one or more of these predictive steps. In this protocol, we illustrate how to make use of CoCoNat, a novel approach based on protein language models, to characterize CCDs. CoCoNat is, at its release (August 2023), the state of the art for CCD detection. The web server allows users to submit input protein sequences and visualize the predicted domains after a few minutes. Optionally, precomputed segments can be provided to the model, which will predict the oligomerization state for each of them. CoCoNat can be easily integrated into biological pipelines by downloading the standalone version, which provides a single executable script to produce the output.


Key features

• Web server for the prediction of coiled-coil segments from a protein sequence.

• Three different predictions from a single tool (segment position, heptad repeat annotation, oligomerization state).

• Possibility to visualize the results online or to download the predictions in different formats for further processing.

• Easy integration in automated pipelines with the local version of the tool.


Graphical overview


0 Q&A 579 Views Dec 5, 2023

The recent surge in plant genomic and transcriptomic data has laid a foundation for reconstructing evolutionary scenarios and inferring potential functions of key genes related to plants’ development and stress responses. The classical scheme for identifying homologous genes is sequence similarity–based searching, under the crucial assumption that homologous sequences are more similar to each other than they are to any other non-homologous sequences. Advances in plant phylogenomics and computational algorithms have enabled us to systemically identify homologs/orthologs and reconstruct their evolutionary histories among distantly related lineages. Here, we present a comprehensive pipeline for homologous sequences identification, phylogenetic relationship inference, and potential functional profiling of genes in plants.


Key features

• Identification of orthologs using large-scale genomic and transcriptomic data.

• This protocol is generalized for analyzing the evolution of plant genes.

0 Q&A 396 Views Oct 5, 2023

Different regions of the gastrointestinal tract have specific functions and thus distinct motility patterns. Motility is primarily regulated by the enteric nervous system (ENS), an intrinsic network of neurons located within the gut wall. Under physiological conditions, the ENS is influenced by the central nervous system (CNS). However, by using ex vivo organ bath experiments, ENS regulation of gut motility can also be studied in the absence of CNS influences. The current technique enables the characterisation of small intestinal, caecal, and colonic motility patterns using an ex vivo organ bath and video imaging protocol. This approach is combined with the novel edge detection script GutMap, available in MATLAB, that functions across Windows and Mac platforms. Dissected intestinal segments are cannulated in an organ bath containing physiological saline with a camera mounted overhead. Video recordings of gut contractions are then converted to spatiotemporal heatmaps and analysed using the GutMap software interface. Using data analysed from the heatmaps, parameters of contractile patterns (including contraction propagation frequency and velocity as well as gut diameter) at baseline and in the presence of drugs/treatments/genetic mutations can be compared. Here, we studied motility patterns of female mice at baseline and in the presence of a nitric oxide synthase inhibitor (Nω-Nitro-L-arginine; NOLA) (nitric oxide being the main inhibitory neurotransmitter of gut motility) to showcase the application of GutMap. This technique is suitable for application to a broad range of animal models of clinical disorders to understand underlying biological pathways contributing to gastrointestinal dysfunction.


Key features

• Enhanced video imaging analysis of gut contractility in rodents using a novel software interface.

• New edge detection algorithm to accurately contour curvatures of the gastrointestinal tract.

• Allows for output of high-resolution spatiotemporal heatmaps across Windows and Mac platforms.

• Edge detection and analysis method makes motility measurements accessible in different gut regions including the caecum and stomach.


Graphical overview


0 Q&A 873 Views Sep 20, 2023

Dietary saturated fatty acids (SFAs) are upregulated in the blood circulation following digestion. A variety of circulating lipid species have been implicated in metabolic and inflammatory diseases; however, due to the extreme variability in serum or plasma lipid concentrations found in human studies, established reference ranges are still lacking, in addition to lipid specificity and diagnostic biomarkers. Mass spectrometry is widely used for identification of lipid species in the plasma, and there are many differences in sample extraction methods within the literature. We used ultra-high performance liquid chromatography (UPLC) coupled to a high-resolution hybrid triple quadrupole-time-of-flight (QToF) mass spectrometry (MS) to compare relative peak abundance of specific lipid species within the following lipid classes: free fatty acids (FFAs), triglycerides (TAGs), phosphatidylcholines (PCs), and sphingolipids (SGs), in the plasma of mice fed a standard chow (SC; low in SFAs) or ketogenic diet (KD; high in SFAs) for two weeks. In this protocol, we used Principal Component Analysis (PCA) and R to visualize how individual mice clustered together according to their diet, and we found that KD-fed mice displayed unique blood profiles for many lipid species identified within each lipid class compared to SC-fed mice. We conclude that two weeks of KD feeding is sufficient to significantly alter circulating lipids, with PCs being the most altered lipid class, followed by SGs, TAGs, and FFAs, including palmitic acid (PA) and PA-saturated lipids. This protocol is needed to advance knowledge on the impact that SFA-enriched diets have on concentrations of specific lipids in the blood that are known to be associated with metabolic and inflammatory diseases.


Key features

• Analysis of relative plasma lipid concentrations from mice on different diets using R.

• Lipidomics data collected via ultra-high performance liquid chromatography (UPLC) coupled to a high-resolution hybrid triple quadrupole-time-of-flight (QToF) mass spectrometry (MS).

• Allows for a comprehensive comparison of diet-dependent plasma lipid profiles, including a variety of specific lipid species within several different lipid classes.

• Accumulation of certain free fatty acids, phosphatidylcholines, triglycerides, and sphingolipids are associated with metabolic and inflammatory diseases, and plasma concentrations may be clinically useful.


Graphical overview


0 Q&A 362 Views Sep 5, 2023

When performing expression analysis either for coding RNA (e.g., mRNA) or non-coding RNA (e.g., miRNA), reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) is a widely used method. To normalize these data, one or more stable endogenous references must be identified. RefFinder is an online web-based tool using four almost universally used algorithms for assessing candidate endogenous references—delta-Ct, BestKeeper, geNorm, and Normfinder. However, the online interface is presently cumbersome and time consuming. We developed an R package, RefSeeker, which performs easy and straightforward RefFinder analysis by enabling raw data import and calculation of stability from each of the algorithms and provides data output tools to create graphs and tables. This protocol uses RefSeeker R package for fast and simple RefFinder stability analysis.


Key features

• Perform stability analysis using five algorithms: Normfinder, geNorm, delta-Ct, BestKeeper, and RefFinder.

• Identification of endogenous references for normalization of RT-qPCR data.

• Create publication-ready graphs and tables output.

• Step-by-step guide dialog window for novice R users.


Graphical overview



Simple workflow diagram. Two main workflow paths are presented. A) Using the RefSeeker wizard allows non-R programmers to easily load data and choose between selected output formats. B) Command line interface provides more options to control input and output formats and to automate analysis.

0 Q&A 546 Views Jan 5, 2023

Accessible chromatin regions modulate gene expression by acting as cis-regulatory elements. Understanding the epigenetic landscape by mapping accessible regions of DNA is therefore imperative to decipher mechanisms of gene regulation under specific biological contexts of interest. The assay for transposase-accessible chromatin sequencing (ATAC-seq) has been widely used to detect accessible chromatin and the recent introduction of single-cell technology has increased resolution to the single-cell level. In a recent study, we used droplet-based, single-cell ATAC-seq technology (scATAC-seq) to reveal the epigenetic profile of the transit-amplifying subset of thymic epithelial cells (TECs), which was identified previously using single-cell RNA-sequencing technology (scRNA-seq). This protocol allows the preparation of nuclei from TECs in order to perform droplet-based scATAC-seq and its integrative analysis with scRNA-seq data obtained from the same cell population. Integrative analysis has the advantage of identifying cell types in scATAC-seq data based on cell cluster annotations in scRNA-seq analysis.

0 Q&A 417 Views Jan 5, 2023

Understanding how genes are differentially expressed across tissues is key to reveal the etiology of human diseases. Genes are never expressed in isolation, but rather co-expressed in a community; thus, they co-act through intricate but well-orchestrated networks. However, existing approaches cannot coalesce the full properties of gene–gene communication and interactions into networks. In particular, the unavailability of dynamic gene expression data might impair the application of existing network models to unleash the complexity of human diseases. To address this limitation, we developed a statistical pipeline named DRDNetPro to visualize and trace how genes dynamically interact with each other across diverse tissues, to ascertain health risk from static expression data. This protocol contains detailed tutorials designed to learn a series of networks, with the illustration example from the Genotype-Tissue Expression (GTEx) project. The proposed toolbox relies on the method developed in our published paper (Chen et al., 2022), coding all genes into bidirectional, signed, weighted, and feedback looped networks, which will provide profound genomic information enabling medical doctors to design precise medicine.


Graphical abstract



Flowchart illustrating the use of DRDNetPro. The left panel contains the summarized pipeline of DRDNetPro and the right panel contains one pseudo-illustrative example. See the Equipment and Procedure sections for detailed explanations.

0 Q&A 1451 Views Dec 20, 2022

CRISPR/Cas9 screening has revolutionized functional genomics in biomedical research and is a widely used approach for the identification of genetic dependencies in cancer cells. Here, we present an efficient and versatile protocol for the cloning of guide RNAs (gRNA) into lentiviral vectors, the production of lentiviral supernatants, and the transduction of target cells in a 96-well format. To assess the effect of gene knockouts on cellular fitness, we describe a competition-based cell proliferation assay using flow cytometry, enabling the screening of many genes at the same time in a fast and reproducible manner. This readout can be extended to any parameter that is accessible to flow-based measurements, such as protein expression and stability, differentiation, cell death, and others. In summary, this protocol allows to functionally assess the effect of a set of 50–300 gene knockouts on various cellular parameters within eight weeks.


Graphical abstract


0 Q&A 1032 Views Sep 20, 2022

R-loops, or RNA:DNA hybrids, are structures that arise co-transcriptionally when a nascent RNA hybridizes back with the template ssDNA, leading to a displaced ssDNA. Because accumulation of R-loops can lead to genomic instability and loss of cellular homeostasis, it is important to determine the genome-wide distribution of R-loops in different physiological conditions. Current R-loop mapping strategies are based on R-loop enrichment—mediated by the S9.6 antibody, such as DRIP-seq, or by the exonuclease RNase H1, such as MapR—or the latest R-loop CUT&Tag, based on an artificial R-loop sensor derived from an RNase H1 sub-domain. Because some of these techniques often require high input material or expensive reagents, we sought to apply MapR, which does not require expensive reagents and has been shown to be compatible with low input samples. Importantly, we demonstrate that incorporation of improved CUT&RUN steps into the MapR protocol yields R-loop-enriched DNA when using low input Drosophila nuclei.


Graphical abstract:




Workflow for mapping tissue-specific, genome-wide R-loops in Drosophila.

Purify GST-tagged and catalytically inactive RNase H1 tethered MapR enzymes, GST-ΔRH-MNase, and GST-MNase, from transformed E. coli. Perform tissue-specific nuclei immuno-enrichment from UAS-EGFP.KASH-Msp300 Drosophila using magnetic bead–bound green fluorescent protein (GFP) antibody. Incubate isolated nuclei with MapR enzymes and activate MNase DNA cleavage with low salt/high calcium buffers. Purify released, R-loopenriched DNA fragments and generate sequencing-ready libraries. Align MapR data to reference genome and compare R-loop enrichment peaks in genome browser.