生物信息学与计算生物学


分类

现刊
0 Q&A 159 Views Nov 5, 2025

The rhizosphere, a 2–10 mm region surrounding the root surface, is colonized by numerous microorganisms, known as the rhizosphere microbiome. These microorganisms interact with each other, leading to emergent properties that affect plant fitness. Mapping these interactions is crucial to understanding microbial ecology in the rhizosphere and predicting and manipulating plant health. However, current methods do not capture the chemistry of the rhizosphere environment, and common plant–microbe interaction study setups do not map bacterial interactions in this niche. Additionally, studying bacterial interactions may require the creation of transgenic bacterial lines with markers for antibiotic resistance/fluorescent probes and even isotope labeling. Here, we describe a protocol for both in silico prediction and in vitro validation of bacterial interactions that closely recapitulate the major chemical constituents of the rhizosphere environment using a widely used Murashige & Skoog (MS)-based gnotobiotic plant growth system. We use the auto-fluorescent Pseudomonas, abundantly found in the rhizosphere, to estimate their interactions with other strains, thereby avoiding the need for the creation of transgenic bacterial strains. By combining artificial root exudate medium, plant cultivation medium, and a synthetic bacterial community (SynCom), we first simulate their interactions using genome-scale metabolic models (GSMMs) and then validate these interactions in vitro, using growth assays. We show that the GSMM-predicted interaction scores correlate moderately, yet significantly, with their in vitro validation. Given the complexity of interactions among rhizosphere microbiome members, this reproducible and efficient protocol will allow confident mapping of interactions of fluorescent Pseudomonas with other bacterial strains within the rhizosphere microbiome.

0 Q&A 150 Views Nov 5, 2025

DNA methylation is a crucial epigenetic modification that influences gene expression and plays a role in various biological processes. High-throughput sequencing techniques, such as bisulfite sequencing (BS-seq) and enzymatic methyl sequencing (EM-seq), enable genome-wide profiling of DNA methylation patterns with single-base resolution. In this protocol, we present a bioinformatics pipeline for analyzing genome-wide DNA methylation. We outline the step-by-step process of the essential analyses, including quality control using FASTQ for BS- and EM-seqs raw reads, read alignment with commonly used aligners such as Bowtie2 and BS-Seeker2, DNA methylation calling to generate CGmap files, identification of differentially methylated regions (DMRs) using tools including MethylC-analyzer and HOME, data visualization, and post-alignment analyses. Compared to existing workflows, this pipeline integrates multiple steps into a single protocol, lowering the technical barrier, improving reproducibility, and offering flexibility for both plant and animal methylome studies. To illustrate the application of BS-seq and EM-seq, we demonstrate a case study on analyzing a mutant in Arabidopsis thaliana with a mutation in the met1 gene, which encodes a DNA methyltransferase, and results in global CG hypomethylation and altered gene regulation. This example highlights the biological insights that can be gained through systematic methylome analysis. Our workflow is adaptable to any organism with a reference genome and provides a robust framework for uncovering methylation-associated regulatory mechanisms. All scripts and detailed instructions are provided in GitHub repository: https://github.com/PaoyangLab/Methylation_Analysis.

往期刊物
0 Q&A 983 Views Sep 20, 2025

Weighted gene co-expression network analysis (WGCNA) is widely used in transcriptomic studies to identify groups of highly correlated genes, aiding in the understanding of disease mechanisms. Although numerous protocols exist for constructing WGCNA networks from gene expression data, many focus on single datasets and do not address how to compare module stability across conditions. Here, we present a protocol for constructing and comparing WGCNA modules in paired tumor and normal datasets, enabling the identification of modules involved in both core biological processes and those specifically related to cancer pathogenesis. By incorporating module preservation analysis, this approach allows researchers to gain deeper insights into the molecular underpinnings of oral cancer, as well as other diseases. Overall, this protocol provides a framework for module preservation analysis in paired datasets, enabling researchers to identify which gene co-expression modules are conserved or disrupted between conditions, thereby advancing our understanding of disease-specific vs. universal biological processes.

0 Q&A 1126 Views Aug 5, 2025

Thousands of RNAs are localized to specific subcellular locations, and these localization patterns are often required for optimal cell function. However, the sequences within RNAs that direct their transport are unknown for almost all localized transcripts. Similarly, the RNA content of most subcellular locations remains unknown. To facilitate the study of subcellular transcriptomes, we developed the RNA proximity labeling method OINC-seq. OINC-seq utilizes photoactivatable, spatially restricted RNA oxidation to specifically label RNA in proximity to a subcellularly localized bait protein. After labeling, these oxidative RNA marks are then read out via high-throughput sequencing due to their ability to induce predictable misincorporation events by reverse transcriptase. These induced mutations are then quantitatively assessed for each gene using our software package PIGPEN. The observed mutation rate for a given RNA species is therefore related to its proximity to the localized bait protein. This protocol describes procedures for assaying RNA localization via OINC-seq experiments as well as computational procedures for analyzing the resulting data using PIGPEN.

0 Q&A 1271 Views Jul 20, 2025

The root meristem navigates the highly variable soil environment where water availability limits water absorption, slowing or halting growth. Traditional studies use uniform high osmotic potentials, poorly representing natural conditions where roots gradually encounter increasing osmotic potentials. Uniform high osmotic potentials reduce root growth by inhibiting cell division and shortening mature cell length. This protocol describes a simple and effective in vitro system using a gradient mixer that generates a vertical gradient in an agar gel based on the principle of communicating vessels, exploiting gravity to generate a continuous mannitol concentration gradient (from 0 to 400 mM mannitol) reaching osmotic potentials of -1,2 MPa. It enables long-term Arabidopsis root growth analysis under progressive water deficit, improving phenotyping and molecular studies in soil-like conditions.

0 Q&A 1602 Views Jul 20, 2025

Transcriptional pausing dynamically regulates spatiotemporal gene expression during cellular differentiation, development, and environmental adaptation. Precise measurement of pausing duration, a critical parameter in transcriptional control, has been challenging due to limitations in resolution and confounding factors. We introduce Fast TV-PRO-seq, an optimized protocol built on time-variant precision run-on sequencing (TV-PRO-seq), which enables genome-wide, single-base resolution mapping of RNA polymerase II pausing times. Unlike standard PRO-seq, Fast TV-PRO-seq employs sarkosyl-free biotin-NTP run-on with time gradients and integrates on-bead enzymatic reactions to streamline workflows. Key improvements include (1) reducing experimental time from 4 to 2 days, (2) reducing cell input requirements, and (3) improved process efficiency and simplified command-line operations through the use of bash scripts.

0 Q&A 1378 Views Jul 5, 2025

Since the creation of the Global Polio Eradication Initiative (GPEI) in 1988, significant progress has been made toward attaining a poliovirus-free world. This has resulted in the eradication of wild poliovirus (WPV) serotypes two (WPV2) and three (WPV3) and limited transmission of serotype one (WPV1) in Pakistan and Afghanistan. However, the increased emergence of circulating vaccine-derived poliovirus (cVDPV) and the continued circulation of WPV1, although limited to two countries, pose a continuous threat of international spread of poliovirus. These challenges highlight the need to further strengthen surveillance and outbreak responses, particularly in the African Region (AFRO). Phylogeographic visualization tools may provide insights into changes in poliovirus epidemiology, which can in turn guide the implementation of more strategic and effective supplementary immunization activities and improved outbreak response and surveillance. We created a comprehensive protocol for the phylogeographic analysis of polioviruses using Nextstrain, a powerful open-source tool for real-time interactive visualization of virus sequencing data. It is expected that this protocol will support poliovirus elimination strategies in AFRO and contribute significantly to global eradication strategies. These tools have been utilized for other pathogens of public health importance, for example, SARS-CoV-2, human influenza, Ebola, and Mpox, among others, through real-time tracking of pathogen evolution (https://nextstrain.org), harnessing the scientific and public health potential of pathogen genome data.

0 Q&A 1688 Views Jul 5, 2025

The complexity of the human transcriptome poses significant challenges for complete annotation. Traditional RNA-seq, often limited by sensitivity and short read lengths, is frequently inadequate for identifying low-abundant transcripts and resolving complex populations of transcript isoforms. Direct long-read sequencing, while offering full-length information, suffers from throughput limitations, hindering the capture of low-abundance transcripts. To address these challenges, we introduce a targeted RNA enrichment strategy, rapid amplification of cDNA ends coupled with Nanopore sequencing (RACE-Nano-Seq). This method unravels the deep complexity of transcripts containing anchor sequences—specific regions of interest that might be exons of annotated genes, in silico predicted exons, or other sequences. RACE-Nano-Seq is based on inverse PCR with primers targeting these anchor regions to enrich the corresponding transcripts in both 5' and 3' directions. This method can be scaled for high-throughput transcriptome profiling by using multiplexing strategies. Through targeted RNA enrichment and full-length sequencing, RACE-Nano-Seq enables accurate and comprehensive profiling of low-abundance transcripts, often revealing complex transcript profiles at the targeted loci, both annotated and unannotated.

0 Q&A 2382 Views Apr 20, 2025

With reduced genotyping costs, genome-wide association studies (GWAS) face more challenges in diverse populations with complex structures to map genes of interest. The complex structure demands sophisticated statistical models, and increased marker density and population size require efficient computing tools. Many statistical models and computing tools have been developed with varied properties in statistical power, computing efficiency, and user-friendly accessibility. Some statistical models were developed with dedicated computing tools, such as efficient mixed model analysis (EMMA), multiple loci mixed model (MLMM), fixed and random model circulating probability unification (FarmCPU), and Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK). However, there are computing tools (e.g., GAPIT) that implement multiple statistical models, retain a constant user interface, and maintain enhancement on input data and result interpretation. In this study, we developed a protocol utilizing a minimal set of software tools (BEAGLE, BLINK, and GAPIT) to perform a variety of analyses including file format conversion, missing genotype imputation, GWAS, and interpretation of input data and outcome results. We demonstrated the protocol by reanalyzing data from the Rice 3000 Genomes Project and highlighting advancements in GWAS model development.

0 Q&A 1290 Views Apr 20, 2025

Bayesian phylogenetic analysis is essential for elucidating evolutionary relationships among organisms. Traditional methods often rely on fixed models and manual parameter settings, which can limit accuracy and efficiency. This protocol presents an integrated workflow that leverages GUIDANCE2 for rigorous sequence alignment, ProtTest and MrModeltest for robust model selection, and MrBayes for phylogenetic tree estimation through Bayesian inference. By automating key steps and providing detailed command-line instructions, this protocol enhances the reliability and reproducibility of phylogenetic studies.

0 Q&A 1396 Views Mar 5, 2025

The limited standards for the rigorous and objective use of mitochondrial genomes (mitogenomes) can lead to uncertainties regarding the phylogenetic relationships of taxa under varying evolutionary constraints. The mitogenome exhibits heterogeneity in base composition, and evolutionary rates may vary across different regions, which can cause empirical data to violate assumptions of the applied evolutionary models. Consequently, the unique evolutionary signatures of the dataset must be carefully evaluated before selecting an appropriate approach for phylogenomic inference. Here, we present the bioinformatic pipeline and code used to expand the mitogenome phylogeny of the order Carcharhiniformes (groundsharks), with a focus on houndsharks (Chondrichthyes: Triakidae). We present a rigorous approach for addressing difficult-to-resolve phylogenies, incorporating multi-species coalescent modelling (MSCM) to address gene/species tree discordance. The protocol describes carefully designed approaches for preparing alignments, partitioning datasets, assigning models of evolution, inferring phylogenies based on traditional site-homogenous concatenation approaches as well as under multispecies coalescent and site heterogenous models, and generating statistical data for comparison of different topological outcomes. The datasets required to run our analyses are available on GitHub and Dryad repositories.

0 Q&A 1645 Views Mar 5, 2025

Mitochondrial genomes (mitogenomes) display relatively rapid mutation rates, low sequence recombination, high copy numbers, and maternal inheritance patterns, rendering them valuable blueprints for mapping lineages, uncovering historical migration patterns, understanding intraspecific population dynamics, and investigating how environmental pressures shape traits underpinned by genetic variation. Here, we present the bioinformatic pipeline and code used to assemble and annotate the complete mitogenomes of five houndsharks (Chondrichthyes: Triakidae) and compare them to the mitogenomes of other closely related species. We demonstrate the value of a combined assembly approach for detecting deviations in mitogenome structure and describe how to select an assembly approach that best suits the sequencing data. The datasets required to run our analyses are available on the GitHub and Dryad repositories.