计算生物学与生物信息学


分类

现刊
0 Q&A 261 Views Apr 20, 2025

Bayesian phylogenetic analysis is essential for elucidating evolutionary relationships among organisms. Traditional methods often rely on fixed models and manual parameter settings, which can limit accuracy and efficiency. This protocol presents an integrated workflow that leverages GUIDANCE2 for rigorous sequence alignment, ProtTest and MrModeltest for robust model selection, and MrBayes for phylogenetic tree estimation through Bayesian inference. By automating key steps and providing detailed command-line instructions, this protocol enhances the reliability and reproducibility of phylogenetic studies.

0 Q&A 351 Views Apr 20, 2025

With reduced genotyping costs, genome-wide association studies (GWAS) face more challenges in diverse populations with complex structures to map genes of interest. The complex structure demands sophisticated statistical models, and increased marker density and population size require efficient computing tools. Many statistical models and computing tools have been developed with varied properties in statistical power, computing efficiency, and user-friendly accessibility. Some statistical models were developed with dedicated computing tools, such as efficient mixed model analysis (EMMA), multiple loci mixed model (MLMM), fixed and random model circulating probability unification (FarmCPU), and Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK). However, there are computing tools (e.g., GAPIT) that implement multiple statistical models, retain a constant user interface, and maintain enhancement on input data and result interpretation. In this study, we developed a protocol utilizing a minimal set of software tools (BEAGLE, BLINK, and GAPIT) to perform a variety of analyses including file format conversion, missing genotype imputation, GWAS, and interpretation of input data and outcome results. We demonstrated the protocol by reanalyzing data from the Rice 3000 Genomes Project and highlighting advancements in GWAS model development.

往期刊物
0 Q&A 268 Views Mar 5, 2025

The limited standards for the rigorous and objective use of mitochondrial genomes (mitogenomes) can lead to uncertainties regarding the phylogenetic relationships of taxa under varying evolutionary constraints. The mitogenome exhibits heterogeneity in base composition, and evolutionary rates may vary across different regions, which can cause empirical data to violate assumptions of the applied evolutionary models. Consequently, the unique evolutionary signatures of the dataset must be carefully evaluated before selecting an appropriate approach for phylogenomic inference. Here, we present the bioinformatic pipeline and code used to expand the mitogenome phylogeny of the order Carcharhiniformes (groundsharks), with a focus on houndsharks (Chondrichthyes: Triakidae). We present a rigorous approach for addressing difficult-to-resolve phylogenies, incorporating multi-species coalescent modelling (MSCM) to address gene/species tree discordance. The protocol describes carefully designed approaches for preparing alignments, partitioning datasets, assigning models of evolution, inferring phylogenies based on traditional site-homogenous concatenation approaches as well as under multispecies coalescent and site heterogenous models, and generating statistical data for comparison of different topological outcomes. The datasets required to run our analyses are available on GitHub and Dryad repositories.

0 Q&A 308 Views Mar 5, 2025

Mitochondrial genomes (mitogenomes) display relatively rapid mutation rates, low sequence recombination, high copy numbers, and maternal inheritance patterns, rendering them valuable blueprints for mapping lineages, uncovering historical migration patterns, understanding intraspecific population dynamics, and investigating how environmental pressures shape traits underpinned by genetic variation. Here, we present the bioinformatic pipeline and code used to assemble and annotate the complete mitogenomes of five houndsharks (Chondrichthyes: Triakidae) and compare them to the mitogenomes of other closely related species. We demonstrate the value of a combined assembly approach for detecting deviations in mitogenome structure and describe how to select an assembly approach that best suits the sequencing data. The datasets required to run our analyses are available on the GitHub and Dryad repositories.

0 Q&A 225 Views Mar 5, 2025

Non-small cell lung cancer (NSCLC) is the most common type of lung cancer. According to 2020 reports, globally, 2.2 million cases are reported every year, with the mortality number being as high as 1.8 million patients. To study NSCLC, systems biology offers mathematical modeling as a tool to understand complex pathways and provide insights into the identification of biomarkers and potential therapeutic targets, which aids precision therapy. Mathematical modeling, specifically ordinary differential equations (ODEs), is used to better understand the dynamics of cancer growth and immunological interactions in the tumor microenvironment. This study highlighted the dual role of the cyclic GMP-AMP synthase–stimulator of interferon genes (cGAS/STING) pathway's classical involvement in regulating type 1 interferon (IFN I) and pro-inflammatory responses to promote tumor regression through senescence and apoptosis. Alternative signaling was induced by nuclear factor kappa B (NF-κB), mutated tumor protein p53 (p53), and programmed death-ligand1 (PD-L1), which lead to tumor growth. We identified key regulators in cancer progression by simulating the model and validating it with the following model estimation parameters: local sensitivity analysis, principal component analysis, rate of flow of metabolites, and model reduction. Integration of multiple signaling axes revealed that cGAS-STING, phosphoinositide 3-kinases (PI3K), and Ak strain transforming (AKT) may be potential targets that can be validated for cancer therapy.

0 Q&A 317 Views Feb 5, 2025

Cellular communication relies on the intricate interplay of signaling molecules, which come together to form the cell–cell interaction (CCI) network that orchestrates tissue behavior. Researchers have shown that shallow neural networks can effectively reconstruct the CCI from the abundant molecular data captured in spatial transcriptomics (ST). However, in scenarios characterized by sparse connections and excessive noise within the CCI, shallow networks are often susceptible to inaccuracies, leading to suboptimal reconstruction outcomes. To achieve a more comprehensive and precise CCI reconstruction, we propose a novel method called triple-enhancement-based graph neural network (TENET). The TENET framework has been implemented and evaluated on both real and synthetic ST datasets. This protocol primarily introduces our network architecture and its implementation.

0 Q&A 471 Views Feb 5, 2025

Dual RNA-Seq technology has significantly advanced the study of biological interactions between two organisms by allowing parallel transcriptomic analysis. Existing analysis methods employ various combinations of open-source bioinformatics tools to process dual RNA-Seq data. Upon reviewing these methods, we intend to explore crucial criteria for selecting standard tools and methods, especially focusing on critical steps such as trimming and mapping reads to the reference genome. In order to validate the different combinatorial approaches, we performed benchmarking using top-ranking tools and a publicly available dual RNA-Seq Sequence Read Archive (SRA) dataset. An important observation while evaluating the mapping approach is that when the adapter trimmed reads are first mapped to the pathogen genome, more reads align to the pathogen genome than the unmapped reads derived from the traditional host-first mapping approach. This mapping method prevents the misalignment of pathogen reads to the host genome due to their shorter length. In this way, the pathogenic read information found at lesser proportions in a complex eukaryotic dataset is precisely obtained. This protocol presents a comprehensive comparison of these possible approaches, resulting in a robust unified standard methodology.

0 Q&A 1128 Views Jan 20, 2025

Stable-isotope resolved metabolomics (SIRM) is a powerful approach for characterizing metabolic states in cells and organisms. By incorporating isotopes, such as 13C, into substrates, researchers can trace reaction rates across specific metabolic pathways. Integrating metabolomics data with gene expression profiles further enriches the analysis, as we demonstrated in our prior study on glioblastoma metabolic symbiosis. However, the bioinformatics tools for analyzing tracer metabolomics data have been limited. In this protocol, we encourage the researchers to use SIRM and transcriptomics data and to perform the downstream analysis using our software tool DIMet. Indeed, DIMet is the first comprehensive tool designed for the differential analysis of tracer metabolomics data, alongside its integration with transcriptomics data. DIMet facilitates the analysis of stable-isotope labeling and metabolic abundances, offering a streamlined approach to infer metabolic changes without requiring complex flux analysis. Its pathway-based "metabologram" visualizations effectively integrate metabolomics and transcriptomics data, offering a versatile platform capable of analyzing corrected tracer datasets across diverse systems, organisms, and isotopes. We provide detailed steps for sample preparation and data analysis using DIMet through its intuitive, web-based Galaxy interface. To showcase DIMet's capabilities, we analyzed LDHA/B knockout glioblastoma cell lines compared to controls. Accessible to all researchers through Galaxy, DIMet is free, user-friendly, and open source, making it a valuable resource for advancing metabolic research.

0 Q&A 496 Views Jan 5, 2025

Cell-generated forces play a critical role in driving and regulating complex biological processes, such as cell migration and division and cell and tissue morphogenesis in development and disease. Traction force microscopy (TFM) is an established technique developed in the field of mechanobiology used to quantify cellular forces exerted on soft substrates and internal mechanical tissue stresses. TFM measures cell-generated traction forces in 2D or 3D environments with varying mechanical and biochemical properties. This technique involves embedding fiducial markers in the substrate, imaging substrate deformations caused by the cells, and using mathematical models to infer forces. This protocol compiles procedures from various previously published studies and software packages and describes how to perform TFM on 2D micropatterned substrates. Although not the focus of this protocol, the methods and software packages shown here also allow to perform monolayer stress microscopy (MSM), a method to calculate internal mechanical stress within the cells by modeling them as a thin plate with linear and homogeneous material properties. TFM and MSM are non-invasive methods capable of yielding spatially and temporally resolved force and stress maps with high throughput. As such, they enable the generation of rich datasets, which can provide valuable insights into the roles of cell-generated forces in various physiological and pathological processes.

0 Q&A 226 Views Jan 5, 2025

Magnetic resonance imaging (MRI) is an invaluable method of choice for anatomical and functional in vivo imaging of the brain. Still, accurate delineation of the brain structures remains a crucial task of MR image evaluation. This study presents a novel analytical algorithm developed in MATLAB for the automatic segmentation of cerebrospinal fluid (CSF) spaces in preclinical non-contrast MR images of the mouse brain. The algorithm employs adaptive thresholding and region growing to accurately and repeatably delineate CSF space regions in 3D constructive interference steady-state (3D-CISS) images acquired using a 9.4 Tesla MR system and a cryogenically cooled transmit/receive resonator. Key steps include computing a bounding box enclosing the brain parenchyma in three dimensions, applying an adaptive intensity threshold, and refining CSF regions independently in sagittal, axial, and coronal planes. In its original application, the algorithm provided objective and repeatable delineation of CSF regions in 3D-CISS images of sub-optimal signal-to-noise ratio, acquired with (33 μm)3 isometric voxel dimensions. It allowed revealing subtle differences in CSF volumes between aquaporin-4-null and wild-type littermate mice, showing robustness and reliability. Despite the increasing use of artificial neural networks in image analysis, this analytical approach provides robustness, especially when the dataset is insufficiently small and limited for training the network. By adjusting parameters, the algorithm is flexible for application in segmenting other types of anatomical structures or other types of 3D images. This automated method significantly reduces the time and effort compared to manual segmentation and offers higher repeatability, making it a valuable tool for preclinical and potentially clinical MRI applications.

0 Q&A 453 Views Nov 5, 2024

Genome-wide gene expression analysis is a commonly used method to quantitatively examine the transcriptional signature of any tissue or cell state. Standard bulk cell RNA sequencing (RNA-seq) quantifies RNAs in the cells of the tissue type of interest through massive parallel sequencing of cDNA synthesized from the cellular RNA. The subsequent analysis of global RNA expression and normalization of RNA expression levels between two or more samples generally assumes that cells from all samples produce equivalent amounts of RNA per cell. This assumption may be invalid in cells where MYC or MYCN expression levels are markedly different and thus, overall mRNA expression per cell is altered. Here, we describe an approach for RNA-seq analysis of MYCN-amplified neuroblastoma cells during treatment with retinoic acid, which causes dramatic downregulation of MYCN expression and induces growth arrest and differentiation of the cells. Our procedure employs spiked-in RNA standards added in ratio to the number of cells in each sample prior to RNA extraction. In the analysis of differential gene expression, the expression level of each gene is standardized to the spiked-in RNA standard to accurately assess gene expression levels per cell in conditions of high and low MYCN expression. Our protocol thus provides a step-by-step experimental approach for normalizing RNA-seq expression data on a per-cell-number basis, allowing accurate assessment of differential gene expression in cells expressing markedly different levels of MYC or MYCN.

0 Q&A 438 Views Oct 20, 2024

Single-cell transcriptomic analyses have emerged as very powerful tools to query the gene expression changes at the single-cell level in physiological and pathological conditions. The quality of the analysis is heavily dependent on tissue digestion protocols, with the goal of preserving thousands of single live cells to submit to the subsequent processing steps and analysis. Multiple digestion protocols that use different enzymes to digest the tissues have been described. Harsh digestion can damage certain cell types, but this might be required to digest especially fibrotic tissue as in our experimental condition. In this paper, we summarize a collagenase type I digestion protocol for preparing the single-cell suspension from fibrovascular tissues surgically removed from patients with proliferative diabetic retinopathy (PDR) for single-cell RNA sequencing (scRNA-Seq) analyses. We also provide a detailed description of the data analysis that we implemented in a previously published study.