Sequence data acquisition and alignment
本实验方法提取自研究论文:
Amino acid exchangeabilities vary across the tree of life
Sci Adv, Dec 4, 2019; DOI: 10.1126/sciadv.aax3124

Sequence data used were retrieved from various sources listed in table S1. Coding sequence alignments of four mammalian clades, fruitflies, and yeasts were directly retrieved from respective databases. For each of the other eukaryotic clades, we queried in Ensembl (https://useast.ensembl.org/index.html) a list of all one-to-one orthologous genes for the pair of species and downloaded their coding sequences. The coding sequences were translated to protein sequences using Multiple Alignment of Coding Sequences (MACSE) v1.02 (40). Local pairwise protein sequence alignment was performed for each pair of orthologs by Multiple Alignment using Fast Fourier Transform (MAFFT) v7.294b (41) using the L-INS-i algorithm. The corresponding coding sequence alignment was then derived using a custom Python script. All prokaryotic clades were sampled from the strains available in the Alignable Tight Genomic Clusters (ATGC) database (42). All alignments were filtered so that no gaps, missing data, or ambiguous codons exist. The alignments and relevant Python scripts have been deposited to GitHub (https://github.com/ztzou/REvariation).

For the analyses of orthologous versus nonorthologous genes between the rodent clade and the avian clade, we downloaded all coding sequences of mouse, rat, chicken, and turkey from Ensembl 84. In each species, the longest transcript of each gene was retained for subsequent analysis. We then obtained from Ensembl a list of one-to-one orthologs between mouse and rat, a list of one-to-one orthologs between chicken and turkey, and a list of one-to-one orthologs between mouse and chicken. We compared REs respectively estimated from four groups of genes: RO, AO, RN, and AN. RO refers to the group of genes that appear on both the first and third lists. AO refers to the group of genes that appear on both the second and third lists. RN refers to the group of genes that appear on the first list but not on the third list. AN refers to the group of genes that appear on the second list but not on the third list.

注意:以上内容是从某篇研究文章中自动提取的,可能无法正确显示。



Q&A
请登录并在线提交您的问题
您的问题将发布在Bio-101网站上。我们会将您的问题发送给本研究方案的作者和具有相关研究经验的Bio-protocol成员。我们将通过您的Bio-protocol帐户绑定邮箱进行消息通知。