Also in the Article



In principle, it should be possible to calculate/estimate the theoretical allele frequencies in Sample A for each of the variants identified in the individual cell lines starting from the allele frequencies and/or the read counts at a locus determined independently for each cell line. This would provide evidence that the mixtures were performed properly and are at expected levels in the pooled final reference. We attempted to build models that used read counts in each cell line related to known positives to predict their VAF by locus in Sample A. VAFs calculated on the deeply sequenced sample A were used to estimate the error of the prediction. To simplify analysis and given the abundance of variants available, we identified private known positives by cell line and performed a linear regression tuned on the private known positives so that the VAF from positives of Sample A could be expressed as a linear combination of the cell line VAFs (positive or 0). The model multipliers were obtained by solving the linear system whose matrix is computed as the convex combination of the depth and the alternate allele counts on a chosen subset of all possible genomic positions. While some models appeared more reasonable than others, we observed large variations in the β estimates for the cell line mix ratios for different models and subsets, resulting in fundamentally unreliable VAF predictions. The instability in estimates was possibly due to the large number of rearrangements in the cancer cell lines, creating inconsistent depth at a given locus from cell line to cell line. A simple linear regression of the VAF of each cell line onto the Sample A VAF provided reasonable if oversimplified results that indicated the cell lines were properly mixed (given the approximate 1:2 mixture of TLY into BLY described elsewhere).

注意:以上内容是从某篇研究文章中自动提取的,可能无法正确显示。



Also in the Article

Q&A
请登录并在线提交您的问题
您的问题将发布在Bio-101网站上。我们会将您的问题发送给本研究方案的作者和具有相关研究经验的Bio-protocol成员。我们将通过您的Bio-protocol帐户绑定邮箱进行消息通知。