Microarray and mRNA expression analysis. All the microarray experiment and data analysis was done as per our earlier publication (Spurrier et al., 2018) using only heads from five biological replicates (each consisting of 50 male flies) for both growth conditions and each time point. Since this experiment was performed using only heads, we do not expect to see body specific effects such as expression changes selective for midgut. A detailed experimental description is as follows: Mated male flies were raised together to achieve the required age, collected and flash-frozen using liquid nitrogen and stored at -80°C. Once flies for all aging conditions were collected, total RNA was extracted from head samples using TRIzol. RNA was purified using 0.3M sodium acetate final concentration and ethanol. Purified RNA was processed and labeled according to the manufacturer’s guidelines for use with the DroGene 1.0 ST GeneChip (Affymetrix, GeneChip Whole Transcript Sense Target Labeling). The Scanner 3000 (Affymetrix) was used along with the GeneChip Operation Software to generate one .CEL file per hybridized cRNA. These .CEL files were then imported into the Transcriptome Analysis Console and the Console was used to generate robust multi array average (RMA) normalized expression values per gene probe per imported file. This expression was then exported from the Console and imported into R for analysis. Quality of data was assessed and confirmed via Tukey box plot, covariance-based PCA scatter plot, and correlation-based heat map. To remove noise-biased expression, the mean and Coefficient of Variation (CV) per gene probe per sample class was calculated. Lowess was then used to model CV by mean expression per sample class producing one fit per sample class. These fits were then over-plotted to identify the common low-end expression value where the linear relationship between mean expression (signal) and CV (noise) was grossly lost (mean expression value=4.0). Expression values less than this value were floored to this value, while gene probes not having at least one sample greater than this value were discarded as non-informative. Annotations for gene probes not discarded were obtained from NetAffx (Affymetrix) and FlyBase (www.flybase.org). Expression per gene probe was tested for linear correlation with age and separately tested for differential expression across sample classes.

To test for linear correlation with age, polyserial correlation was used under a leave-one-out condition (library=polycor). Specifically, with each sample drop, we used polyserial correlation to generate a rho per gene testing the observed expression vs age under non-randomized condition and randomized condition. Z-scores were then calculated for each rho generated under non-randomized condition using the mean and SD of the rho estimates generated under randomized condition. P-values corresponding with these scores were then corrected using the Benjamini-Hochberg procedure. We considered genes as age markers if they had a corrected P < 0.05 under leave-one-out condition 100% of the time and were similarly significant under no leave-out-condition.

To test for differential expression across sample classes, the one-way analysis of variance (ANOVA) test was applied (Type III, library=car) using sample class as the factor under Benjamini–Hochberg false discovery rate multiple comparison correction (MCC) condition (library=multtest). Probes observed to have a corrected P < 0.05 by this test were deemed to have differential expression across the sample classes and further post hoc tested via Tukey’s Honest Significance Difference (HSD) to identify which sample class comparisons probe expression was significantly different (post hoc p<0.05 and an absolute difference of means >1.5-fold). To determine which of the differential probes identified can robustly classify age, leave-one-out (LOO) testing was employed using the same method described earlier (Spurrier et al., 2018). For each LOO round, gene probes identified to have differential expression for at least one sample class comparison were used to construct a k-nearest neighbor (k-NN) model and predict the class of the left-out sample (Dudoit et al., 2002). Gene probes selected 100% of the time over all LOO rounds were deemed to be robust classifiers of age. These probes were then used to construct a principal component seeded AIC-optimized linear model using expression for day 3, 10, 30 and 45 control samples only. This model was then used to predict the physiological age of each biological replicate. Statistical differences in the predictions produced by the model were tested using the one-way ANOVA again (Type III) under MCC condition followed by Tukey’s HSD test.

The same k-NN/LOO method was used to select age-classifier genes from the axenic samples, but using only day 10, 30, and 45 axenic samples, and for comparison, the method was re-applied to control samples, but limited to day 10, 30, and 45, as specified in the text.