The statistical analysis is described in detail in the published protocol.24 GHQ-12 items were reported using two methods. In the first method, item responses are assigned to the values 0, 0, 1, 1 (from the most positive to the most negative sentiment) and summed to form an aggregate score from zero (least distressed) to 12 (most distressed). Using this method, a score of >3 is indicative of case-level distress.30 The second method assigns responses to 0, 1, 2, 3 (positive to negative sentiment) producing a score in the range 0–36, with zero representing the most healthy response (no psychological distress) and 36 the most unhealthy (maximal psychological distress). By presenting the two different scoring methods, we can both report the prevalence of case-level distress across the sample (0-0-1-1 scoring method) and more sensitively detect changes within the sample over the three phases of the pandemic (0-1-2-3 scoring method).

IES-R responses were analysed by assigning the responses to 0, 1, 2, 3, 4 (positive to negative) producing a score in the range 0 (no trauma) to 88 (maximal trauma). A score of 24 or above indicates a clinically significant traumatic stress response, a score above 33 indicates best cut-off for a diagnosis of ‘probable PTSD’.33 34

The change over time in the GHQ-12 (phases I, II and III) and IES-R scores (phases II and III) among participants who responded to all three surveys was examined with repeated measures linear mixed-effect models, with survey phase as the single fixed effect and a participant-level random effect. These model describe the association between pandemic phase and psychological distress (GHQ-12) and trauma (IES-R).

To identify potential modifiers of the change in GHQ-12-score or IES-R-score over time, further models were constructed for each of the measured personal and professional variables. Each model included the single variable of interest, survey phase, their interaction (to allow for a change in the association between the outcome and the variable over time) and a participant-level random effect as before. Responses where the variable value was missing were removed.35 Nagakawa’s marginal R2 was used to measure the proportion of outcome variance accounted for by the model (excluding random effects, ie, when there is no a priori knowledge of the expected outcome for each participant). Values vary from 0 to 1, with 1 occurring when the model perfectly predicts the outcome, and 0 occurring when the model only returns the population average.

Finally, a comparison analysis done to compare distress and trauma outcomes in those who completed all three surveys against those who dropped out.