The primary source of reference data for the visual interpretation of sampled pixels was Landsat data in the form of cloud-free annual composites and 16-day observations ( Methods of Landsat data processing, cloud filtering, and compositing are described in Potapov et al. (58). Sixteen-day observations were cloud screened for the graphs of spectral indices (normalized difference vegetation index and normalized difference water index) and shortwave infrared band reflectance. Non–cloud-screened 16-day composites were used to identify the exact date of forest loss: Loss events are sometimes visible through haze and translucent clouds, which would have been removed in the automated process of cloud screening. To provide landscape context to the visual interpretation of sampled pixels, annual composites include a subset of 20 by 20 Landsat pixels (circa 36 ha) around the sampled pixel and 16-day composites include a subset of 40 by 40 Landsat pixels (circa 144 ha).

Regionally, only 7% of the 10,000 sampled pixels had, on average, less than one cloud-free Landsat observation per year (fig. S5A). These pixels were clustered in the cloudiest areas along the coast and over the HTFs in the core of the Congo Basin, introducing a spatial bias of data availability. Consistent with this regional pattern, the country mean of average number of cloud-free observations per year for each sampled pixel (fig. S5A) was 0.9 in EQG, 1.1 in GAB, 2.0 in RoC, 3.0 in CAM, 4.1 in DRC, and 4.7 in CAR. In all countries except EQG, the majority of sampled pixels had, on average, one or more cloud-free observations per year, despite the large within-country variability of available data. From all sampled pixels, 56% did not have any years with zero cloud-free Landsat observations (fig. S5B), 20% had only one gap year, 9% had two gap years, 4% had three gap years, 2% had four gap years, and 9% had five or more gap years with zero cloud-free observations. Among the countries, EQG and GAB had the largest percentage of sampled pixels with at least one missing year (99 and 98%, respectively), followed by RoC (82%), CAM (53%), DRC (40%), and CAR (10%). This means that the error of forest loss occurrence and date identification due to the reference data availability was probably the highest in EQG and GAB and the lowest in CAR.

Landsat data availability also varied from year to year owing to the characteristics of the Landsat satellite program (fig. S5C). Year 1999 had the lowest data availability because Landsat 7 was launched in April 1999, and its predecessor Landsat 5 did not have a global data acquisition strategy. Year 1999 data were used only as a pre-2000 benchmark to help identify year 2000 forest cover; thus, the low availability of 1999 data did not affect our results directly. Lower data availability occurred in 2003 (fig. S5C) because of the malfunction of the Landsat 7 Scan Line Corrector. This likely resulted in some underestimation of the year 2003 forest loss. The number of available cloud-free observations increased in 2013 and 2014 after the launch of Landsat 8 (fig. S5C), which might have affected our interpretation results as well, leading to better detection of forest loss closer to the end of the study period.

The secondary source of reference data used primarily to help identify the initial forest cover and forest disturbance driver was very high resolution data from Google Earth. The link opening Google Earth for each sampled pixel is available from the interpretation interface ( From all sampled pixels, 74% had at least one very high resolution (<1 m) image on Google Earth, 7% had image from SPOT satellite (2.5 m resolution), and 19% had only Landsat.