This post was written by Derick Cornwall — data scientist and co-founder of Ubilytics
What is the current COVID-19 situation here in Trinidad and Tobago? This is the question on everyone’s mind as the government continues to optimize the delicate balance between the implementation of measures needed to combat the spread of the virus, and its mandate to ensure the economic well-being of citizens.
Knowing what’s going on, however, has been no easy task. Daily reporting of confirmed cases by the Ministry of Health frequently offered very little in the way of useful information. Positive test results were, in many instances, significantly delayed and reported in batches spanning various lengths of time. These positive test results then had to be subsequently processed by the Ministry of Health officials and disaggregated into the actual dates when COVID-19 positive individuals were swabbed. Once that was completed, the data was periodically compiled and presented to the public in the form of an epidemiological update showing the time series of positive cases.
Last lap vs general elections
Ministry of Health epidemiological update released on September 23rd 2020
There has been a lot of contention in the public domain surrounding the nature of activities responsible for driving infections upwards. Ministry of Health experts, at a press conference held on September 23rd, presented the epidemiological update shown above and stated that the anomalous peak on September 2nd was due to “last lap” activities immediately prior to the implementation of restrictions on beaches, bars and restaurants. It was also their considered view that this peak, on September 2nd, was significant due to the sharp decrease in positive cases recorded in the days prior. Interestingly, no mention was made of the general elections held on August 10th nor the sharp increase in positive cases in the days that followed it.
Anomalous peak or sign of data quality issues?
Having a fair bit of perspicacity when it comes to understanding data, we here at Ubilytics immediately saw another explanation for the September 2nd peak. The peak itself was not in fact significant; instead, it only appeared so because data was missing from the entire right-hand side of the epidemiological update. There was, in effect, some sort of systematic underreporting of cases in the epidemiological data which resulted in the apparent anomalous nature of the September 2nd peak.
We waited to see if this missing data would have been added to the following epidemiological update. It wasn’t.
Epidemiological data on the outbreak has never been released in a machine-readable format by the Ministry of Health. Therefore, in order to analyze the epidemiological updates we had to use pixel arithmetic to convert the images of the bar charts into corresponding values of date and number of positive cases which could then be analyzed.
Given the frequent daily releases of batched positive results, the epidemiological updates have always been held as the authoritative perspective on the daily positive case counts. Assuming this is true, suppose we take any one of the epidemiological updates with daily case counts up to some date. It, therefore, stands to reason that, for phase 2 of the outbreak, the sum of all the positive cases reported on a daily basis (batched or otherwise) up to that particular date should correspond closely to the total number of cases in the epidemiological update.
This is the simplest possible test for completeness of the epidemiological data. Sadly, none of the five epidemiological updates given in September passed this test.
We next deconstructed the data from each epidemiological update into what was new to that particular update, and what was there from the update prior. It can be seen that the first three epidemiological updates in September had the largest amounts of new cases added to them, while the last two updates had the least.
We suspect that there is some sort of digitization effort taking place to add new data to the epidemiological updates. It appears that, in the time period between successive updates, information on positive result forms are being processed and entered by hand into a spreadsheet to create the data needed to generate the next epidemiological update.
There appears to be a nominal maximum amount of positive cases that can be processed each day. This upper limit appears to be in the region of 50 cases per day. A doubling of this amount took place for the creation of the second and third epidemiological updates in September. While the last two updates were created by processing less than 50 cases per day, resulting in a considerable increase in the number of missing cases.
Our analysis has revealed that all of the epidemiological updates presented by the Ministry of Health in the month of September do not accurately reflect the progression of COVID-19 in Trinidad and Tobago. There is a growing number of previously reported positive cases that are missing from these updates. Furthermore, apparent efforts to deal with this backlog of missing cases have seemly been reversed.
Given the extreme importance of getting the timing right for the introduction and removal of measures to combat the spread of COVID-19, it is imperative that policy decisions be based on the most timely and accurate data available. The worrying decrease in epidemiological data quality observed necessitates an immediate intervention to remove the backlog of missing cases and put systems in place to automate the processing of test results.
This post was written by Derick Cornwall — data scientist and co-founder of Ubilytics.