A new paper in PLoS ONE: A statistical model for early estimation of the prevalence of a disease
Abstract Epidemics and pandemics require an early estimate of the cumulative infection prevalence, sometimes referred to as the infection "Iceberg," whose tip are the known cases. Accurate early estimates support better disease monitoring, more accurate estimation of infection fatality rate, and an assessment of the risks from asymptomatic individuals. We find the Pivot group, the population sub-group with the highest probability of being detected and confirmed as positively infected. We differentiate infection susceptibility, assumed to be almost uniform across all population sub-groups at this early stage, from the probability of being confirmed positive. The latter is often related to the likelihood of developing symptoms and complications, which differs between sub-groups (e.g., by age, in the case of the COVID-19 pandemic). A key assumption in our method is the almost-random subgroup infection assumption: The risk of initial infection is either almost uniform across all population sub-groups or not higher in the Pivot sub-group. We then present an algorithm that, using the lift value of the pivot sub-group, finds a lower bound for the cumulative infection prevalence in the population, that is, gives a lower bound on the size of the entire infection "Iceberg." We demonstrate our method by applying it to the case of the COVID-19 pandemic. We use UK and Spain serological surveys of COVID-19 in its first year to demonstrate that the data are consistent with our key assumption, at least for the chosen pivot sub-group. Overall, we applied our methods to nine countries or large regions whose data, mainly during the early COVID-19 pandemic phase, were available: Spain, the UK at two different time points, New York State, New York City, Italy, Norway, Sweden, Belgium, and Israel. We established an estimate of the lower bound of the cumulative infection prevalence for each of them. We have also computed the corresponding upper bounds on the infection fatality rates in each country or region. Using our methodology, we have demonstrated that estimating a lower bound for an epidemic’s infection prevalence at its early phase is feasible and that the assumptions underlying that estimate are valid. Our methodology is especially helpful when serological data are not yet available to gain an initial assessment on the prevalence scale, and more so for pandemics with an asymptomatic transmission, as is the case with Covid-19.