[version 1; peer review: 2 approved with reservations]

No competing interests were disclosed.

What policy makers and analysts are interested in during an outbreak is the times series
_{t}

During the outbreak, this is typically not what is available in real time. Instead, we shall have a closer look at two commonly used times series. The first time series is what has been reported by countries and regions to the WHO
^{
1
}, the US CDC
^{
2
}, the European CDC
^{
3
} and that eventually end up in dashboards like Johns Hopkins University
^{
4,
5
} or Our World in Data
^{
6
}. This is the number of new deaths
_{t}

Note that this number is only on very rare occasions updated retrospectively. Normally, a new number is added to the time series every day. Also note that we know nothing about when the deaths happened, the event date, we only know the number of deaths reported on each day.

The second time series is a time series that some countries, e.g. Sweden
^{
7–
9
}, Belgium
^{
10
} and the UK
^{
11,
12
}, make available on the internet. This data is normally a de-identified extract from the national surveillance and reporting system. This series has information about when the deaths actually happened. We will call this data set the event-based series
^{T}

This data set is a time series reported on day

The two time-series as reported by Sweden on 2020-05-01 have been plotted in
^{T}
^{T}

he R-series and the
^{T}
^{T}

The cumulative R-series and the cumulative
^{T}
^{T}-series. The cumulative number of deaths coincide for both series on the reporting day.

In addition to comparing the
^{T}

It turns out that there is a rather elegant mathematical framework for the two series in terms of matrix algebra. For the interested reader we have put all of that in the Appendix (see
^{
13
}). One of the key findings is equation (A16) which is the mathematical relationship between the
^{T}
_{t}

where

The expression (4) makes intuitive sense. It says that the number of deaths reported on day

Now, typically the reporting process does not change that much over time, in which case one may want to find the average cumulative distribution function and the average probability mass function. See expression (A15) in the Appendix (see
^{
13
}) for how this is done. Also see the equation (A20) in the Appendix (see
^{
13
}) for the inverse of (4), i.e.
_{t}

In mathematical terms, the expression (5) is a so-called convolution of the true signal and the time-dependent filter. Filters are used in many areas including epidemiology, although its main application is in signal processing. Interestingly, studies of time-dependent filters relevant to this paper can be found in geology, e.g. in
^{
14,
15
}. The main insight here is that the practice of reporting daily deaths, the

As can be seen in
^{T}
^{T}
^{
16–
22
}. In the Appendix (see
^{
13
}) we derive a very simple expression for the
_{e}

Intuitively this expression makes sense. It says that one could get the final number of deaths on an event day

The computations in this paper have been based on the fundamental data set
^{T}
^{
13
}

We will now apply the methods described above and, in the Appendix (see
^{
13
}), to analyze the relationship between the reporting series and the event-based series as reported by Sweden.

As we have argued, the reporting process is characterized by the two time-dependent distribution functions

Plot of all cdfs

In

Plot of the mean probability mass function corresponding to the mean cdf across all event days between 2020-04-02 and 2020-06-01.

Furthermore, from inspection of the pmf in
^{
24
}.

In order to better understand the weekly periodic pattern in the
^{T}

Plot of the average cdfs by weekday, for all event days between 2020-04-02 and 2020-06-01.

We also plot the pmfs in

Plot of the average pmfs by weekday, for all event days between 2020-04-02 and 2020-06-01.

Next, in order to isolate the effect of the reporting process, we compute the effect of the Swedish reporting process on a hypothetical smooth bell shaped death curve, much like we have done for two simple examples in Figure A11 and Figure A12 in the Appendix (see
^{
13
}). The result can be seen in

Plot of the Swedish reporting process applied to a hypothetical smooth bell-shaped actual deaths curve. The

Again, the daily, time-dependent pmfs produce a highly variable output just like the observed reporting series. Contrasting this output to the output one gets using the mean time-independent pmf, we conclude that the periodic pattern in the reporting series has its origin in characteristics of the reporting process rather than in the characteristics of the actual deaths curve. Although this is perhaps not very surprising, the amplitude of the periodic pattern is surprising. Relatively small differences in the reporting processes from one weekday to the next results in these wild swings in the reported daily deaths. Is there an opportunity to give guidance to the countries for how to reduce these resulting swings? We also note that the Swedish reporting process modifies the shape of the hypothetical death curve in three ways, just like the hypothetical cases in the Appendix (see
^{
13
}). First, there is a clear time shift. Second, the peak is lower. Third, the slope is flatter, both as deaths increase and decrease.

As we have seen, the Swedish

Additionally, if initial conditions need to be specified for the magnitude as well as the slope of the death curve, some kind of smoothing of the
^{T}
^{T}
^{T}

One model that relies on the shape of the death curve is the model developed by the Institute of Health Metrics and Evaluations (IHME) at University of Washington
^{
25
}. In
^{
26
}. We have also plotted the transformed
^{T}
^{T}

Plot showing the IHME smoothing of the R-series on 2020-06-03 in comparison with the Transformed
^{T}
^{T}
^{T}

At this particular instance the IHME modelers were unfortunate to produce a smoothing that drastically changed the outcome of the model and we can see that it deviates significantly from the transformed
^{T}
^{T}

Please note that by only using the smoothed

Turning our attention to the
^{T}
^{
17
}. We wanted to use both a time independent and a time-dependent approximation and used the following two naive approximations

where
^{T}

Graph showing the
^{T}
^{T}
^{T}

Graph showing the zoomed-in
^{T}
^{T}
^{T}

Generally, the Harvard nowcast algorithm performs better, but since also the Harvard nowcast fluctuates, it would be interesting to see if there are improvements that can be made by taking the weekly pattern better into account. It should be noted that we have used default settings for the Harvard nowcast. Furthermore, one can likely improve upon the approximations (7) and (8) but nowcasting is not the focus of this paper. The takeaway message here is that there are good nowcast methods available to analysts if they have access to the fundamental data set

By viewing the
^{T}
^{T}
^{T}

To remedy this for the Swedish
^{T}
^{T}
^{T}
^{T}

So, what is the best method? In cases where the there is a significant difference between the
^{T}
^{T}
^{T}
^{T}
^{T}

Coming back to the Swedish situation, back in May it was not a trivial task to interpret the death curve and seeing the trend if you only had access to the
^{T}
^{T}

The primary data set analyzed in this paper was constructed by downloading the
^{T}
^{
7
} between 2020-04-02 and 2020-07-09. The data is publicly available and is considered public domain data.

Zenodo: On the use of real-time mortality data in modelling and analysis during an epidemic outbreak – underlying data

^{
23
}.

This project contains the following underlying data:

Data are available under the terms of the

In the article we also discuss and plot data generated by
^{
26
} in the 2020-06-05 update of their model. The terms and conditions can be found or their website and states for non-commercial users:

The Appendix to this article as well as the R-code to generate the graphs and the nowcasts are available as extended data.

Zenodo: On the use of real-time mortality data in modelling and analysis during an epidemic outbreak – extended data.

^{
13
}.

This project contains the following extended data:

Data are available under the terms of the

We would like to acknowledge Dr. Sarah McGough for assistance with the Harvard nowcasting r-package.

This paper presents an interesting examination of how aggregate death reporting can misrepresent the true mortality curve during the COVID-19 pandemic. Using case data, which captures the true dates of when deaths happened, not just when they were reported, the author developed a mathematical framework for compare the sources of deaths. Through this framework, the author goes on to compare the now-casting with their framework with those of IHME and Harvard.

While the methods are sound, the structure of this manuscript was unconventional, making it somewhat challenging to follow. The author provided only limited background information, and motivation for this work was not overtly clear to the reader. I would like to see more background provided about why we care about these differences in deaths reporting.

The methods and results were similarly lacking in structure. I was not completely clear what the authors did in terms of methods, particularly with respect to the now-casting, and much of the results are presented as a conversational commentary, rather than scientific findings. It would have been good to seen a quantified impact of using the R-series versus the D-series.

Finally, while I believe this work provides novel findings, I believe these findings could be more clearly presented and emphasized, particularly in terms of how the public should be interpreting model results, and how researchers should be accounting for these. Additionally, the author did not discuss limitations of this work or reliability of the data being used. While the "D-series" is ideal, it is often only a subset of individuals across a country.

Is the work clearly and accurately presented and does it cite the current literature?

Partly

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Partly

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

infectious disease epidemiology and modeling, statistics, COVID-19, vaccination

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

The COVID-19 pandemic has already spread throughout the world and the people are aware about the diseases and they are using precautions about the pandemic. Still, COVID-19 is spreading very quickly. Some countries like Spain, Australia, Serbia, China, etc. have started a second wave of COVID-19. To stop the spread of the disease, a vaccine is needed. In absence of the vaccine people must have maintained social distancing. In order to maintain the social distancing, one must obey the modeling rule.

The introduction needs to be improved by incorporating some recent references of COVID-19 pandemic. To do so, I suggest some modeling work to be included in the references such as: Sakar
^{
1
}, Khajanchi & Sakar (2020)
^{
2
}, and Samui
^{
3
}.

In this context, an important factor must be included in this study, that is, the impact of the effect of media. How the COVID-19 dynamics been changed due to incorporation of the media related awareness like use of face masks, non-pharmaceutical interventions, hand sanitization, etc. The author must include the manuscript Khajanchi
^{
4
} to study the effect of media.

Is there any experimental data to validate the mathematical model? The authors at least describe the basic reproduction number R
_{0} and its impact on COVID-19 pandemic in India. The basic reproduction number R
_{0} is one of the most crucial quantities in infectious diseases, as R
_{0} measures how contagious a disease is. For R
_{0} < 1, the disease is expected to stop spreading, but for R
_{0} = 1 an infected individual can infect on an average 1 person, that is, the spread of the disease is stable. The disease can spread and become epidemic if R
_{0} must be greater than 1
^{
5
}.

Some references contain errors and inconsistent formatting. It is difficult to give credit to research if even elementary aspects of the work are not error free. This should be corrected with care and love to detail.

The manuscript is comprehensive, and I have enjoyed learning about the presented results. I find that the manuscript is written with very poor language and the presentation is not good, and I am in principal in favor of indexing, although the following comments should nevertheless be accommodated in one major revision.

Is the work clearly and accurately presented and does it cite the current literature?

Partly

If applicable, is the statistical analysis and its interpretation appropriate?

Partly

Are all the source data underlying the results available to ensure full reproducibility?

Partly

Is the study design appropriate and is the work technically sound?

Partly

Are the conclusions drawn adequately supported by the results?

Partly

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Infectious diseases, Ecological systems, Tumor-immune interactions.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.