Skip to content

migueldelafuente1/diseaseSIRModel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of contents

  1. SIR Model Introduction
  2. Requirements
  3. Modules Use

SIR Model Introduction

Model of infection based on SIR model, given by this EDO system:

ds/dt = -CONT_RATE*s*i/N_population
di/dt = (CONT_RATE*s*i/N_population) - (RECO_RATE*i) - (MORTALITY*i)
dr/dt = RECO_RATE*i
dd/dt = MORTALITY*i

Where s stands for susceptibe, i for infected, r for recovered and d for dead. CONT_RATE or contagious rate is the average number of contacts between people for a person per day times the probability of infection per contact (needs to be normalized to the total population). The recovery rate (RECO_RATE) here is the inverse average time to overcome the disease and stop infecting others (in other words, the factor for the number of recoveries per day). Note: In the code, CONT_RATE stands for CONT_RATE\N_population for the calculations, the dimensionless transmissibility CONT_RATE in the program is the attribute CONT_FACTOR

There are two important epidemiological parameters, R0 for basic reproductive number (expected number of cases directly generated by one case in a population where all individuals are susceptible to infection) and R_eff for effective reproductive number (when not all the population is susceptible).

R0 = CONT_RATE/(RECO_RATE + MORTALITY)
R_eff = s(t=0)*R0/N_population

We can predict the maximum value of infected population in the model by integrating the first two equations of the ODE system, considering that di/dt(t_max)=0, which leads to s(t_max)=(RECO_RATE + MORTALITY)*N_population/CONT_RATE = N_population/R0. Then, the maximum considering the initial values is given by:

i(t_max) = s(0) + i(0) - (N/R0)*(1 + log(s(0)*R0/N_population))

t_max must be given numerically.

The model could also be extended to take into account different rates of recovery, contagious and to insert mortality or other relations.

As we will see, the model have many flaws due its simplicity, but it has been used to understand many actions against the spread and the evolution of a contagious disease. Also, as a disclaimer, many of the statistics on the data are simple averages and easy fittings, so the conclusions are merely heuristic, more advanced analysis is required to get a reliable values of the actual situation.

Requirements:

python 3.6 or newer, matplotlib and numpy

pip3 install matplotlib
pip3 install numpy

There is an optional module for data downloading that requires the libraries BeautifulSoup and requests. If you don't want to use this feature, skip these complementary installations.

pip3 install beautifulsoup4

Deprecated:

Selenium and the web driver are no longer necessary to be download, since they are replaced by requests (more efficient); This way need the download of selenium and download the chromedriver and also change the path of the driver in the class attribute DataWebLoader.CHROMIUM_PATH to its path in your computer.

Parts:

Data Fitter for exponential trends

fitData.py script has a time compilation from some online sources of corona virus infected and deaths, for Italy and Spain. The module has a linear fitting function for base 10 logarithmic increase. The resultant constants are used to plot the perspectives for a certain day (counting from 2020-2-15) and a plot of this linear trend for both countries.

Arguments:

## INPUTS

date_min = datetime(2020, 3, 5)         # lower bound of data range
date_max = datetime(2020, 3, 11)        # upper bound of data range
date_prediction = datetime(2020, 3, 12) # prediction

The output depends on the date range:

SPAIN   data from [2020-03-05] to [2020-03-11]
-----------------------------------------
Infect Rate:   0.1582   abscI:   0.2600
Death  Rate:   0.2208   abscD:   0.0002
Mortality:    0.0201 +/- 0.0052
    2020-03-12 : infected= 3374.0
    2020-03-12 : death= 110.0

ITALY   data from [2020-03-05] to [2020-03-11]
-----------------------------------------
Infect Rate:   0.0882   abscI:  81.8895
Death  Rate:   0.1273   abscD:   0.5487
Mortality:    0.0471 +/- 0.0082
    2020-03-12 : infected= 16085.0
    2020-03-12 : death= 1120.0

The prediction depends on the date range selected, notice that my data for February and the beginning of March is not daily.In the other hand, governmental actions affects on the global trend of the progression, which is neither a simple exponential nor a geometric progression of a constant factor. This approximations are reliable for short term data collections (f.e, a week or 5 days) and a couple of days in the future. The approximation won't be valid when the active infections were near to the maximum.

The fit is got from a power of 10 and the interpretation for the slopes has to be compared from another slope for a time interval; it can be seen a reduction or increasing from different spread velocities. Lets have an infectious rate A and the same rate divided by a factor r. Then, for a delta_days interval (curve can shift any time to 0 if it's known the number of cases at that moment), the relation between the first rate i and the reduced one i_r is given by:

i_r(delta_days) = i(delta_days)*10^(delta_days*A*(1-r)/r)
i(t; A) = y(t_0)*10^(A*t)

For example, in Spain, the average over a window of 2 days gives a local rate of 0.135 for the 2020-03-11 and of 0.068 for 2020-03-21. That March 11th, the number of cases was 2277. r=0.135/0.068=1.985, so in case of continue with the rate, the infected would had been:i(21th; A=0.135)=50975 (it was 25496). For the same period of time, lets drop to the second constant, so we expect:

 i_r(21th; A=0.68) = 50975 * 10**(10*0.1351*(1-1.985)/1.985)= 50975*0.2136 = 10888 

(11075 if we do not round). As you see, we cannot naively assume estimations if these rates vary on time (It might needs a cumulative sum), but these rates gives an idea of its effect in the spread of the disease and the relation between two of them. The exact count actually comes from a variation from the first rate and the second (in the mid way), which coincides with the activation of the cautionary actions of the government.

The fits can be run over all data making a local fit over a narrow window of days. In that case, we achieve a first approximation of the evolution of the rates, but its very sensitive to the window range and the statistical deviations are not given:

Loading data with worldometers scraper.

Refresh data manually is tedious, so there is a scraper bot (dataWebLoader) that loads the web page for any available country in the web automatically. The values could also be saved and reloaded if you have download the data for that day.

The module requires BeautifulSoup, selenium and a chromedriver; also, it requires to change the path of the driver in the class attribute DataWebLoader.CHROMIUM_PATH to its path in your computer. If you don't want to use this feature, skip these complementary installations.

The usage is shown below, and then the prediction is made as before:

from dataWebLoader import DataWebLoader

countries = ['spain', 'italy', 'us']
data = DataWebLoader.getData(countries, save=True)

for country, tables in data.items():
    processWebData(country, tables)

## Prediction:
getRatesForDateRanges(date_min, date_max, date_prediction, graph=True)

The scrapper donesn't download contents as default if there is a saved download in the last 4 hours or if there are no new countries asked. Local loadings have to be done with the loadJsonData() method.

We can see, heuristically, the actual state of the evolution of the epidemic, this data could present the total detected cases of covid 19 against the actual active cases. The recession of the active cases is shown when the trend goes to the left right value. This idea was token from minutephysics video and its web project

SIR Model Program

disease.py module has the Disease Models. The models has the following parameters:

Parameter Default Units Description
t_step = 0.01 days Step for numeric method, optimize it with optimizers.stepOptimizer tool.
days = 200 days
N_population = 2e+5 persons
contagious_rate = 0.0 1/ day * persons average number of contacts between people for a person per day times the probability of infection per contact
recovery_rate = 0.0 1/ day inverse average time to overcome the disease and stop infecting others
mortality =0.0 linear factor proportional to the infected

These are some examples for a population of 2e+5 people, for a disease with a contagious rate of 1.25, a 4% mortality, and recovery rates (inverse) = 2.1, 6.1, 18.1 days. The images show also when the peak occurs and how many will be infected.

params_c_const = [(1.25, 1./(2.1 + 4.*c)) for c in range(0, 6)]
DiseaseSimulation.setInitializers(infected_0=10000, dead_0=2, recovered_0=80)
for param in params_c_const:
    ds = DiseaseSimulation(contagious_rate=param[0], 
                           recovery_rate=param[1],
                           t_step=0.01,
                           mortality_rate=0.04
                           )
    max_infected_analytical = ds.analyticalValueOfMaximumInfected()
    ds()
    ds.graph(details=False)
    ds.getDetails()
    print(f"{ds.CONT_RATE}\t{ds.RECO_RATE}\t{ds.max_infected}\t({max_infected_analytical})")

With these results:

We can check the values for the maximum of infected populations with method <obj>.analyticalValueOfMaximumInfected, and there is a very good correspondence between these values (in fact, it can be used as a benchmark to choose the time step, instead of selecting it according to numerical convergence).

Image Numerical value (day, infected_max) Analytical maximum
1st (4.25, 48480) 48554.59
2nd (4.51, 109556) 109815.68
3th (4.85, 145835) 146233.08

Parameter optimizers

optimizers.py has some functions to find the best parameters for the calculations.

stepOptimizer

finds the step necessary for the Euler-Method iteration to be in a relative tolerance range when the time step is split.

This function iterates to find a value of the step for which the SIR simulation converges under a certain tolerance. The steps are divided by 2 in each iteration (for a maximum of h_max/2^-6).

Argument type Description
h_max double first step value. in days
tolerance double (=0.2 by default), tolerance ratio on step (1.0 for 100%).
diseaseKwargs Dictionary The parameters for the DiseaseSimulation.

modelOptimizerFromData

From a set of data, find the best parameters (optimize time step in each iteration).

Argument type Description
h_max double Step In days, default = 0.01
parameters0 A first estimation of CONTAGIOUS_RATE, RECOVERY_RATE and MORTALITY
data [(day, infected, recovered, dead)[0], ...]
h_tolerance default=0.1 Tolerance ratio for the time step fitting
data_tolerance default=0.1 Tolerance ratio for the model with the data

Returns: The most upgraded parameter sets and time step achieved.

The start point of time will be set as the first day of the data to simplify some of the adjustments. The process consists in:

  1. Set up the first elements for t, infect, ... with the first data row.
  2. Adapt the time step (stepOptimizer)
  3. Calculate the difference between the data and the model(with time step optimized) in t_data time. For equations of
    1. Death → M
    2. Recovered → RR
    3. f(RR, M ,N, Infected) → CR
  4. Calculate the step for each parameter as a difference data normalized by the minimum(difference), and get a new value form the median for all data. See function optimicers.paramStep().
  5. check an arbitrary value of tolerance, return the parameters and t_step if it's exceeded.

After then, the the evolution of the results is given as well as a run with graph for the best approximation. Example Given these values for Madrid, lets run the program:

days (starting from 17=13-3-2020) infected recovered dead
17 1990 1 (??) 81
20 4165 474 255
23 6777 498 941
26 9702 1899 1022

After 60 iterations, the parameters result:

  =================================================================
    ***         DISEASE SIMULATION, SIR MODEL: INPUTS       ***
    -----------------------------------------------------------
       time step:       0.005    [days]
       days:            200      [days]
       N_population:    6550000  [persons]
       
	   contagious rate: 0.2760 [1/ persons day]
	   recovery rate:   0.0337 [1/ days]
	   
	   Considering People Die: True 
	       Mortality: 2.2538 %
	   Considering [1] groups of recovery
	   
	   R0: 4.9112        R_effective: 4.9096
	   Analytical Max Infections: 3094055
   
  =================================================================

The value of the mortality went half underestimated if compared with the estimation in Spain (see below results of fitData.py for the same period), the value of contagious rate is quite overestimated (3.6 times the empirical value). The recovery rate (the inverse) fit quite well with the average value of recovery: oftenly 2 weeks, 3-6 week when complications. (1/0.0336 = 29.8 days = 4.25 weeks)

SPAIN   data from [2020-03-17] to [2020-03-26]
-----------------------------------------
Infect Rate:   0.0759   abscI:  54.9212
Death  Rate:   0.1075   abscD:   0.2393
Mortality:    0.0568 +/- 0.0109
        2020-03-28 : infected= 84633.0
        2020-03-28 : death= 7833.0

The result for the model gives a maximum of 3092553 infected on the 61_th_ day:

As we can see, if we use 4 static (and independent) parameters, the prediction is quite apocalyptic and not reliable. In reality, these parameters depends on the system and vary with the time, specially the contagious rate, which is dependent on the number of contacts (drastically reduced with the generalized quarantine).

About

Model of infection based in SIR model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages