Canada covid2019 data analysis’ python code

The data source can be downloaded from the data download link. But my program is downloaded in real time.

1: Download data, or import files

2: Formatting data means to convert a date data: data1, and deaths data to int

3: Calculate the comparison date daybefore20

4: Get the data df1 of the last few days

5: Sort

6: Display data loop

7: The list data tpl is obtained in the program, and it can be displayed if necessary, or used for other purposes

The code is follow:

# -*- coding: utf-8 -*-
"""
Created on Sat Apr 18 18:06:42 2020

@author: liwenz
"""

import pandas as pd
from datetime import datetime, timedelta
import requests
#import numpy as np

url='https://health-infobase.canada.ca/src/data/covidLive/covid19.csv'
r = requests.request('GET', url)
df = pd.read_csv(url)
#df.info()

#df = pd.read_csv('covid19.csv') 如果是下载,然后分析就用这句,注释上面的
df['date1']=df['date'].apply(lambda x:datetime.strptime(x, "%d-%m-%Y"))
df['numdeaths']=df['numdeaths'].round().astype('Int64')
df['numdeathstoday']=df['numdeathstoday'].round().astype('Int64')

today=datetime.now()
print(today.day,today.month,today.year)
daybefore20=today-timedelta(days=15)
day1=daybefore20.day
month1=daybefore20.month
year1=daybefore20.year
print(daybefore20)

df1=df[df['date1']>daybefore20]
df2=df1.groupby('pruid').agg(
        prname=('prname','last'),
        sumcase=('numtotal',max),
        sumdeath=('numdeaths',max))
df2.sort_values(by=['sumcase'], inplace=True, ascending=False)
a=df2.head(7)
b=a.index.values.tolist()
i=0;
tpl=[]
for x in b:
    i=i+1
    print(i, a.loc[x,'prname'],a.loc[x,'sumcase'],a.loc[x,'sumdeath'])
    
    tmp=df1[df1['pruid']==x]
    tmp.sort_values(by=['date1'], inplace=True, ascending=False)
    stmp=tmp[['date','numtoday','numdeathstoday']].values.tolist()
    print(tmp.loc[:,['date','numtoday','numdeathstoday']].to_string(index=False))
    tp=(i, a.loc[x,'prname'],a.loc[x,'sumcase'],a.loc[x,'sumdeath'],stmp)
    tpl.append(tp)
#print(tpl)

To get the dataset infomation, df.info() and get:

df.info();

RangeIndex: 3342 entries, 0 to 3341
Data columns (total 32 columns):
pruid 3342 non-null int64
prname 3342 non-null object
prnameFR 3342 non-null object
date 3342 non-null object
numconf 3342 non-null int64
numprob 3342 non-null int64
numdeaths 3223 non-null Int64
numtotal 3342 non-null int64
numtested 3285 non-null float64
numrecover 2818 non-null float64
percentrecover 2635 non-null float64
ratetested 3054 non-null float64
numtoday 3342 non-null int64
percentoday 3342 non-null float64
ratetotal 3123 non-null float64
ratedeaths 3123 non-null float64
numdeathstoday 3223 non-null Int64
percentdeath 2966 non-null float64
numtestedtoday 3285 non-null float64
numrecoveredtoday 2818 non-null float64
percentactive 2966 non-null float64
numactive 2966 non-null float64
rateactive 3123 non-null float64
numtotal_last14 3090 non-null float64
ratetotal_last14 2884 non-null float64
numdeaths_last14 3090 non-null float64
ratedeaths_last14 2884 non-null float64
avgtotal_last7 3090 non-null float64
avgincidence_last7 2884 non-null float64
avgdeaths_last7 3090 non-null float64
avgratedeaths_last7 2884 non-null float64
date1 3342 non-null datetime64[ns]
dtypes: Int64(2), datetime64ns, float64(21), int64(5), object(3)
memory usage: 842.2+ KB