The data source can be downloaded from the data download link. But my program is downloaded in real time.
1: Download data, or import files
2: Formatting data means to convert a date data: data1, and deaths data to int
3: Calculate the comparison date daybefore20
4: Get the data df1 of the last few days
5: Sort
6: Display data loop
7: The list data tpl is obtained in the program, and it can be displayed if necessary, or used for other purposes
The code is follow:
# -*- coding: utf-8 -*- """ Created on Sat Apr 18 18:06:42 2020 @author: liwenz """ import pandas as pd from datetime import datetime, timedelta import requests #import numpy as np url='https://health-infobase.canada.ca/src/data/covidLive/covid19.csv' r = requests.request('GET', url) df = pd.read_csv(url) #df.info() #df = pd.read_csv('covid19.csv') 如果是下载,然后分析就用这句,注释上面的 df['date1']=df['date'].apply(lambda x:datetime.strptime(x, "%d-%m-%Y")) df['numdeaths']=df['numdeaths'].round().astype('Int64') df['numdeathstoday']=df['numdeathstoday'].round().astype('Int64') today=datetime.now() print(today.day,today.month,today.year) daybefore20=today-timedelta(days=15) day1=daybefore20.day month1=daybefore20.month year1=daybefore20.year print(daybefore20) df1=df[df['date1']>daybefore20] df2=df1.groupby('pruid').agg( prname=('prname','last'), sumcase=('numtotal',max), sumdeath=('numdeaths',max)) df2.sort_values(by=['sumcase'], inplace=True, ascending=False) a=df2.head(7) b=a.index.values.tolist() i=0; tpl=[] for x in b: i=i+1 print(i, a.loc[x,'prname'],a.loc[x,'sumcase'],a.loc[x,'sumdeath']) tmp=df1[df1['pruid']==x] tmp.sort_values(by=['date1'], inplace=True, ascending=False) stmp=tmp[['date','numtoday','numdeathstoday']].values.tolist() print(tmp.loc[:,['date','numtoday','numdeathstoday']].to_string(index=False)) tp=(i, a.loc[x,'prname'],a.loc[x,'sumcase'],a.loc[x,'sumdeath'],stmp) tpl.append(tp) #print(tpl)
To get the dataset infomation, df.info() and get:
df.info();
RangeIndex: 3342 entries, 0 to 3341
Data columns (total 32 columns):
pruid 3342 non-null int64
prname 3342 non-null object
prnameFR 3342 non-null object
date 3342 non-null object
numconf 3342 non-null int64
numprob 3342 non-null int64
numdeaths 3223 non-null Int64
numtotal 3342 non-null int64
numtested 3285 non-null float64
numrecover 2818 non-null float64
percentrecover 2635 non-null float64
ratetested 3054 non-null float64
numtoday 3342 non-null int64
percentoday 3342 non-null float64
ratetotal 3123 non-null float64
ratedeaths 3123 non-null float64
numdeathstoday 3223 non-null Int64
percentdeath 2966 non-null float64
numtestedtoday 3285 non-null float64
numrecoveredtoday 2818 non-null float64
percentactive 2966 non-null float64
numactive 2966 non-null float64
rateactive 3123 non-null float64
numtotal_last14 3090 non-null float64
ratetotal_last14 2884 non-null float64
numdeaths_last14 3090 non-null float64
ratedeaths_last14 2884 non-null float64
avgtotal_last7 3090 non-null float64
avgincidence_last7 2884 non-null float64
avgdeaths_last7 3090 non-null float64
avgratedeaths_last7 2884 non-null float64
date1 3342 non-null datetime64[ns]
dtypes: Int64(2), datetime64ns, float64(21), int64(5), object(3)
memory usage: 842.2+ KB