下面是分析加拿大 的corid2019 下载csv 数据的python 分析代码。
数据源可以去数据下载链接 下载。但我这个程序是实时下载的,就是程序运行时下载的。
1:下载数据,或者导入文件
2:格式化数据,就是转化一个日期时间数据data1,deaths数据为int
3:计算比较日期daybefore20
4:得到最近几天的数据df1
5: 排序
6:显示数据循环
7:中间得到列表数据tpl,需要的话可以显示,或者做其他用途
# -*- coding: utf-8 -*- """ Created on Sat Apr 18 18:06:42 2020 @author: liwenz """ import pandas as pd from datetime import datetime, timedelta import requests #import numpy as np url='https://health-infobase.canada.ca/src/data/covidLive/covid19.csv' r = requests.request('GET', url) df = pd.read_csv(url) #df.info() #df = pd.read_csv('covid19.csv') 如果是下载,然后分析就用这句,注释上面的 df['date1']=df['date'].apply(lambda x:datetime.strptime(x, "%d-%m-%Y")) df['numdeaths']=df['numdeaths'].round().astype('Int64') df['numdeathstoday']=df['numdeathstoday'].round().astype('Int64') today=datetime.now() print(today.day,today.month,today.year) daybefore20=today-timedelta(days=15) day1=daybefore20.day month1=daybefore20.month year1=daybefore20.year print(daybefore20) df1=df[df['date1']>daybefore20] df2=df1.groupby('pruid').agg( prname=('prname','last'), sumcase=('numtotal',max), sumdeath=('numdeaths',max)) df2.sort_values(by=['sumcase'], inplace=True, ascending=False) a=df2.head(7) b=a.index.values.tolist() i=0; tpl=[] for x in b: i=i+1 print(i, a.loc[x,'prname'],a.loc[x,'sumcase'],a.loc[x,'sumdeath']) tmp=df1[df1['pruid']==x] tmp.sort_values(by=['date1'], inplace=True, ascending=False) stmp=tmp[['date','numtoday','numdeathstoday']].values.tolist() print(tmp.loc[:,['date','numtoday','numdeathstoday']].to_string(index=False)) tp=(i, a.loc[x,'prname'],a.loc[x,'sumcase'],a.loc[x,'sumdeath'],stmp) tpl.append(tp) #print(tpl)
查看数据集的表头信息
df.info();
RangeIndex: 3342 entries, 0 to 3341
Data columns (total 32 columns):
pruid 3342 non-null int64
prname 3342 non-null object
prnameFR 3342 non-null object
date 3342 non-null object
numconf 3342 non-null int64
numprob 3342 non-null int64
numdeaths 3223 non-null Int64
numtotal 3342 non-null int64
numtested 3285 non-null float64
numrecover 2818 non-null float64
percentrecover 2635 non-null float64
ratetested 3054 non-null float64
numtoday 3342 non-null int64
percentoday 3342 non-null float64
ratetotal 3123 non-null float64
ratedeaths 3123 non-null float64
numdeathstoday 3223 non-null Int64
percentdeath 2966 non-null float64
numtestedtoday 3285 non-null float64
numrecoveredtoday 2818 non-null float64
percentactive 2966 non-null float64
numactive 2966 non-null float64
rateactive 3123 non-null float64
numtotal_last14 3090 non-null float64
ratetotal_last14 2884 non-null float64
numdeaths_last14 3090 non-null float64
ratedeaths_last14 2884 non-null float64
avgtotal_last7 3090 non-null float64
avgincidence_last7 2884 non-null float64
avgdeaths_last7 3090 non-null float64
avgratedeaths_last7 2884 non-null float64
date1 3342 non-null datetime64[ns]
dtypes: Int64(2), datetime64ns, float64(21), int64(5), object(3)
memory usage: 842.2+ KB