Global covid data come from European cdc and could be downloaded from https://www.ecdc.europa.eu . But the program download it in real time.
The code structure is:
1: Download data, or import files
2: Format the data, that is, convert the data of cases deaths to int
3: accumulate cases, deaths
4: Sort by total cases, with the largest first
5: Take 20 countries ranked first,
6: Analyze the data of the last 20 days
The code is follows:
@author: liwenz """ import json from datetime import datetime, timedelta import pandas as pd import requests #with open('corid2019.json','r') as f: # data=json.load(f) #https://opendata.ecdc.europa.eu/covid19/casedistribution/json/ url='https://opendata.ecdc.europa.eu/covid19/casedistribution/json/' r = requests.request('GET', url) data = r.json() df = pd.DataFrame.from_records(data['records']) df['day'] =df['day'].astype("int") df['month'] =df['month'].astype("int") df['year'] =df['year'].astype("int") df['cases'] =df['cases'].astype("int") df['deaths'] =df['deaths'].astype("int") df['date']=df['dateRep'].apply(lambda x:datetime.strptime(x, "%d/%m/%Y")) today=datetime.now() print(today.day,today.month,today.year) daybefore20=today-timedelta(days=15) day1=daybefore20.day month1=daybefore20.month year1=daybefore20.year print(daybefore20) df1=df[df['date']>daybefore20] df2=df.groupby('countryterritoryCode').agg( country=('countriesAndTerritories','last'), sumcase=('cases',sum), sumdeath=('deaths',sum), popu=('popData2019','last')) df2.sort_values(by=['sumcase'], inplace=True, ascending=False) a=df2.head(25) b=a.index.values.tolist() i=0; for x in b: i=i+1 print(i, a.loc[x,'country'],a.loc[x,'sumcase'],a.loc[x,'sumdeath'],a.loc[x,'popu']) tmp=df1[df1['countryterritoryCode']==x] tmp.sort_values(by=['date'], inplace=True, ascending=False) print(tmp.loc[:,['dateRep','cases','deaths']].to_string(index=False))
The dataset infomation could be got by df.info() as follow:
df.info()
RangeIndex: 25726 entries, 0 to 25725
Data columns (total 12 columns):
dateRep 25726 non-null object
day 25726 non-null int32
month 25726 non-null int32
year 25726 non-null int32
cases 25726 non-null int32
deaths 25726 non-null int32
countriesAndTerritories 25726 non-null object
geoId 25726 non-null object
countryterritoryCode 25662 non-null object
popData2019 25564 non-null float64
continentExp 25726 non-null object
date 25726 non-null datetime64[ns]
dtypes: datetime64ns, float64(1), int32(5), object(5)
memory usage: 1.9+ MB