pandas 分组频率统计

假设有一个 dataframe 如下：

country_name	date	标题
中国	20030101	今天是 2003 年第一天，我在中国
安提瓜和巴布达	20030101	今天是 2003 年第一天，我在安提瓜和巴布达
中国	20030102	今天是 2003 年第二天，我在中国
蒙古	20030102	今天是 2003 年第二天，我在蒙古

现在要统计每天每个国家在当天出现的频率，代码如下：

# -*- coding: utf-8 -*-
# author:           inspurer(月小水长)
# create_time:      2022/5/28 20:10
# 运行环境           Python3.6+
# github            https://github.com/inspurer
# website           https://buyixiao.github.io/
# 微信公众号         月小水长

import pandas as pd

input_file = 'all_country.csv'

df = pd.read_csv(input_file)

res_df = df.groupby(['date', 'country_name']).count().reset_index()

res_df = res_df[res_df.columns[:3]]

res_df.rename(columns={'标题': 'daily_cnt'}, inplace=True)

print(res_df, res_df.columns)

res_df['daily_frq'] = [0 for _ in range(res_df.shape[0])]
for index, row in res_df.iterrows():
    res_df.loc[index, 'daily_frq'] = round(row['daily_cnt'] / df[df['date'] == row['date']].shape[0], 3)

res_df.to_csv("res_" + input_file, index=False, encoding='utf-8')