今天分享的是 【月小水长】pandas 三十六计系列 的第八篇 ,一个小工具,将 json 文件转成 csv 文件。

文件格式是表,文件内容是里,只要里子一样,外表是可以像穿衣一样随便换的,就像在 MySQL 中,可以任意导入导出 SQL、csv、json 等文件一样。

假设我们有一个这样的 json 文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
{
"4893424946515214": {
"mid": "4893424946515214",
"weibo_link": "https://weibo.com/2803301701/MDcporkU6",
"text": "据悉,全城月季花已逐渐进入盛花期。",
"publish_time": "2023-04-22 20:34:45",
"user_link": "https://weibo.com/u/2803301701",
"user_name": "人民日报",
"reposts_count": 55,
"comments_count": 92,
"attitudes_count": 298
},
"4893416880346795": {
"mid": "4893416880346795",
"weibo_link": "https://weibo.com/2803301701/MDcco1sdt",
"text": "4月22日,陕西西安。游客发视频... ",
"publish_time": "2023-04-22 20:02:42",
"user_link": "https://weibo.com/u/2803301701",
"user_name": "人民日报",
"reposts_count": 119,
"comments_count": 249,
"attitudes_count": 785
},
"4893410513127118": {
"mid": "4893410513127118",
"weibo_link": "https://weibo.com/2803301701/MDc27d7vo",
"text": "第54个世界地球日,江豚回家路还有多远...",
"publish_time": "2023-04-22 19:37:24",
"user_link": "https://weibo.com/u/2803301701",
"user_name": "人民日报",
"reposts_count": 119,
"comments_count": 145,
"attitudes_count": 463
}
}

现在要转成下面这样的 csv:

mid weibo_link text publish_time user_link user_name reposts_count comments_count attitudes_count
4893424946515214 https://weibo.com/2803301701/MDcporkU6 据悉,全城月季花已逐渐进入盛花期。 2023-04-22 20:34:45 https://weibo.com/u/2803301701 人民日报 55 92 298
4893416880346795 https://weibo.com/2803301701/MDcco1sdt 4月22日,陕西西安。游客发视频… 2023-04-22 20:02:42 https://weibo.com/u/2803301701 人民日报 119 249 785
4893410513127118 https://weibo.com/2803301701/MDc27d7vo 第54个世界地球日,江豚回家路还有多远… 2023-04-22 19:37:24 https://weibo.com/u/2803301701 人民日报 119 145 463

只需要运行下面这份代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# -*- coding: utf-8 -*-
# author: inspurer(月小水长)
# create_time: 2023/4/22 20:49
# 运行环境 Python3.6+
# github https://github.com/inspurer
# website https://buyixiao.github.io/
# 微信公众号 月小水长

import json
import pandas as pd


def convert_json_to_csv(input_json_path, output_csv_path):
with open(input_json_path, mode='r', encoding='utf-8-sig') as f:
input_json = json.loads(f.read())

data_list = []

data_cols = input_json[list(input_json.keys())[0]].keys()
for a_weibo in input_json.values():
data_list.append(list(a_weibo.values()))
df = pd.DataFrame(data_list, columns=data_cols)

df.to_csv(output_csv_path, index=False, encoding='utf-8-sig')


convert_json_to_csv('./data/2803301701.json', './data/2803301701.csv')

代码中没有指定 csv 的任何列名,自动从 json 文件中获取,具有一定的普适性。