我们将使用 Pandas 加载一个示例时间序列数据集,确保它正确地带有时间索引,并创建一些初步的可视化图表,以了解其基本结构。这些操作涉及数据加载、设置索引以及生成图表,用以识别趋势或季节性等模式。首先,确保您已导入所需的库。我们将主要使用 Pandas 进行数据处理,以及 Matplotlib(或 Pandas 的绘图后端,它使用 Matplotlib)进行可视化。import pandas as pd import matplotlib.pyplot as plt import numpy as np # 配置图表以提高可读性 plt.style.use('seaborn-v0_8-whitegrid') plt.rcParams['figure.figsize'] = (10, 6)生成示例数据不加载外部文件,我们来生成一个按月计的时间序列数据集。这有助于重点关注数据处理和绘图方面。我们将创建代表多年来每月小部件销售额的数据,其中包含趋势和季节性。# 生成一个5年期月度数据的日期范围 dates = pd.date_range(start='2019-01-01', periods=60, freq='MS') # 生成带有趋势和季节性的数据 np.random.seed(42) # 用于重现性 trend = np.linspace(50, 150, 60) # 线性上升趋势 seasonality = 15 * np.sin(np.arange(60) * (2 * np.pi / 12)) # 月度季节性 noise = np.random.normal(0, 10, 60) # 随机噪声 # 组合各组成部分 sales = trend + seasonality + noise sales = np.maximum(sales, 10) # 确保销售额为正 # 创建数据框 widget_sales = pd.DataFrame({'Sales': sales}, index=dates) print("小部件销售示例数据:") print(widget_sales.head()) print("\n数据信息:") widget_sales.info()执行此代码会创建一个名为 widget_sales 的 Pandas 数据框。pd.date_range 函数生成月度时间戳('MS' 代表月份开始),我们将其直接用作索引。.head() 方法显示前几行,确认了数据结构:一个“Sales”列和一个 DatetimeIndex。.info() 方法确认了索引类型 (DatetimeIndex) 和“Sales”列的数据类型 (float64)。可视化时间序列数据加载并正确索引后,下一步是绘制它。简单的折线图通常是时间序列数据的最佳起点,因为它直观地表示了观察值随时间的序列。# 绘制时间序列数据 widget_sales['Sales'].plot() plt.title('2019-2023年月度小部件销售额') plt.xlabel('日期') plt.ylabel('销售单位') plt.show()这是组合图表的显示效果:{"layout": {"title": "2019-2023年月度小部件销售额", "xaxis": {"title": "日期"}, "yaxis": {"title": "销售单位"}, "template": "seaborn", "width": 600, "height": 400}, "data": [{"type": "scatter", "mode": "lines", "name": "销售额", "x": ["2019-01-01", "2019-02-01", "2019-03-01", "2019-04-01", "2019-05-01", "2019-06-01", "2019-07-01", "2019-08-01", "2019-09-01", "2019-10-01", "2019-11-01", "2019-12-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01", "2020-07-01", "2020-08-01", "2020-09-01", "2020-10-01", "2020-11-01", "2020-12-01", "2021-01-01", "2021-02-01", "2021-03-01", "2021-04-01", "2021-05-01", "2021-06-01", "2021-07-01", "2021-08-01", "2021-09-01", "2021-10-01", "2021-11-01", "2021-12-01", "2022-01-01", "2022-02-01", "2022-03-01", "2022-04-01", "2022-05-01", "2022-06-01", "2022-07-01", "2022-08-01", "2022-09-01", "2022-10-01", "2022-11-01", "2022-12-01", "2023-01-01", "2023-02-01", "2023-03-01", "2023-04-01", "2023-05-01", "2023-06-01", "2023-07-01", "2023-08-01", "2023-09-01", "2023-10-01", "2023-11-01", "2023-12-01"], "y": [54.9671, 56.7312, 80.9866, 75.1806, 81.0489, 82.3777, 66.6301, 70.4772, 67.3713, 70.5981, 86.5538, 79.0455, 84.7815, 86.6612, 94.1866, 89.8849, 95.9271, 106.1813, 85.2776, 95.9033, 100.0023, 107.9988, 106.3256, 106.6187, 108.3061, 117.6518, 124.4437, 126.8213, 129.6098, 133.2018, 122.0838, 118.3505, 123.1796, 133.1196, 126.8668, 130.6508, 135.5431, 145.6266, 154.3077, 153.0834, 146.4148, 158.5657, 146.1619, 138.0775, 148.4764, 151.5687, 144.3686, 145.4806, 151.9475, 163.5696, 171.6049, 174.3829, 166.1798, 180.0183, 166.3367, 159.9318, 164.3564, 166.9873, 159.3706, 160.2809], "line": {"color": "#228be6"}}]}2019年1月至2023年12月的模拟月度小部件销售数据。从这张初始图表中,我们可以看到:上升趋势:销售额在五年期间普遍增加。季节性模式:每年似乎都有一个重复出现的模式,峰值和谷值出现在相似的时间。噪声/不规则性:线条并非完全平滑,表明在潜在的趋势和季节性模式周围存在随机波动。进一步查看滚动统计数据虽然正式的分解技术将在下一章介绍,但我们可以使用滚动窗口计算(前面已介绍)来直观地平滑数据并突显趋势。计算滚动平均值有助于抑制季节性和噪声。让我们计算并绘制一个12个月的滚动平均值,并与原始数据一起呈现。# 计算12个月滚动平均值 widget_sales['Rolling Mean (12M)'] = widget_sales['Sales'].rolling(window=12).mean() # 绘制原始数据和滚动平均值 widget_sales['Sales'].plot(label='原始销售额', legend=True) widget_sales['Rolling Mean (12M)'].plot(label='12个月滚动平均值', legend=True, color='orange') plt.title('带有12个月滚动平均值的小部件销售额') plt.xlabel('日期') plt.ylabel('销售单位') plt.show()这是组合图表的显示效果:{"layout": {"title": "带有12个月滚动平均值的小部件销售额", "xaxis": {"title": "日期"}, "yaxis": {"title": "销售单位"}, "template": "seaborn", "legend": {"title": {"text": "系列"}}, "width": 600, "height": 400}, "data": [{"type": "scatter", "mode": "lines", "name": "原始销售额", "x": ["2019-01-01", "2019-02-01", "2019-03-01", "2019-04-01", "2019-05-01", "2019-06-01", "2019-07-01", "2019-08-01", "2019-09-01", "2019-10-01", "2019-11-01", "2019-12-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01", "2020-07-01", "2020-08-01", "2020-09-01", "2020-10-01", "2020-11-01", "2020-12-01", "2021-01-01", "2021-02-01", "2021-03-01", "2021-04-01", "2021-05-01", "2021-06-01", "2021-07-01", "2021-08-01", "2021-09-01", "2021-10-01", "2021-11-01", "2021-12-01", "2022-01-01", "2022-02-01", "2022-03-01", "2022-04-01", "2022-05-01", "2022-06-01", "2022-07-01", "2022-08-01", "2022-09-01", "2022-10-01", "2022-11-01", "2022-12-01", "2023-01-01", "2023-02-01", "2023-03-01", "2023-04-01", "2023-05-01", "2023-06-01", "2023-07-01", "2023-08-01", "2023-09-01", "2023-10-01", "2023-11-01", "2023-12-01"], "y": [54.9671, 56.7312, 80.9866, 75.1806, 81.0489, 82.3777, 66.6301, 70.4772, 67.3713, 70.5981, 86.5538, 79.0455, 84.7815, 86.6612, 94.1866, 89.8849, 95.9271, 106.1813, 85.2776, 95.9033, 100.0023, 107.9988, 106.3256, 106.6187, 108.3061, 117.6518, 124.4437, 126.8213, 129.6098, 133.2018, 122.0838, 118.3505, 123.1796, 133.1196, 126.8668, 130.6508, 135.5431, 145.6266, 154.3077, 153.0834, 146.4148, 158.5657, 146.1619, 138.0775, 148.4764, 151.5687, 144.3686, 145.4806, 151.9475, 163.5696, 171.6049, 174.3829, 166.1798, 180.0183, 166.3367, 159.9318, 164.3564, 166.9873, 159.3706, 160.2809], "line": {"color": "#228be6"}}, {"type": "scatter", "mode": "lines", "name": "12个月滚动平均值", "x": ["2019-01-01", "2019-02-01", "2019-03-01", "2019-04-01", "2019-05-01", "2019-06-01", "2019-07-01", "2019-08-01", "2019-09-01", "2019-10-01", "2019-11-01", "2019-12-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01", "2020-07-01", "2020-08-01", "2020-09-01", "2020-10-01", "2020-11-01", "2020-12-01", "2021-01-01", "2021-02-01", "2021-03-01", "2021-04-01", "2021-05-01", "2021-06-01", "2021-07-01", "2021-08-01", "2021-09-01", "2021-10-01", "2021-11-01", "2021-12-01", "2022-01-01", "2022-02-01", "2022-03-01", "2022-04-01", "2022-05-01", "2022-06-01", "2022-07-01", "2022-08-01", "2022-09-01", "2022-10-01", "2022-11-01", "2022-12-01", "2023-01-01", "2023-02-01", "2023-03-01", "2023-04-01", "2023-05-01", "2023-06-01", "2023-07-01", "2023-08-01", "2023-09-01", "2023-10-01", "2023-11-01", "2023-12-01"], "y": [null, null, null, null, null, null, null, null, null, null, null, 72.664, 75.147, 77.641, 78.741, 79.936, 81.179, 83.163, 84.717, 86.838, 89.091, 92.208, 93.723, 96.047, 98.007, 100.619, 103.139, 106.267, 109.088, 111.414, 114.070, 115.783, 118.101, 120.941, 122.323, 124.624, 126.891, 129.554, 132.000, 133.567, 134.783, 136.884, 138.960, 139.500, 140.807, 142.153, 143.169, 144.772, 146.436, 148.485, 150.338, 152.003, 152.779, 154.543, 155.781, 156.707, 157.788, 159.189, 159.912, 160.313], "line": {"color": "#fd7e14"}}]}原始小部件销售数据及其12个月滚动平均值的图表。代表滚动平均值的橙色线条清晰地显示了上升趋势,平滑了蓝色线条(原始数据)中可见的季节性峰值和谷值。请注意,滚动平均值在最初的11个数据点之后才开始显示,因为它需要完整的12个观察值的窗口才能计算出第一个值。本次实践练习展示了将时间序列数据加载到 Pandas 中、确保索引设置正确以及进行初步可视化。这些步骤是任何时间序列分析项目的基本要求,提供了对数据行为的初步了解,并为分解和建模等后续分析步骤提供了依据。