小説内の単語の ARIMA モデルによる時系列予測 - S-Linguistics

小説内の単語の ARIMA モデルによる時系列予測

投稿者: Sho オン 11/06/2024 09/06/2024 BLOG 日本語

Alice in Wonderland における単語出現の時系列予測

先日からAlice in Wonderland の分析を行なっている．

今回は ARIMA モデルという時系列予測モデルを用いて the の出現を予測するモデルを立ててみた．

以下に実行したコードと結果を示す．（前回のコードの続きであるという前提）

from statsmodels.tsa.arima.model import ARIMA

# ARIMAモデルの適合と予測
def fit_predict_arima(df, order, forecast_steps):
    model = ARIMA(df['Word_Count'], order=order)
    results = model.fit()

    # 適合結果のプロット
    plt.figure(figsize=(12, 6))
    plt.plot(df['Index'], df['Word_Count'], label='Original')
    plt.plot(df['Index'], results.fittedvalues, color='red', label='Fitted')
    plt.title('ARIMA Model Fit')
    plt.xlabel('Sentence Index')
    plt.ylabel('Word Count')
    plt.legend()
    plt.show()

    # 予測のプロット
    forecast = results.get_forecast(steps=forecast_steps)
    forecast_index = np.arange(len(df), len(df) + forecast_steps)
    forecast_series = pd.Series(forecast.predicted_mean, index=forecast_index)

    plt.figure(figsize=(12, 6))
    plt.plot(df['Index'], df['Word_Count'], label='Original')
    plt.plot(forecast_series.index, forecast_series, color='green', label='Forecast')
    plt.title('ARIMA Model Forecast')
    plt.xlabel('Sentence Index')
    plt.ylabel('Word Count')
    plt.legend()
    plt.show()

# ARIMAモデルの適合と予測
fit_predict_arima(sentence_df, order=(1, 1, 1), forecast_steps=10)

参考文献

https://www.gutenberg.org/files/11/11-h/11-h.html

関連

コメントを残すコメントをキャンセル