分析コードのまとめ -品詞の比率- - S-Linguistics

分析コードのまとめ -品詞の比率-

投稿者: Sho オン 03/07/2024 24/06/2024 BLOG 日本語

分析コードをまとめていく．品詞の比率編

前回に続いて，コードをまとめる．

今回はテキスト内の品詞の比率にフォーカスをして分析をする．

import nltk
import matplotlib.pyplot as plt
from collections import Counter

# 必要なnltkデータのダウンロード（初回実行時のみ）
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

def plot_pos_ratios(text):
    """
    テキスト内の品詞の比率を円グラフでプロットする関数
    """
    # テキストのトークン化
    tokens = nltk.word_tokenize(text)

    # 品詞タグ付け
    pos_tags = nltk.pos_tag(tokens)

    # 品詞のカウント
    pos_counts = Counter(tag for word, tag in pos_tags)

    # 品詞の比率を計算
    total_count = sum(pos_counts.values())
    pos_ratios = {tag: count / total_count for tag, count in pos_counts.items()}

    # 円グラフのプロット
    labels = pos_ratios.keys()
    sizes = pos_ratios.values()
    colors = plt.cm.Paired(range(len(labels)))

    plt.figure(figsize=(12, 8))
    plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=140)
    plt.axis('equal')
    plt.title('POS Tag Ratios')
    plt.show()

# 品詞の比率をプロット
plot_pos_ratios(processed_text)

関連

コメントを残すコメントをキャンセル