分析コードのまとめ -指示語の出現頻度と位置- - S-Linguistics

分析コードのまとめ -指示語の出現頻度と位置-

投稿者: Sho オン 04/07/2024 25/06/2024 BLOG 日本語

分析コードをまとめていく．指示語の出現頻度と位置編

前回に続いて，コードをまとめる．

今回はテキスト内の指示語の出現頻度と位置にフォーカスをして分析をする．

import matplotlib.pyplot as plt
import nltk

nltk.download('punkt')

def plot_deictic_word_positions(text, deictic_words):
    """
    指示語の出現頻度と位置をプロットする関数
    """
    tokens = nltk.word_tokenize(text)
    positions = {word: [] for word in deictic_words}

    for i, token in enumerate(tokens):
        if token in deictic_words:
            positions[token].append(i)

    plt.figure(figsize=(12, 6))
    for word, pos in positions.items():
        plt.plot(pos, [word]*len(pos), '|', markersize=10, label=word)

    plt.xlabel('Word Index')
    plt.ylabel('Deictic Words')
    plt.title('Deictic Word Positions in Text')
    plt.legend()
    plt.show()

# 指示語のリスト
deictic_words = ['this', 'that', 'these', 'those', 'here', 'there']

# 指示語の出現頻度と位置のプロット
plot_deictic_word_positions(processed_text, deictic_words)

関連

コメントを残すコメントをキャンセル