stanzaを使ってみる 2
以前の投稿でstanzaを使ってみたことをまとめた.
今回はその続きで品詞のタグ付けと単語情報の抽出をstanzaで行ってみたい.
コードは下記.
import stanza
stanza.download('en')
nlp = stanza.Pipeline('en') # デフォルトではtokenize, mwt, pos, lemma, depparseが有効
text = "Artificial intelligence is transforming the world."
doc = nlp(text)
for sentence in doc.sentences:
for word in sentence.words:
print(f"Word: {word.text}")
print(f" Lemma: {word.lemma}") # 原形
print(f" POS: {word.upos}") # Universal POS tag (名詞、動詞など)
print(f" XPOS: {word.xpos}") # 言語特有の詳細な品詞
Output例
Word: Artificial
Lemma: artificial
POS: ADJ
XPOS: JJ
Word: intelligence
Lemma: intelligence
POS: NOUN
XPOS: NN
Word: is
Lemma: be
POS: AUX
XPOS: VBZ
Word: transforming
Lemma: transform
POS: VERB
XPOS: VBG
Word: the
Lemma: the
POS: DET
XPOS: DT
Word: world
Lemma: world
POS: NOUN
XPOS: NN
Word: .
Lemma: .
POS: PUNCT
XPOS: .