python jieba分詞如何去除停用詞

1樓：你還好嗎

import jieba

# 建立停用詞list

def stopwordslist(filepath):

stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()]

return stopwords

# 對句子進行分詞

def seg_sentence(sentence):

sentence_seged = jieba.cut(sentence.strip())

stopwords = stopwordslist('./test/stopwords.txt') # 這裡載入停用詞的路徑

outstr = ''

for word in sentence_seged:

if word not in stopwords:

if word != '\t':

outstr += word

outstr += " "

return outstr

inputs = open('./test/input.txt', 'r', encoding='utf-8')

outputs = open('./test/output.txt', 'w')

for line in inputs:

line_seg = seg_sentence(line) # 這裡的返回值是字串

outputs.write(line_seg + '\n')

outputs.close()

inputs.close()

2樓：匿名使用者

-*- coding: utf-8 -*-import jieba

import jieba.analyse

import sys

import codecs

reload(sys)

sys.setdefaultencoding('utf-8')#使用其他編碼讀取停用詞表

#stoplist = codecs.open('../../file/stopword.txt','r',encoding='utf8').readlines()

#stoplist = set(w.strip() for w in stoplist)

#停用詞檔案是utf8編碼

stoplist = {}.fromkeys([ line.strip() for line in open("../../file/stopword.txt") ])

#經過分詞得到的應該是unicode編碼，先將其轉成utf8編碼

如何分辨分詞作定語，還是伴隨

做定語,肯定會有個名詞前定語 he is the people loving me much.這裡做的是後置定語.做為一個定語從句，你可以新增一些東西，這樣就可以構成一個句子,定語從句。比如這裡可以說成 he is the people who loves me much 而作為伴隨,你會發現跟前...

現在分詞和過去分詞是怎麼用的，現在分詞和過去分詞的用法和區別是什麼啊？

現在分詞和動名詞的形式一樣，但用法不一樣，所以你會混淆，現在分詞做定語，狀語，補足語，表主動，進行，如the man speaking to my teacher is mr wang.做定語，表語時還表示事物的特徵，翻譯成令人的。如the man is boring.或i don t like ...

be動詞的現在分詞和過去分詞怎樣使用

be動詞顧名思義他相當於半個動詞，一般跟在形容詞，但也有很多其他的用法，如被動態，be going to do 等，慢慢積累而且五年級要學這麼難得用法了嗎？可加qq，我可回答你具體問題，495322914 比如 he run on the grass，這裡試著加一個正在，翻譯成他正在草地上跑，語句...

python jieba分詞如何去除停用詞

如何分辨分詞作定語，還是伴隨

現在分詞和過去分詞是怎麼用的，現在分詞和過去分詞的用法和區別是什麼啊？

be動詞的現在分詞和過去分詞怎樣使用

相關推薦