Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bayes_Project2/Bayes.py 拆分词袋有错误 #6

Open
Grifcc opened this issue Oct 15, 2020 · 0 comments
Open

🐛 Bayes_Project2/Bayes.py 拆分词袋有错误 #6

Grifcc opened this issue Oct 15, 2020 · 0 comments

Comments

@Grifcc
Copy link

Grifcc commented Oct 15, 2020

🐛 正则匹配并不能匹配到正确的词
♐ 这样可以,不知再有没有更好的解决方式

def textParse(bigString):
    # 用特殊符号作为切分标志进行字符串切分,即非字母、非数字
    # \W* 0个或多个非字母数字或下划线字符(等价于[^a-zA-Z0-9_])
    bigString=bigString.split()
    listOfTockens=[]
    for i in range(len(bigString)):
        listOfTockens.append(str("".join(list(filter(str.isalpha, bigString[i])))))
    # 除了单个字母,例如大写I,其他单词变成小写,去掉少于两个字符的字符串
    return [tok.lower() for tok in listOfTockens if len(tok) > 2]
@Grifcc Grifcc changed the title Bayes_Project2/Bayes.py 拆分词袋有错误 :BUG: Bayes_Project2/Bayes.py 拆分词袋有错误 Oct 15, 2020
@Grifcc Grifcc changed the title :BUG: Bayes_Project2/Bayes.py 拆分词袋有错误 🐛 Bayes_Project2/Bayes.py 拆分词袋有错误 Oct 15, 2020
@Grifcc Grifcc changed the title 🐛 Bayes_Project2/Bayes.py 拆分词袋有错误 🐛 Bayes_Project2/Bayes.py 拆分词袋有错误 Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant