Pythonic 文本处理工具TextBlob

  • 开源协议:MIT
  • 操作系统: Windows Linux OS X
  • 开发语言:Python
  • 项目所有者:sloria
  • 收录时间:2017-07-26
  • 分享:
编辑评级
3

项目详细介绍

TextBlob 是一款 Pythonic 的文本处理工具,用于处理文本数据,它提供了一个简单的 API,用于潜入常见的自然语言处理(NLP)任务,如词性标注、名词短语提取、情感分析、分类、翻译等。

特性:

  • 名词短语提取

  • 词性标记

  • 情绪分析

  • 分类

  • 由 Google 翻译提供的翻译和检测

  • 标记(将文本分割成单词和句子)

  • 词句、短语频率

  • 解析

  • n-gram

  • 词变化(复数和单数化)和词形化

  • 拼写校正

  • 通过扩展添加新模型或语言

  • WordNet 集成

示例:

from   textblob   import   TextBlob

text   =   '''
The   titular   threat   of   The   Blob   has   always   struck   me   as   the   ultimate   movie
monster:   an   insatiably   hungry,   amoeba-like   mass   able   to   penetrate
virtually   any   safeguard,   capable   of--as   a   doomed   doctor   chillingly
describes   it--"assimilating   flesh   on   contact.
Snide   comparisons   to   gelatin   be   damned,   it's   a   concept   with   the   most
devastating   of   potential   consequences,   not   unlike   the   grey   goo   scenario
proposed   by   technological   theorists   fearful   of
artificial   intelligence   run   rampant.
'''

blob   =   TextBlob(text)
blob.tags                                 #   [('The',   'DT'),   ('titular',   'JJ'),
                                                            #      ('threat',   'NN'),   ('of',   'IN'),   ...]

blob.noun_phrases         #   WordList(['titular   threat',   'blob',
                                                            #                                    'ultimate   movie   monster',
                                                            #                                    'amoeba-like   mass',   ...])

for   sentence   in   blob.sentences:
            print(sentence.sentiment.polarity)
#   0.060
#   -0.341

blob.translate(to="es")      #   'La   amenaza   titular   de   The   Blob...'

标签:textblob  文本处理

相关教程