Python基于jieba分词实现snownlp情感分析

更新时间：2023年01月30日 09:19:13 作者：Sir 老王

情感分析（sentiment analysis）是2018年公布的计算机科学技术名词，它可以根据文本内容判断出所代表的含义是积极的还是负面的等。本文将通过jieba分词实现snownlp情感分析，感兴趣的可以了解一下

情感分析（sentiment analysis）是2018年公布的计算机科学技术名词。

它可以根据文本内容判断出所代表的含义是积极的还是负面的，也可以用来分析文本中的意思是褒义还是贬义。

一般应用场景就是能用来做电商的大量评论数据的分析，比如好评率或者差评率的统计等等。

我们这里使用到的情感分析的模块是snownlp，为了提高情感分析的准确度选择加入了jieba模块的分词处理。

由于以上的两个python模块都是非标准库，因此我们可以使用pip的方式进行安装。

pip install jieba

pip install snownlp

jieba是一个强大的中文分词处理库，能够满足大多数的中文分词处理，协助snownlp的情感分析。

# Importing the jieba module and renaming it to ja.
import jieba as ja
from snownlp import SnowNLP

# Importing the snownlp module and renaming it to nlp.

为了避免大家使用过程中出现的版本冲突问题，这里将python的内核版本展示出来。

python解释器版本：3.6.8

接下来首先创建一组需要进行情感分的数据源，最后直接分析出该文本代表的是一个积极情绪还是消极情绪。

# Creating a variable called analysis_text and assigning it the value of a string.
analysis_text = '这个实在是太好用了，我非常的喜欢，下次一定还会购买的！'

定义好了需要分析的数据来源语句，然后就是分词处理了。这里说明一下为什么需要分词处理，是因为snownlp这个情感分析模块它的中文分词结果不太标准。

比如说，'不好看'，这个词如果使用snownlp来直接分词的话大概率的就会分为'不'和'好看'这两个词。

这样的明明是一个带有负面情绪的中文词汇可能就直接被定义为正面情绪了，这也就是为什么这里需要先使用jieba进行分词处理了。

# Using the jieba module to cut the analysis_text into a list of words.
analysis_list = list(ja.cut(analysis_text))

# Printing the list of words that were cut from the analysis_text.
print(analysis_list)

# ['这个', '实在', '是', '太', '好', '用', '了', '，', '我', '非常', '的', '喜欢', '，', '下次', '一定', '还会', '购买', '的', '！']

根据上面分词以后的结果来看，分词的粒度还是比较细致的，每个词都是最多两个字符串的长度。

使用jieba提供的cut()函数，关键词已经分割完成了，接着就是提取主要的关键字。

一般情况下我们做情感分析都会提取形容词类型的关键字，因为形容词能够代表该文本所表现出来的情绪。

# Importing the `posseg` module from the `jieba` module and renaming it to `seg`.
import jieba.posseg as seg

# This is a list comprehension that is creating a list of tuples. Each tuple contains the word and the flag.
analysis_words = [(word.word, word.flag) for word in seg.cut(analysis_text)]

# Printing the list of tuples that were created in the list comprehension.
print(analysis_words)

# [('这个', 'r'), ('实在', 'v'), ('是', 'v'), ('太', 'd'), ('好用', 'v'), ('了', 'ul'), ('，', 'x'), ('我', 'r'), ('非常', 'd'), ('的', 'uj'), ('喜欢', 'v'), ('，', 'x'), ('下次', 't'), ('一定', 'd'), ('还', 'd'), ('会', 'v'), ('购买', 'v'), ('的', 'uj'), ('！', 'x')]

根据上面的python推导式，将分词以后的关键字和该关键自对应的词性提取出来。

下面是一份jieba模块使用过程中对应的词性表，比如词性标记a代表的就是形容词。

# This is a list comprehension that is creating a list of tuples. Each tuple contains the word and the flag.
keywords = [x for x in analysis_words if x[1] in ['a', 'd', 'v']]

# Printing the list of tuples that were created in the list comprehension.
print(keywords)

# [('实在', 'v'), ('是', 'v'), ('太', 'd'), ('好用', 'v'), ('非常', 'd'), ('喜欢', 'v'), ('一定', 'd'), ('还', 'd'), ('会', 'v'), ('购买', 'v')]

根据关键词的标签提取出关键字以后，这个时候可以将情感标记去除只保留关键字就可以了。

# This is a list comprehension that is creating a list of words.
keywords = [x[0] for x in keywords]

# Printing the list of keywords that were created in the list comprehension.
print(keywords)

# ['实在', '是', '太', '好用', '非常', '喜欢', '一定', '还', '会', '购买']

到现在为至，分词的工作已经处理完了，接下来就是情感分析直接使用snownlp分析出结果。

# Creating a variable called `pos_num` and assigning it the value of 0.
pos_num = 0

# Creating a variable called `neg_num` and assigning it the value of 0.
neg_num = 0

# This is a for loop that is looping through each word in the list of keywords.
for word in keywords:
    # Creating a variable called `sl` and assigning it the value of the `SnowNLP` function.
    sl = SnowNLP(word)
    # This is an if statement that is checking to see if the sentiment of the word is greater than 0.5.
    if sl.sentiments > 0.5:
        # Adding 1 to the value of `pos_num`.
        pos_num = pos_num + 1
    else:
        # Adding 1 to the value of `neg_num`.
        neg_num = neg_num + 1
    # This is printing the word and the sentiment of the word.
    print(word, str(sl.sentiments))

下面就是对原始文本提取关键词以后的每个词的情感分析结果，0-1之间代表情绪越接近于1代表情绪表现的越是积极向上。

# 实在 0.3047790802524796
# 是 0.5262327818078083
# 太 0.34387502381406
# 好用 0.6558628208940429
# 非常 0.5262327818078083
# 喜欢 0.6994590939824207
# 一定 0.5262327818078083
# 还 0.5746682977321914
# 会 0.5539033457249072
# 购买 0.6502590673575129

为了使得关键词的分析结果更加的符合我们的想法也可以对负面和正面的关键词进行统计得到一个结果。

# This is a string that is using the `format` method to insert the value of `pos_num` into the string.
print('正面情绪关键词数量：{}'.format(pos_num))

# This is a string that is using the `format` method to insert the value of `neg_num` into the string.
print('负面情绪关键词数量：{}'.format(neg_num))

# This is a string that is using the `format` method to insert the value of `pos_num` divided by the value of `pos_num`
# plus the value of `neg_num` into the string.
print('正面情绪所占比例：{}'.format(pos_num/(pos_num + neg_num)))

# 正面情绪关键词数量：8
# 负面情绪关键词数量：2
# 正面情绪所占比例：0.8

以上就是Python基于jieba分词实现snownlp情感分析的详细内容，更多关于Python snownlp情感分析的资料请关注脚本之家其它相关文章！

您可能感兴趣的文章:

Python 中给请求设置用户代理 User-Agent的方法
本文介绍 HTTP 标头用户代理主题以及如何使用 Python 中的请求设置用户代理，您将了解 HTTP 标头及其在理解用户代理、获取用户代理以及学习使用 Python 中的请求设置用户代理的多种方法方面的重要性，感兴趣的朋友跟随小编一起看看吧
2023-06-06
Python Dataframe 指定多列去重、求差集的方法
今天小编就为大家分享一篇Python Dataframe 指定多列去重、求差集的方法，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2018-07-07
python 自动识别并连接串口的实现
这篇文章主要介绍了python 自动识别并连接串口的实现，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2021-01-01
python求一个字符串的所有排列的实现方法
这篇文章主要介绍了python求一个字符串的所有排列的实现方法，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2020-02-02
python实现人人对战的五子棋游戏
这篇文章主要为大家详细介绍了python实现人人对战的五子棋游戏，文中示例代码介绍的非常详细，具有一定的参考价值，感兴趣的小伙伴们可以参考一下
2022-05-05
python简单获取本机计算机名和IP地址的方法
这篇文章主要介绍了python简单获取本机计算机名和IP地址的方法,涉及Python中socket模块的相关使用技巧,需要的朋友可以参考下
2015-06-06
PyTorch中self.layers的使用小结
self.layers 是一个用于存储网络层的属性,本文主要介绍了PyTorch中self.layers的使用小结,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧
2024-01-01
超全面python常见报错以及解决方案梳理必收藏
使用python难免会出现各种各样的报错，以下是Python常见的报错以及解决方法（持续更新），快进入收藏吃灰吧
2022-03-03
掌握python polars库进行高效高速的数据处理。
这篇文章主要介绍了python polars库进行高效高速的数据处理技巧详解,有需要的朋友可以借鉴参考下,希望能够有所帮助,祝大家多多进步,早日升职加薪
2024-01-01
Python装饰器原理与简单用法实例分析
这篇文章主要介绍了Python装饰器原理与简单用法,结合实例形式分析了Python装饰器的概念、原理、使用方法及相关注意事项,需要的朋友可以参考下
2018-04-04