python生成词云,要求频率越小生成的字词越大,老师的要求,请各位大佬解答,感谢,急急急!!! 20
importjiebafile=open('article_title','r',encoding='utf-8')duanzi=file.read()file.clos...
import jieba
file = open('article_title', 'r', encoding='utf-8')
duanzi = file.read()
file.close()
sep = '''-/.。""'',!?;:~`·[] \ ,:;“”?!-、}{【】‘’'''
exclude = {' ','\ue412','\x01','我','了','的','你','来','我们','被','……','…'}
for char in sep:
duanzi = duanzi.replace(char,'')
duanziList = list(jieba.cut(duanzi))#分词
duanziDict = {}
duanziciyun = {}
duanzis = list(set(duanziList)-exclude)#删除非中国汉语字符
for d in range(0,len(duanzis)):
duanziDict[duanzis[d]] = duanzi.count(str(duanzis[d]))
dictList = list(duanziDict.items())
dictList.sort(key=lambda x:x[1],reverse=False)
f = open('count.txt','a',encoding='utf-8')
for i in range(0, len(dictList)):
print(dictList[i])
f.write(dictList[i][0] + ':' + str(dictList[i][1]) + '\n')
duanziciyun[dictList[i][0]] = dictList[i][1]
f.close()
# 生成词云
from PIL import Image, ImageSequence
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud, ImageColorGenerator
font = r'zhongwen.ttf'
image = Image.open('3.jpg')
graph = np.array(image)
wc = WordCloud(font_path=font, background_color='White', max_words=5000, mask=graph)
wc.generate_from_frequencies(duanziciyun)
image_color = ImageColorGenerator(graph)
plt.imshow(wc)
plt.axis("off")
plt.show()
wc.to_file(r'new.png') 展开
file = open('article_title', 'r', encoding='utf-8')
duanzi = file.read()
file.close()
sep = '''-/.。""'',!?;:~`·[] \ ,:;“”?!-、}{【】‘’'''
exclude = {' ','\ue412','\x01','我','了','的','你','来','我们','被','……','…'}
for char in sep:
duanzi = duanzi.replace(char,'')
duanziList = list(jieba.cut(duanzi))#分词
duanziDict = {}
duanziciyun = {}
duanzis = list(set(duanziList)-exclude)#删除非中国汉语字符
for d in range(0,len(duanzis)):
duanziDict[duanzis[d]] = duanzi.count(str(duanzis[d]))
dictList = list(duanziDict.items())
dictList.sort(key=lambda x:x[1],reverse=False)
f = open('count.txt','a',encoding='utf-8')
for i in range(0, len(dictList)):
print(dictList[i])
f.write(dictList[i][0] + ':' + str(dictList[i][1]) + '\n')
duanziciyun[dictList[i][0]] = dictList[i][1]
f.close()
# 生成词云
from PIL import Image, ImageSequence
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud, ImageColorGenerator
font = r'zhongwen.ttf'
image = Image.open('3.jpg')
graph = np.array(image)
wc = WordCloud(font_path=font, background_color='White', max_words=5000, mask=graph)
wc.generate_from_frequencies(duanziciyun)
image_color = ImageColorGenerator(graph)
plt.imshow(wc)
plt.axis("off")
plt.show()
wc.to_file(r'new.png') 展开
4个回答
展开全部
首先你要找个基准点,否则你没法比较要放多大。假设一个单词出现的频率为0.5,你给它定个字体大小10,然后用其他的单词频率和它比,产生的倍数作为字体变大变小的权重。这个权重一定要设好,否则会出现大的特别大,小的特别小。还有一种方法,你先将单词按词频大小分成10组,每组之间字体有大的变化,组内字体也按频率大小给予相应字体大小。
已赞过
已踩过<
评论
收起
你对这个回答的评价是?
2018-12-22
展开全部
按照常理可以实现,但是python是智能的,不能安装常理算。关注点有点偏,老师不是刁难学生,而是帮助学生进步。你可以去问问老师具体的思路,祝你取得更大的进步。
已赞过
已踩过<
评论
收起
你对这个回答的评价是?
2018-12-22
展开全部
感觉一楼说的很有道理,老师的侧重点有点偏
已赞过
已踩过<
评论
收起
你对这个回答的评价是?
展开全部
你可以问问你们老师,看看他怎么给你解决
已赞过
已踩过<
评论
收起
你对这个回答的评价是?
推荐律师服务:
若未解决您的问题,请您详细描述您的问题,通过百度律临进行免费专业咨询