From the day I started working on Data Visualization , I fell in love with it . I always like to get useful insights from data .
Before that , I only know the basic chart , For example, a bar chart , Scatter plot , Histogram, etc , These basic charts are built into tableau in , and Power BI For data visualization . By completing this task every day , I came across a lot of new charts , For example, radial dashboard , Waffle et al .
therefore , out of curiosity , Recently I was searching for all the chart types used in data visualization , These words caught my attention , I find it very interesting . Until now, , Seeing this word cloud image forces me to think that these are just random images , These words are arranged randomly , But I was wrong , And it all starts here . after , I try to use Tableau and Power BI A small amount of data in the word cloud . After a successful attempt , I want to write bar graphs , Pie chart and other chart code to try to use it .
What is the word cloud ？
Definition ： A cloud is a powerful word for visualization , For text processing , It's bigger , Thicker letters and different colors show the most commonly used words . The smaller the size of the word , The less important it is .
The purpose of the tag cloud
1） Hot tags on social media （Instagram,Twitter）： All over the world , Social media is looking for the latest trends , therefore , We can get the tags that people use most in their posts .
2） Hot topics in the media ： Analyzing news reports , We can find keywords in the headlines , And extract before n Topics with high demand , And get the results you need , The former n A hot media theme .
3） Search terms in e-commerce ： In e-commerce shopping sites , Site owners can create word clouds for the most searched items . such , He can know which goods are in great demand in a given period of time .
Let's start at python In order to realize this word cloud
First , We need to be in jupyter notebook Install all libraries in .
stay python in , We will install a built-in Library wordcloud. stay Anaconda At the command prompt , Enter the following code ：
pip install wordcloud
If your anaconda Environmental support conda, Please enter ：
conda install wordcloud
although , This can be done directly in notebook In itself , Just add... At the beginning of the code “!” that will do .
like this ：
!pip install wordcloud
Now? , ad locum , I'm going to generate a word cloud of Wikipedia text with any subject . therefore , I will need a Wikipedia Library to access Wikipedia API, It can be done by anaconda Install at the command prompt Wikipedia To complete , As shown below ：
pip install wikipedia
Now we need some other libraries , They are numpy,matplotlib and pandas.
Up to now , The library we need is installed
result= wikipedia.page("MachineLearning") final_result = result.content print(final_result)
Machine learning Wikipedia page output ：
The image above shows us by searching Wikipedia The machine learning page gets the output image of . There? , We can also see that it can scroll down , This means that the entire page will be retrieved .
ad locum , We can also get a summary of the page through the summary method , Such as ：
result= wikipedia.summary("MachineLearning", sentences=5) print(result)
Here we have the parameters of the sentence , So we can use it to retrieve a specific number of rows .
Output 5 A sentence
Let's create wordcloud
from wordcloud import WordCloud, StopWords import matplotlib.pyplot as plt def plot_cloud(wordcloud): plt.figure(figsize=(10, 10)) plt.imshow(wordcloud) plt.axis("off"); wordcloud = WordCloud(width = 500, height = 500, background_color='pink', random_state=10).generate(final_result) plot_cloud(wordcloud)
Stop words are words that have no meaning , for example ‘is’, ‘are’, ‘an’, ‘I’ etc. .
Wordcloud With built-in disabled Thesaurus , The library will automatically remove stop words from the text .
Interestingly , We can go through stopwords.add() Function in python To add a stop word to .
Wordcloud Method will set the width and height , I set them all to 500, The background color is set to pink . If you don't add random States , Every time you run the code , The word cloud will look different . It should be set to any int value .
From the above code , We're going to get this word cloud ：
By looking at the image above , We can see that machine learning is the most commonly used word , There are other words that are often used are models , Mission , Training and data . therefore , We can come to a conclusion , Machine learning is the task of training data models .
We can also change the background color through the background color method here , And pass colormap Method to change the font color , You can also add a color hash code to the background color , however mapcolor With built-in specific colors .
Let's change the background color to cyan by using the hash code , Change the font color to blue ：
from wordcloud import WordCloud, StopWords import matplotlib.pyplot as plt def plot_cloud(wordcloud): plt.figure(figsize=(10, 10)) plt.imshow(wordcloud) plt.axis("off"); wordcloud = WordCloud(width = 500, height = 500, background_color='#40E0D0', colormap="ocean", random_state=10).generate(final_result) plot_cloud(wordcloud)
ad locum , I designated ocean, If I add some wrong color maps ,jupyter Will throw a value error , And show me the available options for the color map , As shown below ：
You can also use PIL The library implements the word cloud in any image .
In this paper , We discussed the word cloud , The definition of word cloud , Areas of application and use of jupyter notebook Of python Example .
Link to the original text ：https://www.analyticsvidhya.com/blog/2020/10/word-cloud-or-tag-cloud-in-python/
Welcome to join us AI Blog station ：
sklearn Machine learning Chinese official documents ：
Welcome to pay attention to pan Chuang blog resource summary station ：