Using Python to analyze 2000 condoms, we can draw these interesting conclusions

Pig brother 66 2020-11-13 07:32:32
using python analyze condoms draw


Up to now , Our Taobao tutorial has been written to the fourth part , The first three are :

  • Chapter one :Python Simulated Login Taobao , Explain in detail how to use requests Library login Taobao pc End .
  • Second articles : Taobao auto login 2.0, newly added Cookies serialize , Teach you how to cookies Save up .
  • Third articles :Python Take Taobao condom , Teach you how to climb Taobao pc End product information .

The above detailed tutorial and all source code , Scan the QR code below for attention vx And no. 「 Naked pigs 」 reply : TaoBao You can view it !
 Insert picture description here

today , Let's take a look at the fourth part of the Taobao series

We have crawled down Taobao data in the last article , But no data analysis . So today's article is to teach you how to analyze data , Come to some useful conclusions !

Python What are the advantages of language over other languages ? Brother pig thinks it's Data analysis and Artificial intelligence These two big pieces , And the demand for these two directions will gradually increase , So those who want to learn Python However, students who don't know which goal they want to study for can consider developing in these two directions !

One 、 Analysis objectives

Before data analysis, we need to know what we want to analyze , That is to make clear our goal first . In the company, it may be the company's financial report 、 Incremental user change 、 Product popularity 、 Some reports and so on .

What are our goals today ? Let's see :

  1. Analyze condom title high frequency keywords
  2. Analyze condom title high frequency keywords And The relationship between the quantity of goods
  3. Analyze condom title high frequency keywords And Average sales relationship
  4. Analyze condom title high frequency keywords And Average selling price relationship
  5. Analyze the distribution of condom commodity price
  6. Analyze the distribution of condom sales
  7. Analyze condom commodity price range And Average sales relationship
  8. Analyze the national distribution of condom merchants
  9. Analysis of the national average sales of condom merchants

Be careful : The above data analysis is all based on the last crawl 2500 Taobao products ( Default sort ), It doesn't mean all condom products on Taobao !

Two 、 Analyze the implementation

With a clear goal , We're going to start technology selection .

First of all, the database of data processing is well established , Basically is numpy and pandas These two essential Libraries , So first of all, make sure you have installed these two libraries .

Then the data visualization Library ? How to choose so many visualization Libraries ? If you don't know how to choose , That brother pig recommends :pyecharts This visual library developed by the Chinese , What kind of graph do you want to find in the following documents .

Chinese document :https://pyecharts.org/#/zh-cn/intro
Source code address :https://github.com/pyecharts/pyecharts

Finally, the technology selection is finished , We can start the formal code analysis .( The title of the analysis will correspond to the above analysis objectives one by one )

0. Data cleaning

Before we analyze the data , We need to clean the data . Because the data from Taobao is not standard data , such as : Sales of goods , The data crawled down is :2.5 ten thousand Personal payment , We need to turn it into :25000( integer ), In this way, we can deal with it later !

Let's take a look at the original data from Taobao , Look at the data that needs to be cleaned
 Insert picture description here
According to the experience of using the database, brother pig thinks that there are two columns of data to be cleaned :1、 The sales volume changes to the whole type 2、 The region is converted to include only the provinces , How to clean the code directly !
 Insert picture description here
You can see that in the end, brother pig has created another excel file , The goal is not to pollute the original data , Because raw data is very important , So we should try our best to save the original data in the future data processing , More backups are not redundant !

1. Analyze condom title high frequency keywords

After data cleaning , We can start to analyze .

The analysis of high frequency keywords is a common process , That is to use jieba participle , Then count the word frequency , Finally, a cloud map of words is generated , I believe that students who often see pig brother's official account are tired of watching , This small function can close your eyes .
 Insert picture description here
Just a dozen lines of code , Let's take a look at the renderings
 Insert picture description here
Analysis conclusion :

  1. From the perspective of the whole business name preference interest two words
  2. The most uric acid is seen from the material
  3. From the perspective of function, particles 、 More threads

ps: Don't ask brother pig which word he likes best , Asking is Time delay .

2. Analyze condom title high frequency keywords And The relationship between the quantity of goods

We can only see about which functions are popular , What if we need to see specific data ?

Let's count the number of commodity data containing these high-frequency keywords , The code explanation is below the picture , The same below !
 Insert picture description here
We take the highest frequency 20 Key words , Then, I'll go through all the data headers to see if they contain keywords , If it contains the value Just 1. Let's see the effect of the generated histogram !
 Insert picture description here
Analysis conclusion :

  1. The products that contain the word "interest" include 1150 paragraph , Take up the total (2500 paragraph ) Of 46%.
  2. The top three are : taste 、 uric acid 、 Grain

ps: Brother pig has a question to ask all the old drivers : This No washing How to play ?

3. Analyze condom title high frequency keywords And Average sales relationship

This analysis is interesting , It is equivalent to what kind of function or material the user prefers .

 Insert picture description here
 Insert picture description here
The implementation of high frequency keyword and average sales analysis data is , Also traverse the title of all data , If it contains a keyword , Then put the sales volume of the data in the keyword value in ( One list), After the statistics are finished, we can use the value To average , Finally, according to the average sales order . Here's how it works !

 Insert picture description here
Analysis conclusion :

  1. The average sales volume of thread function is the highest , Everyone's favorite
  2. The top three function average sales volume are respectively : thread 、 Grain 、 spike
  3. The trumpet is on the list , ha-ha

ps: A lot of students asked : Why not be super thin ? Ultra thin is cool , But my girlfriend ?

 Insert picture description here

4. Analyze condom title high frequency keywords And Average selling price relationship

After analyzing the functions you like , Then analyze the price of these functions ? Which function condoms are more expensive ?
 Insert picture description here
 Insert picture description here
High frequency keywords And The average selling price relationship analysis principle is similar to the above , The same method used , It's just changing the original sales volume into the price , Take a look at the renderings !
 Insert picture description here
The results of the analysis :

  1. You can see that the front row is basically about the material
  2. Gel: Set & Match 、 Transparency 、 The average price of these three items is the most expensive , exceed 100 ocean

ps: It's similar to gel 、 Transparency 、 Which old wash free driver has used , What's the difference between them ?

5. Analyze the distribution of condom commodity price

The title and function of the goods are almost analyzed , Let's analyze the price !
 Insert picture description here
 Insert picture description here
Brother pig artificially divided the price , Divided into :‘0-20’, ‘21-40’, ‘41-60’, ‘61-80’, ‘81-100’, ‘101-120’, ‘121-150’, ‘151-200’, '200 above ’ this 9 Intervals , Then cut the data 、 Statistics 、 Sort , Finally, generate histogram and pie chart respectively .
 Insert picture description here
 Insert picture description here
Analysis conclusion :

  1. Price range in 21-40 Most of the goods are 778 paragraph , About 31%.
  2. The ratio of a commodity with a price within one hundred to more than one hundred is about 7:1

ps: I didn't expect so much more than 100 The block , I want to ask 200 What does it feel like to have a set at a price of $ ?

6. Analyze the distribution of condom sales

After analyzing the price, of course, it's analyzing the sales volume range

 Insert picture description here
 Insert picture description here
Realization principle of sales volume interval distribution analysis : Think it's a sales division , It's roughly divided into :‘ Within a thousand ’, ‘ A thousand to five thousand ’, ‘ Five to ten thousand ’, ‘ Ten to fifty thousand ’, ‘ 50000 to 100000 ’, ‘ More than 100000 ’, These six intervals , Then the same method is used for statistics 、 Sort the final Visualization .
 Insert picture description here
 Insert picture description here
Analysis conclusion :

  1. Sales in 1000 The most within , Probably 90%
  2. It's only when the sales volume exceeds ten thousand 10 paragraph , It shows that there are very few popular styles
  3. There is a condom that sells more than 10 ten thousand

ps: I want to know which one is more than 10 Condom product information of 10000 sales volume ? The official account of WeChat brother 「 Naked pigs 」 reply : Pop up sets , You can view it !

7. Analyze condom commodity price range And Average sales relationship

If you are a condom seller , A new condom , You want to know what the price is for the sales volume to be higher ?

At this time, we can analyze the relationship between price and sales volume of goods , Use actual data to price , This is one of the values of data analysis .

 Insert picture description here
Commodity price range And The principle of average sales relationship analysis is : Use pandas Automatic zoning divides prices into 12 Zones , Then group the sales data 、 averaging , Take a look at the visualization .
 Insert picture description here
Analysis conclusion :

  1. Pricing at 31.9-39 The average sales volume in this range is the highest , by 893
  2. Pricing at 10 The sales volume within yuan is the second highest

ps: Who has used 10 Inside a box of condoms , Come out for a walk

8. Analyze the national distribution of condom merchants

title 、 Price 、 The sales volume has been analyzed , Finally, we also analyze the location data of the merchants .

The goal of the analysis is to count the number of condom businesses in all provinces of the country , And then make a thermal diagram and a histogram .
 Insert picture description here
It's easy to count the number of businesses , Because we only kept the provincial data before data cleaning , So direct value_counts() You can get the data you want , See how it works !
 Insert picture description here

 Insert picture description here
 Insert picture description here
Analysis conclusion :

  1. Top three condom companies : guangdong 、 Shanghai 、 Zhejiang

ps: There are condom sellers all around Guizhou , Why it doesn't have ? Is it related to geography ?

9. Analysis of the national average sales of condom merchants

After analyzing the number of merchants , Let's take a look at the average sales in each province .
 Insert picture description here
The principle of the analysis of the relationship between the average sales volume of the whole country : We create a new PivotTable and average sales , And then sort , Finally, the thermal diagram and histogram are generated .
 Insert picture description here
 Insert picture description here

Analysis conclusion :

  1. Unexpectedly, Shanxi 39 The average sales volume of a business is the first 1535.

ps: Why is the average sales volume of Shanxi the first ? The reason is that I can't really understand

3、 ... and 、 summary

Through the data analysis above , We have some interesting conclusions :

  1. Users prefer thread 、 Grain 、 Wolf teeth and other functions
  2. Gel: Set & Match 、 Transparency 、 The average price of these three items is the most expensive , exceed 100 ocean
  3. Price range in 21-40 Most of the goods are 778 paragraph , About 31%
  4. Sales in 1000 The most within , Probably 90%
  5. Pricing at 31.9-39 The average sales volume in this range is the highest , by 893
  6. Top three condom manufacturers in terms of quantity : guangdong 、 Shanghai 、 Zhejiang
  7. Shanxi Province has the highest average sales volume

Through the above analysis results , If brother pig as a condom business , Want to launch a product , Set the title with thread 、 Grain 、 spike , The price is set at 31.9-39 element , It might sell better .

Data analysis as a cutting edge , You can see things that others can't see , If used properly, it can be an important fulcrum of your business !

Finally, brother pig will give you another piece of advice : Want to learn data analysis , Be sure to learn pandas!!!

Access to the source code : Pay attention to WeChat public number 「 Naked pigs 」 reply : Analysis condom Can get !
 Insert picture description here

版权声明
本文为[Pig brother 66]所创,转载请带上原文链接,感谢

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database