2. Flexible pandas index

SoWhat1412 2020-12-05 01:15:37
flexible pandas index


preface

To study the Pandas Classmate , There are more than 60% Still in the direction of Excel The arms of , The reason for this is that , It's mostly because I just started using Python While processing data , It's too painful to choose the row and column you want , No at all Excel Where do you want the pleasure .

First time to know Pandas Considering the length of the course, only the most basic Column index , But this obviously can't satisfy the growing personalized service of comrades ( selection ) demand . To ease the pain , Increase pleasure , To meet the requirements , In the second part, we separate Indexes Take it out , This paper introduces two common indexing methods in detail :

The first is based on location ( Integers ) The index of , The case is short and straightforward , A rough idea is enough , In practice, I can use , But it's not as widely used as the second .

The second is based on the name ( label ) The index of , The point is to practice on the blackboard , Because it will be an important cornerstone for data cleaning and analysis in the future .

First , Briefly introduce the case data of the exercise :

Source of flow Source details Number of visitors Payment conversion rate Customer unit price
Class A -A 35188 9.98% 54.3
Class A -B 28467 11.27% 99.93
Class A -C 13747 2.54% 0.08
Class A -D 5183 2.47% 37.15
Class A -E 4361 4.31% 91.73
Class A -F 4063 11.57% 65.09
Class A -G 2122 10.27% 86.45
Class A -H 2041 7.06% 44.07
Class A -I 1991 16.52% 104.57
Class A -J 1981 5.75% 75.93
Class A -K 1958 14.71% 85.03
Class A -L 1780 13.15% 98.87
Class A -M 1447 1.04% 80.07
second level -A 39048 11.60% 91.91
second level -B 3316 7.09% 66.28
second level -C 2043 5.04% 41.91
Level three -A 23140 9.69% 83.75
Level three -B 14813 20.14% 82.97
Level Four -A 216 1.85% 94.25
Level Four -B 31 0.00%
Level Four -C 17 0.00%
Level Four -D 3 0.00%

Just like the first dataset , Record different traffic sources , The number of visitors corresponding to the source details of each channel 、 Pay conversion rate and customer price . Although the data set is short ( Complex case data sets will arrive at the end of the basic article ), But it's representative enough , Let's start our index show .

1. Based on location ( Numbers ) The index of

Let's take a look at how the index works :

df.iloc[ Row index , Column index ]
The first position is the row index , Enter the parameters of which lines we want to take
The second position is the column index , Enter the position parameters of which columns we want to take

We need to be based on the actual situation , Fill in the corresponding row and column parameters .

Scene one ( Line selection )

The goal is : choice Source of flow be equal to Class A All of the line .

Ideas : Finger at the screen and count , Primary channel , It's from No 1 Go to the first place 13 That's ok , The corresponding row index is 0-12, but Python By default, slicing contains the beginning and not the end , To choose 0-12 Index lines of , We have to type in 0:13, Lie wants to Choose all , Then enter a colon that will do .
 Insert picture description here

Scene two ( Column selection )

The goal is : We want to take a look at the flow source and customer price list of all channels .
Ideas : All traffic channels , That is, all the lines , In the position of the first line parameter, we enter ; Look at the column again , The source of traffic is 1 Column , The unit price per customer is No 5 Column , The corresponding column indexes are 0 and 4:
 Insert picture description here
It is worth noting that , If we want to Cross column selection , You have to construct the positional parameters into a list , Here is the [0,4], If it is Continuous selection , There is no need to construct a list , Direct input 0:5( Select index as 0 The column to index is 4 The column of ) Just fine .

Scene three ( Row and column cross selection )

The goal is : We want to take a look at level two 、 Third level traffic source 、 Source details correspond to visitors and payment conversion rates

Ideas : Look first , The corresponding row index of secondary and tertiary channels is 13:17, Again The index has a beginning but not a tail Principles , The row parameter we passed in is 13:18; We need the source of traffic 、 Source details 、 Visitors and transformations , Is the former 4 Column , Pass in the parameter 0:4.
 Insert picture description here

2. Based on the name ( label ) The index of

In order to create a sense of lateral contrast , We still use the above three scenes .

Scene one : Select all lines of primary channel .

Ideas : This time we don't have to count the positions one by one , To screen traffic channels for Class A All of the line , Just make a judgment , Determine the source of traffic column , Which values are equal to Class A . Insert picture description here
The result returned by True and False( Boolean type ) constitute , In this example, the results are equal to level 1 and level 1 respectively . stay loc In the method , We can pass the values from this column to the row parameter position ,Pandas The default return result is True The line of ( Here is the index from 0 To 12 The line of ), And the result is False The line of , Direct example : Insert picture description here

Scene two : We want to take a look at the flow source and customer price list of all channels .

Ideas : All channels are equal to all lines , We input parameters directly in the line :, To extract traffic source and customer price column , Enter the name directly into the column parameter position , Because there are two columns involved , So you have to wrap it up in a list :
 Insert picture description here

Scene three : We want to extract secondary 、 Third level traffic source 、 Source details correspond to visitors and payment conversion rates .

Ideas : Line extraction with judgment , Column extraction input specific name parameter .

df2.loc[df2[' Source of flow '].isin([' second level ',' Level three ']),[' Source of flow ',' Source details ',' Number of visitors ',' Payment conversion rate ']]

 Insert picture description here
Here's a piece of isin Advertising of functions , This function can help us quickly determine a column in the source data (Series) Whether the value of is equal to the value in the list . Take the case ,df[‘ Source of flow ’].isin([‘ second level ’,‘ Level three ’]), What is judged is the value of the column of traffic source , Is it equal to “ second level ” perhaps “ Level three ”, If it is equal to ( Equal to any one of them ) Just go back to True, Otherwise return to False. Let's pass the boolean result to the row parameter , It's easy to get a channel with a flow source equal to two or three levels .

since loc More widely used scenarios , He should be given a drumstick , Let's have a grounded scene to practice .
Before inserting the scene , Let's spend first 30 Seconds time to stroke Pandas Middle column (Series) The use of evaluation to , The specific operation is as follows :

df2[' Number of visitors '].mean()
df2[' Number of visitors '].std()
df2[' Number of visitors '].median()
df2[' Number of visitors '].max()
df2[' Number of visitors '].min()

Just add a tail , mean value 、 The standard deviation and other statistics will come out , After learning about this , Now we enter scene four .

Scene 4 : For traffic channel data , What we should really focus on is High quality canal Avenue , If we define the number of visitors here 、 Conversion rate 、 The customer unit price is higher than the average, and the channel is a high-quality channel , How to find these channels ?

Ideas : Quality channel , We have to satisfy the visitors at the same time 、 conversion 、 Customer order is higher than average , This is the key to solving the problem . Let's start by looking at the average :
 Insert picture description here
Then judge whether each index column is greater than the mean value :

df2[' Number of visitors '] > df2[' Number of visitors '].mean()
df2[' Payment conversion rate ']> df2[' Payment conversion rate '].mean()
df2[' Customer unit price '] > df2[' Customer unit price '].mean()

 Insert picture description here
Three conditions must be satisfied at the same time , Between them is a “ And ” The relationship between ( At the same time satisfy ), stay pandas in , It means to be satisfied at the same time , Between the conditions, use & Symbolic connection , It's better to use parentheses to distinguish between conditions ; If it is or The relationship between ( Meet one ), Then use | Symbolic connection :

(df2[' Number of visitors '] > df2[' Number of visitors '].mean())&(df2[' Payment conversion rate ']> df2[' Payment conversion rate '].mean())&(df2[' Customer unit price '] > df2[' Customer unit price '].mean())

 Insert picture description here
After this connection , return True It means that the channel satisfies visitors at the same time 、 Conversion rate 、 The condition that the unit price per customer is higher than the average value , Next, we just need to pass these values to the position of the row parameter .

df2.loc[(df2[' Number of visitors '] > df2[' Number of visitors '].mean())&(df2[' Payment conversion rate ']> df2[' Payment conversion rate '].mean())&(df2[' Customer unit price '] > df2[' Customer unit price '].mean()),:]

 Insert picture description here
To this step , We directly screened out 4 High quality channels where all the key indicators are higher than the average .

3. A mixture of numerical and name positions

It's using pandas.ix[ That's ok , Column ], But the new version pandas It is no longer recommended to use the modified method , It's better to use it or not 1 or 2.

End

These two indexing methods , Namely Based on location ( Numbers ) The index of and Based on the name ( label ) The index of , The key is to put the rows and columns you want to select in your mind , Map to the corresponding row and column parameters .

With a little practice , We can use whatever we want pandas Processing and analyzing data , After that step , You'll find out and Excel comparison ,Python It's so beautiful .

版权声明
本文为[SoWhat1412]所创,转载请带上原文链接,感谢
https://pythonmana.com/2020/12/20201204225432158r.html

  1. 小白量化投资交易入门课(python入门金融分析)
  2. Python:PyCharm选择性忽略PEP8警告
  3. Python: pychar selectively ignores pep8 warnings
  4. Django-模板
  5. Django template
  6. Python正则表达式大全
  7. 最全Python正则表达式来袭
  8. A python knowledge for Xiaobai
  9. 1. Get to know pandas
  10. See how I use Python to create a magic with baby (one play can play for a day)?
  11. Wow, python can do real-time translation
  12. Python经典编程习题100例
  13. 100 examples of Python classic programming exercises
  14. Invincible, with Python for English teachers to develop a magic tool for English composition correction (support primary school to IELTS)
  15. 抖音数据采集教程,最全python库selenium自动化使用
  16. Pandas 11-综合练习
  17. Pandas 11 - comprehensive exercises
  18. Pandas基础|用户游览日志时间合并排序
  19. python自学 第三章 python语言基础之保留字、标识符与内置函数
  20. python学习例程3-函数
  21. Python GUI 之Tkinter小结 - 知乎
  22. Pandas foundation | user travel log time merge sort
  23. Chapter 3 reserved words, identifiers and built-in functions of the foundation of Python
  24. Tkinter summary of Python GUI - Zhihu
  25. 【Python常用包】itertools
  26. Itertools
  27. [Python] Matplotlib 图表的绘制和美化技巧
  28. Drawing and beautifying skills of [Python] Matplotlib chart
  29. Drawing and beautifying skills of [Python] Matplotlib chart
  30. Python序列之列表(一)
  31. Python解析库lxml与xpath用法总结
  32. Python解析库lxml与xpath用法总结
  33. Usage Summary of Python parsing library lxml and XPath
  34. Usage Summary of Python parsing library lxml and XPath
  35. Python web/HTML GUI
  36. Why is sanic better than Django flame?
  37. Wechat applet Python sends subscription message
  38. Invincible, with Python for English teachers to develop an English composition correction artifact (support primary school to IELTS)
  39. How can I use Python to create a magic with children (one can play for one day)?
  40. Pandas module
  41. Machine learning in Python - Boston house price forecast
  42. 50 Great Python modules
  43. Share the survival status of Python practitioners and tell you the real salary of general programmers
  44. Pandas basic operation update
  45. Python Programming day02 Python operator
  46. Usage Summary of Python parsing library lxml and XPath
  47. 1. First meeting pandas
  48. Conversion between Python and base conversion between Python and base
  49. Basics of Python
  50. Fundamentals of python (XIV): errors and exceptions
  51. Fundamentals of python (8): time related modules
  52. Fundamentals of python (I): necessary knowledge for getting started
  53. Operators in Python 3
  54. The list of national computer non graduate schools (captured by Python), just look at this one!
  55. Python data visualization: Seaborn
  56. Quick start pandas (lower)
  57. Operators in Python 3
  58. Python tarfile module
  59. Python basic syntax