Programming is not for the python workplace

Bi worker 2020-11-11 19:03:28
programming python workplace


Professionals use Excel Data processing has become commonplace . But I believe you must have been helpless , For example, complex computing 、 Repeat the calculation 、 Automatic processing, etc , There's another crash that hasn't been saved , It's not impossible for the whole person to collapse .

If you learn the programming language , These problems are nothing . that , What to learn ?

Countless training institutions and online materials will tell us :Python!

Python The code looks simple , A few lines can solve a lot of trouble Excel Calculation , It looks great .

But is it true ? As nonprofessionals , I can really learn Python To help us with our work ?

Python DataFrame

Daily business is mainly dealing with tabular data ( In a professional sense, structured data ), Like this :

..

Each row of data in a table other than the first row is called a record , It corresponds to one thing 、 A person 、 An order ……, The first line is the title , Describes what attributes a record consists of , These records all have the same properties , The whole table is a collection of such records .

 

Python It's mainly called DataFrame To handle this kind of tabular data , Let's see DataFrame How it's done .

Like the table above , Read in DataFrame The latter is like this :

..

Looks like Excel almost , It's just that the line number is from 0 At the beginning .

 

however ,DataFrame The essence of is a matrix ( Can you remember linear algebra in college ?),Python There is no record of such a concept , All its operations have to go around the method that the matrix can execute .

Let's look at some simple operations .

Filter

Filtering is a simple and common operation , It is to take a subset that satisfies a certain condition , For example, the table above :

..

 

Question 1 : Take out R&D Employees of the Department .

Python The code looks like this :

import pandas as   pd

data =   pd.read_csv('Employees.csv')

rd = data.loc[data['DEPT']=='R&D']

print(rd)

Import Pandas

Reading data

Filter R&D department

see rd data

Running results :

..

The code is simple , And it turns out to be OK . however :

1.     The function used is called loc, yes location( location ) Abbreviation , There's no filtering at all . in fact , The filtering here is also through positioning (location) The index of the rows that satisfy the condition , Inside the function data[‘DEPT’]==’R&D’ It will work out a Boolean value Series:

..

and data The index of is the same , The act of satisfying conditions True Otherwise False

then loc According to the value of True The index corresponding to the row of data A new one is obtained by the corresponding line in DataFrame, Essentially, it is the operation of extracting a specified row from a matrix , It's a little tricky to deal with filtering .

2.     Filter DataFrame You can't just use loc Function filtering , You can also use query(…) Other methods , But the result is to locate the row and column index of the matrix , Then index the data by row and column , It's basically like this matrix.loc[row,col]

 

in any case , The basic filtering is simple , I understand . Next, let's try to do two operations on the filtered subset that are not complicated .

Modify the data in the subset

Question two : take R&D The salary of the Department staff has been raised 5%

Natural thinking , Just filter out R&D Department staff , Then we can modify the salary of these employees .

Write the code according to this logic :

import pandas as   pd

data = pd.read_csv('Employees.csv')

rd =   data.loc[data['DEPT']=='R&D']

rd['SALARY']=rd['SALARY']*1.05

print(data)

Import Pandas

Reading data

Filter R&D department

modify SALARY

Running results :

SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  rd['SALARY']=rd['SALARY']*1.05

..

You can see , Not only did it trigger a warning , The value modification was not successful .

This is because rd = data.loc[data['DEPT']=='R&D'] It's a filtered matrix , Reuse rd['SALARY']=rd['SALARY']*1.05 This sentence modifies SALARY When it's worth it ,rd['SALARY'] It's a new matrix again , So modifying it is actually a modification rd This submatrix , No modification data This initial matrix .

That's a lot to say , Listen to me .

How to write the right code ?

import pandas as   pd

data =   pd.read_csv('Employees.csv')

rd_salary = data.loc[data['DEPT']=='R&D','SALARY']

data.loc[data['DEPT']=='R&D','SALARY']   = rd_salary*1.05

print(data)

 

 

find R&D Department staff salaries

Intercept R&D Department staff salary and revise

 

Running results :

..

That's right this time . You can't take the subset first and then modify it , To the original matrix , Find the location of the member to be modified and modify it , namely loc[row=data['DEPT']=='R&D',column='SALARY'], Get the data to be modified according to the column index , Assign a value to this matrix . Want to raise 5% We have to get this data before that (rd_salary=… This sentence ). This kind of writing needs repeated filtering , It's just inefficient , But it's just too much .

 

Intersection of subsets

Question 3 : Find out that it's both New York State and R&D Employees of the Department

This is a simpler question , As long as we calculate two subsets and do an intersection operation, it's over . Let's see. Python How is it handled :

import pandas as   pd

data =   pd.read_csv('Employees.csv')

rd =   data[data['DEPT']=='R&D']

ny =   data[data['STATE']=='New York']

isect_idx =   rd.index.intersection(ny.index)

rd_isect_ny =   data.loc[isect_idx]

print(rd_isect_ny)

 

 

R&D Department staff

New York State Employees

Index intersection

Capture data by index intersection

 

Running results :

..

Set intersection is a very basic operation , Many programming languages provide , in fact python Also provided ( It has intersection function ). However , DataFrame The essence of is a matrix , There's no point in finding the intersection of two matrices ,Python There is no matrix intersection operation . Want to do it with two dataframe The intersection operation of the set represented by , We can only find the intersection of two matrix indexes by detour , Finally, the intersection of the index is used to locate and intercept the original data , There's a feeling of being close to the distance , Do not press “ tricks ” play .

The most commonly used filtering operations in work are so puzzling , I'm dizzy , You can imagine other more complex operations , It's a sour feeling “ Born leisurely ”.

Let's look at a slightly more complex grouping operation :

grouping

Grouping operation is the most commonly used operation in daily data processing ,Python It also provides a wealth of grouping operation functions , Can complete most of the grouping operations , But it's not so easy to understand and use .

Understand in groups

Grouping is to divide a large set into small sets according to certain rules , The result is a set of sets , And then the set after grouping is calculated , Here's the picture :

..

Let's take a look at the most commonly used grouping aggregation operations .

Question 4 : Summarize the number of people in each department

Python Code :

import pandas as   pd

data =   pd.read_csv('Employees.csv')

group =   data.groupby("DEPT")

dept_num = group.count()

print(dept_num)

 

 

Group by Department

Summarize the number of people in each department

Running results :

..

It turned out to be a little embarrassing , Originally, you only need to record the number of members in each group , Just have one column , Why are there so many columns , It's like repeating the same action for each column , That's strange .

Don't worry. , This problem Python It can still be solved , It's just not using count function , It is size function :

import pandas as   pd

data =   pd.read_csv('Employees.csv')

group =   data.groupby("DEPT")

dept_num =   group.size()

print(dept_num)

 

 

Group by Department

Summarize the number of people in each department

Running results :

..

This result seems to be much more normal , however , It's still weird .

It's a drop , This result is no longer two-dimensional DataFrame 了 , It's a one-dimensional Seriese.

count The reason why the result of function calculation is strange , Because it counts every column , and size The function is to see the size of each group , But in fact, our natural logic still uses count To count ,size It's hard to think of ( And search the Internet for information ).

 

As mentioned earlier , The grouping result should be a set of sets , Let's see. Python Medium DataFrame What does it look like after grouping ? Put the code above data.groupby(“DEPT”) Print out the results of .

import pandas as   pd

data =   pd.read_csv('Employees.csv')

group = data.groupby("DEPT")

print(group)

 

 

Group by Department

Running results :

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001ADBC9CE0F0>

wow , What kind of thing is this ?

The first time I saw this thing , It's just a circle , Shouldn't the result of grouping be a set of sets , Why is that so ? It's a nightmare for nonprofessional programmers .

However, searching the Internet can still see that it is a so-called iterative object , After iteration, it is found that each item is indexed in groups + DataFrame Composed of , There are some ways to see what's inside , If you use list(group) You can see the result of grouping . Here's the picture :

..

When you look at the picture above, you will understand , go by the name of “ object ” It turns out to be like this . In essence, it is also a collection of sets ( Let's think of a matrix as a set ), But it can't take a member directly like a normal set ( Such as group[0]), This forced users to memorize this kind of “ object ” Of N The rules of operation , If you can't understand it, you have to memorize it by rote .

See here , It is estimated that many readers have already started to feel dizzy , I have no idea what the above paragraph is nonsense . Um. , That's right , Because this is the normal state of the workplace .

Simple aggregation operations in groups are so hard to understand , Let's burn our brains again , Let's look at a slightly more complex grouping of subsets .

Group subset processing

Although grouping is often used for aggregation operations , But sometimes we don't care about aggregation results , It's about the set itself after grouping . For example, the grouped sets are sorted by a certain column .

Question five : Sort the employees of each department from morning to evening according to their starting time .

Problem analysis : After grouping, the subsets are sorted according to the entry time .

Python Code

import pandas as pd

employee = pd.read_csv("Employees.csv")

employee['HIREDATE']=pd.to_datetime(employee['HIREDATE'])

employee_new =   employee.groupby('DEPT',as_index=False).apply(lambda   x:x.sort_values('HIREDATE')).reset_index(drop=True)

print(employee_new)

 

 

Change the entry time format

 

Press DEPT grouping , And to each group according to HIREDATE Sort , Finally reset the index

 

Running results :

..

It works , The employees of each department are sorted according to their entry time . But let's look at the core sentence in the code employee.groupby('DEPT',as_index=False).apply(lambda x:x.sort_values('HIREDATE')), Abstract this code, and that's it :

df.groupby(c).apply(lambda x:f(x))

df: Data frame DataFrame

groupby: Group function

c: The columns by which you group

The above three are easy to understand , But apply coordination lambda It's very obscure , Beyond what most nonprofessional programmers understand , It needs to be understood that “ Functional language ” To understand the principle of ( Search for it yourself , I'm too lazy to explain ).

If you don't use this “ both of you ”(apply+lambda) Well ? Can also do , It's just going to be a lot of trouble . Have to use for loop , Sort each grouping subset separately , Finally, we have to combine the results .

import pandas as pd

employee = pd.read_csv("Employees.csv")

employee['HIREDATE']=pd.to_datetime(employee['HIREDATE'])

dept_g = employee.groupby('DEPT',as_index=False)

dept_list = []

for index,group in dept_g:

    group =   group.sort_values('HIREDATE')

      dept_list.append(group)

employee_new = pd.concat(dept_list,ignore_index=True)

print(employee_new)

 

 

Change the entry time format

Press DEPT grouping

Initialization list

for loop

Sort each group

Sort the results into the list

Combine the results of each group

The running results are the same , But the code is a lot more complicated , And the operating efficiency is getting lower . Which one would you like to use ?

Python Different data types are designed for similar but not identical data , There are different ways of operation , It's not easy to copy knowledge of one kind of data to another similar data , Make people faint .

Said so much , In conclusion, it's just a sentence :Python It's really hard to understand , It's not something for non professionals . Specifically speaking, it is about three points :

1.     DataFrame The essence is a matrix

All the operations have to be calculated according to the matrix method , It's always very winding .

2.     There are many data types and different operation rules

Python Designed in Series,DataFrame, Grouping objects and so on , And different data types , The calculation method is not exactly the same , Such as DataFrame have access to query Function filtering , and Series Can not be , The nature of grouping objects is completely different from Series and DataFrame, The calculation method is more elusive .

3.     He who knows it knows not why

Too many data types , The calculation method is quite different , Invisible to increase the amount of memory users , There's more to rote learning , It's hard to be flexible , This creates a strange phenomenon : A simple operation , The Internet search Python The time of code may be better than that of excel The calculation is longer .

Python The code looks simple , But if you go to a training class, you probably can't learn , The result is only copying examples .

that , Is there no suitable tool for daily data processing in the workplace ?

Yes, there are. .

esProc SPL

esProc SPL It's also a programming language , Focus on structured data computing .SPL Provides a wealth of basic calculation methods , Its conceptual logic is also in line with our thinking habits .

1.     An ordinal list is a collection of records

SPL Using ordered tables to carry structured data , Close to daily processing excel surface .

2.     Few data types and consistent rules

SPL In structured data processing, there are only two data types: collection and record , The methods involved are generally the same .

3.     Know what it is and know why

Just remember two data types , Master the basic algorithm , The more complicated operation is just a combination of simple operation rules . You may write bad code when you are not proficient , But it's unlikely that it won't come out , There will be no Python The phenomenon of spending a lot of time searching for code writing .

Let's use SPL To solve the problem mentioned above , Let's get to know SPL How much “ Amiable and easy of approach ”:

Sequence table

esProc SPL The data structure used to carry two-dimensional structured data in is ordinal table , It and excel The results presented in this paper are consistent , Here's the picture :

..

Except for the first line in the table above ( Title Line ) Outside , Each other line represents a record , And an ordinal list is a collection of records , Compare with Python Medium DataFrame More intuitive .

Filter

SPL It's not filtered by matrix positioning , It is select( Screening ) Make records that meet the conditions .

Question 1 : see R&D Employee information of the Department

  A B
1 =file("Employees.csv").import@tc() / Import data
2 =A1.select(DEPT=="R&D") / Filter

A2 result :

..

SPL The filtered results are very understandable , It's a subset of the original data set .

Look again. SPL Modification of subsets and intersection operations

1.     Modify the data in the subset

Question two : take R&D The salary of the Department staff has been raised 5%

  A B
1 =file("Employees.csv").import@tc() / Import data
2 =A1.select(DEPT=="R&D") / Filter
3 =A2.run(SALARY=SALARY*1.05) / Revise salary
4 =A1 / View results

A4 result :

..

SPL It's all calculated according to our normal way of thinking , Filter out the results , Revise the salary for the results , Not like Python So much effort .

2.     Intersection of subsets

Question 3 : Find out that it's both New York State and R&D Employees of the Department

  A B
1 =file("Employees.csv").import@tc() / Import data
2 =A1.select(DEPT=="R&D") /R&D Department staff
3 =A1.select(STATE=="New   York") / New York State Employees
4 =A2^A3 / intersection

A4 result :

..

SPL The intersection operation in is to find the intersection of sets , It's a real set operation , Just use a simple intersection operator “^” that will do . Easy to understand , And it's easy to write , Not like Python In this way, because we can't find the intersection of the matrix, we can find the intersection of the index , And then it intercepts from the original data .SPL Other set operations in, such as union 、 Difference set 、 XOR sets also have corresponding operators , It's easy to use , convenient .

grouping

Understand in groups

SPL The grouping operation of is also in line with natural logic , That is, after grouping, the result is a set of sets , Obvious .

First look at it. SPL Group aggregation operation of .

Question 4 : Summarize the number of people in each department

  A B
1 =file("Employees.csv").import@tc()  
2 =A1.groups(DEPT;count(~):cnt) / grouping

A2 result :

..

The result of group aggregation , It's still a preface , You can continue to use the ordered list method . Don't like Python The result of the aggregation is one-dimensional Series.

And then look at SPL The grouping result of

  A B
1 =file("Employees.csv").import@tc()  
2 =A1.group(DEPT) / grouping

A2 result :

..

The figure above is a collection of ordered lists , Each set is an ordered list of members of a department ; The figure below shows the member of the first group ——Administration Order list of department members .

This result is in line with our normal logic , It's also easy to see the results of grouping , It's easier to do the next operation on the grouped results .

Group subset processing

The result of grouping is a set of sets , Just deal with each subset .

Question five : Sort the employees of each department from morning to evening according to their starting time

  A B
1 =file("Employees.csv").import@tc()  
2 =A1.group(DEPT) / grouping
3 =A2.conj(~.sort(HIREDATE)) / Sort subsets and merge

A3 result :

..

because SPL The result of grouping is still a set , So it can be calculated using the set method , There is no need to force memory of the calculation method after grouping , There is no need to use apply()+lambda This kind of heavenly Book combination , It's very natural to complete the grouping + Sort + The job of merging , ordinary 3 Line code , It's easy to write , It's easy to understand , And it's very efficient .

Summary

1.     Python When doing structured processing , It's all matrix operations , Simple set operations need to go around the matrix ;esProc SPL The essence is a collection of records , Set operation is simple and convenient .

2.     Python Data types are complex and diverse , The rules of operation are unpredictable , It is often known but not why , It's not likely to draw inferences from one instance , Writing code has more memory , It's so hard to understand how it works ;esProc SPL Less data types , And the calculation rules are fixed , You only need to master the basic operation rules, you can draw inferences from one instance to complete complex operations .

3.       Study SPL, You can go to :

http://www.raqsoft.com.cn/wx/SPL-programming.html

版权声明
本文为[Bi worker]所创,转载请带上原文链接,感谢

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database