Still working on more than one Excel I'm worried about statistics ?Python It's delicious !

writing | Leisure and joy

source :Python technology 「ID: pythonall」

 Still working on more than one Excel I'm worried about statistics ?Python It's delicious !

Why are more and more white-collar non programmers learning Python ? They may not want to learn Python Go to some websites and get a cool sense of accomplishment , It's a lot of data analysis and processing problems , use Python It can be solved simply and efficiently . This article will show you through a practical example Python How to solve complex problems efficiently in practical work .

background

Xiao Ming works for an outdoor sports company , Their company has many brands , And it involves a lot of subdivided industries . Xiao Ming works as a data analyst in this company , Usually through Excel To do data analysis . The boss gave him a task today : Select the top five brands and sales volume of the group company in the latest year before work .

about Excel For the boss , Isn't that a matter of minutes ? Xiao Ming didn't pay attention to it , Until colleagues in the marketing department sent him the original data file , He realized that it wasn't that simple :

 Still working on more than one Excel I'm worried about statistics ?Python It's delicious !
Form file ( Data from network )

It's not the top five in the imaginary order . There are 90 File , In the normal way of thinking , He either copies the contents of all the files into a table to sort them out , Or you can sort and summarize each form , Then the final results are classified and summarized .

Think about the workload , Think about the deadline , Xiao Ming scratched his head , I feel like I'm going bald .

Thought analysis

This kind of physical work , It's the easiest thing to write a program . Xiao Ming thought of his programmer friend Duan at this time , So he left the question to the short paragraph .

A little wisp of his little hair , say :so easy, Just find master pan .

Xiao Ming said : You're not sure ? We need to find someone else !

A little bitter smile said : No no no , Master pan is Python It's a data processing library , It's called Pandas , Be commonly called Master pan .

Xiao Ming said : I don't care what master, master , Just say how long it will take .

On the short paragraph : Give me a few minutes to program , Just a few more seconds !

Xiao Mingfa's face of worshiping the big man .

A little bit of thinking , Sort out the program ideas :

  • Calculate the sales of each row in each table , use “ Number of visitors Conversion rate Customer unit price ” Just go .
  • Put each table together by brand sales .
  • Sum up the results of all the tables into a general table
  • Summarize and sort sales by brand in the general table

code

Step zero , Read Excel :

import pandas as pd
df = pd.read_excel("./tables/" + name)

First step , Calculate the sales in each form :

df[' sales '] = df[' Number of visitors '] * df[' Conversion rate '] * df[' Customer unit price ']

The second step , Put each table together by brand sales :

df_sum = df.groupby(' brand ')[' sales '].sum().reset_index()

The third step , Sum up the results of all the tables into a general table :

result = pd.DataFrame()
result = pd.concat([result, df_sum])

Step four , Summarize and sort sales by brand in the general table :

final = result.groupby(' brand ')[' sales '].sum().reset_index().sort_values(' sales ', ascending=False)

Last , Let's take a look at the complete program :

import pandas as pd
import os
result = pd.DataFrame()
for name in os.listdir("./tables"):
try:
df = pd.read_excel("./tables/" + name)
df[' sales '] = df[' Number of visitors '] * df[' Conversion rate '] * df[' Customer unit price ']
df_sum = df.groupby(' brand ')[' sales '].sum().reset_index()
result = pd.concat([result, df_sum])
except:
print(name)
pass
final = result.groupby(' brand ')[' sales '].sum().reset_index().sort_values(' sales ', ascending=False)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
print(final.head())

The end result is this :

 brand sales
15 brand -5 1078060923.62
8 brand -17 1064495314.96
4 brand -13 1038560274.21
3 brand -12 1026115153.00
13 brand -3 1006908609.07

You can see that the final top five have come out , The whole program runs very fast .

In a few minutes , Xiaoduan sent the result to Xiaoming , Xiaoming was moved to the heart , Call for dinner some other day , Apprenticeship !

summary

This article is to show you master pan through a practical case (Pandas) The charm of , Especially for this kind of table processing , It's very convenient . Those who have written programs may feel a little familiar with it , It's a little bit like SQL Query statement . Master pan can not only make our program more simple and efficient , It's also very friendly to non programmers who need to deal with tables frequently , It's easy to get started .