Compared with Excel, it is easy to learn Python report automation practice!

Python learning and data mining 2021-10-29 12:18:15
compared excel easy learn python

This article will take you through the process of report Automation , And teach you to use it Python Realize a report automation practice in the work , Length is longer than the , Suggest collecting first , Like to like and support , At the end of the article, a technical exchange group is provided , The specific contents of the article are :

1.Excel Basic composition of
2. A report automation process
3. Report automation practice
- The month on month comparison of various indicators on that day

  • Order quantity created in each province on that day
  • Create order volume trend in recent period
    4. Merge different results
  • Merge different results into the same Sheet in
  • Merge different results into different parts of the same workbook Sheet in

Excel Basic composition of

We usually make statements at the beginning , It's basically from Excel At the beginning , It's all about using Excel I'm making a report , So let's first understand Excel Basic composition of .

The picture below is Excel The composition of the various parts of , We deal with a lot every day in our work Excel file , One Excel A file is actually a workbook . You create one at a time Excel When you file , The file name will default to workbook x, among x Is the number of files you created . There can be more than one in a workbook Sheet, Different Sheet There is a separate table between . every last Sheet It consists of several cells . Each cell has several elements or attributes , We usually aim at Excel In fact, most of the file settings are set for the elements of the cell .

 picture

The main content of setting cell elements is shown in the menu bar below , Like fonts 、 Alignment mode 、 Conditional format, etc . This book is also based on Excel Write each module in the menu bar .

 picture

The process of an automated report

The following figure shows the process I need to go through to make an automated report , It is mainly divided into 5 A step :

 picture

The first step is to disassemble the report to be made , This step is not directly related to whether to use tools or what tools to use , For example, the first step in reporting is generally to collect data , This data may be recorded in paper notebooks by offline personnel , It may also be stored in Excel Inside the watch , It may also be stored in the database . Because of different data source types or storage methods , The corresponding data collection methods will be different , But the step of collecting data itself will not change , The purpose of this step is to collect data .

The second step is to think about the corresponding code implementation of each specific step involved in the first step , It's usually to find the code corresponding to each step , For example, what is the code for importing data , Another example is what the code of duplicate value deletion is like .

The third step is to combine the codes corresponding to each step in the second step , Combined into a complete code .

The fourth step is to verify the report results obtained from the complete code in step 3 , See if the result is correct .

The fifth step is to wait for the call , See when you need to make a report , Then execute the written code once .

In fact, report automation is essentially The process of making machines do things instead of people , We just need to translate every step we need to do manually into a language that the machine can understand , That's code , Then let the machine automatically execute , This is actually the realization of Automation .

Report automation practice

This section shows you how to combine... In practical work Pandas and openpyxl Automatically generate reports .

Suppose we now have the following data set :

 picture

Now we need to make daily reports based on this data set , It will mainly include three aspects :

  • The same month on month comparison of various indicators on that day ;

  • Order quantity created in each province on that day ;

  • Create order volume trend in recent period

Next, we will implement these three parts .

The same month on month comparison of various indicators on that day :

We use first Pandas Calculate and process the data , Get the same month on month comparison of each index , The specific implementation code is as follows :

# Import files 
import pandas as pd
df = pd.read_excel(r'D:\Data-Science\share\excel-python Report Automation \sale_data.xlsx')
# Construct functions that obtain different indicators at the same time 
def get_data(date):
create_cnt = df[df[' Date of creation '] == date]['order_id'].count()
pay_cnt = df[df[' Payment date '] == date]['order_id'].count()
receive_cnt = df[df[' Receiving date '] == date]['order_id'].count()
return_cnt = df[df[' Refund date '] == date]['order_id'].count()
return create_cnt,pay_cnt,receive_cnt,return_cnt
# Suppose the day is 2021-04-11
# Obtain the index values in different time periods 
df_view = pd.DataFrame([get_data('2021-04-11')
,get_data('2021-04-10')
,get_data('2021-04-04')]
,columns = [' Create order quantity ',' Payment order quantity ',' Receipt order quantity ',' Refund order quantity ']
,index = [' On the day ',' Yesterday, ',' The same time last week ']).T
df_view[' Chain ratio '] = df_view[' On the day '] / df_view[' Yesterday, '] - 1
df_view[' Year on year '] = df_view[' On the day '] / df_view[' The same time last week '] - 1
df_view

Running the above code will get the following results :

 picture

The above is only the absolute value of the same month on month ratio of each index , However, our general daily newspapers have to make some format adjustments before they are sent out , Such as adjusting the font . Format adjustment requires openpyxl library , We need to Pandas In the library DataFrame Convert data in format to applicable openpyxl The data format of the library , The specific implementation code is as follows :

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
# Create an empty workbook 
wb = Workbook()
ws = wb.active
# take DataFrame Format data into openpyxl Format 
for r in dataframe_to_rows(df_view,index = True,header = True):
ws.append(r)
wb.save(r'D:\Data-Science\share\excel-python Report Automation \ Core indicators _ original .xlsx')

Running the above code will get the following results , You can see that the original data file looks chaotic :

 picture

Next, we adjust the format of the above original data file , The specific adjustment codes are as follows :

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.styles import colors
from openpyxl.styles import Font
from openpyxl.styles import PatternFill
from openpyxl.styles import Border, Side
from openpyxl.styles import Alignment
wb = Workbook()
ws = wb.active
for r in dataframe_to_rows(df_view,index = True,header = True):
ws.append(r)
# The second line is empty , Delete the second line 
ws.delete_rows(2)
# to A1 Assign values to cells 
ws['A1'] = ' indicators '
# Insert a line as the title line 
ws.insert_rows(1)
ws['A1'] = ' E-commerce business direction 2021/4/11 daily '
# Merge the cells of the header row 
ws.merge_cells('A1:F1') # merge cell 
# Right. 1 To the first 6 Format the cells in the row 
for row in ws[1:6]:
for c in row:
# Font settings 
c.font = Font(name = ' Microsoft YaHei ',size = 12)
# Alignment Settings 
c.alignment = Alignment(horizontal = "center")
# Border line settings 
c.border = Border(left = Side(border_style = "thin",color = "FF000000"),
right = Side(border_style = "thin",color = "FF000000"),
top = Side(border_style = "thin",color = "FF000000"),
bottom = Side(border_style = "thin",color = "FF000000"))
# Make special settings for the header row and header row 
for row in ws[1:2]:
for c in row:
c.font = Font(name = ' Microsoft YaHei ',size = 12,bold = True,color = "FFFFFFFF")
c.fill = PatternFill(fill_type = 'solid',start_color='FFFF6100')
# Set month on month and year-on-year to percentage format 
for col in ws["E":"F"]:
for r in col:
r.number_format = '0.00%'
# Adjust column width 
ws.column_dimensions['A'].width = 13
ws.column_dimensions['E'].width = 10
# Save the adjusted file 
wb.save(r'D:\Data-Science\share\excel-python Report Automation \ Core indicators .xlsx')

Running the above code will get the following results :

 picture

You can see that all items have been set successfully .

Order quantity created in each province on that day :

We also use Pandas The warehouse processing obtains the order quantity created by each province on the current day , The specific implementation code is as follows :

df_province = pd.DataFrame(df[df[' Date of creation '] == '2021-04-11'].groupby(' Province ')['order_id'].count())
df_province = df_province.reset_index()
df_province = df_province.sort_values(by = 'order_id',ascending = False)
df_province = df_province.rename(columns = {
'order_id':' Create order quantity '})
df_province

Running the above code will get the following results :

 picture

After obtaining the absolute value of the order quantity created in each province on that day , Format it as well , The specific setting code is as follows :

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.styles import colors
from openpyxl.styles import Font
from openpyxl.styles import PatternFill
from openpyxl.styles import Border, Side
from openpyxl.styles import Alignment
from openpyxl.formatting.rule import DataBarRule
wb = Workbook()
ws = wb.active
for r in dataframe_to_rows(df_province,index = False,header = True):
ws.append(r)
# Right. 1 To the first 11 Set the cells in the row 
for row in ws[1:11]:
for c in row:
# Font settings 
c.font = Font(name = ' Microsoft YaHei ',size = 12)
# Alignment Settings 
c.alignment = Alignment(horizontal = "center")
# Border line settings 
c.border = Border(left = Side(border_style = "thin",color = "FF000000"),
right = Side(border_style = "thin",color = "FF000000"),
top = Side(border_style = "thin",color = "FF000000"),
bottom = Side(border_style = "thin",color = "FF000000"))
# Format progress bar conditions 
rule = DataBarRule(start_type = 'min',end_type = 'max',
color="FF638EC6", showValue=True, minLength=None, maxLength=None)
ws.conditional_formatting.add('B1:B11',rule)
# Right. 1 Set the row header row 
for c in ws[1]:
c.font = Font(name = ' Microsoft YaHei ',size = 12,bold = True,color = "FFFFFFFF")
c.fill = PatternFill(fill_type = 'solid',start_color='FFFF6100')
# Adjust column width 
ws.column_dimensions['A'].width = 17
ws.column_dimensions['B'].width = 13
# Save the adjusted file 
wb.save(r'D:\Data-Science\share\excel-python Report Automation \ Sales situation of each province .xlsx')

Running the above code will get the following results :

 picture

Create order volume trend in recent period :

The trend of an indicator is generally reflected in the form of a line chart , We also talked about , In practice, we usually use matplotlib Or other visual libraries for charting , And save it , And then reuse it openpyxl The library inserts the chart into Excel in .

First use of matplotlib Library for drawing , The specific implementation code is as follows :

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["font.sans-serif"]='SimHei'# Solve the Chinese garbled code 
# Set chart size 
plt.figure(figsize = (10,6))
df.groupby(' Date of creation ')['order_id'].count().plot()
plt.title('4.2 - 4.11 Create order volume daily trend ')
plt.xlabel(' date ')
plt.ylabel(' Order quantity ')
# Save the chart locally 
plt.savefig(r'D:\Data-Science\share\excel-python Report Automation \4.2 - 4.11 Create order volume daily trend .png')

Insert the chart saved locally into Excel in , The specific implementation code is as follows :

from openpyxl import Workbook
from openpyxl.drawing.image import Image
wb = Workbook()
ws = wb.active
img = Image(r'D:\Data-Science\share\excel-python Report Automation \4.2 - 4.11 Create order volume daily trend .png')
ws.add_image(img, 'A1')
wb.save(r'D:\Data-Science\share\excel-python Report Automation \4.2 - 4.11 Create order volume daily trend .xlsx')

Running the above code will get the following results , You can see that the chart has been successfully inserted into Excel in :

 picture

Merge different results

Above, we separate each part to realize , Finally, it is stored in different Excel In file . Yes, of course , Sometimes it's troublesome to put it in different files , We need to combine these results in the same Excel In the same Sheet Or different Sheet in .

Merge different results into the same Sheet in :

Merge different results into the same Sheet The difficulty in is that the results of different tables have different structures , And you need to leave a blank between different results .

First insert the core indicator table df_review, The insertion method is the same as the individual insertion , The specific code is as follows :

for r in dataframe_to_rows(df_view,index = True,header = True):
ws.append(r)

Next, we should insert the situation table of each province df_province, because append The default is to insert from the first line , And we already have... In the first few lines df_view The data in the table are , So it can't be used appen The way to insert , You can only insert by traversing each cell .

How do we know which cells to traverse ? The core needs to know the row and column at the beginning of traversal and the row and column at the end of traversal .

 Traverse the starting line = df_view Rows occupied by the table + The line left blank ( Generally, there are... Between tables 2 That's ok ) + 1
Traverse the end line = Traverse the starting line + df_province Rows occupied by the table
Traverse the starting column = 1
End of traversal column = df_province The columns occupied by the table

And because DataFrame The way to get column names in is different from the way to get specific values , So we need to insert... Separately , Insert column name first , The specific code is as follows :

for j in range(df_province.shape[1]):
ws.cell(row = df_view.shape[0] + 5,column = 1 + j).value = df_province.columns[r]

df_province.shape[1] Is to obtain df_province How many columns does the table have ,df_view.shape[0] Is to obtain df_view How many rows does the table have .

As I said before , The row at the beginning of traversal is the row occupied by the table plus the blank row plus 1, Generally, the lines left blank are 2, But why is it df_view.shape[0] + 5 Well ? This is because df_view.shape[0] It does not include column name rows , At the same time insert Excel It will add by default 1 Go ahead, go ahead , So we need to add... On the basis of leaving white lines 2 That's ok , namely 2 + 2 + 1 = 5.

because range() The default function is from 0 At the beginning , and Excel The columns in are from 1 At the beginning , therefore column Need to add 1.

The code above just puts df_province The column names of the table are inserted , Next, insert specific values , The method is the same as that of inserting column names , You just need to insert... On the next row of the column name , The specific code is as follows :

# Then insert the specific value into 
for i in range(df_province.shape[0]):
for j in range(df_province.shape[1]):
ws.cell(row = df_view.shape[0] + 6 + i,column = 1 + j).value = df_province.iloc[i,j]

It's time to insert the picture , Insert the picture in the same way as the previous separate insertion , The specific code is as follows :

# Insert a picture 
img = Image(r'D:\Data-Science\share\excel-python Report Automation \4.2 - 4.11 Create order volume daily trend .png')
ws.add_image(img, 'G1')

After inserting all the data, it's time to format the data , Because different tables have different structures , So we can't directly format all cells in batch , It can only be set by range , The format of different ranges may be the same , So let's preset some format variables , In this way, you can directly call these variables when you use them later , Reduce code redundancy , The specific code is as follows :

# Format preset 
# Header font settings 
title_Font_style = Font(name = ' Microsoft YaHei ',size = 12,bold = True,color = "FFFFFFFF")
# Normal content font settings 
plain_Font_style = Font(name = ' Microsoft YaHei ',size = 12)
Alignment_style = Alignment(horizontal = "center")
Border_style = Border(left = Side(border_style = "thin",color = "FF000000"),
right = Side(border_style = "thin",color = "FF000000"),
top = Side(border_style = "thin",color = "FF000000"),
bottom = Side(border_style = "thin",color = "FF000000"))
PatternFill_style = PatternFill(fill_type = 'solid',start_color='FFFF6100')

After the format is preset, you can set the format of each range separately , The specific code is as follows :

# Yes A1 to F6 Set the cells in the range 
for row in ws['A1':'F6']:
for c in row:
c.font = plain_Font_style
c.alignment = Alignment_style
c.border = Border_style
# Right. 1 Xing He 2 Set the cells in the row 
for row in ws[1:2]:
for c in row:
c.font = title_Font_style
c.fill = PatternFill_style
# Yes E Column sum F Column 
for col in ws["E":"F"]:
for r in col:
r.number_format = '0.00%'
# Yes A9 to B19 Set the cells in the range 
for row in ws['A9':'B19']:
for c in row:
c.font = plain_Font_style
c.alignment = Alignment_style
c.border = Border_style
# Yes A9 to B9 Set the cells in the range 
for row in ws['A9':'B9']:
for c in row:
c.font = title_Font_style
c.fill = PatternFill_style
# Set progress bar 
rule = DataBarRule(start_type = 'min',end_type = 'max',
color="FF638EC6", showValue=True, minLength=None, maxLength=None)
ws.conditional_formatting.add('B10:B19',rule)
# Adjust column width 
ws.column_dimensions['A'].width = 17
ws.column_dimensions['B'].width = 13
ws.column_dimensions['E'].width = 10

Finally, merge all the above code fragments together , Is to merge different result files into the same Sheet Complete code in , The results are as follows , You can see that the different result files are merged together , And their formats are well set .

 picture

Merge different results into different parts of the same workbook Sheet in :

Merge different results into different parts of the same workbook Sheet It is easier to realize , Just create a few Sheet, And then for different Sheet Insert data , The specific implementation code is as follows :

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
wb = Workbook()
ws = wb.active
ws1 = wb.create_sheet()
ws2 = wb.create_sheet()
# change sheet The name of 
ws.title = " Core indicators "
ws1.title = " Sales by province "
ws2.title = " Daily trend "
for r1 in dataframe_to_rows(df_view,index = True,header = True):
ws.append(r1)
for r2 in dataframe_to_rows(df_province,index = False,header = True):
ws1.append(r2)
img = Image(r'D:\Data-Science\share\excel-python Report Automation \4.2 - 4.11 Create order volume daily trend .png')
ws2.add_image(img, 'A1')
wb.save(r'D:\Data-Science\share\excel-python Report Automation \ Merge multiple results _ many Sheet.xlsx')

Run the above code , We will get the following results , You can see that you created 3 individual Sheet, And different contents are saved to different Sheet in :

 picture

Here we have completed the code of an automated report , Every time I need this report in the future , Execute the above code again , The results will come out soon , Of course, you can also set scheduled execution , When the time comes, the results will be automatically sent to your email .


Technical communication

Welcome to reprint 、 Collection 、 Gain some praise and support !

 Insert picture description here

At present, a technical exchange group has been opened , Group friends have exceeded 2000 people , The best way to add notes is : source + Interest direction , Easy to find like-minded friends

  • The way ①、 Send the following picture to wechat , Long press recognition , The background to reply : Add group ;
  • The way ②、 Add microsignals :dkl88191, remarks : come from CSDN
  • The way ③、 WeChat search official account :Python Learning and data mining , The background to reply : Add group

 Long press attention

版权声明
本文为[Python learning and data mining]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/10/20211013223555950u.html

  1. Python code reading (Chapter 14): List Union
  2. Lecture du Code Python (article 25): diviser les chaînes multilignes en listes
  3. Python self study notes -- operators
  4. Formation python - différences entre http et HTTPS
  5. Implementation of automatic timing comment function on Python CSDN platform
  6. python+tkinter+treeview子控件快捷键
  7. Raccourcis clavier pour les sous - contrôles Python + tkinter + treeview
  8. Analyse des données Python
  9. python+tkinter+treeview子控件快捷鍵
  10. Devine si je peux attraper Maotai avec la programmation python? Tout est ouvert à github
  11. À propos de pygame.display.set in Python Un petit problème avec mode ()
  12. Implementation of automatic timing comment function on Python CSDN platform
  13. python:dataframe进行iteritem遍历时如何将输出结果按照列分别输出为该列最后一行
  14. python:dataframe進行iteritem遍曆時如何將輸出結果按照列分別輸出為該列最後一行
  15. Python: comment le dataframe affiche les résultats de sortie par colonne à la dernière ligne de la colonne lors de la traversée de l'itemitem
  16. Écrivez un gadget de bureau pour votre fille préférée en python et elle dit que c'est génial!
  17. Introduction to closures in Python 3
  18. Global / nonlocal usage in Python 3
  19. Introduction to context manager in Python 3
  20. Python crawler selenium framework. You can start with these five questions | Python skill tree
  21. Common standard library random, python introductory tutorial 5 or 6 questions a day | Python skill tree
  22. It is said that Python is omnipotent. It's really good to see Liyang photography circle with Python this time
  23. 【Python 爬虫】 4、爬虫基本原理
  24. 【Python 爬蟲】 4、爬蟲基本原理
  25. 【 Python crawler】 4. Principes de base du crawler
  26. 这道python题到底应该要怎么做
  27. Que doit faire exactement ce problème Python
  28. Après l'importation des variables du module Python, les valeurs imprimées sont fixes.
  29. Nouveau singe Muzi Lee: 0 cours de formation Python de base types de hachage pour les opérations Python redis
  30. Looking at problems from a fresh perspective: analyzing selenium principle from the perspective of Python
  31. Insérez le format de date dans la base de données MySQL en python et ne l'exécutez pas.
  32. Try Python 3.10 with CONDA
  33. Répondez en python et demandez à quelqu'un de vous aider.
  34. Un simple problème de travail Python, qui ne fonctionne pas
  35. Problèmes d'écriture Python pour la boucle
  36. Comment Python exécute les commandes du programme à plusieurs reprises au lieu de quitter
  37. YYDS! Dexplot: one line of Python code to easily draw statistical charts!
  38. pandas生成的透视表如何和源数据一起保存
  39. pandas生成的透視錶如何和源數據一起保存
  40. Comment sauvegarder le tableau pivot généré par pandas avec les données sources
  41. 10 fois plus efficace avec cache dans le développement de Django
  42. 求Python *.svg文件操作方法
  43. 求Python *.svg文件操作方法
  44. Trouver la méthode de fonctionnement du fichier Python *.Svg
  45. 【 python】 Internal Guide for Unit Test Practice
  46. 用Python编程佩尔数列pell数列循环结构
  47. 【 python】 échafaudage fastapi: spécification du développement du projet d'interface arrière fastapi
  48. [Python] restful Specification Practice Based on fastapi
  49. Python代码阅读(第26篇):将列表映射成字典
  50. How to use Python to make a screen color extractor with Exe file
  51. Lecture du Code Python (article 26): cartographie des listes dans les dictionnaires
  52. Python代码阅读(第26篇):将列表映射成字典
  53. Python代碼閱讀(第26篇):將列錶映射成字典
  54. Lecture du Code Python (article 26): cartographie des listes dans les dictionnaires
  55. 使用 Python 进行数据可视化之Seaborn
  56. Real time access to stock data, free—— Python crawler Sina stock actual combat
  57. Seaborn pour la visualisation des données en python
  58. 浅识XPath(熟练掌握XPath的语法)【python爬虫入门进阶】(03)
  59. Python中if else语句进行操作的时候哪里除了错,搞不懂
  60. Python题,我刚学,还不会