How does panda operate excel? Not yet. This is enough

What's the name 2021-04-07 20:31:59
panda operate excel yet.


Python operation Excel Operation summary , Include Series and Data Frame Of each other 、 Use pandas Read Excel form 、python Read multiple tables 、python Merge multiple worksheets and write Excel file

pandas It's based on NumPy Data analysis tools for . It provides a lot of methods that enable us to process data quickly .


pandas How to operate Excel? Not yet , This is enough

Common data types

  • Series: One dimensional array , And NumPy One dimensional arrays in are similar to , and Python Self list It's similar . The difference is from Series The data in can only be one kind of data , and list The data in can be different
  • Time-Series: Indexed by time Series
  • DataFrame: Two dimensional tabular data structure . Often used to deal with Excel Table data, etc , This is what we will focus on in this lesson
  • Panel: Three dimensional array (0.25 After version , Unified use xarray, No longer supported Panel)

Series and Data Frame Of each other

  • utilize to_frame() Realization Series turn DataFrame
  • utilize squeeze() Realize single column data DataFrame turn Series
import pandas as pds = pd.Series([" Beishan "," Focus on "," give the thumbs-up "])s
0 Beishan 1 Focus on 2 give the thumbs-up dtype: object
s = s.to_frame(name=" Name ")s

pandas How to operate Excel? Not yet , This is enough

s.squeeze()
0 Beishan 1 Focus on 2 give the thumbs-up Name: Name , dtype: object

Use pandas Read Excel form

stay pandas in , Read Excel It's simple , It has only one way :readExcel(), But there are a lot of parameters

Main commonly used parameters , Let's get to know it first :

  • io: General designation excel File path is OK . It could be something else Excel Read objects such as ExcelFile、xlrd.Book etc.
  • sheet_name: Use to specify the worksheet (sheet) name . It could be a number ( The worksheet starts with 0 Index started )
  • header: Specifies the row as the column name , The default is 0, That is, the first act of listing . If the data does not contain column names , Set to None
  • names: Specify a new list of column names . The number of elements and columns in the list must be the same
  • index_col: Specifies the column to be listed as the index column , Default None The index is 0 The first column of is the index column
  • usecols: Columns to parse data , It can be int perhaps str A list of , It can also be a comma separated string (pandas 0.24 New function ), for example :”A:F”, From A List to F Column ,”A,C,F” Express A、C、F The three column , It can also be written as ”A,C,F,K:Q”
  • dtype: The data type of each column , for example :{‘a’: np.float64, ‘b’: np.int32}
  • converters: Dictionary data of the function used to convert the data of each column , for example :{‘a’: func_1, ‘b’: func_2}
import pandas as pdsheet = pd.read_excel(io=" Test data .xlsx")sheet.head()

pandas How to operate Excel? Not yet , This is enough

Let's first look at the data type of the retrieved data .

print(type(sheet))
<class 'pandas.core.frame.DataFrame'>

You can see , It's what we mentioned earlier DataFrame data ., You can get it directly through its column name , such as , To get all the salary information , Can be as follows :
print(sheet[' Wages '])
0 76531 87992 98003 128804 36005 38006 89767 120008 89009 768810 671211 965512 685413 812214 678815 8830Name: Wages , dtype: int64

You can see that all of its data are listed , And the data type of this column is int64, namely 64 An integer .
After getting this column of data , We can deal with it .

for i in sheet[' Wages ']: print(i)
765387999800128803600380089761200089007688671296556854812267888830

Or convert it to a list and process it :
salaries = list(sheet[' Wages '])print(salaries)
[7653, 8799, 9800, 12880, 3600, 3800, 8976, 12000, 8900, 7688, 6712, 9655, 6854, 8122, 6788, 8830]

Calculate everyone's average salary :

sum = 0for i in salaries: sum += i print(f" Total wage :{sum}")ave = sum / len(salaries)print(f" Average wage :{ave}")
 Total wage :131057 Average wage :8191.0625

We can also find a way to sum , Use lambda expression ( Anonymous functions ) combination reduce() Function .reduce() The function changes the list 、 Tuples and other traversable elements are operated in turn : Operate on the first and second elements , And the result is calculated with the third element , Until the last element .
import functoolssum = functools.reduce(lambda x, y: x + y, salaries)print(sum)
131057

We can use read_excel Medium usecols Parameters , It specifies the columns that we need to read data from , It receives data in string or integer list format , List the name or index of the column we want to retrieve the data from .
import pandas as pdsheet = pd.read_excel(io=" Test data .xlsx", usecols=[2])sheet

pandas How to operate Excel? Not yet , This is enough

perhaps :

import pandas as pdsheet = pd.read_excel(io=" Test data .xlsx", usecols=[' Wages '])sheet

pandas How to operate Excel? Not yet , This is enough


If you want to read data , Change the name of the original column to another name , You can use names Parameter is specified as another column name :
import pandas as pdsheet = pd.read_excel(io=" Test data .xlsx", names=['name','age','salary'])sheet

pandas How to operate Excel? Not yet , This is enough


It should be noted that , here , If we're going to do this DataFrame To operate , You need to use a new column name . If we want to take out the wage data , With “¥12,345” The format of , When you get the data , Just specify the conversion function :
import pandas as pddef formatsalary(num): return f"¥{format(num,',')}"sheet = pd.read_excel(io=" Test data .xlsx", usecols=[' Wages '],converters={' Wages ':formatsalary})sheet

pandas How to operate Excel? Not yet , This is enough


Through the top converters It specifies “ Wages ” Column , Use formatsalary Function to handle , So the data taken out has been processed . Of course , We can also take it out and format it .

Other parameters , You can experiment on your own . Let's take another look , Let's say I want to take out all the greater than or equal to 8000 The salary of , How to deal with it ? We can use conditional access to DataFrame Row data :

import pandas as pdsheet = pd.read_excel(io=" Test data .xlsx", usecols=[' Wages '])high_salary = sheet[sheet[' Wages '] >= 8000]high_salary

pandas How to operate Excel? Not yet , This is enough

If you want to get a salary greater than or equal to 8000 Less than or equal to 10000 The data of :

import pandas as pdsheet = pd.read_excel(io=" Test data .xlsx")high_salary = sheet[(sheet[' Wages '] >= 8000) & (sheet[' Wages '] <=10000)]high_salary

pandas How to operate Excel? Not yet , This is enough

If you only want to display the qualified name and salary , Then you can specify the columns to be displayed in a list :

import pandas as pdsheet = pd.read_excel(io=" Test data .xlsx")high_salary = sheet[(sheet[' Wages '] >= 8000) & (sheet[' Wages '] <=10000)][[' full name ',' Wages ']]high_salary

pandas How to operate Excel? Not yet , This is enough

Read multiple tables

In the example above , Although in “ Test data .xlsx” The file contains two data tables (sheet), But it only reads the contents of the first table , What should I do if I want to read out the data in both tables ? You can specify sheet_name Parameters , It receives strings 、 Numbers 、 A list of strings or numbers and None. If specified as None, All data table data is returned . The default is 0, That is to return the data of the first data table .

import pandas as pdsheet = pd.read_excel(io=" Test data .xlsx", sheet_name=[0, 1])sheet
{0: full name Age Wages 0 OLIVER. 23 7653 1 HARRY. 45 8799 2 GEORGE. 34 9800 3 NOAH. 54 12880 4 JACK. 34 3600 5 JACOB. 32 3800 6 MUHAMMAD. 51 8976 7 LEO. 46 12000 8 Harper. 42 8900 9 Evelyn. 38 7688 10 Ella. 33 6712 11 Avery. 26 9655 12 Scarlett. 37 6854 13 Madison. 41 8122 14 Lily. 54 6788 15 Eleanor. 28 8830, 1: full name Age Wages 0 Zhang San 39 15000 1 Li Si 43 16000 2 Li lei 25 6800 3 Han Meimei 28 23000}

You can see , We get the data of two tables . In this case, we need to get the data in the data table , We need to pass first sheet[0]、sheet[1] Get all the data in the first table , Then the data is processed in the data table , for example :

sheet[1]

pandas How to operate Excel? Not yet , This is enough

If you use the name of the data table , It should be written as sheet[‘ Company A ’].
If we want to combine the data from these two tables , have access to pandas Medium
concat() function :

import pandas as pdsheet = pd.read_excel(io=" Test data .xlsx", sheet_name=[1, 0])st = pd.concat(sheet,ignore_index = True)st

pandas How to operate Excel? Not yet , This is enough

here ignore_index It means to ignore each index , Unified use of new indexes .

Merge multiple worksheets

Multiple EXCECL Merge into one sheet ,Python To help you achieve

# -*- coding:utf-8 -*-# @Address:https://beishan.blog.csdn.net/# @Author: Beishan import pandas as pdimport ospath = r"E:\Python\00 Data analysis \RichardFu123\ Five provinces PM2.5\archive"dfs,index = [],0for i in os.listdir(path): dfs.append(pd.read_csv(os.path.join(path,i))) print(f" Merging in progress {index+1} Worksheet ") index += 1df = pd.concat(dfs)df.to_csv(" Data summary .csv",index=False)
 Merging in progress 1 Worksheets are merging 2 Worksheets are merging 3 Worksheets are merging 4 Worksheets are merging 5 Worksheets are merging 6 Worksheets are merging 7 Worksheet 

write in Excel file

Can be DataFrame Data is written to a new Excel In file , for example , We can combine the two above Excel Data sheet data , Write to the new Excel In file :

df = pd.DataFrame(st)df.to_excel(" Consolidated salary statement .xlsx")

Here we use DataFrame Upper to_excel() Method to write data to Excel In file . Its prototype is :to_excel(self, excel_writer, sheet_name=‘Sheet1’, na_rep=’’, float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep=‘inf’, verbose=True, freeze_panes=None), Description of commonly used parameters :
  • excel_writer: You need to specify a file to write to , It can be a string or ExcelWriter object
  • sheet_name: The name of the worksheet to write , Is a string , The default is ’Sheet1’
  • na_rep: When there is no data , The default value that should be filled in , The default is an empty string
  • float_format: Floating point format , The default is None. May, in accordance with the float_format="%.2f" In this way
  • columns: Specifies the order of column names to write , It's a list .
  • header: Whether there is a meter , The default is True, It can be a boolean type or a list of strings .
  • index: Whether to add uplink index , The default is True.
  • index_label: Index labels , It can be a string or a list , The default is None.
  • startrow: Insert the starting line of the data , The default is 0.
  • startcol: Insert the actual column of data , Default 0
  • engine: Using the write file engine , for example :‘openpyxl’ 、 ‘xlsxwriter’
  • Of course , We can also go beyond just one Excel The data in the table is written to another table Excel file , The data we run in the program ourselves , It can also be organized into DataFrame after , Write to Excel In file .
import pandas as pddf = pd.DataFrame({' full name ':[' Li lei ', ' Han Meimei ', ' Xiao Ming ', ' Zhang San ', ' Li Si ', ' Wang Wu '], ' Age ':[31, 22, 30, 49, 38, 33]})df.to_excel(" The employee table .xlsx", sheet_name="202002 induction ")

See if it's written to a file :

f = pd.read_excel(" The employee table .xlsx")f

pandas How to operate Excel? Not yet , This is enough

You can see , It's written in .
If you want to write multiple data to one Excel Multiple tables of files (sheet) in , How to deal with it ? In this case, you can use the following method .

df1 = pd.DataFrame({' full name ':[' Li lei ', ' Han Meimei ', ' Xiao Ming ', ' Zhang San ', ' Li Si ', ' Wang Wu '], ' Age ':[31, 22, 30, 49, 38, 33]})df2 = pd.DataFrame({'Names': ['Andrew', 'Tomas', 'Larry', 'Sophie', 'Sally', 'Simone'], 'Age':[42, 37, 39, 35, 29, 27]})dfs = {' Domestic employees ':df1, ' Foreign employees ':df2}writer = pd.ExcelWriter('Employees.xlsx', engine='xlsxwriter')for sheet_name in dfs.keys(): dfs[sheet_name].to_excel(writer, sheet_name=sheet_name, index=False) writer.save()

See if it's written to the file :
sheet = pd.read_excel(io="Employees.xlsx", sheet_name=None)sheet
{' Domestic employees ': full name Age 0 Li lei 31 1 Han Meimei 22 2 Xiao Ming 30 3 Zhang San 49 4 Li Si 38 5 Wang Wu 33, ' Foreign employees ': Names Age 0 Andrew 42 1 Tomas 37 2 Larry 39 3 Sophie 35 4 Sally 29 5 Simone 27}

But if you look closely , You'll find the data sheet of foreign employees above , Field Names and Age Contrary , This is because DataFrame Automatically sorted us in alphabetical order . To avoid this situation , Need to be in to_excel() Medium plus columns To specify the order of the header fields :

df1 = pd.DataFrame({' full name ':[' Li lei ', ' Han Meimei ', ' Xiao Ming ', ' Zhang San ', ' Li Si ', ' Wang Wu '], ' Age ':[31, 22, 30, 49, 38, 33]})df2 = pd.DataFrame({'Names': ['Andrew', 'Tomas', 'Larry', 'Sophie', 'Sally', 'Simone'], 'Age':[42, 37, 39, 35, 29, 27]})dfs = {' Domestic employees ':df1, ' Foreign employees ':df2}cols = {" Domestic employees ":[' full name ', ' Age ']," Foreign employees ":['Names','Age']} # Specify column name order writer = pd.ExcelWriter('Employees.xlsx', engine='xlsxwriter')for sheet_name in dfs.keys(): dfs[sheet_name].to_excel(writer, sheet_name=sheet_name, index=False, columns = cols[sheet_name]) writer.save()

Let's see if it's right now :
sheet = pd.read_excel(io="Employees.xlsx", sheet_name=None)sheet
{' Domestic employees ': full name Age 0 Li lei 31 1 Han Meimei 22 2 Xiao Ming 30 3 Zhang San 49 4 Li Si 38 5 Wang Wu 33, ' Foreign employees ': Names Age 0 Andrew 42 1 Tomas 37 2 Larry 39 3 Sophie 35 4 Sally 29 5 Simone 27}

Now there's no problem .
You can also use the previous time when reading and writing files with … This way, .
The above way , It will cover the original contents of the file . If you want to be in the original Excel Add a new data table to the table (sheet), You can do this in the following ways :

from openpyxl import load_workbookbook = load_workbook("Employees.xlsx") # Load the original data to Workbookdf3 = pd.DataFrame({'Names': ['Judy'], 'Age':[27]})with pd.ExcelWriter('Employees.xlsx', engine='openpyxl') as writer: writer.book = book # Give Way writer Add the original two workbook df3.to_excel(writer, sheet_name=' Alternate staff ', index=False, columns=['Names', 'Age']) writer.save()
import pandas as pdsheet = pd.read_excel(io="Employees.xlsx", sheet_name=None)sheet
{' Domestic employees ': full name Age 0 Li lei 31 1 Han Meimei 22 2 Xiao Ming 30 3 Zhang San 49 4 Li Si 38 5 Wang Wu 33, ' Foreign employees ': Names Age 0 Andrew 42 1 Tomas 37 2 Larry 39 3 Sophie 35 4 Sally 29 5 Simone 27, ' Alternate staff ': Names Age 0 Judy 27}

You can see , In the original Excel In file , Has joined “ Alternate staff ” This data sheet . Joining requires adding data to a data table (append), You can use the following :

from openpyxl import load_workbookbook = load_workbook("Employees.xlsx") # Load the original data to Workbookdf4 = pd.DataFrame({'Names': ['Moore'], 'Age':[38]})with pd.ExcelWriter('Employees.xlsx', engine='openpyxl') as writer: writer.book = book # Give Way writer Add the original 3 individual workbook writer.sheets = {ws.title: ws for ws in book.worksheets} start_row = writer.sheets[' Alternate staff '].max_row df4.to_excel(writer, sheet_name=' Alternate staff ', index=False, columns=['Names', 'Age'], startrow=start_row,header=False) writer.save()

The point here is : Use startrow Specify the text to insert data , Note that we are inserting data into an existing data table , So specify the correct sheet_name, There is also to avoid duplicate header , take header Set to False.

import pandas as pdsheet = pd.read_excel(io="Employees.xlsx", sheet_name=None)sheet
{' Domestic employees ': full name Age 0 Li lei 31 1 Han Meimei 22 2 Xiao Ming 30 3 Zhang San 49 4 Li Si 38 5 Wang Wu 33, ' Foreign employees ': Names Age 0 Andrew 42 1 Tomas 37 2 Larry 39 3 Sophie 35 4 Sally 29 5 Simone 27, ' Alternate staff ': Names Age 0 Judy 27 1 Moore 38}

pandas How to operate Excel? Not yet , This is enough


author : Beishan

Link to the original text :https://beishan.blog.csdn.net/article/details/115290941
版权声明
本文为[What's the name]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/04/20210407202831281S.html

  1. Want to know what kids will look like in the future? Python face fusion tells you
  2. I made a big screen of global epidemic data with Python
  3. python你TM太皮了——区区30行代码就能记录键盘的一举一动
  4. Python you TM too skinny - just 30 lines of code can record every move of the keyboard
  5. python的装饰器概念学习基础基础版
  6. Python decorator concept learning basic edition
  7. SQL配合Python-Flask的中转注入
  8. python3使用kivy生成安卓程序
  9. 不到 150 行代码写一个 Python 版的贪吃蛇
  10. Transfer injection of SQL and python flash
  11. Using Kivy to generate Android program in Python 3
  12. Less than 150 lines of code to write a python version of the snake
  13. Python面向对象练习题
  14. Python数据分析入门(八):Pandas统计计算和描述
  15. Python面向对象练习题
  16. Python object oriented exercises
  17. Introduction to Python data analysis (8): Pandas statistical calculation and description
  18. Python object oriented exercises
  19. WEB4-通过python获得flag
  20. python-web5
  21. Pandas-二进制操作
  22. python入门教程14-01 (python语法入门之python内存泄露)
  23. Web4 - get flag through Python
  24. python-web5
  25. Pandas binary operation
  26. python入门教程13-06 (python语法入门之视图、触发器、事务、存储过程、函数)
  27. python入门教程13-07 (python语法入门之ORM框架SQLAlchemy)
  28. python入门教程13-08 (python语法入门之python索引原理与慢查询优化)
  29. 定投指数到底能不能赚钱?Python 来告诉你答案
  30. Python入门学习之:10分钟1500访问量
  31. Getting started with Python 14-01
  32. 用 Python 画哆啦 A 梦
  33. Python 图表利器 pyecharts
  34. 用 Python 抓取公号文章保存成 HTML
  35. Introduction to Python 13-06 (view, trigger, transaction, stored procedure, function of introduction to Python syntax)
  36. Getting started with Python 13-07 (ORM framework Sqlalchemy for getting started with Python syntax)
  37. Introduction to Python 13-08
  38. Can fixed investment index make money? Python will tell you the answer
  39. Introduction to Python: 1500 visits in 10 minutes
  40. 用 Python 获取股市交易数据
  41. Drawing Doraemon in Python
  42. Python charts
  43. 用 Python 来了解一下《安家》
  44. 用 Python 抓取公号文章保存成 PDF
  45. 用 Python 生成炫酷二维码及解析
  46. Using Python to grab articles with public number and save them as HTML
  47. Getting stock market trading data with Python
  48. Learn about settle down in Python
  49. Using Python to grab articles with public number and save them as PDF
  50. Using Python to generate cool two dimensional code and analysis
  51. 20210225-1 Python错误与异常
  52. 20210225-1 Python errors and exceptions
  53. 使用Python拆分、合并PDF
  54. Using Python to split and merge pdf
  55. 真工程师:20块钱做了张「名片」,可以跑Linux和Python
  56. Implementation of LSB steganography based on MATLAB and python
  57. Real Engineer: 20 yuan to make a "business card", can run Linux and python
  58. python修改微信和支付宝步数
  59. Python changes WeChat and Alipay steps
  60. Python空间分析| 01 利用Python计算全局莫兰指数(Global Moran's I)