Start pandas, take you to play with data (3) -- save pandas data into file

Look at the future 2021-04-07 14:23:44
start pandas play data pandas


 Insert picture description here

Connected to a

A little embarrassed , It should have been written together , But I didn't expect that one of them wrote so much , I'm tired of writing , You can't be tired , Or write separately , Slow down my fatigue , It also reduces your visual fatigue .

Get started Pandas, Play with data (2)-- Use pandas Reading data from multiple files


Convert external data into DataFrame

Whatever you want to write , Whatever your destination is , First, transform your data into DataFrame Format , There's no doubt about that .
I've talked about this before , But it doesn't hurt to say it again , Because I can't remember myself .

Create a... From the list DataFrame:

data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13

from ndarrays / Lists Of Dict Create a DataFrame:

data = {
'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky

Create a... From the list of dictionaries DataFrame:

data = [{
'a': 1, 'b': 2},{
'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
a b c
0 1 2 NaN
1 5 10 20.0
df = pd.DataFrame(data, index=['first', 'second'])
a b c
first 1 2 NaN
second 5 10 20.0

If you don't want to bring unnecessary trouble , It's suggested that you bring the list with you , Then select not to write index column when writing .


DataFrame write file

csv

to_csv() yes DataFrame Class method , The definition of common parameters is as follows :

path_or_buf: File save location
sep: Separator , If you don't write , The default is ‘,’
columns: Specify the columns to save
header: Do you want to save the column name
index: Save index column
index_label: Specify the index column name

Code example :

import pandas as pd
df = pd.DataFrame([[' Wuhan Qingchuan University ', ' Marketing Management ', ' Physics or history ', ' Don't mention the requirement of re selecting subjects '],[' Qianjiang College of Hangzhou Normal University ', ' Marketing Management ', ' Physics or history ', ' Don't mention the requirement of re selecting subjects '],[' Business School of Yunnan Normal University ', ' International economy and Trade ', ' Physics or history ', ' Don't mention the requirement of re selecting subjects '],[' Nanchang Institute of Technology ', ' The product design ', ' Physics or history ', ' Don't mention the requirement of re selecting subjects '],[' Changchun University of Finance and Economics ', ' Finance ', ' Physics or history ', ' Don't mention the requirement of re selecting subjects '],[' Xingjian College of Arts and Sciences, Guangxi University ', ' Finance ', ' Physics or history ', ' Don't mention the requirement of re selecting subjects ']],
columns=[" School "," major "," Major requirements "," The minor requires "])
df.to_csv('test.csv',index=False)

If there are any other requirements , Please refer to the parameter list above ; If the above parameter list doesn't meet your needs , Please comment on , Then go to Baidu by yourself . After I see it, Baidu will make up for the lack in the first time , thank you 、


Append write

df.to_csv(file_name, mode='a')

Write... To the specified location

What's on your mind? ?? use Excel Go to


json

I think ah. , It's not likely to use this function in the future .
 Insert picture description here
 Insert picture description here

Is that so? , We don't have to remember everything , Aim for what you want .


Excel

Excel, I like to use . But today , I had a new thought : Saving data is not the ultimate goal , What we want to achieve is to choose the best storage method according to the actual scene .

And there are several ways I can think of right now :
1、 Not a lot of data , Pursue the speed of storage . After the event, the cached data can be transported in a wave .
2、 Large amount of data , It goes without saying , database .
3、 Data should be divided into tables . Actually, I'm a little puzzled ,csv It doesn't support sub table ? If I write more csv Well ? It seems that this is not good either , This kind of time is still Excel Come on .
After all Excel There are professional sub table tools , Just walk back and forth in memory .

Okay , Don't talk nonsense , Back to the subject :

def to_excel(excel_writer: Any,
sheet_name: str = "Sheet1",
columns: Any = None,
header: Any = True,
index: Any = True,
index_label: Any = None,
encoding: Any = None,
)

I don't need to explain these parameters any more , These two articles are all about these parameters .


Write without overlay

But what? , There is also a very serious problem , It's the problem of append writing , Is that so? , In the parameters above, we don't see it , Including the front csv, I didn't mention it .
I finished the test , Twice to the same Excel Page , It will cause coverage .

Here are two parameters :

startrow: Any = 0, # Which line should I start with 
startcol: Any = 0, # Which column to start with 

These two parameters are Excel Of .


Is this over ? If I give you one Excel, There is already content in it , But you don't know how much content there is , What to do with that ?

Append write

Me! , I've been looking up the information for a long time , Just find out the result :
1、 You read it first , And then if you want to add to the bottom , Even if it's down the line , To add to the right , Even if it's one of the following numbers . Then do it with the above parameters that do not override the write .
2、 Change the page , Why mix it with the original data before ?

For method one :
Be careful , Write without column number , Line number , namely header=False,index=False

import pandas as pd
from openpyxl import load_workbook
result2=[('a','2','ss'),('b','2','33'),('c','4','bbb')]# New data needs to be written 
df = pd.DataFrame(result2,columns=['xuhao','id','name'])# List data into data frame 
df1 = pd.DataFrame(pd.read_excel('123.xlsx',sheet_name='aa')) # Read the original data file and table 
writer = pd.ExcelWriter('123.xlsx',engine='openpyxl')
book=load_workbook('123.xlsx')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df_rows = df1.shape[0] # Get the number of rows of the original data 
df.to_excel(writer, sheet_name='aa',startrow=df_rows+1, index=False, header=False)# Write data to excel Medium aa surface , Start with the first blank line 
writer.save()# preservation 
writer.close()

If you want to be in the same sheet Change columns and add data , Just change it 2 Just one parameter .

df_cols = df1.shape[1]
staion.to_excel(writer, sheet_name=sheet_name, startcol=df_cols, index=False, header=False)

Affirming : I personally don't like this method , But I don't mind if some friends like this method .


Yes pandas operation Excel and csv The feeling of

It took all night , I decided to , Professional work should be done by professional tools ,openpyxl Is it not fragrant ?
Not really ? I have a tutorial !!!

Python And Excel Elegant manipulation selected


Write data to multiple tables

You need to use ExcelWriter

df = pd.read_csv(csv_file,parse_dates=['data'])
with pd.ExcelWriter('test4.xlsx') as f:
df.to_excel(f,sheet_name='a',columns=['a'],index=True,index_label='ID')
df.to_excel(f, sheet_name='b', columns=['b','data'], index=True, index_label='ID')

MySQL

This one still uses pandas Well , Although there are professional tools to use , But I haven't written a tutorial yet .

Save data to MySQL

def to_sql(
self,
name,
con,
schema=None,
if_exists="fail",
index=True,
index_label=None,
chunksize=None,
dtype=None,
method=None,
)

Parameter interpretation :( The parameters defined will not be repeated )

schema: Used to create database objects , Basically, the default values are used .

if_exists: What if the table exists ?

fail: Throw out ValueError abnormal
replace: Delete table before inserting data . Note that it's not just deleting data , Is to delete the original table , Rebuild the table .
append: Insert new data . If there is a primary key , To avoid primary key conflicts ; See the format of the table ,DataFrame Of columns With the watch columns Is the corresponding ;DF Of index The default is as a column of data , That is to say, it will be written to the database by default

index: Write the index to the database as a column , The default is True, That is to say, by default DF The index of is to write to the database ,index_label Is the column name

index_label: The column name when the index is written to the database , The default is index; If DF It's a multi-level index , be index_label Should be a sequence

chunksize: The batch , How many pieces of data are processed at a time . Default all , It's generally useless , Unless there's too much data , Obviously, it can be processed in batches when I feel stuck .

dtype: A dictionary , Specify the data type of the column . The key is the name of the column , The value is sqlalchemy types perhaps sqlite3 String form of . If it's a new table , You need to specify the type , Otherwise, the maximum storage type will be used as the default type . such as varchar The type will become text type , A lot of space resources are wasted . If it's adding data , Generally, it is not necessary to specify this parameter .

method: Which type of statement insert ?

None: Default single line insert
‘multi’: Multi line insertion
callable: Insert... As a callback function , Write the name of the function , Never used .

The final summary

The more I write, the more wrong I feel , Is putting the cart before the horse .
pandas It's used to store data ???

I always think it's strange , It's definitely putting the cart before the horse !!!
The next one is back on track .

 Insert picture description here

版权声明
本文为[Look at the future]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/04/20210407142249578d.html

  1. Python brush questions - letter graphics
  2. Python数据分析入门(七):Pandas层级索引
  3. Introduction to Python data analysis (7): Pandas hierarchical index
  4. Python 操作腾讯云短信(sms)详细教程
  5. Python operation Tencent cloud SMS (SMS) detailed tutorial
  6. Python数据可视化,完整版实操指南 !
  7. Python data visualization, full version of the practical guide!
  8. 上手Pandas,带你玩转数据(2)-- 使用pandas从多种文件中读取数据
  9. 上手Pandas,带你玩转数据(1)-- 实例详解pandas数据结构
  10. Using pandas to read data from various files
  11. Hands on pandas, take you to play with data (1) -- detailed explanation of pandas data structure with examples
  12. Pandas数据结构基础用法
  13. Basic usage of pandas data structure
  14. Python读取ini配置文件,保存到对象属性
  15. Python reads the INI configuration file and saves it to the object properties
  16. Foundation of Python: classes in Python
  17. python刷题-闰年判断
  18. python刷题-01字串
  19. How to judge leap year
  20. Python brush title-01 string
  21. 安装python
  22. 按尺寸切片pandas数据集DataFrame到多个文件
  23. Install Python
  24. Slice the pandas dataset dataframe to multiple files by size
  25. python 求最大值、最小值、平均值
  26. Finding maximum, minimum and average in Python
  27. 认识Python解释器和PyCharm编辑器
  28. Know Python interpreter and pycharm editor
  29. Python 小数据池和代码块缓存机制
  30. Python small data pool and code block caching mechanism
  31. python刷题-序列求和
  32. python刷题-圆的面积
  33. Sequence summation in Python
  34. The area of a circle
  35. Python functions, advanced syntax and usage
  36. Teach you to crawl novels in Python! Who can pay for novels these days!
  37. Python入门学习之:10分钟1500访问量
  38. Introduction to Python: 1500 visits in 10 minutes
  39. 数据分析之Pandas合并操作总结
  40. OpenCV-Python 雪花飘落特效
  41. Pandas merge operation summary of data analysis
  42. Opencv Python snowflake falling effect
  43. python logging模块“另一个程序正在使用此文件,进程无法访问。”问题解决办法
  44. Python logging module "this file is being used by another program and cannot be accessed by the process." Problem solving
  45. Mac 下python3 [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed 解决方法
  46. Python 3 [SSL: Certificate] on MAC_ VERIFY_ Failed] certificate verify failed solution
  47. Python学习之解决python下载第三方依赖速度慢的问题
  48. Python learning to solve the problem of slow download speed of third party dependence on Python
  49. python操作Excel文件报lrd.biffh.XLRDError
  50. How to operate excel file with Python lrd.biffh.XLRDError
  51. 2021的挑战与机遇,今年Python数据分析岗位会很香!
  52. The challenge and opportunity of 2021, python data analysis post will be very popular this year!
  53. 【C++简明教程】Python和C++指定元素排序比较
  54. Comparison of Python and C + + specified element sorting
  55. Python Flask使用Nginx做代理时如何获取真实IP
  56. How to get real IP address when Python flash uses nginx as proxy
  57. Python培训出来好找工作吗?好找工作的关键是什么?
  58. Is Python training easy to find a job? What is the key to finding a good job?
  59. 从零开始学python | 什么是Python JSON?
  60. Learn Python from scratch | what is Python JSON?