Using pandas to read data from various files

Look at the future 2021-04-06 17:04:48
using pandas read data various


 Insert picture description here


pandas IO summary

About pandas Of IO, It's not just what I wrote before , But I haven't seen any other formats , So don't write anything else .

In general, the following formats are supported :
 Insert picture description here


Read the file

Read csv file

Here's the talk read_csv function , But it's not just about read_csv function .

Let's take a look at the function prototype first :

def read_csv(filepath_or_buffer: PathLike[str], # File name 
sep: Any = lib.no_default, # From defining the separator between fields 
header: str = "infer",
# When selecting the default value or header=0 when , Set the first row as the column name . If the column name is passed in an explicit value, the header=None. Be careful , When header=0 when , Even if the column name is passed, the parameter will be covered .
names: Any = None, # Use of column name list . If the file does not contain column names , Then you should set header=None. Duplicate values are not allowed in the column name list .
index_col: Any = None,
# DataFrame List of row indexes for , It can be either a string name or a column index . If you pass in a sequence of strings or integers , So be sure to use a multi-level index (MultiIndex).
# Be careful : When index_col=False ,pandas No longer use the first column as an index .
usecols: Any = None, # Returns a subset of the list of column names .
dtype: Any = None, # Specifies the data type of a column or entire data . E.g. {'a': np.float64, 'b': np.int32} ( I won't support it engine='python').
)
# Here are the common parameters , It doesn't mean there are only these parameters 

Here are a few examples :

 Insert picture description here

import pandas as pd
df = pd.read_csv('test.csv')
print(df.head(7)) # The default is 5 That's ok , You can designate 

result :

 a b
0 1.0 2
1 3.0 4
2 NaN 5

import pandas as pd
df = pd.read_csv('test.csv',header=1) # Appoint csv The first line of the file is the column name 
print(df.head(7)) # The default is 5 That's ok , It can be considered that 

result :

 1 2
0 3.0 4
1 NaN 5

import pandas as pd
df = pd.read_csv('test.csv',names = [1,2,3,4,5]) # Specify the column name 
print(df.head(7))
 1 2 3 4 5
0 a b NaN NaN NaN
1 1 2 NaN NaN NaN
2 3 4 NaN NaN NaN
3 NaN 5 NaN NaN NaN

The specified column names are arranged from right to left , The list must be complete , Otherwise, there will be several pages that cannot be called .


import pandas as pd
df = pd.read_csv('test.csv',header = 0,names = ['a','b'],usecols=['a']) # Returns the specified column 
print(df.head())
 a
0 1.0
1 3.0
2 NaN

import pandas as pd
df = pd.read_csv('test.csv')
print(df.dtypes) # Look at the column type 
a float64
b int64
dtype: object

Modify column type :

import pandas as pd
df = pd.read_csv('test.csv',dtype={
'b':object}) # Modify column type 
print(df.dtypes) # Look at the column type 
a float64
b object
dtype: object

from StringIO Object

from io import StringIO
data = "name|age|birth|sex~Tom|18.0|2000-02-10|~Bob|30.0|1988-10-17|male"
df = pd.read_csv(StringIO(data), sep="|", lineterminator="~")
print(df.head())
 name age birth sex
0 Tom 18.0 2000-02-10 NaN
1 Bob 30.0 1988-10-17 male

episode : Wrong report in Chinese SyntaxError: Non-UTF-8 code starting with ‘\xe6’ in file

Code top with encoding format :

#coding:utf-8

Read json file

def read_json(path_or_buf: Any = None, # File path , Or website 
orient: str = None, # The expected json Format 
lines: bool = False, # By line json file 
)

About json, I don't want to talk too much , I wish I could read and write , I don't think I can use it .

With records Format read JSON file :
 Insert picture description here

 Insert picture description here


Read Excel file

read_excel(io,sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None,usecols=None)

Parameter interpretation :

io:Excel route
sheetname: Specify the sheet name or number , Number from 0 Start
skiprows : Omit data for the specified number of rows
skip_footer : Omit the number from the tail int Row data
index_col : Specifies the column to be listed as the index column

I don't say much nonsense , Direct demonstration

 Insert picture description here

Practice

import pandas as pd
xls_file = ' The box office of different kinds of films in previous years .xlsx'
df = pd.read_excel(xls_file).head(5)
print(df)

Be careful : if xlrd Version too high , Please turn it down .

Then run it out :

 Release date title Total box office
0 2019 Cambodian love 9.63W
1 2019 That bridge 6.05W
2 2019 Midway Island in the final battle 2.92Y
3 2019 Lane Manager 2433.22W
4 2019 Kung Fu town 16.97W

Read the specified worksheet

import pandas as pd
xls_file = ' The box office of different kinds of films in previous years .xlsx'
df = pd.read_excel(xls_file,sheet_name=' action ').head(5) # Specifies to read the worksheet 
print(df)
 Release date title Total box office
0 2019 Kung Fu town 16.97W
1 2019 Rebirth beyond the realm 6.55W
2 2019 The terminator : Dark destiny 3.51Y
3 2019 Yang Jingyu 127.81W
4 2019 Er Feng 61.22W

Specify the column name

import pandas as pd
xls_file = ' The box office of different kinds of films in previous years .xlsx'
df = pd.read_excel(xls_file,sheet_name=' action ',header=1).head(5) # Specify the column name 
print(df)
 2019 Kung Fu town 16.97W
0 2019 Rebirth beyond the realm 6.55W
1 2019 The terminator : Dark destiny 3.51Y
2 2019 Yang Jingyu 127.81W
3 2019 Er Feng 61.22W
4 2019 Gemini killer 2.32Y

Read the specified column

import pandas as pd
xls_file = ' The box office of different kinds of films in previous years .xlsx'
df = pd.read_excel(xls_file,sheet_name=' action ',usecols=[' title ',' Total box office ']).head(5) # Specify the column name 
print(df)

You want that column not to exist , It's a mistake

 title Total box office
0 Kung Fu town 16.97W
1 Rebirth beyond the realm 6.55W
2 The terminator : Dark destiny 3.51Y
3 Yang Jingyu 127.81W
4 Er Feng 61.22W

ExcelFile class

In order to read multiple tables of the same file more conveniently ,ExcelFile Class can be used to package files and pass them to read_excel. Because you only need to read memory once , So this way to read multiple tables of a file will have a performance advantage .

xlsx = pd.ExcelFile('path_to_file.xls')
df = pd.read_excel(xlsx, 'Sheet1')
with pd.ExcelFile('path_to_file.xls') as xls:
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

Parsing the date

When reading excel When you file , Values like date time are usually automatically converted to the appropriate dtype( data type ). But if you have a list of strings that look like dates ( Not really excel The date format inside ), Then you can use parse_dates Method to parse these strings as dates :

pd.read_excel('path_to_file.xls', 'Sheet1', parse_dates=['date_strings'])

Cell conversion

Excel The contents of the cells can be accessed through converters Method to convert . for example , Convert a column to a Boolean value :

pd.read_excel('path_to_file.xls', 'Sheet1', converters={
'MyBools': bool})

A column of integers with missing values cannot be converted to integers dtype Array of , because NaN Strictly known as floating-point numbers . You can manually mark missing data as recovery integers dtype:

def cfun(x):
return int(x) if x else -1
pd.read_excel('path_to_file.xls', 'Sheet1', converters={
'MyInts': cfun})

Read MySQL Database files

There needs to be a certain amount of MySQL Skill level , If not , It's suggested to move first :MySQL What I saw - An introductory tour

《 Explain profound theories in simple language SQL》 Q & A

read It's actually the shell of two functions :

read_sql_query: Through one SQL Statement read data
read_sql_table: Read a table in the database (table)
pandas.read_sql(sql, con, index_col=None,columns=None)

Parameter interpretation :

sql: Table name 、SQL sentence
con: Establishing a connection
index_col: Index columns

test

Environment configuration :
First you need to have an installation MySQL Environmental Science , It's not much here , The one above MySQL The introductory tour is very detailed .

After finishing , Open a house :
 Insert picture description here

from sqlalchemy import create_engine
import pandas as pd
# mysql+pymysql:// user name : password @ The server IP/ Database name 
engine = create_engine('mysql+pymysql://pandas:[email protected]/pandas')
# I have a new person here 
print(engine.execute('show tables').fetchall())
# Configure the engine , And make sure the engine is available 

Except for the package in the code , There are also two packages to install .
pymysql and cryctography、

[('presidents',)]

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('mysql+pymysql://pandas:[email protected]/pandas')
print(pd.read_sql('presidents',con=engine))

There's nothing in the watch , Of course, I can't read anything .

Empty DataFrame
Columns: [last_name, first_name]
Index: []

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('mysql+pymysql://pandas:[email protected]/pandas')
#print(pd.read_sql('presidents',con=engine,index_col='first_name')) # Use index columns 
pd.read_sql('select * from presidents',con=engine) # Use it directly SQL sentence 

Since there is no data in the table , So I can't read anything .


版权声明
本文为[Look at the future]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/04/20210406170401235D.html

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database