Using pandas to read data from various files

Look at the future 2021-04-06 17:04:48
using pandas read data various


 Insert picture description here


pandas IO summary

About pandas Of IO, It's not just what I wrote before , But I haven't seen any other formats , So don't write anything else .

In general, the following formats are supported :
 Insert picture description here


Read the file

Read csv file

Here's the talk read_csv function , But it's not just about read_csv function .

Let's take a look at the function prototype first :

def read_csv(filepath_or_buffer: PathLike[str], # File name 
sep: Any = lib.no_default, # From defining the separator between fields 
header: str = "infer",
# When selecting the default value or header=0 when , Set the first row as the column name . If the column name is passed in an explicit value, the header=None. Be careful , When header=0 when , Even if the column name is passed, the parameter will be covered .
names: Any = None, # Use of column name list . If the file does not contain column names , Then you should set header=None. Duplicate values are not allowed in the column name list .
index_col: Any = None,
# DataFrame List of row indexes for , It can be either a string name or a column index . If you pass in a sequence of strings or integers , So be sure to use a multi-level index (MultiIndex).
# Be careful : When index_col=False ,pandas No longer use the first column as an index .
usecols: Any = None, # Returns a subset of the list of column names .
dtype: Any = None, # Specifies the data type of a column or entire data . E.g. {'a': np.float64, 'b': np.int32} ( I won't support it engine='python').
)
# Here are the common parameters , It doesn't mean there are only these parameters 

Here are a few examples :

 Insert picture description here

import pandas as pd
df = pd.read_csv('test.csv')
print(df.head(7)) # The default is 5 That's ok , You can designate 

result :

 a b
0 1.0 2
1 3.0 4
2 NaN 5

import pandas as pd
df = pd.read_csv('test.csv',header=1) # Appoint csv The first line of the file is the column name 
print(df.head(7)) # The default is 5 That's ok , It can be considered that 

result :

 1 2
0 3.0 4
1 NaN 5

import pandas as pd
df = pd.read_csv('test.csv',names = [1,2,3,4,5]) # Specify the column name 
print(df.head(7))
 1 2 3 4 5
0 a b NaN NaN NaN
1 1 2 NaN NaN NaN
2 3 4 NaN NaN NaN
3 NaN 5 NaN NaN NaN

The specified column names are arranged from right to left , The list must be complete , Otherwise, there will be several pages that cannot be called .


import pandas as pd
df = pd.read_csv('test.csv',header = 0,names = ['a','b'],usecols=['a']) # Returns the specified column 
print(df.head())
 a
0 1.0
1 3.0
2 NaN

import pandas as pd
df = pd.read_csv('test.csv')
print(df.dtypes) # Look at the column type 
a float64
b int64
dtype: object

Modify column type :

import pandas as pd
df = pd.read_csv('test.csv',dtype={
'b':object}) # Modify column type 
print(df.dtypes) # Look at the column type 
a float64
b object
dtype: object

from StringIO Object

from io import StringIO
data = "name|age|birth|sex~Tom|18.0|2000-02-10|~Bob|30.0|1988-10-17|male"
df = pd.read_csv(StringIO(data), sep="|", lineterminator="~")
print(df.head())
 name age birth sex
0 Tom 18.0 2000-02-10 NaN
1 Bob 30.0 1988-10-17 male

episode : Wrong report in Chinese SyntaxError: Non-UTF-8 code starting with ‘\xe6’ in file

Code top with encoding format :

#coding:utf-8

Read json file

def read_json(path_or_buf: Any = None, # File path , Or website 
orient: str = None, # The expected json Format 
lines: bool = False, # By line json file 
)

About json, I don't want to talk too much , I wish I could read and write , I don't think I can use it .

With records Format read JSON file :
 Insert picture description here

 Insert picture description here


Read Excel file

read_excel(io,sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None,usecols=None)

Parameter interpretation :

io:Excel route
sheetname: Specify the sheet name or number , Number from 0 Start
skiprows : Omit data for the specified number of rows
skip_footer : Omit the number from the tail int Row data
index_col : Specifies the column to be listed as the index column

I don't say much nonsense , Direct demonstration

 Insert picture description here

Practice

import pandas as pd
xls_file = ' The box office of different kinds of films in previous years .xlsx'
df = pd.read_excel(xls_file).head(5)
print(df)

Be careful : if xlrd Version too high , Please turn it down .

Then run it out :

 Release date title Total box office
0 2019 Cambodian love 9.63W
1 2019 That bridge 6.05W
2 2019 Midway Island in the final battle 2.92Y
3 2019 Lane Manager 2433.22W
4 2019 Kung Fu town 16.97W

Read the specified worksheet

import pandas as pd
xls_file = ' The box office of different kinds of films in previous years .xlsx'
df = pd.read_excel(xls_file,sheet_name=' action ').head(5) # Specifies to read the worksheet 
print(df)
 Release date title Total box office
0 2019 Kung Fu town 16.97W
1 2019 Rebirth beyond the realm 6.55W
2 2019 The terminator : Dark destiny 3.51Y
3 2019 Yang Jingyu 127.81W
4 2019 Er Feng 61.22W

Specify the column name

import pandas as pd
xls_file = ' The box office of different kinds of films in previous years .xlsx'
df = pd.read_excel(xls_file,sheet_name=' action ',header=1).head(5) # Specify the column name 
print(df)
 2019 Kung Fu town 16.97W
0 2019 Rebirth beyond the realm 6.55W
1 2019 The terminator : Dark destiny 3.51Y
2 2019 Yang Jingyu 127.81W
3 2019 Er Feng 61.22W
4 2019 Gemini killer 2.32Y

Read the specified column

import pandas as pd
xls_file = ' The box office of different kinds of films in previous years .xlsx'
df = pd.read_excel(xls_file,sheet_name=' action ',usecols=[' title ',' Total box office ']).head(5) # Specify the column name 
print(df)

You want that column not to exist , It's a mistake

 title Total box office
0 Kung Fu town 16.97W
1 Rebirth beyond the realm 6.55W
2 The terminator : Dark destiny 3.51Y
3 Yang Jingyu 127.81W
4 Er Feng 61.22W

ExcelFile class

In order to read multiple tables of the same file more conveniently ,ExcelFile Class can be used to package files and pass them to read_excel. Because you only need to read memory once , So this way to read multiple tables of a file will have a performance advantage .

xlsx = pd.ExcelFile('path_to_file.xls')
df = pd.read_excel(xlsx, 'Sheet1')
with pd.ExcelFile('path_to_file.xls') as xls:
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

Parsing the date

When reading excel When you file , Values like date time are usually automatically converted to the appropriate dtype( data type ). But if you have a list of strings that look like dates ( Not really excel The date format inside ), Then you can use parse_dates Method to parse these strings as dates :

pd.read_excel('path_to_file.xls', 'Sheet1', parse_dates=['date_strings'])

Cell conversion

Excel The contents of the cells can be accessed through converters Method to convert . for example , Convert a column to a Boolean value :

pd.read_excel('path_to_file.xls', 'Sheet1', converters={
'MyBools': bool})

A column of integers with missing values cannot be converted to integers dtype Array of , because NaN Strictly known as floating-point numbers . You can manually mark missing data as recovery integers dtype:

def cfun(x):
return int(x) if x else -1
pd.read_excel('path_to_file.xls', 'Sheet1', converters={
'MyInts': cfun})

Read MySQL Database files

There needs to be a certain amount of MySQL Skill level , If not , It's suggested to move first :MySQL What I saw - An introductory tour

《 Explain profound theories in simple language SQL》 Q & A

read It's actually the shell of two functions :

read_sql_query: Through one SQL Statement read data
read_sql_table: Read a table in the database (table)
pandas.read_sql(sql, con, index_col=None,columns=None)

Parameter interpretation :

sql: Table name 、SQL sentence
con: Establishing a connection
index_col: Index columns

test

Environment configuration :
First you need to have an installation MySQL Environmental Science , It's not much here , The one above MySQL The introductory tour is very detailed .

After finishing , Open a house :
 Insert picture description here

from sqlalchemy import create_engine
import pandas as pd
# mysql+pymysql:// user name : password @ The server IP/ Database name 
engine = create_engine('mysql+pymysql://pandas:pandas@localhost/pandas')
# I have a new person here 
print(engine.execute('show tables').fetchall())
# Configure the engine , And make sure the engine is available 

Except for the package in the code , There are also two packages to install .
pymysql and cryctography、

[('presidents',)]

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('mysql+pymysql://pandas:pandas@localhost/pandas')
print(pd.read_sql('presidents',con=engine))

There's nothing in the watch , Of course, I can't read anything .

Empty DataFrame
Columns: [last_name, first_name]
Index: []

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('mysql+pymysql://pandas:pandas@localhost/pandas')
#print(pd.read_sql('presidents',con=engine,index_col='first_name')) # Use index columns 
pd.read_sql('select * from presidents',con=engine) # Use it directly SQL sentence 

Since there is no data in the table , So I can't read anything .


版权声明
本文为[Look at the future]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/04/20210406170401235D.html

  1. 商业数据分析从入门到入职(7)Python基础数据结构及其操作
  2. 商业数据分析从入门到入职(6)Python程序结构和函数
  3. Business data analysis from entry to entry (9) Python Network Data Acquisition
  4. Business data analysis from entry to entry (8) Python module, file IO and object oriented
  5. Business data analysis from entry to entry (7) Python basic data structure and its operation
  6. Business data analysis from entry to entry (6) Python program structure and function
  7. 简简单单实现 Python Web 的登录注册页面,还包含一半逻辑。
  8. Simple implementation of Python web login registration page, but also contains half of the logic.
  9. 什么是pip?Python新手入门指南
  10. What is PIP? Getting started with Python
  11. Python uses for... Else to jump out of double nested loop
  12. Python基础之:Python中的内部对象
  13. 人工智能入门:Python实现机器学习
  14. The foundation of Python: inner objects in Python
  15. Introduction to artificial intelligence: machine learning in Python
  16. Python基础之:Python中的内部对象
  17. The foundation of Python: inner objects in Python
  18. Python 小技之 Office 文件转 PDF
  19. 还在为多张Excel汇总统计发愁?Python 秒处理真香!
  20. 用 Python 制作音乐聚合下载器
  21. Spark Delta Lake 0.4.0 发布,支持 Python API 和部分 SQL
  22. How to transfer office files to PDF
  23. Are you still worried about multiple excel summary statistics? Python second processing really fragrant!
  24. Making music aggregate downloader with Python
  25. Spark delta Lake 0.4.0 is released, supporting Python API and part of SQL
  26. Python信息搜集
  27. Python information gathering
  28. Python - 关于类(self/cls) 以及 多进程通讯的思考
  29. Python - thinking about class (self / CLS) and multi process communication
  30. Python - 关于类(self/cls) 以及 多进程通讯的思考
  31. Python - thinking about class (self / CLS) and multi process communication
  32. Python信用评分卡建模(附代码)
  33. Python credit score card modeling (with code)
  34. 学Python需要学数据库吗?Python学习教程!
  35. Do you need to learn database to learn Python!
  36. Python私有变量如何定义?Python学习教程!
  37. How to define Python private variables? Python tutorial!
  38. Python数据分析入门(六):Pandas的函数应用
  39. Introduction to Python data analysis (6): function application of pandas
  40. 学Python需要学数据库吗?Python学习教程!
  41. Do you need to learn database to learn Python!
  42. Python描述 LeetCode 80. 删除有序数组中的重复项 II
  43. C++/python描述 AcWing 94. 递归实现排列型枚举
  44. C++/python描述 AcWing 92. 递归实现指数型枚举
  45. Python描述 LeetCode 88. 合并两个有序数组
  46. 苏州大学计算机考研 复试机试真题2013-2021真题及Python题解
  47. Python描述 LeetCode 781. 森林中的兔子
  48. 字典和json的区别是什么?Python学习
  49. Python describes leetcode 80. Removing duplicate items from ordered arrays II
  50. C + + / Python description acwing 94. Recursive implementation of permutation enumeration
  51. C + + / Python description acwing 92. Recursive implementation of exponential enumeration
  52. Python describes leetcode 88. Merging two ordered arrays
  53. Real computer test questions 2013-2021 of computer postgraduate entrance examination of Soochow University and python solutions
  54. The rabbit in the forest
  55. Python中的魔法属性
  56. What's the difference between dictionary and JSON? Python learning
  57. Magic properties in Python
  58. 字典和json的区别是什么?Python学习
  59. What's the difference between dictionary and JSON? Python learning
  60. python刷题-字母图形