Basic usage of pandas data structure

Fingertip programming 2021-04-06 17:27:08
basic usage pandas data structure


Head And Tail

head() And tail() For quick preview Series And DataFrame, Default display 5 Data , You can also specify the amount of data to display .

Attributes and underlying data

Pandas Metadata can be accessed through multiple attributes :

shape: The axis dimension of the output object , And ndarray Agreement

Axis labels :

Series:Index( Only this axis )

DataFrame:Index( That's ok ) And column

Pandas object (Index、Series、DataFrame) The container equivalent to an array , For storing data 、 Perform calculations . The underlying arrays of most types are numpy.ndarray. however ,Pandas And third-party support libraries are generally expanded NumPy Type system , Add a custom array .

.array Property is used to extract Index or Series The data in .

array Generally refer to ExtensionArray.

extract NumPy Array , use to_numpy() or numpy.asarray().

Series And Index The type is ExtensionArray when ,to_numpy() Will copy the data , And cast the value .

to_numpy() Can be controlled numpy.ndarray The type of data generated . With time zone datetme For example ,NumPy Not providing time zone information datetime data type ,Pandas It provides two forms of expression :

1. One is to take Timestamp Of numpy.ndarray, Provides the right tz Information .

2. The other is datetime64[ns], It's also a numpy.ndarray, The value is converted to UTC, But the time zone information is removed .

Time zone information can be used dtype=object preservation

Or use dtype=’datetime64[ns]’ Remove .

extract DataFrame The original data in is a little bit complicated .DataFrame When the data types of all the columns in are the same ,DataFrame.to_numpy() Return the underlying data :

DataFrame For isomorphic data ,Pandas Directly modify the original ndarray, So the modification will be directly reflected in the data structure . For heterogeneous data , namely DataFrame When the data types of columns are different , It's not this mode of operation , Unlike shaft labels , Cannot assign value to property of value .

Here we need to pay attention to when dealing with heterogeneous data , Output results ndarray The data type of is applicable to all kinds of data involved . if DataFrame It contains strings , The data type of the output structure is object. If it's only floating point numbers or integers , The data type of the output result is floating point number .

before ,Pandas Recommend to use Series.values or DataFrame.values from Series or DataFrame Extract data from the database .

but Pandas Improved this function , Now? , Recommend to use .array or to_numpy Extract the data , Don't use .values 了 .

.values There are the following 2 Disadvantages :

1.Series With extension type ,Series.values It's impossible to judge whether to return NumPy array, Or return Extension array. and Series.array Only return to ExtensionArray, And it doesn't copy data .Series.to_numpy Then return to NumPy Array , The price is the need to replicate 、 And force the value of the data .

2.DataFrame With multiple data types ,DataFrame.values Will copy the data , And cast the value of the data to the same data type , It's a costly operation .DataFrame.to_numpy() Then return to NumPy Array , It's clearer in this way , And I won't DataFrame The data in the database is treated as a type .

Speed up the operation

With the help of numexpr And bottleneck support library ,Pandas Can speed up specific types of binary values and Boolean operations .

When dealing with large data sets , These two support libraries are particularly useful , The acceleration effect is also very obvious .numexpr Using intelligent blocking 、 Cache and multi core technology .bottleneck It's a set of exclusive cython routine , Treatment with nans Value array , Very fast .

Please see the following example (DataFrame contain 100 Column ×10 Ten thousand rows of data ):

Both support Kummer's view of enabled state , You can use the following options to set :

If you want to learn Python, But we can't find the learning path and resources , Welcome Fingertip programming .

Online interactive learning , learn python Faster and better !

版权声明
本文为[Fingertip programming]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/04/20210406172526968M.html

  1. 商业数据分析从入门到入职(7)Python基础数据结构及其操作
  2. 商业数据分析从入门到入职(6)Python程序结构和函数
  3. Business data analysis from entry to entry (9) Python Network Data Acquisition
  4. Business data analysis from entry to entry (8) Python module, file IO and object oriented
  5. Business data analysis from entry to entry (7) Python basic data structure and its operation
  6. Business data analysis from entry to entry (6) Python program structure and function
  7. 简简单单实现 Python Web 的登录注册页面,还包含一半逻辑。
  8. Simple implementation of Python web login registration page, but also contains half of the logic.
  9. 什么是pip?Python新手入门指南
  10. What is PIP? Getting started with Python
  11. Python uses for... Else to jump out of double nested loop
  12. Python基础之:Python中的内部对象
  13. 人工智能入门:Python实现机器学习
  14. The foundation of Python: inner objects in Python
  15. Introduction to artificial intelligence: machine learning in Python
  16. Python基础之:Python中的内部对象
  17. The foundation of Python: inner objects in Python
  18. Python 小技之 Office 文件转 PDF
  19. 还在为多张Excel汇总统计发愁?Python 秒处理真香!
  20. 用 Python 制作音乐聚合下载器
  21. Spark Delta Lake 0.4.0 发布,支持 Python API 和部分 SQL
  22. How to transfer office files to PDF
  23. Are you still worried about multiple excel summary statistics? Python second processing really fragrant!
  24. Making music aggregate downloader with Python
  25. Spark delta Lake 0.4.0 is released, supporting Python API and part of SQL
  26. Python信息搜集
  27. Python information gathering
  28. Python - 关于类(self/cls) 以及 多进程通讯的思考
  29. Python - thinking about class (self / CLS) and multi process communication
  30. Python - 关于类(self/cls) 以及 多进程通讯的思考
  31. Python - thinking about class (self / CLS) and multi process communication
  32. Python信用评分卡建模(附代码)
  33. Python credit score card modeling (with code)
  34. 学Python需要学数据库吗?Python学习教程!
  35. Do you need to learn database to learn Python!
  36. Python私有变量如何定义?Python学习教程!
  37. How to define Python private variables? Python tutorial!
  38. Python数据分析入门(六):Pandas的函数应用
  39. Introduction to Python data analysis (6): function application of pandas
  40. 学Python需要学数据库吗?Python学习教程!
  41. Do you need to learn database to learn Python!
  42. Python描述 LeetCode 80. 删除有序数组中的重复项 II
  43. C++/python描述 AcWing 94. 递归实现排列型枚举
  44. C++/python描述 AcWing 92. 递归实现指数型枚举
  45. Python描述 LeetCode 88. 合并两个有序数组
  46. 苏州大学计算机考研 复试机试真题2013-2021真题及Python题解
  47. Python描述 LeetCode 781. 森林中的兔子
  48. 字典和json的区别是什么?Python学习
  49. Python describes leetcode 80. Removing duplicate items from ordered arrays II
  50. C + + / Python description acwing 94. Recursive implementation of permutation enumeration
  51. C + + / Python description acwing 92. Recursive implementation of exponential enumeration
  52. Python describes leetcode 88. Merging two ordered arrays
  53. Real computer test questions 2013-2021 of computer postgraduate entrance examination of Soochow University and python solutions
  54. The rabbit in the forest
  55. Python中的魔法属性
  56. What's the difference between dictionary and JSON? Python learning
  57. Magic properties in Python
  58. 字典和json的区别是什么?Python学习
  59. What's the difference between dictionary and JSON? Python learning
  60. python刷题-字母图形