Head And Tail
head() And tail() For quick preview Series And DataFrame, Default display 5 Data , You can also specify the amount of data to display .
Attributes and underlying data
Pandas Metadata can be accessed through multiple attributes ：
shape： The axis dimension of the output object , And ndarray Agreement
Axis labels ：
Series：Index（ Only this axis ）
DataFrame：Index（ That's ok ） And column
Pandas object （Index、Series、DataFrame） The container equivalent to an array , For storing data 、 Perform calculations . The underlying arrays of most types are numpy.ndarray. however ,Pandas And third-party support libraries are generally expanded NumPy Type system , Add a custom array .
.array Property is used to extract Index or Series The data in .
array Generally refer to ExtensionArray.
extract NumPy Array , use to_numpy() or numpy.asarray().
Series And Index The type is ExtensionArray when ,to_numpy() Will copy the data , And cast the value .
to_numpy() Can be controlled numpy.ndarray The type of data generated . With time zone datetme For example ,NumPy Not providing time zone information datetime data type ,Pandas It provides two forms of expression ：
1. One is to take Timestamp Of numpy.ndarray, Provides the right tz Information .
2. The other is datetime64[ns], It's also a numpy.ndarray, The value is converted to UTC, But the time zone information is removed .
Time zone information can be used dtype=object preservation
Or use dtype=’datetime64[ns]’ Remove .
extract DataFrame The original data in is a little bit complicated .DataFrame When the data types of all the columns in are the same ,DataFrame.to_numpy() Return the underlying data ：
DataFrame For isomorphic data ,Pandas Directly modify the original ndarray, So the modification will be directly reflected in the data structure . For heterogeneous data , namely DataFrame When the data types of columns are different , It's not this mode of operation , Unlike shaft labels , Cannot assign value to property of value .
Here we need to pay attention to when dealing with heterogeneous data , Output results ndarray The data type of is applicable to all kinds of data involved . if DataFrame It contains strings , The data type of the output structure is object. If it's only floating point numbers or integers , The data type of the output result is floating point number .
before ,Pandas Recommend to use Series.values or DataFrame.values from Series or DataFrame Extract data from the database .
but Pandas Improved this function , Now? , Recommend to use .array or to_numpy Extract the data , Don't use .values 了 .
.values There are the following 2 Disadvantages ：
1.Series With extension type ,Series.values It's impossible to judge whether to return NumPy array, Or return Extension array. and Series.array Only return to ExtensionArray, And it doesn't copy data .Series.to_numpy Then return to NumPy Array , The price is the need to replicate 、 And force the value of the data .
2.DataFrame With multiple data types ,DataFrame.values Will copy the data , And cast the value of the data to the same data type , It's a costly operation .DataFrame.to_numpy() Then return to NumPy Array , It's clearer in this way , And I won't DataFrame The data in the database is treated as a type .
Speed up the operation
With the help of numexpr And bottleneck support library ,Pandas Can speed up specific types of binary values and Boolean operations .
When dealing with large data sets , These two support libraries are particularly useful , The acceleration effect is also very obvious .numexpr Using intelligent blocking 、 Cache and multi core technology .bottleneck It's a set of exclusive cython routine , Treatment with nans Value array , Very fast .
Please see the following example （DataFrame contain 100 Column ×10 Ten thousand rows of data ）：
Both support Kummer's view of enabled state , You can use the following options to set ：
If you want to learn Python, But we can't find the learning path and resources , Welcome Fingertip programming .
Online interactive learning , learn python Faster and better ！