Python It is a common tool for data processing , Can handle orders of magnitude from K To what T Unequal data , It has high development efficiency and maintainability , It also has strong universality and cross platform .Python It can be used for data analysis , But it simply depends on Python There are some limitations in data analysis with its own library , Third party extension libraries need to be installed to enhance analysis and mining capabilities .
Python The third-party extension libraries that need to be installed for data analysis are :Numpy、Pandas、SciPy、Matplotlib、Scikit-Learn、Keras、Gensim、Scrapy etc. , The following is Qianfeng Wuhan Python Trainer's brief introduction to the third-party extension library :
1. Pandas
Pandas yes Python Powerful 、 Flexible data analysis and exploration tools , contain Series、DataFrame And other advanced data structures and tools , install Pandas Can make Python Data processing in is very fast and simple .
Pandas yes Python A data analysis package of ,Pandas Originally developed as a financial data analysis tool , therefore Pandas It provides a good support for time series analysis .
Pandas It was created to solve the data analysis task ,Pandas It includes a large number of databases and some standard data models , Provides the tools needed to operate large datasets efficiently .Pandas It provides a lot of functions and methods for us to process data quickly and conveniently .Pandas Contains advanced data structures , And make data analysis fast 、 Simple tools . It is based on Numpy above , bring Numpy Applications become simple .
Data structure with axis , Support automatic or explicit data alignment . This prevents data structures from being misaligned , And dealing with different sources 、 Common errors caused by data with different indexes .
Use Pandas Easier to deal with lost data .
Merge popular databases ( Such as : be based on SQL The database of )
Pandas It's about data clarity / The best tool to organize .
2. Numpy
Python Not provided