The second popular language: from introduction to mastery, python data science concise tutorial

Little ant 2021-01-22 12:30:41
second popular language introduction mastery

Python Is a general programming language , It has been widely used in the field of data science in the past decade . in fact ,Python In the field of data science, it's second only to R The second most popular programming language .

The main purpose of this article is to show you how to use Python How easy it is to learn data science . You may think you want to be a senior first Python The programmer , Then we can carry out the complex tasks usually related to data science , But that's not the case .Python Comes with a lot of useful tool Libraries , They can provide you with powerful support in the background . You don't even need to know what the program is running , You don't have to care about that . The only thing you really need to know is , You need to perform certain tasks , and Python Make these tasks fairly simple .

that , Let's start now .

Configure what data science needs Python Environmental Science

Whether the computer you use is Mac still Windows, I suggest you download a free one that allows you to easily access as many useful modules as possible Python Release version .

I tried some Python The distribution version of , ad locum , I recommend you to use Continuum Analytics Provided Anaconda. This Python The release contains 200 Multiple libraries . To understand Python Middle bag 、 The difference between modules and Libraries , Please refer to this article .

When you download Anaconda When , You need to choose to download Python 2 Version or Python 3 edition . I strongly recommend that you use Python 2.17.12 edition . By the end of 2016 end of the year , The vast majority of non computer science Python Users use this Python edition . It can do a good job in Data Science , Than Python 3 It's easier to learn , And like GitHub There are millions of such sites Python Scripts and code snippets , For your reference , Life will be easier .

Anaconda It also comes with Ipython Programming environment , We suggest you use . install Anaconda after , Just navigate to Jupyter Notebook and open the program , You can go to Web Open in the browser IPython.Jupyter The laptop program will start automatically Web Applications in browsers .

 The second hot language : From entry to mastery ,Python A concise course in Data Science

You can refer to this article to learn how to Ipython Change path in notebook .

Basic knowledge learning

Before you learn more about Python Before the database of Data Science , You need to learn some first Python Basic knowledge of .Python Is an object-oriented programming language . stay Python in , Object can be assigned to a variable , It can also be passed as a parameter to a function . Here are Python Objects in the : Numbers 、 character string 、 list 、 Tuples 、 aggregate 、 Dictionaries 、 Functions and classes .

Python The functions in are basically the same as those in general mathematics —— It receives input data , Process the data and output the results . The output depends entirely on how the function is designed . On the other hand ,Python Classes in are prototypes of objects designed to output other objects .

If your goal is to write fast 、 Reusable 、 Easy to modify Python Code , So you have to use functions and classes . Using functions and classes helps keep code efficient and clean .

Now? , Let's see Python What data science tool libraries are available in .

Scientific Computing :Numpy And Scipy

Numpy It's mainly used to deal with n Dimension array object Python tool kit , and Scipy It provides many mathematical algorithms and the realization of complex functions , Can be used to extend Numpy The function of the library .Scipy The library is Python Added some special scientific functions , In response to specific tasks in Data Science .

In order to be in Python Use in Numpy( Or anything else Python library ), You have to import the corresponding tool library first .

 The second hot language : From entry to mastery ,Python A concise course in Data Science

np.array(scores) Convert a list to an array .

When you use normal Python The program —— No external extensions are used ( For example, tool library ) Of Python Program —— You can only use one-dimensional lists to store data . however , If you use Numpy Library to expand Python, You can use it directly n Dimension group .( If you want to know ,n A dimension array is an array that contains one or more dimensions .)

Learn from the beginning Numpy, It's because you're using Python When doing scientific calculations Numpy essential . Yes Numpy In depth knowledge of will help you use... Efficiently Pandas and Scipy Such a tool library .

Data reprocessing :Pandas

Pandas It is the most widely used tool in data reprocessing . It includes advanced data structure and data operation tools designed to make data analysis faster and more convenient . For the use of R Language for statistical calculation , It must not be right DataFrame The variable name of .

Pandas yes Python One of the key factors to grow into a powerful and efficient data analysis platform .

Next , I'll show you how to use Pandas Working with a small data set .

 The second hot language : From entry to mastery ,Python A concise course in Data Science

DataFrame It's a spreadsheet structure , Contains an ordered set of columns . Each column can have a different variable type .DataFrame Include both row index , It also contains column indexes .

 The second hot language : From entry to mastery ,Python A concise course in Data Science

visualization :Matplotlib + Seaborn + Bokeh

Matlplotlib yes Python A module for data visualization in .Matplotlib It makes it easy for you to draw a line diagram 、 The pie chart 、 Histograms and other professional charts .

You can use Matplotlib Customize every detail in the chart . When you are in IPython Use in Matplotlib when ,Matplotlib With zoom 、 Translation and other interactive features .Matplotlib Support different... On all operating systems GUI Back end , meanwhile , It can also export charts to several common image formats , Such as PDF、SVG、JPG、PNG、BMP、GIF etc. .

 The second hot language : From entry to mastery ,Python A concise course in Data Science

Seaborn It's based on Matplotlib Data visualization tool library of , Used in Python Create attractive and informative statistical charts in .Seaborn The main feature of the game is , With relatively simple commands, it can be accessed from Pandas Creating complex chart types from data . I use Seaborn I drew the following picture :

 The second hot language : From entry to mastery ,Python A concise course in Data Science

machine learning : Scikit-learn

The goal of machine learning is to learn from machines ( Software ) Provide some examples ( How to perform a task or what cannot be performed ) To teach machines to perform tasks .

Python There are many tool libraries for machine learning in , However ,Scikit-learn It's one of the most popular .Scikit-learn Based on the Numpy、Scipy And Matplotlib Above Library . be based on Scikit-learn library , You can implement almost all machine learning algorithms , Like returning to 、 clustering 、 Classification and so on . therefore , If you plan to use Python Learn machine learning , So I suggest you learn from Scikit-learn Start .

K Nearest neighbor algorithm can be used for classification or regression . The following code shows how to use KNN The model predicts iris data set .

 The second hot language : From entry to mastery ,Python A concise course in Data Science

 The second hot language : From entry to mastery ,Python A concise course in Data Science

Other machine learning libraries also have :

  • Theano

  • Pylearn2

  • Pyevolve

  • Caffe

  • Tensorflow

statistical :Statsmodels And Scipy.stats

Statsmodels and Scipy.stats yes Python Two popular statistical learning modules in .Scipy.stats It is mainly used for the realization of probability distribution . On the other hand ,Statsmodels It provides a statistical model similar to R The formula framework of . Including descriptive statistics 、 Statistical tests 、 The extended functions, including plotting function and result statistics, are suitable for different types of data and each estimator .

The following code shows how to use Scipy.stats Module calls normal distribution .

 The second hot language : From entry to mastery ,Python A concise course in Data Science

 The second hot language : From entry to mastery ,Python A concise course in Data Science

A normal distribution is a continuous distribution or function whose input is any value on a real line . The normal distribution can be parameterized by two parameters : Mean of distribution μ And variance σ2.

Web Grab :Requests、Scrapy And BeautifulSoup

Web Crawling means getting unstructured data from the network ( Usually it is HTML Format ), And the process of transforming it into structured data format for analysis .

Popular for Web The tool libraries we grab are :

  • Scrapy

  • URl lib

  • Beautifulsoup

  • Requests

To crawl data from a website , You need to know something about HTML Basic knowledge of .

Here's a use BeautifulSoup Library for network crawling example :

import urllib2

import bs4

 The second hot language : From entry to mastery ,Python A concise course in Data Science

Code beautiful = urllib2.urlopen(url).read(); Go to And obtained the website corresponding entire HTML Text . And then , I store the text in variables beautiful in .

I use the urllib2 To get url by The website page of , You can also use Requests Do the same thing . Here is an article to help you understand urllib2 and Requests The difference between the two .

Scrapy And BeautifulSoup similar . Back-end engineer Prasanna Venkadesh stay Quora The difference between the two toolkits is explained in :

"Scrapy It's a Web Reptiles , Or say , It's a Web The crawler frame , You are Scrapy Provide a root to start the grab operation URL, Then you can specify some constraints , For example, how many URL wait , This is one for Web A complete frame for grabbing or crawling .

and BeautifulSoup Is a parsing library , It can also perform page crawling tasks excellently , And allows you to easily parse some of the content on the page . however ,BeautifulSoup I'll just grab what you offer URL The content of the page . It doesn't grab other pages , Unless you manually move the page in a certain way URL Add to the loop .

Simply speaking , You can use it. BeautifulSoup Build a relationship with Scrapy Something similar . however BeautifulSoup It's a Python library , and Scrapy It's a complete framework ."


Now? , You know Python And the purpose of these tool libraries . It's time to use what you've learned to solve specific data analysis problems . You can start with structured data sets , After that, we can solve those complex unstructured data analysis problems .

The above is the translation

This article is written by Beijing post @ Love coco - Love life The teacher recommended , Aliyunqi community organization .

Link to the original text :

本文为[Little ant]所创,转载请带上原文链接,感谢

  1. 前后端分离有什么了不起,手把手教你用Python爬下来!
  2. 在 Azure 上执行一些简单的 python 工作
  3. 推荐 :利用Python的混合集成机器学习(附链接)
  4. Cunning or orthodox Kung Fu? The most complete usage of Python derivation
  5. It's estimated that 80% of pandas people have to hang up!
  6. What's so great about the separation of front and rear ends? Hand in hand teach you to climb down with Python!
  7. Doing some simple Python work on azure
  8. Recommendation: hybrid integrated machine learning using python (link attached)
  9. Learning PPO algorithm programming from scratch (Python version)
  10. Python OpenCV 图片模糊操作 blur 与 medianBlur
  11. Python OpenCV image blur operation blur and mediablur
  12. 成功解决cv2.error: OpenCV(4.1.2) C:\projects\opencv-python\opencv\modules\imgproc\src\color.cpp:182: err
  13. Cv2.error solved successfully: opencv (4.1.2) C:: (projects / opencv Python / opencv modules / imgproc / SRC)\ color.cpp:182 : err
  14. Python 中使用 virtualenv 管理虚拟环境
  15. Using virtualenv to manage virtual environment in Python
  16. 如何使用Python执行系统命令?Python学习教程!
  17. How to use Python to execute system commands? Python tutorial!
  18. 快速掌握Python中的循环技术
  19. Quickly grasp the loop technology in Python
  20. Python主流Web框架之Tornado
  21. appium+python自动化63-使用Uiautomator2报错问题解决
  22. Tornado: the mainstream Python Web Framework
  23. Appium + Python automation 63 - using uiautomator2 to solve the problem of error reporting
  24. 爬虫+django,打造个性化API接口
  25. Crawler + Django to create personalized API interface
  26. 爬虫+django,打造个性化API接口
  27. Crawler + Django to create personalized API interface
  28. C、C++、Java、PHP、Python主要应用在哪里方面?
  29. C. Where are the main applications of C + +, Java, PHP and python?
  30. Python 无限级分类树状结构生成算法 「实用代码」
  31. Python infinite classification tree structure generation algorithm "practical code"
  32. 【Azure 存储服务】Python模块(azure.cosmosdb.table)直接对表存储(Storage Account Table)做操作示例
  33. [azure storage service] Python module( azure.cosmosdb.table )Direct operation example of storage account table
  34. 【Azure 存储服务】Python模块(azure.cosmosdb.table)直接对表存储(Storage Account Table)做操作示例
  35. [azure storage service] Python module( azure.cosmosdb.table )Direct operation example of storage account table
  36. openpose c++ 配置教程 + python api
  37. Openpose C + + configuration tutorial + Python API
  38. PYTHON爬虫实战_垃圾佬闲鱼爬虫转转爬虫数据整合自用二手急速响应捡垃圾平台_3(附源码持续更新)
  39. 使用python javaSerializationTools模块拼接生成 8u20 Gadget
  40. 萌新入门之python基础语法
  41. python中hmac模块的使用
  42. Python crawler_ Garbage man idle fish crawler turn crawler data integration self use second hand rapid response garbage collection platform_ 3 (with continuous source update)
  43. Using Python javaserialization tools module to generate 8u20 gadget
  44. The basic syntax of Python
  45. The use of HMAC module in Python
  46. 攻防世界web进阶区Web_python_block_chain详解
  47. Attack and defense world web advanced zone Web_ python_ block_ Details of chain
  48. pandas DataFrame的新增行列,修改、删除、筛选、判断元素以及转置操作
  49. Add rows and columns, modify, delete, filter, judge elements and transpose operations in pandas dataframe
  50. pandas DataFrame的新增行列,修改、删除、筛选、判断元素以及转置操作
  51. Add rows and columns, modify, delete, filter, judge elements and transpose operations in pandas dataframe
  52. 虚言妙诀终虚见,面试躬行是致知,Python技术面试策略与技巧实战记录
  53. The interview practice is knowledge, python technology interview strategy and skills of the actual record
  54. 用tqdm和rich为固定路径和目标的python算法代码实现进度条
  55. Using tqdm and rich as the fixed path and target of Python algorithm code to realize the progress bar
  56. 我来记笔记啦-Django开发流程与配置
  57. Let me take notes - Django development process and configuration
  58. python数据类型的强制转换
  59. Django报错:'Key 'id' not found in 'xxx'. Choices are: xxx'
  60. Python400集大型视频,从正确的方向出发学习,全套完整送给大家