NVIDIA's Python GPU algorithm ecology rapids 0.10

Understanding oneself 2020-11-13 10:09:24
nvidia python gpu algorithm ecology

With the introduction of the new version ,RAPIDS The first anniversary of its launch . Looking back on the year ,RAPIDS The team expressed its heartfelt thanks to the community for their care and support for the project . before ,RAPIDS Got its first BOSSIE prize . Thank you very much for your support !RAPIDS The team will continue to promote the development of end-to-end data science , Reach new heights .

 Insert picture description here

Related articles :

nvidia-rapids︱cuDF And pandas Same DataFrame library
NVIDIA Of python-GPU Algorithmic ecology ︱ RAPIDS 0.10
nvidia-rapids︱cuML Machine learning acceleration Library
nvidia-rapids︱cuGraph(NetworkX-like) Graph model


RAPIDS Definition

RAPIDS, Full name Real-time Acceleration Platform for Integrated Data Science, yes NVIDIA Open source for data science and machine learning GPU Acceleration Library , be based on CUDA-X AI make , Accelerate data preparation 、 Model training and graph analysis .

Use RAPIDS Acceleration library can be realized from data preparation 、 The model is trained to predict the whole end-to-end process GPU Accelerated support for , Greatly improve the efficiency of task execution , Breakthrough in model accuracy while reducing infrastructure TCO.

CUDNN Has become a GPU Standard accelerated library for accelerated deep learning framework .RAPIDS( Here's the picture ) Provided cuDF、cuML and CuGraph It provides data preparation 、 Machine learning algorithms and graph analysis GPU Acceleration Library .

 Insert picture description here

RAPIDS Support lightweight big data framework DASK, So that the task can get more GPU、 Multinode GPU Accelerate support .

RAPIDS Start with data preparation , Introduce new GPU Data frame (cuDF), Furthermore, parallel data loading and data operation can be realized , make the best of NVIDIA GPU Large high bandwidth video memory on . cuDF It is easy to use and based on Python The toolset of , It can replace the familiar pandas Toolset . Data scientists don't have to learn from scratch NVIDIA CUDA technology , You only need to make a few changes to the existing code , So we can speed up the data preparation , Make it no longer limited to CPU or CPU Input and output between and memory .

RAPIDS It also introduces new and growing GPU Speed up ML Algorithm (cuML) library , It includes XGBoost Wait for the hot algorithm , as well as Kalman、K-means、 KNN、 DBScan、 PCA、 TSVD、 OLS Linear regression 、Kalman Filtering And so on . ML The algorithm can generate a lot of data transmission , It is still difficult to realize parallelization . With GPU Accelerated ML and NVIDIA NVLink as well as NVSwitch Architecture has been applied to server system , Model training can now be easily distributed among multiple GPU And multiple nodes ( System ) Between , There is little delay , And avoid CPU I / O bottleneck with memory .

rapids Background information

RAPIDS The team is discussing 0.10 Version when thinking about the previous Wes Mckinney I wrote a blog 《Apache Arrow and “ I hate Pandas Of 10 A question ”》.
 Insert picture description here

A brief review of the history of Data Science . Ten years ago , Many elements of today's big data have emerged one after another . In the corner of the data world ,Hadoop The birth of ecology , Hadoop、Hive、Cassandra、Mahout And so on are developing rapidly .

 Insert picture description here

On the other hand , What data scientists call PyData The stack is rising .NetworkX(2005)、Numpy(2006)、Scikit-Learn(2007) and Pandas(2008) There's a wave of usability ;Hadoop、Hive、Cassandra、Flume、Pig and Spark Expand data science to unprecedented levels . They all add a lot of new databases to the data science ecosystem 、 Vendors and almost countless ways to build data pipelines , To solve the problems of Data Science .

 Insert picture description here

Although the emergence of new tools and workflows is exciting , But few people think in reverse Apache Arrow Before , How these libraries and frameworks work together effectively . therefore , Most data scientists / Engineers spend most of their time serializing and deserializing data between Libraries ( Lots of copies and transformations ).

RAPIDS It's a combination of many libraries that people love .、 Many advantages of community and framework , And the hardships and troubles people have experienced when using these tools on a large scale . These positive emotions and negative emotions lead RAPIDS Ecology solved Wes Nasty about Pandas Of 10 A question ( It's actually 11 A question ) etc. .

“ I hate Pandas Of 10 A question ” list

  • 1、 The internals are away from “metal” Too far away ;
  • 2、 Memory mapped datasets are not supported ;
  • 3、 Database and file ingestion / Poor export performance ;
  • 4、Warty Lack of data support ;
  • 5、 Lack of transparency and RAM management ;
  • 6、 Weak support for classified data ;
  • 7、 Complex group function operation is clumsy and slow ;
  • 8、 Attach data to DataFrame It's cumbersome and expensive ;
  • 9、 Type metadata is limited and not extensible ;
  • 10、 An urgent evaluation model , No query plan ;
  • 11、“ slow ”, The ability of multi-core algorithm to process large data sets is limited .

 Insert picture description here

RAPIDS Not alone in solving these problems ; People attach great importance to “ ecology ”. There is no accelerated data science Ecology , It's impossible to have RAPIDS. First ,RAPIDS Is based on Apache Arrow Built .Apache Arrow Is a cross language development platform for in memory data . If not Apache The project and its contributors , that RAPIDS It's going to be more difficult to build . then , Don't forget to Anaconda、Peter Wang and Travis Oliphant( It has brought us a lot of PyData library ) And to encourage and highlight PyData What ecological performance does .Numba(2012) by Python Ecology provides a JIT compiler . The compiler can also target RAPIDS Used extensively in all our libraries GPU. Due to the ability to arbitrarily expand functions and use pure Python Write user-defined functions (UDF), therefore Python Ecosystems have many advantages that other languages don't have .

And then there is Python Native scheduler Dask(2014). The program can be used throughout Python Used in ecology , And with almost all dispatchers ( Include Slurm、Kubernetes and Yarn) There is a connection .GoAi(2017) A lot of people gathered GPU Construction of pioneers in the field of analysis RAPIDS Basic prototypes and develop GPU Communication and interoperability standards between Libraries . Last , In terms of interoperability , many CUDA Python Arrays and deep learning libraries (PyTorch、 MxNet、 Chainer、 CuPy And the upcoming PaddlePaddle) use DLPack and CUDA_Array_Interface( I hope there will be more ). All these are in RAPIDS The linked libraries in the ecosystem together realize the rapid creation of new libraries , for example cuSpatial、pyBlazing、cuXFilter and GFD( The following is a further introduction ), And this trend will continue .

As far as I'm concerned , This is also my favorite RAPIDS The place of —— Realized Python ecology GPU Democratization of , Enable others to build high-performance libraries with multiple functions at an unprecedented speed . To fill up a picture “10 Big ” list , I also ask that each RAPIDS The leaders of Ku say they are right RAPIDS Where I love you ( You will find that they must have spent a lot of time colluding with each other to answer , Because many of them have the same answer ).

RAPIDS The ten leading places of Kuo

Keith Kraus:
---- Speed —— Core functions “ near metal”;
---- GPU Ecological interoperability ;
---- PyData Ecological interoperability ;
---- Powerful memory layout semantics ;
---- Low level access and control ( Users can get raw pointers to their data when they need to );
---- Open source ;
---- Deep learning framework Integration ;
---- Follow the known PyData Application programming interface (API);
---- adopt BlazingSQL The implementation of structured query language (SQL).

John Zedlewski:
---- I remember every day before Spend hours waiting for machine learning to be completed in batches on large clusters , So every time I see a desktop computer can do such a large job in a few seconds, I'm very happy !

Bartley Richardson:
---- For specialized research in a certain field ( For example, network security and information security ) For data scientists , other Python Interoperability between tools is critical . We not only benefit from faster data analysis ( Usually in network security TB+ Level data set ), At the same time, it can also work with the domain exclusive downstream that security analysts rely on Python Software package and API Keep interoperability , This is really great .

Mark Harris:
---- Our team is so good .RAPIDS The team is made up of passion 、 A diverse and distributed team of people with outstanding capabilities . Although we are all over the world , Many of us work at home , But our team can build new capabilities through open communication and collaboration and solve problems at an amazing rate . Everyone is actively helping , They often force themselves to touch things outside their professional fields to learn new skills . We feel very happy to do it .

Brad Rees:
---- ETL、 Data engineering 、 There is a seamless transition between machine learning and graph analysis .RAPIDS Let data scientists just think about analysis , Without thinking about how to move data between tools .

Matt Rocklin:
---- I like RAPIDS Conforming to the standard Python API, This makes it easy to work with existing Python Ecosystem Integration ;
---- I like RAPIDS For many other Python The software package contributed , Not just yourself ;
---- I like RAPIDS Make it easy for users 、 Try all kinds of hardware quickly , Instead of learning the new system ;
---- I like RAPIDS To speed up the development of new science , It's not just about adding deep learning .

RAPIDS Core library update


cuDF The development rate in the past year is very fast . Each version has exciting new features 、 Optimization and bug fix .0.10 Version is no exception .cuDF 0.10 Some of the new features in the version include groupby.quantile()Series.isin()、 From remote / Cloud file system ( for example hdfs、gcs、s3) Read 、Series and DataFrame isna()、 Press any length in the group function Series grouping 、Series Covariance and Pearson Relevance and from DataFrame / Series .values Property returns CuPy Array . Besides ,apply UDF function API Optimized , And joined through .iloc Collection and dissemination methods of accessors .

In addition to providing all of the above excellent features 、 Beyond optimization and bug fixes ,cuDF 0.10 It also takes a lot of effort to build the future . This version will cuStrings Repository merge into cuDF in , And is ready to merge two code bases , Enables string functionality to be more tightly integrated into cuDF in , To provide faster acceleration and more functions . Besides ,RAPIDS Added cuStreamz Meta package , So you can use cuDF and Streamz Library simplification GPU Accelerate flow processing .cuDF Keep improving Pandas API Compatibility and Dask DataFrame Interoperability , So that our users can maximize the seamless use of cuDF.

Behind the scenes ,libcudf Our internal architecture is undergoing a major redesign .0.10 The latest version of cudf :: column and cudf :: table class , These classes greatly improve the robustness of memory ownership control , And support variable size data types for the future ( Include string Columns 、 Arrays and structures ) Laid the foundation . As has been built on the whole libcudf API Support for new classes in , This work will continue in the next release cycle . Besides ,libcudf 0.10 A lot of new API Sum algorithm , Including sort based 、 Support the grouping function of empty data 、 Grouping function quantile and median 、cudf :: unique_count,cudf :: repeat、cudf :: scatter_to_tables etc. . As usual , This release also includes many other improvements and fixes .

RAPIDS Memory manager Library RMM There is also a series of restructuring going on . This reorganization includes a new architecture based on memory resources , The architecture and C ++ 17 std :: pmr :: memory_resource Mostly compatible . This makes it easier for the library to add a new type of memory allocator after the common interface .0.10 Also use Cython To replace the CFFI Python binding , So that C ++ Exceptions can be propagated to Python abnormal , Make more tunable errors passed to the application . The next version will continue to improve RMM Exception support in .

Last , You'll notice cuDF There's been a significant increase in speed in this release , Include join( most 11 times )、gather and scatter on tables( Too fast 2-3 times ) Significant performance improvements for , And more like the picture 5 What is shown .
 Insert picture description here
chart 5: Single NVIDIA Tesla V100( Try it for free now ) GPU And two ways Intel Xeon E5–2698 v4 CPU(20 nucleus ) Upper cuDF vs Pandas Speed up

cuML and XGBoost

RAPIDS The team started with GPU Speed up XGBoost( One of the most popular gradient decision tree Libraries ) Commit to moving all improvements upstream to the main repository rather than creating long-running fork.RAPIDS The team is pleased to announce ,0.10 The version comes with a completely based on XGBoost The main branch XGBoost conda software package . This is a snapshot version , This version contains the upcoming 1.0.0 XGBoost Many of the features in the version . It supports data from cuDF DataFrames Load into XGBoost Transparency of time , And provide a more concise brand new Dask API Options ( For more information, see XGBoost The repository ). The older... Is now abandoned Dask-XGBoost API, But it can still work with RAPIDS 0.10 In combination with . To simplify downloading , at present XGBoost Of conda software package (rapids-xgboost) Has been included in the main Rapidsai conda In the passage , If you install it RAPIDS conda Meta package , Will be installed automatically conda software package ( For more information, see the getting started page ).

 Insert picture description here
contrast :Intel Xeon E5–2698 v4 CPU(20 nucleus ) And NVIDIA V100

RAPIDS Machine learning library cuML After extension, it supports a variety of popular machine learning algorithms .cuML Now we have a SVM classifier (SVC) Model , It's faster than the same CPU Fast version 300 times . It's in CannyLabs Of GPU Build an acceleration based on accelerated work TSNE Model , This model provides the most popular high performance dimensionality reduction method , At the same time, its running speed ratio is based on CPU The model is fast 1000 times . Every version of our random forest model is constantly improving , And now it includes a layered algorithm , Its speed ratio scikit-learn Random forest training fast 30 times .

from cuML Train to reason

It's not just training , If you want to really be GPU Expand data science , You also need to accelerate end-to-end applications .cuML 0.9 It brings us the basis of GPU The next development supported by the tree model of , Including the new forest reasoning database (FIL).FIL It's a lightweight GPU To accelerate the engine , It infers based on tree model , Including gradient enhanced decision tree and random forest . Using a single V100 GPU And two lines Python Code , The user can load a saved XGBoost or LightGBM Model , And reasoning on new data , Speed is better than double 20 nucleus CPU Node fast 36 times . Open source Treelite Software package based on , The next version of FIL You will also add pairs of scikit-learn and cuML Support of random forest model .

 Insert picture description here
chart 3: Comparison of reasoning speed ,XGBoost CPU vs Forest reasoning database (FIL) GPU
 Insert picture description here

chart 4:XGBoost CPU and FIL Reasoning time expands with the increase of batch size ( The lower the better )

future ,cuML Will also support GPU The reasoning of other algorithms .


Dask stay HPC and Kubernetes Standardized deployment is realized on the system , Include support for running scheduler separately from client , So that users can easily start the calculation on the remote cluster on the local laptop .Dask It's also for using the cloud, but it can't be used Kubernetes The agency added AWS ECS Native support .

UCX The development of high-performance communications on continues , Including the use of NVLINK In a single node of GPU And the use of InfiniBand Multiple nodes in the cluster .RAPIDS The team has ucx-py Binding override , Make it simpler , And solved cross Python-GPU library ( Such as Numba、RAPIDS and UCX) Several problems in shared memory management .


cuGraph A new step has been taken in integrating leading graphics frameworks into an easy-to-use interface . A few months ago ,RAPIDS Received... From Georgia Institute of technology Hornet copy , And refactor and rename it to cuHornet. This change of name indicates , The source code has deviated from Georgia Tech Benchmark and reflect the code API And data structure and RAPIDS cuGraph The matching of .cuHornet The addition of provides a boundary based programming model 、 Dynamic data structures and lists of existing analyses . In addition to the core number function , The first two available cuHornet The algorithm is Katz centrality and K-Cores.

cuGraph yes RAPIDS Graphic analysis library , in the light of cuGraph We've launched a multi... Supported by two new primitives GPU PageRank Algorithm : This is a COO To CSR More GPU Data converter , And a function to calculate vertex degrees . These primitives are used to list the source and target edges from Dask Dataframe Convert to graphic format , And make PageRank Can span multiple GPU Zoom .

The figure below shows the new many GPU PageRank Performance of the algorithm . And previous PageRank The benchmark run time is different , These runtimes are just measurements PageRank The performance of the solver . This set of runtime includes Dask DataFrame To CSR Transformation 、PageRank To execute and from CSR Back to DataFrame The result conversion of . The average result shows , More new GPU PageRank Analysis ratio 100 node Spark Cluster fast 10 More than times .

 Insert picture description here
chart 1:cuGraph PageRank On different numbers of edges and NVIDIA Tesla V 100 The time used in the calculation

The picture below only shows Bigdata Data sets 、5000 Ten thousand peaks and 19.8 Hundred million sides , And run HiBench End to end testing .HiBench The reference runtime includes data reading 、 function PageRank, And then get all the top scores . before ,HiBench Respectively in 10、20、50 and 100 Of nodes Google GCP Has been tested .

 Insert picture description here
chart 2:5 Ten million edge to end PageRank Running time ,cuGraph PageRank vs Spark Graph( The lower the better )

cuGraph 0.9 It also includes a new single GPU Strong connection component function .


RAPIDS 0.10 It also includes cuSpatial The initial version of .cuSpatial It's an efficient C ++ library , It is used for CUDA and cuDF Of GPU Accelerate geospatial analysis . The library contains... For use by data scientists python binding .cuSpatial It is faster than the existing algorithm 50 More than times and still under development .cuSpatial The initial version of includes for calculating trajectory clustering 、 Distance and speed 、hausdorff and hasrsine distance 、 Space window projection 、 The points in the polygon and the intersection of windows GPU Acceleration algorithm . In future versions , Will add... In a planned way shapefile Support and quadtree indexing .

 Insert picture description here


Publish this RAPIDS Version at the same time ,RAPIDS Also released cuDataShader GPU Speed up and cuDF Port support . This port is used for high performance Datashader. With speed 、 Large scale data visualization function and its surrounding python The design of the ,Datashader Very suitable for GPU Driven viz Use it together . Our first version implemented about 50 Times the speed . Based on these results , Will be in the next release GPU Function added to Datashader In itself ! So keep your eyes on the product . If you want to try , The easiest way is in our other Viz library cuXfilter Use it in .

 Insert picture description here


cuXfilter Used to support our mortgage virtualization demonstration ( The new link is here ), After a complete reconstruction , Its cross filter dashboard is easier to install and create , And all this work can be done through python Laptop complete ! Because there are many excellent visualization libraries on the network , So we usually don't create our own chart library , But through faster acceleration 、 Larger datasets and better user experience for development to enhance other chart Libraries , This is to eliminate the interconnection of multiple charts to GPU Back end trouble , Enables you to visualize data faster .

RAPIDS Community

Users contribute the most to the ecology .BlazingSQL Just released V0.4.5, This version is available on GPU Run faster on , And added a new benchmark . and GCP Upper TPC-H Query from local NVME and GCS Data extraction compared to , The benchmark can query 600M That's ok .ensemblecap.ai Of Ritchie Ng Released using RAPIDS cuDF A score difference of (GFD)GPU Implementation method , The speed ratio of the implementation method CPU Higher than 100 More than times .

In the next few months ,RAPIDS Activities of the engineering team around the world 、 Presentation and tutorials at conferences and programming marathons . Join us in GTC DC、PyData NYC and PyData LA.RAPIDS The team wants to work with you , Continuously improve RAPIDS.

Alibaba cloud GPU The cloud server now supports NVIDIA RAPIDS Acceleration Library

Support examples

Alicloud currently supports RAPIDS The example specifications of are GN6i(Tesla T4( Try it for free now ))、GN6v(Tesla V100( Try it for free now ))、GN5(Tesla P100) and GN5i(Tesla P4).

How to be in GPU Use RAPIDS Acceleration Library

About how to be in Alibaba cloud GPU The example is based on NGC Environment use RAPIDS Acceleration Library , Please refer to the documentation :《 stay GPU Use RAPIDS Accelerate machine learning tasks 》.

According to the above documents , Can run a stand-alone GPU Accelerated data preprocessing + Trained XGBoost Demo, And compare GPU And CPU Training time for .

Users can also choose more data volume and GPU Number to verify more GPU Support for .

Alibaba cloud will continue to provide more RAPIDS Accelerated best practices .


RAPIDS 0.10 Now available ! The results of decades of Data Science , Everybody loved them
Super open class 17 speak | Open source software platform RAPIDS How to accelerate data science
RAPIDS 0.9 Now available : A lot of new algorithms have been built

本文为[Understanding oneself]所创,转载请带上原文链接,感谢

  1. 利用Python爬虫获取招聘网站职位信息
  2. Using Python crawler to obtain job information of recruitment website
  3. Several highly rated Python libraries arrow, jsonpath, psutil and tenacity are recommended
  4. Python装饰器
  5. Python实现LDAP认证
  6. Python decorator
  7. Implementing LDAP authentication with Python
  8. Vscode configures Python development environment!
  9. In Python, how dare you say you can't log module? ️
  10. 我收藏的有关Python的电子书和资料
  11. python 中 lambda的一些tips
  12. python中字典的一些tips
  13. python 用生成器生成斐波那契数列
  14. python脚本转pyc踩了个坑。。。
  15. My collection of e-books and materials about Python
  16. Some tips of lambda in Python
  17. Some tips of dictionary in Python
  18. Using Python generator to generate Fibonacci sequence
  19. The conversion of Python script to PyC stepped on a pit...
  20. Python游戏开发,pygame模块,Python实现扫雷小游戏
  21. Python game development, pyGame module, python implementation of minesweeping games
  22. Python实用工具,email模块,Python实现邮件远程控制自己电脑
  23. Python utility, email module, python realizes mail remote control of its own computer
  24. 毫无头绪的自学Python,你可能连门槛都摸不到!【最佳学习路线】
  25. Python读取二进制文件代码方法解析
  26. Python字典的实现原理
  27. Without a clue, you may not even touch the threshold【 Best learning route]
  28. Parsing method of Python reading binary file code
  29. Implementation principle of Python dictionary
  30. You must know the function of pandas to parse JSON data - JSON_ normalize()
  31. Python实用案例,私人定制,Python自动化生成爱豆专属2021日历
  32. Python practical case, private customization, python automatic generation of Adu exclusive 2021 calendar
  33. 《Python实例》震惊了,用Python这么简单实现了聊天系统的脏话,广告检测
  34. "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system in Python
  35. Convolutional neural network processing sequence for Python deep learning
  36. Python data structure and algorithm (1) -- enum type enum
  37. 超全大厂算法岗百问百答(推荐系统/机器学习/深度学习/C++/Spark/python)
  38. 【Python进阶】你真的明白NumPy中的ndarray吗?
  39. All questions and answers for algorithm posts of super large factories (recommended system / machine learning / deep learning / C + + / spark / Python)
  40. [advanced Python] do you really understand ndarray in numpy?
  41. 【Python进阶】Python进阶专栏栏主自述:不忘初心,砥砺前行
  42. [advanced Python] Python advanced column main readme: never forget the original intention and forge ahead
  43. python垃圾回收和缓存管理
  44. java调用Python程序
  45. java调用Python程序
  46. Python常用函数有哪些?Python基础入门课程
  47. Python garbage collection and cache management
  48. Java calling Python program
  49. Java calling Python program
  50. What functions are commonly used in Python? Introduction to Python Basics
  51. Python basic knowledge
  52. Anaconda5.2 安装 Python 库(MySQLdb)的方法
  53. Python实现对脑电数据情绪分析
  54. Anaconda 5.2 method of installing Python Library (mysqldb)
  55. Python implements emotion analysis of EEG data
  56. Master some advanced usage of Python in 30 seconds, which makes others envy it
  57. python爬取百度图片并对图片做一系列处理
  58. Python crawls Baidu pictures and does a series of processing on them
  59. python链接mysql数据库
  60. Python link MySQL database