With the introduction of the new version ,RAPIDS The first anniversary of its launch . Looking back on the year ,RAPIDS The team expressed its heartfelt thanks to the community for their care and support for the project . before ,RAPIDS Got its first BOSSIE prize . Thank you very much for your support ！RAPIDS The team will continue to promote the development of end-to-end data science , Reach new heights .
Related articles ：
nvidia-rapids︱cuDF And pandas Same DataFrame library
NVIDIA Of python-GPU Algorithmic ecology ︱ RAPIDS 0.10
nvidia-rapids︱cuML Machine learning acceleration Library
nvidia-rapids︱cuGraph(NetworkX-like) Graph model
RAPIDS, Full name Real-time Acceleration Platform for Integrated Data Science, yes NVIDIA Open source for data science and machine learning GPU Acceleration Library , be based on CUDA-X AI make , Accelerate data preparation 、 Model training and graph analysis .
Use RAPIDS Acceleration library can be realized from data preparation 、 The model is trained to predict the whole end-to-end process GPU Accelerated support for , Greatly improve the efficiency of task execution , Breakthrough in model accuracy while reducing infrastructure TCO.
CUDNN Has become a GPU Standard accelerated library for accelerated deep learning framework .RAPIDS（ Here's the picture ） Provided cuDF、cuML and CuGraph It provides data preparation 、 Machine learning algorithms and graph analysis GPU Acceleration Library .
RAPIDS Support lightweight big data framework DASK, So that the task can get more GPU、 Multinode GPU Accelerate support .
RAPIDS Start with data preparation , Introduce new GPU Data frame (cuDF), Furthermore, parallel data loading and data operation can be realized , make the best of NVIDIA GPU Large high bandwidth video memory on . cuDF It is easy to use and based on Python The toolset of , It can replace the familiar pandas Toolset . Data scientists don't have to learn from scratch NVIDIA CUDA technology , You only need to make a few changes to the existing code , So we can speed up the data preparation , Make it no longer limited to CPU or CPU Input and output between and memory .
RAPIDS It also introduces new and growing GPU Speed up ML Algorithm (cuML) library , It includes XGBoost Wait for the hot algorithm , as well as Kalman、K-means、 KNN、 DBScan、 PCA、 TSVD、 OLS Linear regression 、Kalman Filtering And so on . ML The algorithm can generate a lot of data transmission , It is still difficult to realize parallelization . With GPU Accelerated ML and NVIDIA NVLink as well as NVSwitch Architecture has been applied to server system , Model training can now be easily distributed among multiple GPU And multiple nodes （ System ） Between , There is little delay , And avoid CPU I / O bottleneck with memory .
RAPIDS The team is discussing 0.10 Version when thinking about the previous Wes Mckinney I wrote a blog 《Apache Arrow and “ I hate Pandas Of 10 A question ”》.
A brief review of the history of Data Science . Ten years ago , Many elements of today's big data have emerged one after another . In the corner of the data world ,Hadoop The birth of ecology , Hadoop、Hive、Cassandra、Mahout And so on are developing rapidly .
On the other hand , What data scientists call PyData The stack is rising .NetworkX（2005）、Numpy（2006）、Scikit-Learn（2007） and Pandas（2008） There's a wave of usability ;Hadoop、Hive、Cassandra、Flume、Pig and Spark Expand data science to unprecedented levels . They all add a lot of new databases to the data science ecosystem 、 Vendors and almost countless ways to build data pipelines , To solve the problems of Data Science .
Although the emergence of new tools and workflows is exciting , But few people think in reverse Apache Arrow Before , How these libraries and frameworks work together effectively . therefore , Most data scientists / Engineers spend most of their time serializing and deserializing data between Libraries （ Lots of copies and transformations ）.
RAPIDS It's a combination of many libraries that people love .、 Many advantages of community and framework , And the hardships and troubles people have experienced when using these tools on a large scale . These positive emotions and negative emotions lead RAPIDS Ecology solved Wes Nasty about Pandas Of 10 A question （ It's actually 11 A question ） etc. .
“ I hate Pandas Of 10 A question ” list
RAPIDS Not alone in solving these problems ; People attach great importance to “ ecology ”. There is no accelerated data science Ecology , It's impossible to have RAPIDS. First ,RAPIDS Is based on Apache Arrow Built .Apache Arrow Is a cross language development platform for in memory data . If not Apache The project and its contributors , that RAPIDS It's going to be more difficult to build . then , Don't forget to Anaconda、Peter Wang and Travis Oliphant（ It has brought us a lot of PyData library ） And to encourage and highlight PyData What ecological performance does .Numba（2012） by Python Ecology provides a JIT compiler . The compiler can also target RAPIDS Used extensively in all our libraries GPU. Due to the ability to arbitrarily expand functions and use pure Python Write user-defined functions （UDF）, therefore Python Ecosystems have many advantages that other languages don't have .
And then there is Python Native scheduler Dask（2014）. The program can be used throughout Python Used in ecology , And with almost all dispatchers （ Include Slurm、Kubernetes and Yarn） There is a connection .GoAi（2017） A lot of people gathered GPU Construction of pioneers in the field of analysis RAPIDS Basic prototypes and develop GPU Communication and interoperability standards between Libraries . Last , In terms of interoperability , many CUDA Python Arrays and deep learning libraries （PyTorch、 MxNet、 Chainer、 CuPy And the upcoming PaddlePaddle） use DLPack and CUDA_Array_Interface（ I hope there will be more ）. All these are in RAPIDS The linked libraries in the ecosystem together realize the rapid creation of new libraries , for example cuSpatial、pyBlazing、cuXFilter and GFD（ The following is a further introduction ）, And this trend will continue .
As far as I'm concerned , This is also my favorite RAPIDS The place of —— Realized Python ecology GPU Democratization of , Enable others to build high-performance libraries with multiple functions at an unprecedented speed . To fill up a picture “10 Big ” list , I also ask that each RAPIDS The leaders of Ku say they are right RAPIDS Where I love you （ You will find that they must have spent a lot of time colluding with each other to answer , Because many of them have the same answer ）.
RAPIDS The ten leading places of Kuo
---- Speed —— Core functions “ near metal”;
---- GPU Ecological interoperability ;
---- PyData Ecological interoperability ;
---- Powerful memory layout semantics ;
---- Low level access and control （ Users can get raw pointers to their data when they need to ）;
---- Open source ;
---- Deep learning framework Integration ;
---- Follow the known PyData Application programming interface （API）;
---- adopt BlazingSQL The implementation of structured query language （SQL）.
---- I remember every day before Spend hours waiting for machine learning to be completed in batches on large clusters , So every time I see a desktop computer can do such a large job in a few seconds, I'm very happy ！
---- For specialized research in a certain field （ For example, network security and information security ） For data scientists , other Python Interoperability between tools is critical . We not only benefit from faster data analysis （ Usually in network security TB+ Level data set ）, At the same time, it can also work with the domain exclusive downstream that security analysts rely on Python Software package and API Keep interoperability , This is really great .
---- Our team is so good .RAPIDS The team is made up of passion 、 A diverse and distributed team of people with outstanding capabilities . Although we are all over the world , Many of us work at home , But our team can build new capabilities through open communication and collaboration and solve problems at an amazing rate . Everyone is actively helping , They often force themselves to touch things outside their professional fields to learn new skills . We feel very happy to do it .
---- ETL、 Data engineering 、 There is a seamless transition between machine learning and graph analysis .RAPIDS Let data scientists just think about analysis , Without thinking about how to move data between tools .
---- I like RAPIDS Conforming to the standard Python API, This makes it easy to work with existing Python Ecosystem Integration ;
---- I like RAPIDS For many other Python The software package contributed , Not just yourself ;
---- I like RAPIDS Make it easy for users 、 Try all kinds of hardware quickly , Instead of learning the new system ;
---- I like RAPIDS To speed up the development of new science , It's not just about adding deep learning .
cuDF The development rate in the past year is very fast . Each version has exciting new features 、 Optimization and bug fix .0.10 Version is no exception .cuDF 0.10 Some of the new features in the version include
Series.isin()、 From remote / Cloud file system （ for example hdfs、gcs、s3） Read 、Series and DataFrame
isna()、 Press any length in the group function Series grouping 、Series Covariance and Pearson Relevance and from DataFrame / Series
.values Property returns CuPy Array . Besides ,
apply UDF function API Optimized , And joined through
.iloc Collection and dissemination methods of accessors .
In addition to providing all of the above excellent features 、 Beyond optimization and bug fixes ,cuDF 0.10 It also takes a lot of effort to build the future . This version will cuStrings Repository merge into cuDF in , And is ready to merge two code bases , Enables string functionality to be more tightly integrated into cuDF in , To provide faster acceleration and more functions . Besides ,RAPIDS Added cuStreamz Meta package , So you can use cuDF and Streamz Library simplification GPU Accelerate flow processing .cuDF Keep improving Pandas API Compatibility and Dask DataFrame Interoperability , So that our users can maximize the seamless use of cuDF.
Behind the scenes ,libcudf Our internal architecture is undergoing a major redesign .0.10 The latest version of cudf :: column and cudf :: table class , These classes greatly improve the robustness of memory ownership control , And support variable size data types for the future （ Include string Columns 、 Arrays and structures ） Laid the foundation . As has been built on the whole libcudf API Support for new classes in , This work will continue in the next release cycle . Besides ,libcudf 0.10 A lot of new API Sum algorithm , Including sort based 、 Support the grouping function of empty data 、 Grouping function quantile and median 、cudf :: unique_count,cudf :: repeat、cudf :: scatter_to_tables etc. . As usual , This release also includes many other improvements and fixes .
RAPIDS Memory manager Library RMM There is also a series of restructuring going on . This reorganization includes a new architecture based on memory resources , The architecture and C ++ 17 std :: pmr :: memory_resource Mostly compatible . This makes it easier for the library to add a new type of memory allocator after the common interface .0.10 Also use Cython To replace the CFFI Python binding , So that C ++ Exceptions can be propagated to Python abnormal , Make more tunable errors passed to the application . The next version will continue to improve RMM Exception support in .
Last , You'll notice cuDF There's been a significant increase in speed in this release , Include join（ most 11 times ）、gather and scatter on tables（ Too fast 2-3 times ） Significant performance improvements for , And more like the picture 5 What is shown .
chart 5： Single NVIDIA Tesla V100（ Try it for free now ） GPU And two ways Intel Xeon E5–2698 v4 CPU（20 nucleus ） Upper cuDF vs Pandas Speed up
RAPIDS The team started with GPU Speed up XGBoost（ One of the most popular gradient decision tree Libraries ） Commit to moving all improvements upstream to the main repository rather than creating long-running fork.RAPIDS The team is pleased to announce ,0.10 The version comes with a completely based on XGBoost The main branch XGBoost conda software package . This is a snapshot version , This version contains the upcoming 1.0.0 XGBoost Many of the features in the version . It supports data from cuDF DataFrames Load into XGBoost Transparency of time , And provide a more concise brand new Dask API Options （ For more information, see XGBoost The repository ）. The older... Is now abandoned Dask-XGBoost API, But it can still work with RAPIDS 0.10 In combination with . To simplify downloading , at present XGBoost Of conda software package （rapids-xgboost） Has been included in the main Rapidsai conda In the passage , If you install it RAPIDS conda Meta package , Will be installed automatically conda software package （ For more information, see the getting started page ）.
contrast ：Intel Xeon E5–2698 v4 CPU（20 nucleus ） And NVIDIA V100
RAPIDS Machine learning library cuML After extension, it supports a variety of popular machine learning algorithms .cuML Now we have a SVM classifier （SVC） Model , It's faster than the same CPU Fast version 300 times . It's in CannyLabs Of GPU Build an acceleration based on accelerated work TSNE Model , This model provides the most popular high performance dimensionality reduction method , At the same time, its running speed ratio is based on CPU The model is fast 1000 times . Every version of our random forest model is constantly improving , And now it includes a layered algorithm , Its speed ratio scikit-learn Random forest training fast 30 times .
It's not just training , If you want to really be GPU Expand data science , You also need to accelerate end-to-end applications .cuML 0.9 It brings us the basis of GPU The next development supported by the tree model of , Including the new forest reasoning database （FIL）.FIL It's a lightweight GPU To accelerate the engine , It infers based on tree model , Including gradient enhanced decision tree and random forest . Using a single V100 GPU And two lines Python Code , The user can load a saved XGBoost or LightGBM Model , And reasoning on new data , Speed is better than double 20 nucleus CPU Node fast 36 times . Open source Treelite Software package based on , The next version of FIL You will also add pairs of scikit-learn and cuML Support of random forest model .
chart 3： Comparison of reasoning speed ,XGBoost CPU vs Forest reasoning database (FIL) GPU
chart 4：XGBoost CPU and FIL Reasoning time expands with the increase of batch size （ The lower the better ）
future ,cuML Will also support GPU The reasoning of other algorithms .
Dask stay HPC and Kubernetes Standardized deployment is realized on the system , Include support for running scheduler separately from client , So that users can easily start the calculation on the remote cluster on the local laptop .Dask It's also for using the cloud, but it can't be used Kubernetes The agency added AWS ECS Native support .
UCX The development of high-performance communications on continues , Including the use of NVLINK In a single node of GPU And the use of InfiniBand Multiple nodes in the cluster .RAPIDS The team has ucx-py Binding override , Make it simpler , And solved cross Python-GPU library （ Such as Numba、RAPIDS and UCX） Several problems in shared memory management .
cuGraph A new step has been taken in integrating leading graphics frameworks into an easy-to-use interface . A few months ago ,RAPIDS Received... From Georgia Institute of technology Hornet copy , And refactor and rename it to cuHornet. This change of name indicates , The source code has deviated from Georgia Tech Benchmark and reflect the code API And data structure and RAPIDS cuGraph The matching of .cuHornet The addition of provides a boundary based programming model 、 Dynamic data structures and lists of existing analyses . In addition to the core number function , The first two available cuHornet The algorithm is Katz centrality and K-Cores.
cuGraph yes RAPIDS Graphic analysis library , in the light of cuGraph We've launched a multi... Supported by two new primitives GPU PageRank Algorithm ： This is a COO To CSR More GPU Data converter , And a function to calculate vertex degrees . These primitives are used to list the source and target edges from Dask Dataframe Convert to graphic format , And make PageRank Can span multiple GPU Zoom .
The figure below shows the new many GPU PageRank Performance of the algorithm . And previous PageRank The benchmark run time is different , These runtimes are just measurements PageRank The performance of the solver . This set of runtime includes Dask DataFrame To CSR Transformation 、PageRank To execute and from CSR Back to DataFrame The result conversion of . The average result shows , More new GPU PageRank Analysis ratio 100 node Spark Cluster fast 10 More than times .
chart 1：cuGraph PageRank On different numbers of edges and NVIDIA Tesla V 100 The time used in the calculation
The picture below only shows Bigdata Data sets 、5000 Ten thousand peaks and 19.8 Hundred million sides , And run HiBench End to end testing .HiBench The reference runtime includes data reading 、 function PageRank, And then get all the top scores . before ,HiBench Respectively in 10、20、50 and 100 Of nodes Google GCP Has been tested .
chart 2：5 Ten million edge to end PageRank Running time ,cuGraph PageRank vs Spark Graph（ The lower the better ）
cuGraph 0.9 It also includes a new single GPU Strong connection component function .
RAPIDS 0.10 It also includes cuSpatial The initial version of .cuSpatial It's an efficient C ++ library , It is used for CUDA and cuDF Of GPU Accelerate geospatial analysis . The library contains... For use by data scientists python binding .cuSpatial It is faster than the existing algorithm 50 More than times and still under development .cuSpatial The initial version of includes for calculating trajectory clustering 、 Distance and speed 、hausdorff and hasrsine distance 、 Space window projection 、 The points in the polygon and the intersection of windows GPU Acceleration algorithm . In future versions , Will add... In a planned way shapefile Support and quadtree indexing .
Publish this RAPIDS Version at the same time ,RAPIDS Also released cuDataShader GPU Speed up and cuDF Port support . This port is used for high performance Datashader. With speed 、 Large scale data visualization function and its surrounding python The design of the ,Datashader Very suitable for GPU Driven viz Use it together . Our first version implemented about 50 Times the speed . Based on these results , Will be in the next release GPU Function added to Datashader In itself ！ So keep your eyes on the product . If you want to try , The easiest way is in our other Viz library cuXfilter Use it in .
cuXfilter Used to support our mortgage virtualization demonstration （ The new link is here ）, After a complete reconstruction , Its cross filter dashboard is easier to install and create , And all this work can be done through python Laptop complete ！ Because there are many excellent visualization libraries on the network , So we usually don't create our own chart library , But through faster acceleration 、 Larger datasets and better user experience for development to enhance other chart Libraries , This is to eliminate the interconnection of multiple charts to GPU Back end trouble , Enables you to visualize data faster .
Users contribute the most to the ecology .BlazingSQL Just released V0.4.5, This version is available on GPU Run faster on , And added a new benchmark . and GCP Upper TPC-H Query from local NVME and GCS Data extraction compared to , The benchmark can query 600M That's ok .ensemblecap.ai Of Ritchie Ng Released using RAPIDS cuDF A score difference of （GFD）GPU Implementation method , The speed ratio of the implementation method CPU Higher than 100 More than times .
In the next few months ,RAPIDS Activities of the engineering team around the world 、 Presentation and tutorials at conferences and programming marathons . Join us in GTC DC、PyData NYC and PyData LA.RAPIDS The team wants to work with you , Continuously improve RAPIDS.
Alicloud currently supports RAPIDS The example specifications of are GN6i（Tesla T4（ Try it for free now ））、GN6v（Tesla V100（ Try it for free now ））、GN5（Tesla P100） and GN5i（Tesla P4）.
About how to be in Alibaba cloud GPU The example is based on NGC Environment use RAPIDS Acceleration Library , Please refer to the documentation ：《 stay GPU Use RAPIDS Accelerate machine learning tasks 》.
According to the above documents , Can run a stand-alone GPU Accelerated data preprocessing + Trained XGBoost Demo, And compare GPU And CPU Training time for .
Users can also choose more data volume and GPU Number to verify more GPU Support for .
Alibaba cloud will continue to provide more RAPIDS Accelerated best practices .
RAPIDS 0.10 Now available ！ The results of decades of Data Science , Everybody loved them
Super open class 17 speak | Open source software platform RAPIDS How to accelerate data science
RAPIDS 0.9 Now available ： A lot of new algorithms have been built