Orca The project in DolphinDB On top of that comes pandas API, It enables users to analyze and process massive data more efficiently .
If you are already familiar with pandas, You can go through Orca package , make the best of DolphinDB High performance and concurrency , Dealing with massive data , Without the extra learning curve . If you already have one pandas Code , You don't have to worry about what you already have pandas A lot of code changes , Can move to Orca.
at present ,Orca The project is still in the development phase , And in fast iterations . We welcome you to use Orca At the same time , adopt GitHub issues Give us feedback .
Orca Design concept of
Python Third party library pandas It's a powerful tool for analyzing structured data , High performance 、 The interface is easy to use 、 Easy to learn features , Popular in data science and Quantitative Finance . However , When we start to deal with TB Level of massive data , Single core operation pandas It seems to be out of hand ;pandas The high memory consumption is also one of the limitations that affect its performance . When we have more processor cores , When you have multiple physical machines , We'll want to take advantage of concurrency , Improve the efficiency of data processing .
DolphinDB It's a distributed data analysis engine , It can be TB Class massive data is stored on multiple physical machines , And make the most of CPU, High performance analysis and calculation of massive data . In the calculation of the same function ,DolphinDB In terms of performance pandas fast 1~2 An order of magnitude , also Memory footprint is usually less than pandas Of 1/2. but DolphinDB The way of deployment and development is similar to pandas There's a significant difference , If the user wants to pandas Migrate to DolphinDB, A lot of changes need to be made to the existing code . Fortunately, ,DolphinDB We have started to develop Orca project —— One is based on DolphinDB Engine pandas DataFrame API The implementation of the . It allows users to pandas Programming style , Simultaneous utilization DolphinDB Performance advantages , Efficient analysis of massive data . comparison panddas Full memory computing ,Orca Support distributed storage and Computing . For the same amount of data , Memory footprint is generally less than pandas Of 1/2.
Orca The architecture of
Orca The top floor is pandas API, The bottom is DolphinDB database , adopt DolphinDB Python API Realization Orca The client and DolphinDB Communication on the server side .Orca The basic working principle of is , On the client side through Python Generate DolphinDB Script , Pass the script through DolphinDB Python API Send to DolphinDB Server side parsing execution .Orca Of DataFrame Only the corresponding DolphinDB The metadata of the table of , Real storage and computing are on the server side .
therefore ,Orca There are some restrictions on the interface of ：
- Orca Of DataFrame Each column in cannot be a mixed type , Listing must also be legal DolphinDB Variable name .
- If DataFrame Corresponding DolphinDB Table is a partitioned table , Data storage is not continuous , So there is no RangeIndex The concept of , And you can't put a whole Series Assign to a DataFrame The column of .
- about DolphinDB Partition table , Some functions that are not implemented in a distributed version , for example median,Orca Temporary does not support .
- DolphinDB Null value mechanism and pandas Different ,pandas use float Type of nan As a null value , and DolphinDB The null value of is the minimum value of each type .
- DolphinDB It's a columnar database . about pandas Interface , some axis=columns Parameters are not supported yet .
- Cannot be resolved at present Python function , therefore , for example
DataFrame.aggEtc. cannot accept a Python Function as parameter .
About Orca and pandas Detailed differences , And the resulting Orca Programming considerations , Please refer to Orca Use the tutorial .
Orca Support Linux and Windows System , requirement Python Version is 3.6 And above ,pandas Version is 0.25.1 And above .
Orca Project has been integrated into DolphinDB Python API in . adopt pip Tool installation DolphinDB Python API, You can use Orca.
pip install dolphindb
Orca Is based on DolphinDB Python API Developed , therefore , You need to have a DolphinDB The server , And pass
connect Function to connect to this server , And then run Orca：
import dolphindb.orca as orca orca.connect(MY_HOST, MY_PORT, MY_USERNAME, MY_PASSWORD)
If you already have one pandas Program , Can be pandas Of import Replace with ：
# import pandas as pd import dolphindb.orca as pd pd.connect(MY_HOST, MY_PORT, MY_USERNAME, MY_PASSWORD)
For more information