Apache Spark  Released Delta Lake 0.4.0, Main support DML Of Python API、 take Parquet Table to Delta Lake surface And part SQL function . 


These functions are described in detail below

Partial functional SQL Support

SQL Our support can provide users with great convenience , If you go to see the brick counting Delta Lake product , You must have seen it support SQL grammar . But open source Delta Lake Prior to this release, only Scala\Java To create 、 Delete 、 to update  Delta Lake  surface .

The good news is that , from  0.4.0  Version start ,Delta Lake Has started to support some of the commands SQL The grammar . because Delta Lake It's a separate project , If it needs to support all SQL grammar , Need from Apache Spark Copy a lot of code to Delta Lake In the project , Not easy to maintain , So this version only supports vacuum and history Simple command SQL grammar .

Other delete、update  as well as  merge  Of DML Operational support may have to wait until Spark 3.0 edition To support . Now the community is also Spark 3.0 Inside DataSource V2 API It's added a pair of  DELETE/UPDATE/MERGE  Support for , For details, see  https://issues.apache.org/jira/browse/SPARK-28303. Believe in the future version , These are basic SQL Grammar support will gradually support .

be used for DML And practical operation Python API

stay 0.4.0 Before the release ,Delta Lake Only support Scala and Java API. In order to be able to Python Use in Delta Lake, This version introduces Python API( For details, please refer to https://github.com/delta-io/delta/issues/89), You can use it in Delta Lake Table update\delete\merge Wait for the operation .

We can also use this Python API Run some practical operations , such as  vacuum、history etc. . such Python and Scala\Java Of API The function is aligned . More about Python API For the use of Delta Lake Official documents of .

take  Parquet Table to Delta Lake surface

If we had one parquet My ordinary watch , And then we want to turn it into   Delta Lake surface , Before that, we need to read out this table , And then write Delta Lake surface . If our parquet The watch is very big , It takes a lot of resources to transform . This version gives us conversion commands , direct You can put... In place  Parquet surface convert to Delta Lake surface , Be careful It's in place , It means you don't need to Moving data from one place to another place , Don't need to, Read and write all the data Original catalogue . This command Will list  Parquet All the files in the table , then adopt Automatically Read all Parquet  Of documents  footer  Get the location of the table Pattern , And then finally generate a  transaction log To track these files . When however , If you don't need to Delta Lake surface , You can also use this Order it back to normal  Parquet surface .