Introducing OCI Data Integration - New cloud native, serverless service for ETL/ELT and Data Pipelines SNAGHTML62a57a

Introducing OCI Data Integration – New cloud native, serverless service for ETL/ELT and Data Pipelines

TL;DR: Oracle offers a new cloud native, serverless service on OCI for data processing and ETL/ELT, called Data Integration. It seems a new incarnation of Oracle Data Integrator or even of Warehouse Builder. It provides data flows that can filter, convert, join and aggregate. It currently only supports Object Storage (CSV, JSON and Parquet files) and Oracle Database (on prem and cloud) as sources and targets. Its pricing model is not yet revealed.

EDIT 10th July: after asking around on Twitter, Jakub Ilner informed me of this pricing definition: “Workspace usage per hour (0.3$ PAYGO) and Gigabyte of Data Processed per Hour (0.075$ PAYGO), less for Monthly Flex.” I am not sure at this point what exactly counts as workspace usage (is that design time? or does running a task also count as workspace usage?

Real-Time Data Integration and Replication

The new OCI Data Integration service is fully integrated in the OCI framework – supported through REST APIs and the OCI CLI as well as the SDKs, governed by policies and subject to monitoring (and the alarms & notifications that can be defined through the OCI monitoring framework).

In the OCI web console, it also provides a graphical designer for data flows. The data flows when run will execute steps that extract data from the specified sources (files on Object Storage or tables in an Oracle Database), process that data (through conversions and derivations, validations and filters, joins and aggregations) and finally load the data into the target (again either a file on Object Storage or a table in Oracle Database).

The data flows are currently activated manually (not yet on automated schedules or triggered by events – although using OCI Cloud Events and Functions, this can be programmed). On each run, a data flow can be associated with a different set of parameter values – that can dynamically configure the data flow to run on the data assets, connections, entities, and even conditions (for filters and joins) and expressions (for aggregations and derivations) that are specified.

OCI Data Integration provides a Data Xplorer tool, through which you can explore a data sample, review profiling metadata, and apply transformations using expression operators. The Data Xplorer updates the sampling as you apply transformation operations and helps you validate the impact of these transformations, debugging and troubleshooting possible failures before a task is run. The transformations that can be defined for attributes are extract, change case, exclude, rename, coalesce (aka NVL), replace and extract (using regular expressions)

Product Marketing

As Oracle puts it: “Oracle Cloud Infrastructure Data Integration is a managed service that provides extract, transform, and load (ETL) capabilities to target AI and analytics projects on Oracle Cloud. It helps users easily ingest and transform data from various data sources, starting with databases and data lakes and extending to applications.” This can be read as a promise to extend Data Integration with support for Data Lakes and Applications. Currently, beyond Oracle Database – Autonomous or On Premises – the service does not support sources or targets, not even those one would expects such as Oracle NoSQL Database Cloud Service, Oracle Big Data Cloud and Oracle MySQL Database Cloud Service.

You might think that this description:”Oracle offers unique, next-generation extract, load, and transform (ELT) technology that improves performance and reduces data integration costs—even across heterogeneous systems. Unlike conventional ELT tools, Oracle delivers the productivity of a declarative design approach and the benefits of an active integration platform for seamless batch and real-time integration.” would be for the OCI Data Integration service (or ODI for short 😉 ) but this  in fact is what it says on the Oracle website about on-prem ODI- which is also available on the cloud as Oracle Data Integrator Marketplace.

SNAGHTML62a57a

The new service is positioned under Analytics and Big Data – not under Integration where you might also expect to find it – and where Integration (OIC) is to be found. Oracle seems determined to maintain this distinction – increasingly artificial – between [application] integration and data integration. One almost suspects that internal company politics has more to do with this than customer requirements.

image

Limitations, questions and unclarity

I am not sure about the roadmap of the Data Integration service. At present, there are some limitations and open questions I am struggling with. Hopefully someone can help me find answers to these questions or prove me long concerning some of the limitations:

  • Limited sources and destinations: only Object Storage (Parquet, JSON, CSV) and Oracle Database
    • No OCI NoSQL Database, Kafka, OCI Streaming, REST API, OCI Managed or on Prem MySQL Database,  OCI Big Data (Hadoop/Hive, Spark), OData, Snowflake, Excel, S3/Dropbox/OneDrive (JDBC/ODBC – PostgreSQL/SQL Server,…)
    • No adapters for SaaS applications – everything is handled only at the technical level
  • Unclear relation with data processing in Oracle Analytics Cloud
    • there is a strong similarity between Data Integration and the Data Preparation in Analytics Cloud – but the two seem unrelated – using different operators, expression language, profiling tools etc; a missed opportunity to join forces (and let me reuse my skills)
    • compare the limited set of data sources in Data Integration with the far larger set of supported data sources in Oracle Analytics Cloud: https://docs.oracle.com/en/cloud/paas/analytics-cloud/acsds/supported-data-sources.html
  • Unclear relation with OIC – Oracle Integration Cloud
    • are the two joining forces? are they to stay apart?
  • No integration of Functions for complex transformation, validation or enrichment
    • it would be very useful (if not downright a requirement) to be able write custom logic in a serverless function and engage that function from data processing pipelines
  • Manual start of a task run (or through API); not yet triggerable by meaningful events (such as the arrival of a file)
  • Relation of OCI Data Integration with ODI Marketplace?
  • Unclear pricing – no information on pricing seems available
  • I wonder whether Data Integration connects with OCI Data Catalog. I hope that it does – but I have not seen indications of it

This is so ironic, taken from the announcement article on the Data Integration service

image

this is what I could find on pricing for this new service:

image

and:

image

which amounts to… nothing at all. I do not know what you will have to pay.

EDIT 10th July: after asking around on Twitter, Jakub Ilner informed me of this pricing definition: “Workspace usage per hour (0.3$ PAYGO) and Gigabyte of Data Processed per Hour (0.075$ PAYGO), less for Monthly Flex.” I am not sure at this point what exactly counts as workspace usage (is that design time? or does running a task also count as workspace usage?

Conclusion

It seems that OCI Data Integration is in its early stages – with a lot of evolution just around the corner. It is a important area – facilities to bring data to where it is needed, when it is needed in that shape in which it is needed – is incredibly important. Oracle has been doing it on premises for a long time – and has not been extremely successful at bringing that functionality to the cloud – in true cloud native fashion. Hopefully, with OCI DI, that trend is now come to and end. However, a lot is still unclear. Although some aspects of the new service are nicely integrated into the overall OCI way of working, several aspects are really not where I would like them to be – such as the glaring lack of integration with Data Catalog as well as with NoSQL Database, MySQL, Big Data and Streaming, the overlap with Analytics Cloud’s data preparation features and the weird absence of [clarity on] a price tag. I have not been able to find a roadmap for this new product – which could help to make things a lot clearer.

All in all, I am positive: Data Integration is important and I am looking forward to see this new service blossom into a wonderful, cloud native offering.

 

Resources

OCI CLI Reference – https://docs.cloud.oracle.com/en-us/iaas/tools/oci-cli/2.12.2/oci_cli_docs/cmdref/data-integration.html

Oracle Data Integration Products – https://www.oracle.com/middleware/data-integration/products.html

OCI Data Integration Documentation – https://docs.cloud.oracle.com/en-us/iaas/data-integration/using/index.htm

OCI Data Integration Tutorials