How to extract SAP BW/4HANA data to Microsoft Azure environment using SAP change data capture (CDC) capability - sapport.az

How to extract SAP BW/4HANA data to Microsoft Azure environment using SAP change data capture (CDC) capability

As we all know, today a cloud-based infrastructure is at the center of digital transformation that provides new customer experiences to meet changing business and market dynamics. In my opinion, over the course of this transformation, the underlying process will be effective when organizations are able to seamlessly integrate their existing IT solutions and applications architecture with new digital services such as Microsoft Azure.


Now a broad public is constantly discussing Microsoft Azure and its integration with the SAP world. In this context, I’d like to give a summary of my research on leveraging SAP CDC and Azure Data Factory (ADF) to integrate real SAP BW/4HANA (applicable since ABAP 7.40, SP10 ) data with Microsoft Azure, which can be useful for architects, data engineers, product owners during projects on the same and also anyone who is interested in this topic.   

Therefore, this article outlines how to use the copy activity in Azure Data Factory (applicable to Azure Synapse Analytics) pipelines to copy data from the SAP BW/4HANA (same approach for SAP ECC via classic extractors and S/4 ). SAP (BW/4HANA) is a next-generation data warehousing and reporting platform that is designed to deliver analytics solutions and enable organizations to harness the full value of data.

The SAP CDC capability in ADF is based on SAP Operational Data Provisioning (ODP) framework to replicate a full and delta from the SAP source dataset. SAP ODP is not something new in the SAP world but it is in Azure. The new SAP CDC connector has been released at the end of Q2 2022. Azure SAP CDC is designed to be scalable, fault-tolerant, and secure and it uses Azure Data Factory’s managed infrastructure to handle the data transfer and data processing tasks.

Before SAP CDC there were already six dedicated SAP Connectors just based on open standards, like ODBC/JDBC, SQL, OData, etc.

No alt text provided for this image

What is Azure Data Factory?

Azure Data Factory is a cloud-based code-free data integration service of Microsoft that allows you to create, schedule, and orchestrate data pipelines to move and transform data between various data stores and services. It provides a visual interface in which you can use Azure Data Factory’s data integration capabilities to transform, filter, and enrich the replicated data before loading it into target stores. For more information about ADF refer to the ADF documentation.

Azure ADF ETL

What is SAP ODP?

SAP Operational Data Provisioning provides a technical infrastructure to simplify the data extraction and replication processes in SAP. A game-changing framework offers complete data extraction for usage in the Azure environment to create a strong interaction with SAP. The framework identifies new and changed data in the source and makes it ready to be consumed by SAP and non-SAP products. Before this connector, in order to get the same functionality a complex logic needed to be created.

SAP OPD Framework-Azure context

Self-Hosted Integration Runtime (SHIR)

ADF cloud service orchestrates the extraction procedure but is unable to establish a direct connection with the data source. It requires the installation and configuration of SHIR in your server, ideally where the SAP system is running. It provides access to SAP systems on-premises or in any cloud. (later SAP.NET connector will be required to be installed for a successful connection)

No alt text provided for this image

Use Case – Data Flow

The below use case has been tested with the SAP CDC connector. Before all, required prerequisites like self-hosted runtime integration, and source system connections have been set up. The architecture is as below:

No alt text provided for this image

For testing purposes below ADSO has been created:

sample ADSO

Moreover, for data extraction from BW/4HANA, the following ABAP CDS view has been created:

CDS View

The first test data has been loaded:

BW/4HANA ADSO Table first load

Create ADF/Synapse Analytics Pipeline

To create a pipeline with SAP extractors step by step, you can refer to the more detailed blog post. The target can be any data store supported by Azure. However, the target store has to support upsert (update/insert) and deletes to use the automatic delta processes. To perform delta extraction and loading, Delta data set type has been selected as a data format, which will save as parquet files.

The first run has been performed:

Data load Monitoring

You can use the ODQMON transaction code in the BW/4HANA to monitor the extraction process on the source side. You can also display the detailed information about processed data, including a data preview.

A delta was loaded to ADSO:

No alt text provided for this image

The second run was performed via ADF pipeline and this time only delta loaded:

No alt text provided for this image

Finally let’s display all loaded data from parquet files:

No alt text provided for this image

How about Challenges – SAP BW/4HANA Data Architecture Perspective

As you could see ABAP CDS view and timestamp in it have been used for delta extraction. However, the timestamp is usually not present in Propagation modelling layer of BW/4HANA in which business logic is implemented and mainly there is consolidated and aggregated data in this layer.

To overcome the challenge a new data set – extracted from the propagation layer may be required with a timestamp or raw data can be extracted from a Corporate layer in which usually the timestamp exists.

Conclusion

You can utilize SAP ODP sources via ADF or Synapse Analytics such as:

  • InfoProviders/InfoObjects in SAP BW and BW/4HANA. In other words, all master data and transactional data in SAP BW.   
  • SAP classic extractors, originally built to extract data from SAP ECC and load it into SAP BW
  • ABAP CDS views, the new data extraction standard for SAP S/4HANA
  • Custom-built classic Extractors and CDS Views and tables

Overall Azure SAP CDC is a powerful and reliable way to replicate data from SAP BW/4HANA into Azure for further analysis and reporting. You can use Azure SAP CDC to integrate Azure and SAP BW/4HANA and take advantage of Azure’s data processing and analytics capabilities and SAP’s ODP framework. This allows you to extract BW/4HANA reporting data that has already been processed with business logic, giving you greater insights into your business data or raw data for further processing in Microsoft Azure