Posts

The Functioning of Data Lake Built on Amazon S3

Image
 The Simple Storage Service (S3) of Amazon is a cloud-based data storage service that stores data in its native format – unstructured, semi-structured, or structured. Here, metadata and objects in data files are stored in buckets. For uploading files or metadata, the objects have to be first moved to Amazon S3. After this step is completed, permissions can be set on the metadata or the related objects stored in the buckets (containers) and selected personnel are given access to them. They are given the authority to decide where the logs and objects will be stored on Amazon S3. Various competencies can be used for building an S3 data lake on Amazon S3. These include media data processing applications, Artificial Intelligence (AI), Machine Learning (ML), big data analytics, and high-performance computing (HPC). All these used in tandem give organizations access to critical business intelligence and analytics as well as unstructured data sets from S3 data lake. There are several benefits

SAP Data Lake – Evolution and Architecture

Image
 April 2020 saw the launch of the SAP HANA Data Lake designed to further increase the capabilities of SAP of its data storage capabilities. The goal was to provide storage options to customers at very affordable rates. The package has SAP HANA native storage extension as well as the SAP data lake incorporated in it. This cloud-based relational data lake of the SAP IQ ecosystem has features at par with the leaders in this field namely Microsoft Azure or Amazon Simple Storage Service (S3). Architecture of the SAP Data Lake The unique SAP data lake architecture resembles a pyramid with the top, middle, and bottom segments having specific storage capabilities. At the top section of the structure is all data that is critical for organizations and hence the cost of data storage here is the highest in the SAP data lake. This data is frequently accessed and processed for operational requirements. The middle of the pyramid stores data that is not accessed frequently but not insignificant enoug

Extracting Data From SAP Source Systems

Image
 An integral and critical component of the data retrieval mechanisms in the source systems of SAP are Extractors that can be used to extract data from SAP. An Extractor can fill the structure of a Data Source with the data from the source system datasets of SAP. Replication is the process that is widely used to enable the Data Source and its relevant properties to be known in the SAP Business Warehouse (BW). To extract data from SAP and transfer it to the input layer of SAP Business Warehouse that is the Persistent Staging Area (PSA), the load process with an Info Package in the scheduler needs to be defined. When the Info Package is executed, the data load process is triggered by a request IDoc to the source system. The process chains should be used for executions.  Process to extract data from SAP There are several application-specific extractors, which are hard-coded for the Data Source delivered with the BI Content of the Business Warehouse. These extractors fill the precise struc