Getting data projects done

Working around Lambda AWS Lambda constraints

15.11.2023

Lambda functions were designed for microservice architectures. For simple functions, we have no or little dependencies and never bump into the Lambda dependency quota: the total unzipped code including all dependencies must not be larger than 250 MB. If we want use the technology for data engineering or (even worse) data science applications, we quickly accumulate dependencies large dependencies.

Python package structure for complex serverless apps

15.11.2023

When building a serverless application on AWS involving multiple functions in a single repository, we have to build and package Lambda functions and Lambda layers. At the same time, we ideally to install all dependencies and development tools without any conflicts into our development environment.

Anatomy of an extract load job

15.11.2023

In data engineering, an extract load job refers to the two stage step of extracting data from sources and loading into a date warehouse. In many situations, we need to run an EL job only a few times a day. Using a serverless setup seems very appealing. In the sequel, we will discuss how to best structure and organize a data integration app with multiple sources.

Cloud Data Warehouse Benchmarks

04.06.2022

Die Data-Warehousing-Lösungen sind nicht einfach zu vergleichen, da die Bewertung der Leistung, die sie für das ausgegebene Geld bieten, sehr aufwändig ist. Bestehende Studien stützen sich in der Regel auf die TPC Performance-Benchmark-Datensätze. Das TPC (Transaction Processing Performance Council) ist eine Non-Profit-Organisation, die gegründet wurde, um Transaktionsverarbeitungs- und Datenbank-Benchmarks zu definieren und objektive, überprüfbare Leistungsdaten in der Branche zu verbreiten. Dieser Benchmark, bei dem komplette OLTP-Systemkonfigurationen gemessen werden, ist ein von der Branche allgemein akzeptierter Maßstab.