Every data movement follows the same principle, source data is transformed and made available in the target and along the line some side effect data is generated also.

This applies to ETL flows, where a program is called at specific times reads the change data from the source, applies transformations and the result of the transformations are stored in the target.

But the same can be said about data federation (Fabric Shortcuts), where data is read from the target and the request is re-routed to the source dynamically.

Or realtime replication where the source pushes changes to a consumer, the transformation is a pure 1:1 mapping and the data physically copied into the target.

Note that "source" does not mean necessarily a remote source system. It could also mean that two tables in Fabric Lakehouse are joined and put as a fact table into Fabric Warehouse.

Having said all of that, one decision was that the target is always a physical persistence, a table, for query performance reasons.


When we now consider the different tools and methods, this leaves to some obvious and less obvious requirements. Each method must be able to .... 

At the end it is a cost comparison question.


By using Python and DuckDB, we get the flexibility of Python with the processing power of an in-memory, columnar ANSI SQL database. Coupled that with Kafka as event broker to trigger dependent processes checks every single requirement and enables a future proof solution.