Forum

Please or Register to create posts and topics.

What is Lakehouse architecture and its benefits?

Lackhouse được thiết kế như thế nào

What is Lakehouse architecture and its benefits?

➡ A Lakehouse is an open architecture that combines the best elements of data lakes and data warehouses.

➡ A Lakehouse has the following key features:

🔹 Transaction support: Supports ACID transactions which ensures consistency as multiple parties concurrently read or write data, typically using SQL.

🔹 Schema enforcement and governance: Supports schema enforcement and evolution, supporting DW schema architectures such as star/snowflake-schemas. The system is able to reason about data integrity and have robust governance and auditing mechanisms.

🔹 BI support: Lakehouse enables using BI tools directly on the source data thereby reducing staleness and improves recency, reduces latency, and lowers the cost of having to operationalize two copies of the data in both a data lake and a warehouse.

🔹 Storage is decoupled from compute: Storage and compute use separate clusters, thus able to scale to many more concurrent users and larger data sizes.

🔹 Openness: Storage formats used are open and standardized, such as Parquet, and provides an API for variety of tools and engines, including machine learning and Python/R libraries which helps in efficiently accessing the data directly.

🔹 Support for diverse data types ranging from unstructured to structured data: The Lakehouse can be used to store, refine, analyze, and access data types needed for many new data applications, including images, video, audio, semi-structured data, and text.

🔹 End-to-end streaming: Real-time reports are the norm in many enterprises. Support for streaming eliminates the need for separate systems dedicated to serving real-time data applications.