📕 The following is a guide that can be used to study for the Databricks Data Engineer Associate certification. The exam has 45 questions and requires 70% to pass. The guide covers the major topics of the examination, but is not comprehensive. For a full list of all the topics that may appear on the exam, please refer to the official Databricks Certification Guideline that can be found here.
A data Lakehouse is a modern data architecture that is composed of a centralized, single platform that is used to serve it’s users. Lakehouse replaces the current dependency on data lakes and data warehouses, which have it’s own limitations.
The Lakehouse uses an open standard storage format (Parquet) and supports low latency BI workloads. It is powerful because it can support Schema enforcement, enforces ACID transactions, Business Intelligence applications, open file formats such as CSV, and end to end streaming use cases.
In the Lakehouse, Storage is decoupled from compute, allowing you to manage storage and compute costs in an efficient manner.
Databricks separates their workspace into three planes, the control plane, data plane, and the cloud storage plane.

The control plane is the administrative and management layer of the Databricks platform, responsible for tasks like cluster management, job scheduling, and security.
b. Data Plane
The data plane is where the actual data processing and computation happen, consisting of Apache Spark clusters provisioned and managed by the control plane.
c. Cloud Storage
he cloud storage plane refers to the data storage layer used by Databricks, which integrates with cloud storage services like Amazon S3 or Azure Blob Storage.