By Adrian Kok

📕 The following is a guide that can be used to study for the Databricks Data Engineer Associate certification. The exam has 45 questions and requires 70% to pass. The guide covers the major topics of the examination, but is not comprehensive. For a full list of all the topics that may appear on the exam, please refer to the official Databricks Certification Guideline that can be found here.

Databricks Lakehouse Platform

What is the Databricks Lakehouse Platform?

A data Lakehouse is a modern data architecture that is composed of a centralized, single platform that is used to serve it’s users. Lakehouse replaces the current dependency on data lakes and data warehouses, which have it’s own limitations.

Databricks Lakehouse File Storage Format and support.

The Lakehouse uses an open standard storage format (Parquet) and supports low latency BI workloads. It is powerful because it can support Schema enforcement, enforces ACID transactions, Business Intelligence applications, open file formats such as CSV, and end to end streaming use cases.

Databricks Lakehouse Storage and Compute

In the Lakehouse, Storage is decoupled from compute, allowing you to manage storage and compute costs in an efficient manner.

Databricks Workspace and Services

Databricks separates their workspace into three planes, the control plane, data plane, and the cloud storage plane.

Untitled

  1. Control Plane

The control plane is the administrative and management layer of the Databricks platform, responsible for tasks like cluster management, job scheduling, and security.

b. Data Plane

The data plane is where the actual data processing and computation happen, consisting of Apache Spark clusters provisioned and managed by the control plane.

c. Cloud Storage

he cloud storage plane refers to the data storage layer used by Databricks, which integrates with cloud storage services like Amazon S3 or Azure Blob Storage.

Databricks Notebooks