Microsoft Fabric Lakehouse setup
profiles.yml
file is for dbt Core users onlyIf you're using dbt Cloud, you don't need to create a profiles.yml
file. This file is only for dbt Core users. To connect your data platform to dbt Cloud, refer to About data platforms.
Below is a guide for use with Fabric Data Engineering, a new product within Microsoft Fabric. This adapter currently supports connecting to a lakehouse endpoint.
To learn how to set up dbt using Fabric Warehouse, refer to Microsoft Fabric Data Warehouse.
- Maintained by: Microsoft
- Authors: Microsoft
- GitHub repo: microsoft/dbt-fabricspark
- PyPI package:
dbt-fabricspark
- Slack channel: db-fabric-synapse
- Supported dbt Core version: v1.7 and newer
- dbt Cloud support: Not supported
- Minimum data platform version: n/a
Installing dbt-fabricspark
Use pip
to install the adapter. Before 1.8, installing the adapter would automatically install dbt-core
and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install dbt-core
. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations.
Use the following command for installation:
Configuring dbt-fabricspark
For Microsoft Fabric-specific configuration, please refer to Microsoft Fabric configs.
For further info, refer to the GitHub repository: microsoft/dbt-fabricspark
Connection methods
dbt-fabricspark can connect to Fabric Spark runtime using Fabric Livy API method. The Fabric Livy API allows submitting jobs in two different modes:
session-jobs
A Livy session job entails establishing a Spark session that remains active throughout the spark session. A spark session, can run multiple jobs (each job is an action), sharing state and cached data between jobs.- batch jobs entails submitting a Spark application for a single job execution. In contrast to a Livy session job, a batch job doesn't sustain an ongoing Spark session. With Livy batch jobs, each job initiates a new Spark session that ends when the job finishes.
To share the session state among jobs and reduce the overhead of session management, dbt-fabricspark adapter supports only session-jobs
mode.
session-jobs
session-jobs is the preferred method when connecting to Fabric Lakehouse.
your_profile_name:
target: dev
outputs:
dev:
type: fabricspark
method: livy
authentication: CLI
endpoint: https://api.fabric.microsoft.com/v1
workspaceid: [Fabric Workspace GUID]
lakehouseid: [Lakehouse GUID]
lakehouse: [Lakehouse Name]
schema: [Lakehouse Name]
spark_config:
name: [Application Name]
# optional
archives:
- "example-archive.zip"
conf:
spark.executor.memory: "2g"
spark.executor.cores: "2"
tags:
project: [Project Name]
user: [User Email]
driverMemory: "2g"
driverCores: 2
executorMemory: "4g"
executorCores: 4
numExecutors: 3
# optional
connect_retries: 0
connect_timeout: 10
retry_all: true
Optional configurations
Retries
Intermittent errors can crop up unexpectedly while running queries against Fabric Spark. If retry_all
is enabled, dbt-fabricspark will naively retry any queries that fails, based on the configuration supplied by connect_timeout
and connect_retries
. It does not attempt to determine if the query failure was transient or likely to succeed on retry. This configuration is recommended in production environments, where queries ought to be succeeding. The default connect_retries
configuration is 2.
For instance, this will instruct dbt to retry all failed queries up to 3 times, with a 5 second delay between each retry:
retry_all: true
connect_timeout: 5
connect_retries: 3
Spark configuration
Spark can be customized using Application Properties. Using these properties the execution can be customized, for example, to allocate more memory to the driver process. Also, the Spark SQL runtime can be set through these properties. For example, this allows the user to set a Spark catalogs.
Supported functionality
Most dbt Core functionality is supported, Please refer to Delta Lake interoporability.
Delta-only features:
- Incremental model updates by
unique_key
instead ofpartition_by
(seemerge
strategy) - Snapshots
- Persisting column-level descriptions as database comments
Limitations
- Lakehouse schemas are not supported. Refer to limitations
- Service Principal Authentication is not supported yet by Livy API.
- Only Delta, CSV & Parquet table data formats are supported by Fabric Lakehouse.