Introduction to DBT Projects
DBT (Data Build Tool) is a framework that helps transform data in your data warehouse by enabling analytics engineers to work more effectively with SQL. Here are comprehensive notes on setting up a DBT project based on the provided transcripts.
Project Structure
Every DBT project has a standardized structure:
- Standard folders with specific naming conventions
- Global configuration files
- Models for your data transformations
Two Setup Methods
Method A: Using DBT Cloud (BigQuery)
-
Project Initialization:
- DBT Cloud IDE guides you through setup process
- Connection to git repository (GitHub in this course) is required
- IDE creates necessary folder structure and configuration files
-
Configuration:
- Project name defined in project YAML file
- Connection to BigQuery configured under account settings
- Subdirectory can be defined if creating project in existing repository
-
Workflow in DBT Cloud:
- Branch management through the cloud interface
- Work from a development branch, not main
- Initial commit creates basic "Hello DBT" project structure
Method B: Using DBT Core Locally (Postgres)
-
Prerequisites:
- Empty repository to store your project
- Local database (Postgres in this example)
- Appropriate adapter installed (postgres-adapter or bigquery-adapter)
-
Profile Configuration:
- Create
profiles.yml
file at~/.dbt/profiles.yml
- Define database connections (user, password, database, schema)
- Can define multiple targets (dev, prod) and even multiple database types
- Create
-
Project Initialization:
- Run
dbt init
command in your repository folder - Select database type when prompted
- Name your project (e.g., "taxi_rides_new_york")
- Run
Key Configuration Files
-
dbt_project.yml:
- Project name definition
- Profile selection (must match profiles.yml)
- Global configurations
- Default materializations (table/view) for models
- Custom variables
-
profiles.yml (for local setup):
- Located outside project at
~/.dbt/profiles.yml
- Contains database connection details
- Allows defining multiple targets and databases
- Sets number of threads for parallel execution
- Located outside project at
Initial Project Structure
After initialization, your project contains:
-
models/
directory with example models -
analyses/
directory for non-materialized SQL -
tests/
directory for data quality tests -
macros/
directory for reusable SQL snippets -
seeds/
directory for CSV data - Configuration files
Validation and Testing
- Run
dbt debug
to verify your connection settings - Run
dbt deps
to install dependencies - Run
dbt build
to verify the entire project works
Development Workflow
- Create a development branch (don't work directly on main)
- Initialize project structure
- Configure project and connection settings
- Start building models
- Commit changes to repository
Notes on Model Materialization
The project.yml file allows you to set default materializations:
- Tables: Persistent and fully materialized
- Views: Virtual tables that run their query on access
These can be applied globally or to specific directories within your models.
Connecting to Data Warehouse
- For BigQuery: Configure through DBT Cloud account settings
- For Postgres: Define connection in profiles.yml with credentials
Common Commands
-
dbt init
: Start a new project -
dbt debug
: Test connections and project configuration -
dbt deps
: Install dependencies -
dbt build
: Run all models, tests, and snapshots
These notes cover the fundamentals of setting up a DBT project using either the cloud interface with BigQuery or locally with Postgres.