Introduction to DBT Projects

DBT (Data Build Tool) is a framework that helps transform data in your data warehouse by enabling analytics engineers to work more effectively with SQL. Here are comprehensive notes on setting up a DBT project based on the provided transcripts.

Project Structure

Every DBT project has a standardized structure:

Standard folders with specific naming conventions
Global configuration files
Models for your data transformations

Two Setup Methods

Method A: Using DBT Cloud (BigQuery)

Project Initialization:
- DBT Cloud IDE guides you through setup process
- Connection to git repository (GitHub in this course) is required
- IDE creates necessary folder structure and configuration files
Configuration:
- Project name defined in project YAML file
- Connection to BigQuery configured under account settings
- Subdirectory can be defined if creating project in existing repository
Workflow in DBT Cloud:
- Branch management through the cloud interface
- Work from a development branch, not main
- Initial commit creates basic "Hello DBT" project structure

Method B: Using DBT Core Locally (Postgres)

Prerequisites:
- Empty repository to store your project
- Local database (Postgres in this example)
- Appropriate adapter installed (postgres-adapter or bigquery-adapter)
Profile Configuration:
- Create profiles.yml file at ~/.dbt/profiles.yml
- Define database connections (user, password, database, schema)
- Can define multiple targets (dev, prod) and even multiple database types
Project Initialization:
- Run dbt init command in your repository folder
- Select database type when prompted
- Name your project (e.g., "taxi_rides_new_york")

Key Configuration Files

dbt_project.yml:
- Project name definition
- Profile selection (must match profiles.yml)
- Global configurations
- Default materializations (table/view) for models
- Custom variables
profiles.yml (for local setup):
- Located outside project at ~/.dbt/profiles.yml
- Contains database connection details
- Allows defining multiple targets and databases
- Sets number of threads for parallel execution

Initial Project Structure

After initialization, your project contains:

models/ directory with example models
analyses/ directory for non-materialized SQL
tests/ directory for data quality tests
macros/ directory for reusable SQL snippets
seeds/ directory for CSV data
Configuration files

Validation and Testing

Run dbt debug to verify your connection settings
Run dbt deps to install dependencies
Run dbt build to verify the entire project works

Development Workflow

Create a development branch (don't work directly on main)
Initialize project structure
Configure project and connection settings
Start building models
Commit changes to repository

Notes on Model Materialization

The project.yml file allows you to set default materializations:

Tables: Persistent and fully materialized
Views: Virtual tables that run their query on access

These can be applied globally or to specific directories within your models.

Connecting to Data Warehouse

For BigQuery: Configure through DBT Cloud account settings
For Postgres: Define connection in profiles.yml with credentials

Common Commands

dbt init: Start a new project
dbt debug: Test connections and project configuration
dbt deps: Install dependencies
dbt build: Run all models, tests, and snapshots

These notes cover the fundamentals of setting up a DBT project using either the cloud interface with BigQuery or locally with Postgres.

Study Notes 4.2.1 | 4.2.2: DBT Project Setup