Study Notes 4.2.1 | 4.2.2: DBT Project Setup

Pizofreude - Feb 25 - - Dev Community

Introduction to DBT Projects

DBT (Data Build Tool) is a framework that helps transform data in your data warehouse by enabling analytics engineers to work more effectively with SQL. Here are comprehensive notes on setting up a DBT project based on the provided transcripts.

Project Structure

Every DBT project has a standardized structure:

  • Standard folders with specific naming conventions
  • Global configuration files
  • Models for your data transformations

Two Setup Methods

Method A: Using DBT Cloud (BigQuery)

  1. Project Initialization:
    • DBT Cloud IDE guides you through setup process
    • Connection to git repository (GitHub in this course) is required
    • IDE creates necessary folder structure and configuration files
  2. Configuration:
    • Project name defined in project YAML file
    • Connection to BigQuery configured under account settings
    • Subdirectory can be defined if creating project in existing repository
  3. Workflow in DBT Cloud:
    • Branch management through the cloud interface
    • Work from a development branch, not main
    • Initial commit creates basic "Hello DBT" project structure

Method B: Using DBT Core Locally (Postgres)

  1. Prerequisites:
    • Empty repository to store your project
    • Local database (Postgres in this example)
    • Appropriate adapter installed (postgres-adapter or bigquery-adapter)
  2. Profile Configuration:
    • Create profiles.yml file at ~/.dbt/profiles.yml
    • Define database connections (user, password, database, schema)
    • Can define multiple targets (dev, prod) and even multiple database types
  3. Project Initialization:
    • Run dbt init command in your repository folder
    • Select database type when prompted
    • Name your project (e.g., "taxi_rides_new_york")

Key Configuration Files

  1. dbt_project.yml:
    • Project name definition
    • Profile selection (must match profiles.yml)
    • Global configurations
    • Default materializations (table/view) for models
    • Custom variables
  2. profiles.yml (for local setup):
    • Located outside project at ~/.dbt/profiles.yml
    • Contains database connection details
    • Allows defining multiple targets and databases
    • Sets number of threads for parallel execution

Initial Project Structure

After initialization, your project contains:

  • models/ directory with example models
  • analyses/ directory for non-materialized SQL
  • tests/ directory for data quality tests
  • macros/ directory for reusable SQL snippets
  • seeds/ directory for CSV data
  • Configuration files

Validation and Testing

  • Run dbt debug to verify your connection settings
  • Run dbt deps to install dependencies
  • Run dbt build to verify the entire project works

Development Workflow

  1. Create a development branch (don't work directly on main)
  2. Initialize project structure
  3. Configure project and connection settings
  4. Start building models
  5. Commit changes to repository

Notes on Model Materialization

The project.yml file allows you to set default materializations:

  • Tables: Persistent and fully materialized
  • Views: Virtual tables that run their query on access

These can be applied globally or to specific directories within your models.

Connecting to Data Warehouse

  • For BigQuery: Configure through DBT Cloud account settings
  • For Postgres: Define connection in profiles.yml with credentials

Common Commands

  • dbt init: Start a new project
  • dbt debug: Test connections and project configuration
  • dbt deps: Install dependencies
  • dbt build: Run all models, tests, and snapshots

These notes cover the fundamentals of setting up a DBT project using either the cloud interface with BigQuery or locally with Postgres.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .