AI

Working Jaffle Store dbt Challenge in Docker

Picture by Ryan Howerter on Unsplash

If you’re new to information construct device (dbt) you might have in all probability come throughout the so known as Jaffle Store, a undertaking used for testing functions.

jaffle_shop is a fictional ecommerce retailer. This dbt undertaking transforms uncooked information from an app database right into a clients and orders mannequin prepared for analytics.

Jaffle Shop GitHub project

One elementary subject I noticed with the Jaffle Store undertaking is that it expects customers, who could also be newcomers to dbt, to configure and host a neighborhood database for the dbt fashions to materialize.

On this tutorial, I’ll display find out how to create a containerized model of the undertaking utilizing Docker. It will enable us to deploy a Postgres occasion and configure the dbt undertaking to learn from and write to that database. I’ll additionally present a hyperlink to a GitHub undertaking I’ve created that can enable you get all of the companies up and operating very quickly.

Creating the Dockerfile and docker-compose.yml

Let’s start by defining the companies we need to run by way of Docker. First, we’ll create a docker-compose.yml file the place we’ll outline two companies. The primary service would be the Postgres database, and the second can be a customized service that we’ll create within the subsequent step utilizing a Dockerfile.

# docker-compose.yml

model: "3.9"

companies:
postgres:
container_name: postgres
picture: postgres:15.2-alpine
setting:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
ports:
- 5432
healthcheck:
take a look at: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
dbt:
container_name: dbt
construct: .
picture: dbt-jaffle-shop
volumes:
- ./:/usr/src/dbt
depends_on:
postgres:
situation: service_healthy

The file specifies the model of Docker Compose getting used (model 3.9). It defines two companies, postgres and dbt, every with their very own settings.

The postgres service is predicated on the official postgres Docker picture model 15.2-alpine. It units the container title to postgres, maps port 5432 (the default port for Postgres) to the host machine, and units setting variables for the Postgres consumer and password. The healthcheck part specifies a command to check if the container is wholesome, and units a timeout and retries for the verify.

The dbt service specifies a dbt container of the Docker picture of present listing (utilizing a Dockerfile). It mounts the present listing as a quantity throughout the container, and specifies that it is determined by the postgres service, and can solely begin as soon as the postgres service is wholesome.

So as to containerize the Jaffle Store undertaking, we have to create a Dockerfile that installs the mandatory dependencies for each Python and dbt, and ensures that the container stays energetic as soon as the setting has been arrange.

# Dockerfile

FROM --platform=linux/amd64 python:3.10-slim-buster

RUN apt-get replace
&& apt-get set up -y --no-install-recommends

WORKDIR /usr/src/dbt

# Set up the dbt Postgres adapter. This step can even set up dbt-core
RUN pip set up --upgrade pip
RUN pip set up dbt-postgres==1.2.0
RUN pip set up pytz

# Set up dbt dependencies (as laid out in packages.yml file)
# Construct seeds, fashions and snapshots (and run assessments wherever relevant)
CMD dbt deps && dbt construct --profiles-dir ./profiles && sleep infinity

Configuring Postgres with dbt

To work together with dbt, we’ll use the dbt Command Line Interface (CLI). A listing containing a dbt_project.yml file is taken into account a dbt undertaking by the dbt CLI.

We’ll create one and specify some fundamental configurations, such because the dbt undertaking title and the profile to make use of (which we’ll create within the subsequent step). Moreover, we’ll specify the paths containing the varied dbt entities and supply configuration about their materialization.

# dbt_project.yml

title: 'jaffle_shop'

config-version: 2
model: '0.1'

profile: 'jaffle_shop'

model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
analysis-paths: ["analysis"]
macro-paths: ["macros"]

target-path: "goal"
clean-targets:
- "goal"
- "dbt_modules"
- "logs"

require-dbt-version: [">=1.0.0", "<2.0.0"]

fashions:
jaffle_shop:
materialized: desk
staging:
materialized: view

Now the profiles.yml file is used to retailer dbt profiles. A profile consists of targets, every of which specifying the connection particulars and credentials for the database or the information warehouse.

# profiles.yml

jaffle_shop:
goal: dev
outputs:
dev:
kind: postgres
host: postgres
consumer: postgres
password: postgres
port: 5432
dbname: postgres
schema: public
threads: 1

This file defines a profile named jaffle_shop that specifies the connection particulars for a Postgres database operating on a Docker container named postgres.

  • jaffle_shop: That is the title of the profile. It is an arbitrary title chosen by the consumer to determine the profile.
  • goal: dev: This specifies the default goal for the profile, which on this case is called dev.
  • outputs: This part lists the output configurations for the profile, with the default output configuration named dev.
  • dev: This specifies the connection particulars for the dev goal, which makes use of a Postgres database.
  • kind: postgres: This specifies the kind of the output, which on this case is a Postgres database.
  • host: postgres: This specifies the hostname or IP deal with of the Postgres database server.
  • consumer: postgres: This specifies the username used to connect with the Postgres database.
  • password: postgres: This specifies the password used to authenticate with the Postgres database.
  • port: 5432: This specifies the port quantity on which the Postgres database is listening.
  • dbname: postgres: This specifies the title of the Postgres database to connect with.
  • schema: public: This specifies the schema title to make use of when executing queries in opposition to the database.
  • threads: 1: This specifies the variety of threads to make use of when operating dbt duties.

Jaffle Store dbt fashions and seeds

The supply information for Jaffle Store undertaking consists of csv recordsdata for purchasers, funds and orders. In dbt, we will load this information into our database by way of seeds. We then use this supply information to construct dbt models on prime of it.

Right here’s an instance mannequin that generates some metrics for our customers:

with clients as (

choose * from {{ ref('stg_customers') }}

),

orders as (

choose * from {{ ref('stg_orders') }}

),

funds as (

choose * from {{ ref('stg_payments') }}

),

customer_orders as (

choose
customer_id,

min(order_date) as first_order,
max(order_date) as most_recent_order,
depend(order_id) as number_of_orders
from orders

group by customer_id

),

customer_payments as (

choose
orders.customer_id,
sum(quantity) as total_amount

from funds

left be a part of orders on
funds.order_id = orders.order_id

group by orders.customer_id

),

last as (

choose
clients.customer_id,
clients.first_name,
clients.last_name,
customer_orders.first_order,
customer_orders.most_recent_order,
customer_orders.number_of_orders,
customer_payments.total_amount as customer_lifetime_value

from clients

left be a part of customer_orders
on clients.customer_id = customer_orders.customer_id

left be a part of customer_payments
on clients.customer_id = customer_payments.customer_id

)

choose * from last

Working the companies by way of Docker

Now let’s construct and spin up our Docker companies. To take action, we’ll merely have to run the next instructions:

$ docker-compose construct
$ docker-compose up

The instructions above will run a Postgres occasion after which construct the dbt sources of Jaffle Store as specified within the repository. These containers will stay up and operating as a way to:

  • Question the Postgres database and the tables created out of dbt fashions
  • Run additional dbt instructions by way of dbt CLI

Working dbt instructions by way of CLI

The dbt container, has constructed the required fashions already. Nevertheless, we will nonetheless entry the container and run dbt instructions by way of dbt CLI, both for brand spanking new or modified fashions. To take action, we are going to first have to entry the container.

The next command will record all energetic containers:

$ docker ps

Copy the id of dbt container, after which enter it when operating the following command:

$ docker exec -it <container-id> /bin/bash

The command above will primarily offer you entry to the container’s bash which implies you are actually capable of run dbt instructions.

# Set up dbt deps (may not required so long as you don't have any -or empty- `dbt_packages.yml` file)
dbt deps

# Construct seeds
dbt seeds --profiles-dir profiles

# Construct information fashions
dbt run --profiles-dir profiles

# Construct snapshots
dbt snapshot --profiles-dir profiles

# Run assessments
dbt take a look at --profiles-dir profiles

Observe that since now we have mounted the native listing to the operating container, any modifications within the native listing can be mirrored to the container instantly. This implies you’re additionally capable of create new fashions or modify current ones after which go into the runnning container and construct fashions, run assessments, and many others.

Querying the dbt fashions on Postgres database

You might be additionally capable of question the postgres database and the dbt fashions or snapshots created on it. In the identical manner, we should enter the operating postgres container so as to have the ability to question the database immediately.

# Get the container id for `postgres` service
$ docker ps

# Then copy the container id to the next command to enter the
# operating container
$ docker exec -it <container-id> /bin/bash

We’ll then use psql, a terminal-based interface for PostgreSQL that enables us to question the database:

$ psql -U postgres

The 2 instructions shared under can be utilized to record tables and views respectively:

postgres=# dt
Checklist of relations
Schema | Identify | Sort | Proprietor
--------+---------------+-------+----------
public | clients | desk | postgres
public | orders | desk | postgres
public | raw_customers | desk | postgres
public | raw_orders | desk | postgres
public | raw_payments | desk | postgres
(5 rows)

postgres=# dv
Checklist of relations
Schema | Identify | Sort | Proprietor
--------+---------------+------+----------
public | stg_customers | view | postgres
public | stg_orders | view | postgres
public | stg_payments | view | postgres
(3 rows)

And now you can question dbt fashions by way of a SELECT question:

SELECT * FROM <table_or_view_name>;

Getting the total code

I’ve created a GitHub repository you’ll be able to clone in your native machine and run the containerised model Jaffle Store dbt undertaking shortly. Yow will discover the undertaking in addition to the code shared on this tutorial within the following hyperlink.

Ultimate Ideas

Knowledge construct device (dbt) is without doubt one of the quickly rising applied sciences in fashionable information stacks. For those who’re simply beginning to learn to use dbt, I extremely suggest experimenting with the Jaffle Store undertaking. It’s a self-contained undertaking created by dbt Labs for testing and experimentation functions.

dbt is a device generally utilized by information analysts and analytics engineers (along with information engineers), and it requires a connection to a database or information warehouse. Nevertheless, many analysts may not be snug configuring and initializing a neighborhood database.

On this article, we display find out how to get began with dbt and run all of the companies required to materialize dbt fashions on a neighborhood Postgres database. I hope this tutorial will enable you get your dbt undertaking and database up and operating as shortly as attainable. For those who expertise any points operating the undertaking, please let me know within the feedback, and I’ll do my greatest that will help you debug your code and configuration.

👉 Become a member and browse each story on Medium. Your membership payment immediately helps me and different writers you learn. You’ll additionally get full entry to each story on Medium.

👇Associated articles you may additionally like 👇

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button