Terraform – Dev Community for DBT Users

If you know DBT, this post is using the same mental model you use for DBT to help you learn the basic concepts of Terraform. You will see where they are similar and how you can apply the same patterns used in DBT to Terraform!

Why? That’s why making connections Between what you’re learning and what you already know, it’s an effective way to wrap your head around that new topic.

if you know norSome comparisons may sound abstract, but it is Ok, Hopefully you will gain an understanding of the patterns and ecosystems around both which will make learning exponentially easier with either.

introduction

Over the past five years, two libraries have fundamentally changed the game in their respective realms: Terraform for cloud infrastructure and DBT for data transformation.

The purpose of this post is to give you a basic understanding of what Terraform does and how many patterns we love about DBT, also migrate to Terraform.

my personal goal is to get dbt users who don’t know terraform Excited about this. I want you (if this is you) to be curious about how it can help you in your work and day-to-day life.

I will have another blog post soon on where Terraform Data fits into the current state of the industry, but this is not that post. Consider this preamble. It is meant to excite you and make you want to know how it will improve your life.

terraform

Terraform Cloud Computing is a tool created by the brightest minds of Hashicorp to solve problems in the new space of cloud infrastructure. This popularized the concept of code as an infrastructure.

I remember when I started using AWS, I looked at my team and asked them:

stop. You just… click the buttons and you get a computer for about $50,000 a month? And everyone else can replace that computer too? And if you want to prevent other people from messing with your stuff, you need to figure out IAM policies and roles? And are you okay with all this?

After some frustration I ran. Fast. Away from cloud infrastructure because it seemed too daunting. It felt like a Swiss army knife made from a flamethrower. to be precise, cloud infrastructure Are there any servers, databases or other tools from cloud service providers (CSPs) that allow you to create. Examples of CSPs are companies like Amazon Web Services (AWS), Google Cloud Platform (GCP), or Azure (sorry, no abbreviation here).


Source: A Historical Image of Me Running from CSP

When Terraform came along, I dipped my toes into infrastructure again. Here was a way to manage the infrastructure Basic lesson, All the same, sweet version-controlled text we archived all our code on git.

Terraform uses libraries called providers and allows people to create resources. Resource Those are the items we created earlier that require multiple button clicks and forms to fill out, such as virtual machines or private networks. What used to take several clicks and forms, I was now able to spin up an AWS EC2 virtual machine in a few lines of Terraform and run a terraform apply command.

The big magic of it all, though? Terraform knew if anything had changed and would adjust accordingly. If someone on the team has renamed something, or if I’ve added it to a new IP permission list, it will be put under version control and run by CI/CD after the merge pull request. All the tension, worry and worries went away. So I could just focus on making the good stuff.

Terraform Provider for DBT Users

If you use DBT, you probably follow an ELT pattern, use a cloud data warehouse, and have a certain number of devices in your data stack. Traditionally, Terraform has been used to manage cloud resources from a CSP.

Although, Terraform is not limited to cloud resources. It can also be used to programmatically configure and manage the tools you use every day. There are Terraform providers for a lot of services and tools:

You can configure Snowflake/BigQuery access, Fivetran connectors (and sync schedules!), GitHub repository access, and DBT cloud jobs. All of this is executed and managed in a code-based, CI/CD-run, and audio manner!

Git (version control)

Both devices are Hull version controlled. For people using DBT who used to write SQL, save them as stored procedures on the company’s database, and find out weeks later that someone changed it later and didn’t tell you, version control keeps people happy , is less stressed, and not hostile towards their peers.

Just as DBT makes it easy to store SQL in Git, Terraform does the same with configuration for infrastructure.

being able to push the changes to git they are just text filesA whole new world of possibilities opens up. In pull requests, DBT and its ecosystem make it easy to perform Continuous Integration (CI) on DBT to test changes and impact of the model. With CI, Terraform made it easy to run tests to ensure that the configurations being proposed made sense, were valid, and were cost-effective.

narrative

This is less an aspect of DBT and more about SQL. For those unfamiliar, SQL is known as a declarative language. tl; The dr is that declarative languages ​​are programming languages ​​where you tell the program what you want to output and it will tell you the best way to get that answer. when you run select statement, the database finds the fastest way to get the data you ask for. Most end users don’t need to care about what’s in memory vs storage, how the data is partitioned, etc. The user only gets the result of the query.

Terraform is also declarative. You tell Terraform what you want to build, and it builds it for you. It will figure out which API endpoint to hit, which configurations to set, where it is deployed, etc. for you. As the end user, you don’t have to figure out how it’s done (most of the time). You press enter and then take advantage.

statefulness

If you’ve ever lost your way in your /target directory in your dbt project, you must have seen a file named manifest.json, This file is a picture of the time metadata that makes up your DBT project. When you use SlimCI, --deferor state: modified selection method, it uses the existing manifest.json To find the difference between what you have changed and what was in the past manifest.json, most used manifest.json One of your most recent production builds. By doing this, you can run or test only those models in which changes have been made, as if you were adding a column or test.

DBT Cloud can defer to your production environment
Source: https://docs.getdbt.com/docs/deploy/cloud-ci-job

Terraform has a similar file called it tfstate.json, This file is a complete representation of everything Terraform has created for you based on your Terraform files.

This is the real magic of Terraform tfstate.json, When running Terraform, what does it compare to your current Terraform code tfstate.json The file says exists. Afterwards, it executes the tasks (create, update, delete) for Only Resources that were either replaced or did not exist. For example, if we want to change a property (eg name), Terraform will update the resource without changing it, if possible. However, if we don’t change anything, Terraform will do anything because it doesn’t have to change anything. if we deleted code, terraform will explode and ruin Related resources made from that code. It will also tell you if any other resources are dependent on the resource you are deleting (similar to dbt ref macro!).

different from terraform plan
Source: https://stackoverflow.com/questions/43532785/can-terraform-plan-show-me-a-json-diff-for-a-changed-resource

this magic is called statefulness, Statefulness is the property in which a program knows what it is (or what it used to be) and can act on that knowledge. In the case of DBT, it is what is going on that has changed. In the case of Terraform, it is simply creating resources that did not already exist. or, in this example, to remove Resources that no longer exist in Terraform.

Quick Bit: Same Order

dbt compile makes a manifest.json Without running your dbt project.

terraform plan makes a tfplan.state file, which lists the proposed changes that will be due to your Terraform project.

dbt run Make it manifest.json file And Creates DBT models.

terraform apply Make it tfplan.state file, tells you what changes it will make and execute them.

different environment

One of the software engineering conventions that DBT made accessible to people working with SQL was the concept of a developer environment. A development environment is a self-contained space where individuals can iterate, tinker and do their work. This means people can work without potentially breaking the production environment used by their stakeholders. Everyone has their own environment and can work on their tasks in parallel without stepping on their toes.

Terraform can also use the developer environment to provide resources before building into production. Development and staging environments are important in Terraform Workflow because it is a risk-free opportunity to spin resources and test whether functionality works as intended.

Managed Services + Cloud Edition

In today’s day and age, everything has a managed service or a cloud-hosted option. Terraform and DBT are no exception.

For DBT users, do you feel excited about Terraform? How about Terraform users? A powerful part of both tools is that these frameworks have to be taken into production. really Fast and easy through their cloud offerings. Terraform Cloud and DBT Cloud are services that teams can sign up for to make the basic version of the products even more feature-rich and easier to use.

In DBT Cloud, you can run DBT on a schedule and quickly set up Continuous Integration (CI) with the press of a few buttons.

DBT Cloud CI Integration
Source: https://docs.getdbt.com/docs/deploy/cloud-ci-job

In Terraform Cloud, you are provided with an easy way to manage tfstate.json File, format and test your Terraform changes in CI, and run Terraform as needed.

Removing Terraform Cloud Resources

The best part? The most basic tiers for both services are free, You can sign up for one of these today and start adding value and strength to your organization today,

Conclusion / Call to Action

excitement, I wrote about this sentiment at the top of this post. I hope you feel it now. If you previously feared how overwhelming Terraform could be, I hope it allayed that fear and replaced it with excitement. I want this to be the start of getting as comfortable with Terraform as you are with DBT.

If you’re looking to get started with Terraform today and aren’t familiar with CSP, a great starting point is trying to use Terraform to manage SaaS and cloud-hosted services. In the “Terraform Providers for DBT Users” section, I mentioned that many of the tools you may use have Terraform Providers.

I believe the best way to learn Terraform is in the context of something you already know. That was the inspiration for this post. To understand the concepts of Terraform using the same patterns and concepts that you know from DBT.

The next step is to apply those concepts.

And keep an eye on my next post to see why you’re applying these skills are important to advancing the data industry.

Leave a Comment