If you thought the title of this article was just a bunch of mental word vomit, well, it kind of is. But the words all have meaning and I wanted to quickly talk about a key issue that I have continued to come across in my journey with Terraform… namely, getting things to apply consistently and successfully – particularly when introducing things to make end-use of your custom modules easier and your code more succinct and versatile – like conditionals and meta arguments such as “count” and “for_each”.

Thinking about Idempotency Reproducibility
While the word “idempotent” sounds like a theological mouthful, for IaC and Terraform I am using it with the idea that I should be able to run my Terraform code the same way, more than once, and get the same results. I realize this isn’t the purest definition of the word… STOP

Yeah, that’s just the wrong definition of the word. Something I realized during my third attempt at editing this article before posting… What I actually care about is “reproducibility.” Anyhow, we all have blind spots, hah. Apparently this is a common one – this article was particularly helpful if you are wanting an easy-to-read explanation of the differences… Continuing on…

I want consistent/expected results from everything that I create. Every time and in every situation that my module, function, or piece of code is called, the same expected results occur. Whether that be a subsequent “Plan/Apply” on the same workspace with no changes (this would probably be closer to idempotency, so maybe we are still flirting with the term) or a brand new deployment using the same parameters (this however is firmly within the realm of ‘reproducibility’).

This all sounds straight-forward but at times it can actually be very hard to nail down. It’s a fairly common goal in most programming and I would go so far as to say it is bedrock, foundational, critical, cornerstone (and whatever other weighty term you can think of) when it comes to IaC. When you run a Terraform Apply or an ARM deployment, you need to get EXACTLY what you are expecting, how you are expecting it, every single time because we are dealing with systems and networking and storage. IaC is doubly fun in this regard because once you get outside of the lab you find your code being used to do deployments in living, breathing environments where things frequently change.

This all becomes 100x more critical if you are modifying existing infrastructure and/or working in an established production environment. If there is something worth losing a bit of sleep over (at least for the conscientious nerds among us), it’s that worry that my code is going to be used and do something totally unexpected and tear things apart. (AND ALSO worthy of tossing and turning… the difference between ‘idempotency’ and ‘reproducibility.’)

::Dismount Soapbox::

Flexible Code
Keeping things simple is often the easiest way to get consistent results. Within Terraform module development is the idea of creating “swiss army knife” type modules which can account for multiple use-cases. This is often at odds with the goal of keeping things simple. Granted as complexity goes up, the amount of code can often go down. A basic example is a module that creates an Azure Storage Account; one use case may require that HTTPS be enforced and the redundancy level is set to GRS, whereas another use case may allow for the use of HTTP and only require LRS. If my module can account for both use cases (vs. being particularly coded for only one), then my overall code base gets smaller and easier to maintain.

Relatively straightforward use cases such as the above can typically be obtained by the addition of single inputs and variable substitution. Additionally, setting defaults for the most “common” use case ensure that employing the module is still an undemanding exercise for the end user.

However, as you go “further up and further in” with Terraform, you will quickly run into scenarios that call for more complex and creative solutions. One method of achieving enhanced flexibility is by using a combination of “Conditional” expressions and “Count” or “For_each” meta-arguments.

A Simple, Unrealistic Example

variable "deploy_us" {
  type = bool
}

locals {
  geo_where_deployed = var.deploy_us ? "USA" : "EUROPE"
}

resource "azurerm_resource_group" "example_us" {
  count     = var.deploy_us ? 1 : 0
  name      = "rg_example_${local.geo_where_deployed}"
  location  = "East US"
}

resource "azurerm_resource_group" "example_eu" {
  count     = var.deploy_us ? 0 : 1
  name      = "rg_example_${local.geo_where_deployed}"
  location  = "West Europe"
}

output "geo_where_deployed" {
  value = local.geo_where_deployed
}

The above construct when wrapped in a module allows for very simple operation where one resource group or the other gets deployed based on the true/false value of var.deploy_us.


A module call for the above would look something like:

module "resource_group_us" {
  source = "../azure_resource_group/"
  deploy_us = true
}

Again, I will reiterate that the above makes no real-world sense. I am utilizing an Azure resource group because it is very basic and easy to discuss in a blog post. It’s a simple demonstration of a design method using “count” to provide versatility in a module (and not something you would ever actually write outside of an article). However, my experience has been that whenever you use meta-arguments like “count” with conditional expressions you are introducing a requirement into your module that isn’t immediately apparent. “For_each” and “Count” require that their values be known at “Plan” time. When the plan phase is running, Terraform must be able to determine then and there if (in our case) count is equal to either a 1 or a 0. If Terraform can’t make this determination, then the plan fails. This DOESN’T mean that the determining values ALWAYS must be literal. You can use typical HCL interpolation and be just fine.


Here is our module call again (this would work).

locals {
  deploy_us = true
  flip_deploy_us = local.deploy_us ? false : true
}

module "resource_group_us" {
  source = "../azure_resource_group/"
  deploy_us = local.deploy_us
}

module "resourge_group_eu" {
  source = "../azure_resource_group/"
  deploy_us = local.flip_deploy_us
}

Terraform is gonna have to do a couple of logical bunny hops, but at the end of the day it can determine during the plan phase what is going to happen -> so all is well…


Here is a twist on the above example where you get into hot water. (this would fail)

locals {
  deploy_us = true
  flip_deploy_us = local.deploy_us ? false : true
}

module "resource_group_us" {
  source = "../azure_resource_group/"
  deploy_us = local.deploy_us
}

module "resourge_group_eu" {
  source = "../azure_resource_group/"
  deploy_us = module.resource_group_us.geo_where_deployed == "EUROPE" ? 0 : 1
}

The interpolation and use of a conditional are fine. The problem with the above is that we are using the resultant OUTPUT of the first module call as an input into the second module call… and that input is making its way over to the COUNT meta-argument and as a result Terraform is gonna give you a very loud peal on a very sad trombone. You may protest that any developer could clearly look at what is going on and easily “plan” and determine exactly what is going to happen. That really doesn’t matter, because Terraform can’t and/or won’t, and Hashi has really good reasons for playing it extremely safe when using these meta-arguments.


So What?
This kind of thing is obvious and easy to spot in simple projects (or nonsense article examples), but it starts getting tricky when you begin layering both your Terraform code (i.e. modules calling modules) and deployments (i.e. deployment referencing other deployments, either directly via shared state or indirectly via data sources).

Case in point… A slightly more complex example is a situation where you are refactoring some Terraform IaC that is currently split across two deployment layers. In a split deployment scenario (i.e. I have two separate deployments) and referencing a value that is an output from the first deployment into the input of your module works just fine. This is readily apparent in the context of reading an article on the topic… Obviously “deployment 1” is already laid down before “deployment 2” (with your module call) has started. Then a month goes by and you decide to refactor and combine those deployments into one single deployment… You recode your module call to take the input directly from a module instead of from a deployment’s remote state output or a data source lookup. Oops, your module can no longer be used because your referenced resources now no longer exist during your plan phase and you now have some extra work on your hands.

Another example… you have a single deployment that you are adding new resources to and you are doing an iterative plan/apply. You add your module into the deployment, run a plan and an apply and all goes well… You then copy that entire deployment (which now includes your module) into a new workspace to lay it down again… You run a plan and your plan fails. Very similar situation as the above example… When you were making changes iteratively, everything checked out because the infrastructure you were referencing was already laid down and available during the plan phase… But when you go to a new/clean environment to lay the whole thing down from start to finish it fails because your referenced resources haven’t been created yet.


Conclusion
To sum up, if you are mixing the “count” or “for_each” meta arguments with conditional expressions, you should be wary of the above scenario and document your module and deployments well. Long lived deployments will, more than likely, grow and change over time and you could end up with a subsequent deployment code base that cannot be easily re-deployed from scratch and/or be refactored later on. One additional bonus note from the above is that while “count” and “for_each” rely on values that can be determined during the plan phase, this does not exclude using logic or other string interpolation to arrive at their values. You can even use remote data source lookups because those are lookups are actually done during plan phase. You still have to work well within the confine of what can be determined by Terraform ahead of time, but it does open up a whole creative world for making your overall modules and deployment infinitely more flexible and user friendly. As you do so though, you must document well or risk obscuring the logic which can lead to issues like the above easily being buried and very hard to spot.

1 of 1

This post has no comments. Be the first to leave one!

Join the discussion

Your email address will not be published. Required fields are marked *