Terragrunt Notes.md 8.0 KB

Terragrunt Notes

aka "how to develop the terraform 12+ stuff"

Local cache of providers

NOTE: this doesn't work well with provider locking in TF14+. I recommend you disable this if you've enabled it.

~helpful tip, speed up cache by adding the following to your ~./bashrc:~ ~~ ~export TF_PLUGIN_CACHE_DIR=~/.terraform.d/plugin-cache~ ~[[ -d "$TF_PLUGIN_CACHE_DIR" ]] || mkdir -p $TF_PLUGIN_CACHE_DIR~ ~~

Renaming Directories/Resources

General process:

  1. Make sure everything's up to date.
  2. Move the remote state.
  3. Update the configuration.
  4. Rename the directory
  5. Make sure terragrunt applies cleanly (But updates all the tags, so lots of changes to review)

For this example, I was renaming 010-standard-vpc to 010-vpc-splunk in test/aws-us-gov/mdr-test-modelclient.

cd 010-standard-vpc/
# clear out cache to make our lives easier
rm -rf .terragrunt-cache

# validate that we're on latest code
terragrunt-local apply

# Get the `bucket` and 'key' value
cat `find . -name 'backend.tf'`

# In this example:
#   bucket         = "afsxdr-terraform-state"
#   key            = "aws/test/aws-us-gov/mdr-test-modelclient/010-standard-vpc/terraform.tfstate"
aws --profile mdr-common-services-gov \
  s3 mv \
   s3://afsxdr-terraform-state/aws/test/aws-us-gov/mdr-test-modelclient/010-standard-vpc/terraform.tfstate \
   s3://afsxdr-terraform-state/aws/test/aws-us-gov/mdr-test-modelclient/010-vpc-splunk/terraform.tfstate

# move and rename
cd ..
git mv 010-standard-vpc 010-vpc-splunk
cd 010-vpc-splunk

# Apply again: NOTE: The only changes should be to the tags. Do not accept any other changes, or you will have extra resources
rm -rf .terragrunt-cache
terragrunt-local apply

If you get:

Error refreshing state: state data in S3 does not have the expected content.

You forgot to rename the directory you're working in.

GitFlow Notes

These notes will walk you through the Terragrunt git flow for making changes.

  • Fork the Master branch to your branch
  • change local xdr-terrafrom-live repo with expected new tag ( so you don't forget to do it when you are done. )
  • make changes to xdr-terraform-modules
  • make changes to xdr-terraform-live
  • increment the ref=v0.x.x in your terragrunt.hcl
  • use terragrunt-local to try the changes
  • ( did you run the saml command to login?)
  • use tgswitch to change versions
  • rm -rf .terragrunt-cache to resolve "strange" errors
  • commit the changes to your branch
  • push new branch to github
  • get pr approved and merged in
  • tag master to latest tag that is set in terragrunt.hcl
  • verify it is working in TEST without terragrunt-local
  • deploy to PROD
  • delete github branch and close jira ticket

Destroy instances

TF_VAR_instance_termination_protection=false terragrunt apply
TF_VAR_instance_termination_protection=false terragrunt destroy

tfswitch.toml

colby-williams taught me: cp -ar to copy symlinks correctly.

ln -s ../../../../.tfswitch.toml .

ls -larth .tfswitch.toml -> ../../../../.tfswitch.toml

2021-04-29: State Issues

When running terragrunt apply, got the following:

Initializing the backend...
Error refreshing state: state data in S3 does not have the expected content.

This may be caused by unusually long delays in S3 processing a previous state
update.  Please wait for a minute or two and try again. If this problem
persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
to manually verify the remote state and update the Digest value stored in the
DynamoDB table to the following value: ec9c9183a070f5ad59b9abd524810c06

The remote state looks uncorrupted:

cd ~/xdr-terraform-live/prod/aws-us-gov/mdr-prod-c2/160-splunk-indexer-cluster
find .terragrunt-cache -name 'backend.tf'
# Use the filename found and view the contents
cat .terragrunt-cache/tC_aGEvkrKzsZjSw0YQum-A6YL8/Ipji28Trjy_fymLhd4EZgtAe8xg/base/splunk_servers/indexer_cluster/backend.tf
# Use the bucket and key to from the s3 path:
scp --profile mdr-common-services-gov cp s3://afsxdr-terraform-state/aws/prod/aws-us-gov/mdr-prod-c2/160-splunk-indexer-cluster/terraform.tfstate
less -iS terraform.tfstate

To fix:

  1. go to the gui, log into AWS console to the mdr-common-services-gov account, service dynamodb
  2. Go to tables->items
  3. Change dropdown to 'query'
  4. Into lockId=, enter: afsxdr-terraform-state/aws/prod/aws-us-gov/mdr-prod-c2/160-splunk-indexer-cluster/terraform.tfstate-md5 (The key from above, with -md5 appended)
  5. Record the old digest: 9cb9cbfdda
  6. Insert the digest from the error message: ec9c9183a0
  7. Run terragrunt refresh

TF 0.14 / The State lock File

With tf14, terraform has added the creation of a 'provider state lock file' to prevent inadvertant drift of provider modules. This requires some addition management.

  • On first run of a module, create the provider lock file for multiple platforms by running terragrunt-providers (which is just a bash script that runs some cleanup and then runs terragrunt providers lock -platform=darwin_amd64 -platform=linux_amd64 -platform=windows_amd64 -platform=linux_arm64.
  • If you need an extra provider, you should override the generation of required_providers.tf in your terragrunt.hcl file for the module. This must include the modules from the root terragrunt.hcl that are used within your module. For an example, see xdr-terraform-live/common/aws-us-gov/afs-mdr-common-services-gov/085-codebuild-ecr-customer-portal/terragrunt.hcl
  • To regenerate or upgrade modules, I guess you just delete it?
  • There is possible compatibility issues with TF_PLUGIN_CACHE_DIR. You can try disabling this if you have trouble getting hashes.

Could not load plugin

If you get:

Substituting 'git@github.xdr.accenturefederalcyber.com:mdr-engineering/xdr-terraform-modules.git//base/sensu-configuration' with '../../../../../xdr-terraform-modules//base/sensu-configuration'
Acquiring state lock. This may take a few moments...
Releasing state lock. This may take a few moments...
╷
│ Error: Could not load plugin
│
│
│ Plugin reinitialization required. Please run "terraform init".
│
│ Plugins are external binaries that Terraform uses to access and manipulate
│ resources. The configuration provided requires plugins which can't be
│ located,
│ don't satisfy the version constraints, or are otherwise incompatible.

It means that you've run the module before with an earlier version of the plugin. To fix, run:

terragrunt init --upgrade

(Or terragrunt-local if appropriate)

Static Code Analysis

We can do some good static code analysis with some standard tools. On OS X run:

brew install tflint tfsec checkov

These can be enabled/enforced during terragrunt via xdr-terraform-live/terragrunt.hcl in the section labeled Apply Static Code Analysis, which should be somewhat self-explanatory.

Ignoring Findings

You can easily ignore findings from tflint, tfsec, or checkov by adding comments to the code.

Terragrunt formatting

terragrunt hclfmt

This command will make changes to your files!

tflint

Run these command in the modules folders.

terraform fmt

This command will make changes to your files!

To ignore a finding from tflint, add a comment like the following (this one from xdr-terraform-modules/base/splunk_servers/indexer_cluster/elb-with-acks.tf):

  # tflint-ignore: aws_elb_invalid_subnet - Incorrectly errors out that these are invalid

tfsec

tfsec .

Run these command in the modules folders.

For tfsec, look for the finding id, and add a comment:

# tfsec:ignore:<id>

This can be added before the line, at the end of the line, or before the module.

To ignore globally, edit xdr-terraform-live/terragrunt.hcl and add the id to the ignored_tfsec local variable.

checkov

Run these command in the modules folders.

checkov --framework terraform --quiet -d . checkov --framework terraform --quiet -f filename1.tf -f filename2.tf

For checkov, look for the check ID and add a comment:

#checkov:skip=CKV_AWS_<ID_HERE>[:optional comment]

For clarity, these should be added closest to the affected resource or element.