Plane that just landed

My own AWS Landing Zone

You probably know how this works: you start learning AWS, get a free tier account, work with it – play with it (yes, really, AWS is toys for adults). You start digging in AWS Organizations and AWS ControlTower and use some extra accounts. Write blogs about what you find – and before you know you put tools in your accounts that look like production.

And that’s not an issue, but one day or another you start thinking about your own Landing Zone. What is important? What should be the same as the Landing Zones you work every day with – and what should be different? In this blog I will do some digging. I also wrote a small program to help me maintain my own Landing Zone [1].

What is a Landing Zone?

Monitoring

From our perspective, the Landing Zone takes care of the infrastructure and the generic tools of the applications that run on the Landing Zone. In a company you can think of the company website, internal (and sometimes even partly internet) facing HR and ICT tools, storage to store all kind of documents. All this data and all these tools should be monitored: with monitoring tools like DataDog or Dynatrace. One tool for the whole company, so no-one has to reinvent the wheel and everyone can just plug in into existing tools and dashboards.

Security

In my case, I was looking for a solution to know if my security is up to date: I wouldn’t be too happy when someone else than me could use my environment without me knowing about it. There are several tools for companies to work with: AWS Security Hub, Guard Duty, Amazon Inspector – and they’re great. But they come at a price. My environment is still small, I don’t have people who can look at dashboards 24/7 and I don’t have time to directly respond to alerts. I do want to know, however, when people are spooking around and want to have some security measures in place to prevent bad things from happening.

Infrastructure as Code

What is really important for me is Infrastructure as Code. A lot of what I know, I wrote before in CloudFormation. Sometimes in code snippets and sometimes in working applications. Two applications I presented earlier on these blog posts (my personal link shortener frlink.nl with an S3 bucket to store and distribute presentations and files and Bitwarden) are both written in CloudFormation. In a few months I want to have my own website, in three variants (dev, staging and production), all written in CloudFormation and a little bit of Python glue. I know what I want, now I need an infrastructure to deploy my application into different accounts: the main account for the website and some supporting programs and tools – and a different account for the monitoring of this application. I want to deploy this with pipelines, more or less in the way that AWS advises, see for example this schema for a basic organization [2].

In the LandingZone Infrastructure as Code makes it possible to use a new account and then automate the deployment of all infrastructure to that new account. Think of the configuration of AWS Config (via ControlTower), but also the configuration of IAM roles that give access to CI/CD pipelines or Resource Explorer to be able to find which resources are currently used.

Be warned for what went wrong

Sometimes deployments are not correct. In some occasion, you will fix this immediately, in some cases something else turns up and you just forget. You want the Landing Zone software to warn you that you still have work to do.

Remove Landing Zone resources

I once worked in an AWS environment with Landing Zone resources that were hard to remove: the resources could be deployed via CI/CD, but when they were deployed you had to use the AWS Console to go to an individual account and remove them. That was a great lesson for me: in my new Landing Zone environment it shouldn’t be too hard to remove resources in an automated way.

Speed

Not all Landing Zone deployments are fast: ControlTower needs about an hour to deploy and configure its resources in three or four accounts. We looked not too long ago at the Landing Zone Accelerator: when everything is deployed and we changed a tiny item in a configuration file, the redeployment took an hour. I thought of a way to speed this up.

Dynamic deployments

Some environments are hard to test: after you changed a configuration file you just have to give the code to some Git repository and then look at the deployment pipelines and hope for the best. I found a way to improve this, by adding parameters to the deployment script to be able to use just one account, or just one region (or just one account in just one region ;-)).

Costs

This is just my home project. I’m using the resources as support for some demos and some blog posts, but I don’t earn anything with it. I don’t have $400 – $450 for AWS Landing Zone Accelerator or monitoring tools like OpenSearch, Grafana. It’s not that I don’t like them, but in the way I like them (scalable, serverless) they are very costly. There was just one solution to this issue, I had to write something myself.

My own Landing Zone software

My own Landing Zone software consist of just a few configuration files, you can find them in the config directory:

accounts.yaml

This file contains all the information about the accounts in the Landing Zone: both the functional names (like “master”, “development”, “audit” etc) and the profile names (which might contain more cryptical and contain the name of the company or the name of the environment – like fra-lz-test-master/dev/audit etc for my test Landing Zone environment). Example:

---
Accounts:
- Name: "development"
  ProfileName: "fra-lz-test-dev"
  Environment: "dev"
  AccountId: "111111111111"
- Name: "log-archive"
  ProfileName: "fra-lz-test-log-archive"
  Environment: "prod"
  AccountId: "222222222222"
- Name: "audit" 
  ProfileName: "fra-lz-test-audit"
  Environment: "prod"
  AccountId: "333333333333"

groups.yaml

You can group both accounts and regions, and groups can be nested. A nice trick is to use one group name for all the accounts in the environment and use Except for one or two accounts that should not contain resources. Or use a list with all the regions, use one region as your main environment and use a list with the Except keyword for all the regions that are not your main region.

---
Groups:
- Name: AllRegions
  List:
    - "eu-west-1"
    - "eu-central-1"
    - "us-east-1"

- Name: AllAccounts
  List:
    - "development"
    - "log-archive"
    - "audit"
    - "master"

# Derivations from the main groups above

- Name: AllRegionsExceptEuCentral1
  List:
    - "AllRegions"
  Except:
    - "eu-central-1"

landingzone-config.yaml

In the Landing Zone configuration file you can state how fast you want deployments to be: how many accounts can be changed at once, how many stacks in one account can be changed at the same time. There is also a waiting period between two times the program will check if the stacks are ready with their creation, updates or deletes.

There are two groups of tags: the Tags keywords gives the tags that the program will consider “his” resources: when configuration files are added, changed or removed then the stacks with this tag (or: these tags) will be changed to reflect those changes in the AWS accounts. The AddTags are not used for the deployment of resources, they are simply added for f.e. insights in costs.

---
LandingZoneConfig:
  MaxConcurrentAccounts: 2
  MaxConcurrentStacksPerAccount: 5
  WaitTimeInSec: 5

  Tags:
  - Key: "LandingZoneResource"
    Value: "True"
  AddTags:
  - Key: "Department"
    Value: "ICT"

  GroupNameAllAccounts: "AllAccounts"
  GroupNameAllRegions: "AllRegions"

  # Logging: can be "Debug", "Info"
  Logging: "Info"

The GroupNameAllAccounts and GroupNameAllRegions are used as defaults for when you forget to specify these items in the configuration files of CloudFormation files. You will not use this a lot.

The Logging keyword can help you to debug some stuff. To be honest: I added a lot of print statements during the development and most of them are already removed. In some cases using Debug can give you more insights to what happens, sometimes you can better add your own print statements.

templates directory

The CloudFormation files go in the templates directory. Per CloudFormation file (with the extension .cfn.yaml) you also have to have one configuriation file (with the same name and the extension .config.yaml). Some examples of configuration files:

---
TemplateConfig:
  DependsOn:
  - "ResourceExplorerAggregatorRole"
  Role: "ResourceExplorerAggregatorRole"
  DeployTo:
    Regions:
    - "eu-central-1"
    Accounts:
    - "AllAccounts"

In my Landing Zone I want to give every CloudFormation stack its own IAM role. The role is in the same account as the CloudFormation stack that uses its. In the Role keyword the name of the role is passed. This makes the CloudFormation stack that implements resources dependent on the role that is used to deploy those resources: first the role should be deployed, only when that is successful then the CloudFormation with the resources can be implemented.

For the role, you also need to add capabilities:

---
TemplateConfig:
  Capabilities:
  - "CAPABILITY_NAMED_IAM"
  DeployTo:
    Regions:
    - "MainRegion"
    Accounts:
    - "AllAccounts"

Placeholders

Sometimes you need AccountIds from other account than the account you are working with. You can simply add them within {}. Based on the accounts list above: when you add {development} in your code, this will be replaced by 111111111111. You can also use the ProfileName: fra-lz-test-dev will also lead to 111111111111.

This is useful for, for example, bucket policies [3]: when you have a bucket policy like this: when I want to add a new bucket to the log-archive account and give developers read access to that bucket from the developer account, I can simply use the following CloudFormation code:

Resources:
  LoggingBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: frederique20231007
  
  LoggingBucketPolicy:
    Type: AWS::S3::BucketPolicy
    Properties:
      Bucket: !Ref LoggingBucket
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
        - Sid: AllowDevelopersToHaveReadAccess
          Effect: "Allow"
          Action:
            - s3:GetObject
          Resource:
            - !Sub "arn:${AWS::Partition}:s3:::${LoggingBucket}/*"
          Principal:
            "AWS": !Sub "arn:${AWS::Partition}:iam::{development}:root"

In this code you see that there is standard replacement by AWS CloudFormation (with dollar-sign and swirly brackets). This replacement is done after the replacement of {development} by the AccountID of the development account by my LandingZone program.

By giving the developers these permissions in the development account, you will allow them to read this bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::frederique20231007",
                "arn:aws:s3:::frederique20231007/*"
            ]
        }
    ]
}

The program

The program will take into account all the dependencies that you configured in the different files. When you only have dependencies on roles and stacks, all roles will be deployed first, the stacks that are dependent on these roles will follow. You can also add dependencies on stacks that are in different regions or even in different accounts. The less dependencies you have however, the more stacks will be deployed in parallel.

When stacks are deleted, the program will use the step numbers of the deployment and then use the reverse order. This is to delete the stacks first and then the roles that these stacks use. When you would try it the other way around resources will not be deleted because the role that gives permissions to CloudFormation is gone before the stack is finished with its deletion.

By default the program will start in dry-run mode. You can override this with the parameter –no-dry-run. By default the program will stop when a stack isn’t deployed successful. You can then change the stack and retry later. You can override this behaviour with the –no-abort-when-stack-fails parameter.

The -e (–environment), -g (–group) and -p (–profile) parameters all limit the deployment to just the accounts you mention on the command line. The same is true for regions: by default the program will deploy to all regions that are mentioned in the configuration files, when you specify command line parameters you will just deploy to a specific region. This makes it possible to test the tool on the command line to just one development environment instead of being forced to use multiple accounts.

When the configuration files are compared to the current environment, the code will create new stacks (when the CloudFormation templates are present on disk but not in AWS), deletes (when stacks with the correct tag(s) are present but no CloudFormation templates exist) and updates. The program will check the hash of the files on disk and will compare that with the hash in the tag in the stack. When these numbers differ than probably a new version is available. Please mind that different parameters (in the .config.yaml files) are not seen as different, but different default parameters (within the cfn.yaml files) are. By just comparing hashes not all code of the CloudFormation stacks have to be requested from AWS and then compared, this saves a lot of time.

Warnings

The program will warn you when a previous deployment went wrong, both in dry-run and no-dry-run mode:

> python landingzone-deploy.py --no-dry-run
17:17:06 Describe stacks in account development - region eu-west-1
17:17:07 Describe stacks in account development - region eu-central-1
17:17:07 Describe stacks in account development - region us-east-1
17:17:08 Describe stacks in account log-archive - region eu-west-1
17:17:09 Describe stacks in account log-archive - region eu-central-1
17:17:09 Describe stacks in account log-archive - region us-east-1
17:17:10 Describe stacks in account audit - region eu-west-1
17:17:11 Describe stacks in account audit - region eu-central-1
17:17:11 Describe stacks in account audit - region us-east-1
17:17:12 Describe stacks in account master - region eu-west-1
17:17:12 Describe stacks in account master - region eu-central-1
17:17:13 Describe stacks in account master - region us-east-1
17:17:14 Found: ../templates\Generic\ResourceExplorerAggregator.config.yaml
17:17:14 Found: ../templates\Generic\ResourceExplorerAggregatorRole.config.yaml
17:17:14 Found: ../templates\Generic\ResourceExplorerLocal.config.yaml
17:17:14 Found: ../templates\Generic\ResourceExplorerLocalRole.config.yaml
17:17:14 Found: ../templates\Monitoring\ConfigRuleCloudFormationDrift.config.yaml
17:17:14 Found: ../templates\Monitoring\ConfigRuleCloudFormationDriftRoles.config.yaml
17:17:14 Found: ../templates\Monitoring\SomeCentralLoggingBucket.config.yaml
17:17:14 Found: ../templates\Monitoring\SomeCentralLoggingBucketRole.config.yaml
17:17:14 Warning: ResourceExplorerAggregator in accountdevelopment in region eu-central-1 has status ROLLBACK_COMPLETE
17:17:14 No changes needed

Last but not least: how to wipe an account?

I promised you to tell how to wipe effectively an account from its LandingZone resources. It’s not as hard as it seems: when you remove the account name from all the configuration files (groups file, cloudformation config files, etc) except for the accounts file, then the program will see that no Landing Zone resources should remain in that account.

When you want to get rid of all LandingZone resources in all accounts, just remove the CloudFormation stacks and configuration files to another directory. The program will try to delete stacks in the right order (based on the order in which CloudFormation scripts were deployed earlier).

Conclusion

In the Github repository you will find the code and some examples. I think this code can help to speed up your deployments. I added some example scripts for enabling Resource Explorer and for deploying an AWS Config rule to prevent CloudFormation stack drift to all accounts. This adds to your security even though not 100% of the drift is detected (see this article [4]).

I hope you find this helpful, please let me know what you think.

Links

[1] Github repository for this tool: https://github.com/FrederiqueRetsema/LandingZoneDeploy

[2] Example of AWS for a basic organization: https://docs.aws.amazon.com/whitepapers/latest/organizing-your-aws-environment/basic-organization.html

[3] See for example this example: https://repost.aws/knowledge-center/cross-account-access-s3

[4] Limitations of drift detection in AWS Config: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-stack-drift.html#drift-considerations

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.