Windows Failover Cluster Migration to AWS part 2: installation

Frederique Retsema

Introduction

In the previous blog [1] I showed the different solutions that there are to migrate an on-premise Windows Failover Cluster environment to AWS. I also showed how fast (or how slow) the failover of a node takes. I assume you might want to see how this works for yourself. In this blog, I will explain what to do to get the examples working on your own personal computer or in your own AWS environment. You will see that installing this on your on-premise environment will cost much more manual work than doing the same in AWS.

Current situation: Windows Failover Cluster on Hyper-V

I simulated the On-Premise situation on Hyper-V. You can do so yourself: go to the Failover-HyperV directory of the GitHub repository [2]. Start the create-vms.ps1 powershell script with the correct parameters: -BaseDir is the directory where the Virtual Machines (VMs) and the drives of the VMs will be created. -ISOFile is the ISO file with Windows Server 2019 [3]. The VLAN numbers are the numbers that will be used in your environment. You can pick any number that is not in use (yet). I used 3 for the public VLAN (which is the VLAN that the domain controller, the Demo VM and the ClusterNodes will use) and 4 for the private VLAN (only in use for the cluster network). When you already have VMs with names like DC, ClusterNode and Demo, you might like the option to add a Prefix to the names of the VMs. When you use AMISBlog- as prefix, the VMs will be called AMISBlog-DC, AMISBlog-ClusterNode1, etc. The full command looks like:

. .\create-vms.ps1 -BaseDir d:\VMs -ISOFile D:\Install\MSDN\Win2019\en_windows_server_2019_updated_sep_2020_x64_dvd_2d6f25f2.iso -VLanNumberPublic 3 -VlanNumberPrivate 4 -Prefix AMISBlog-

The script will create five VM’s: a domain controller (DC), three cluster nodes and a demo node. Each VM uses two CPUs and each VM will be started with 2048 MB of RAM. Each VM can take up to 127GB of storage, both the memory and the disk usage are dynamic: space that isn’t used is also not allocated. I used a big PC with 16 GB of RAM, an Intel i7 processor with 4 cores and 8 logical processors and an SSD disk, this was fast enough to do my tests. This example uses about 70 GB of disk space.

After you installed Windows Server 2019 on each VM, you can copy the installation files to each of the nodes (you might temporarily change the internal switch of one of the network cards to the external switch and disable the VLAN to copy the files and after copying the files revert these changes).

DC

Copy the scripts of the Failover-HyperV\DC directory in the GitHub repository to the C:\Install directory. Remember to change back the configuration of the network card to the internal network with the correct VLAN if necessary, then reboot. Change the passwords in the uidspwds.ps1 file to the passwords you want to use. Open a privileged Powershell window and then run part1.ps1:

cd C:\Install\
. .\part1.ps1

After a few seconds, the VM will reboot, and you can log on again with the password that is configured in the file uidspwds.ps1. The installation will automatically continue with part2.ps1 after the reboot, as you can see when you look in the file C:\Install\install_log.txt. You can follow the installation process with a tool like baretail [4]. When you don’t see progress in this log file, open the Task Scheduler, rightclick on part2.ps1 and click on “Run”. When this powershell script ends, the VM will be rebooted. After the reboot the domain controller is ready to use.

ClusterNode1, 2, 3

The process for ClusterNode 1, 2 and 3 is more or less the same. Copy the installation files from Failover-HyperV\ClusterNode, change the passwords in uidspwds.ps1 (or copy the uidspwds.ps1 file from the DC node), change back the network card to internal network and use the correct VLAN ID if necessary, then reboot the VM. The same scripts are used for all three cluster nodes and the scripts will use different settings for each node. The name of the node is given as a parameter to the first script. Wait for the Domain Controller to be ready, and then use the following commands to configure ClusterNode1, ClusterNode2 and ClusterNode3:

cd C:\Install\
. .\part1.ps1 -ComputerName ClusterNode1

The VM will restart several times. After part4.ps1 is ready, the node is configured. The scripts should start automatically after each reboot, but when this isn’t the case then look in the Task Scheduler and run the script manually. When the end of the fourth script is reached, this is mentioned in the logfile C:\Install\install_log.txt.

When all three nodes are ready, start the cluster: go to ClusterNode1, start a PowerShell script as administrator and type:

cd C:\Install
. .\Configure-Cluster.ps1

Demo

Copy the installation files from Failover-HyperV\Demo, change the passwords in uidspwds.ps1 (or copy the uidspwds.ps1 file from the DC node), change back the network card to the internal network and correct VLAN ID if necessary, then reboot the VM. The part1.ps1 script for Demo doesn’t have a parameter. Wait for the Domain Controller to be ready before you type:

cd C:\Install\
. .\part1.ps1

The VM will restart several times. When the VM is ready, start a PowerShell script and type:

cd C:\
. .\curl1sec -Address http://myclusteriis

When the cluster is started correctly, you will see both the current clusternode and the time:

< p > C L U S T E R N O D E 3 – 2 0 : 4 1 : 0 5 < / p >

< p > C L U S T E R N O D E 3 – 2 0 : 4 1 : 0 6 < / p >

You can use the search icon on one of the cluster nodes to search for Failover Cluster Manager. Under MyCluster.ONP-1234.org > Roles you will see the role that you can (via the context menu) move to another node. You can also switch off a node via Hyper-V and see how long it takes to fail over to another node.

Situation 1: Windows Failover Cluster in AWS

I wrote a CloudFormation template (see [2], directory FailoverCluster) which will deploy an Active Directory server, with three cluster nodes and a demo node which can be used to see the effect of failing over to another node.

Before you enroll the template, you have to have a key in EC2. If necessary, go to the EC2 service, click on Keys in the left menu and create a new key.

If you didn’t do so before, you need to install two tools and a programming language on your computer, which are used within the scripts:

  • install 7Zip [5]
  • install the Command Line Interface tool of AWS [6]. After installing this tool, configure it with an access key and a secret access key that has permissions to create files in the S3 bucket of your AWS environment [7] via the command aws configure.
  • install Python [8]

Next, you need to have the zip files of the Lambda functions in an S3 bucket in the region where you enroll the template. When you are using the template in region Ireland, you can use my bucket. This bucket is called fra-euwest1 (and this is the default parameter for the S3 bucket in the CloudFormation template). When you create a new bucket in your own region, be sure to switch off the checkbox before “Block all public access”. Switch on the checkbox before “I acknowledge that the current settings may result in this bucket and the objects within become public”. Then click on “Next” and “Create bucket”.

Go to the directory you cloned from the GitHub repository, and change the file CreateZips-InS3Bucket.ps1 so the names of the directories and the name of the S3 bucket match the directories and S3 bucket in your environment. Then open a PowerShell window and run this script:

cd D:\Clone\AWSMigrationWindowsFailoverCluster\FailoverCluster
. .\CreateZips-InS3Bucket.ps1

After these preparations, go to CloudFormation and enroll the template. I used instance type t3.medium for development and m5a.xlarge to do the tests. The deployment of this template will take about 30-45 minutes. In a later blog, I will tell more about this template.

After the enrollment, go back to EC2 and click on the Demo node: you will see the IP address of this node. Use Remote Desktop to connect to this node, use “Administrator” as user-id and use the password you filled in in the deployment of the CloudFormation template. Start Powershell and then use the following command to start a curl to the cluster every second:

cd C:\
. .\curl1sec.ps1 -Address http://myclusteriis

You will see that the cluster is started on one of the nodes and you will see the current time (in GMT), that is changing every second.

When you want to know what the failover cluster does, you can use CloudWatch and follow the log group cluster_log.txt. You can start the Failover Cluster Manager on one of the ClusterNodes and move the cluster to another node or stop the active node and see how another node takes over the cluster.

Logging

There are many places you can check if something goes wrong:

– When a Virtual Machine starts, the logging will be sent to the directory C:\cfn\log.

– Within AWS, you can use CloudWatch: all virtual machines will send the logging of the CloudWatch agent, the logging of the installation and the logging of the cluster to CloudWatch. The Lambda logging is also sent to CloudWatch.

– The installation will use Systems Manager Run Command to send commands to the virtual machines. You can find the (first part of the) logging of these commands back in AWS Systems Manager.

– Logging for the Windows Failover Cluster can be retrieved by running the Powershell command

Get-ClusterLog -Destination . 

on the virtual machine. This will produce three log files (one for each node), with very detailed logging.

Costs

The costs I mentioned in the previous blog are costs per month, based on reserved instances. When you try these CloudFormation templates for the first time, you will in general not have reserved instances. On the moment of writing, when you use this environment for one hour in region Ireland, this will cost you about $2.80 per hour when you use instance type m5a.xlarge.

Solutions 2, 3 and 4: Auto Scaling Groups with one node

When you followed along, you will have seen the directory ASG in my github repository [2]. In this directory, there is a CloudFormation template to deploy the solution.

As with the previous solution, you need to have a key in EC2. You also need 7Zip and the AWS Command Line Interface. You can use the same S3 bucket as before. Go to the directory you cloned from the GitHub repository, and change the file CreateZips-InS3Bucket.ps1 so the names of the directories and the name of the S3 bucket match the directories and S3 bucket in your environment. Then open a PowerShell window and run this script. You might see that some of the Lambda functions have the same names as in solution 1. Functions with the same name have the same content.

After these preparations, go to CloudFormation and enroll the template. I used instance type t3.medium for development and m5a.xlarge for the tests. The deployment of this template will take about 15 minutes for the Auto Scaling Group without a Custom Image and about 30 minutes for the Auto Scaling Group with Custom Image.

In the Windows Failover Cluster template, there was a Demo node where you could start the curl1sec.ps1 script. This was, because in solution1 the DNS of the Domain Controller was used. In solutions 2, 3 and 4 the DNS of AWS is used. You can use the curl1sec.ps1 script in the OutsideAWS directory to start the PowerShell script on your own PC. The URL of the Load Balancer can be found in the outputs of the main CloudFormation template. Start a PowerShell window on your own PC, then start this script:

cd D:\Clone\AWSMigrationWindowsFailoverCluster\ASG\OutsideAWS
. .\curl1sec.ps1 -Address http://asgnodeloadbalancer-784056269.eu-west-1.elb.amazonaws.com/

The difference between solution 2 and solution 3 are the parameters: you either take the defaults (which are the same as when you would create the Load Balancer and the Target Group via the AWS GUI) or use the lowest values that are mentioned.

The difference between solutions 3 and 4 is, that a custom image will be build before the autoscaling group is started.

When you try this for yourself, you can see a strange effect when you stop a node: even though the target group sees that the node is down, it doesn’t inform the Auto Scaling Group about this: it is the EC2 health check that informs the autoscaling group that the node is not available anymore.

When you want to log on to a node in the Autoscaling Group with Custom Image solution, you need to use your EC2 key to request the administrator password. The EC2Launch program will remove the current administrator password. The password you gave in the CloudFormation template can be used to log on the instance that is used to create the image (as long as it is not terminated). Another option is to use the GUI and use the Connect button in EC2. When you use the Session Manager, you don’t need to give a password to logon to the node. You will get a PowerShell session. Stopping the website is possible with the following PowerShell command:

Get-IISSite | Stop-IISSite -Confirm:$False  

In the template for AutoScaling Group with Custom Image, there is an option to use a container to speed up the use of the ELB Health Check in the Auto Scaling Group. In a future blog I will tell more about this, for now choose “False” and don’t install it.

Costs

The costs to play with this environment are about $1.80 per hour for solutions 2 and 3, based on instance type m5a.xlarge. Solution 4 will cost you about $2.20 per hour.

Comparing your results with mine

All the effects that I found are in the spreadsheet “Results” in the GitHub repository [2].

Deleting the environment

You can delete the on-premise environment, by deleting the virtual machines in Hyper-V. You can then delete the directories with the virtual machines and the virtual disks.

In AWS, you can delete (only) the (main) CloudFormation stack, this will delete all resources in AWS. All nested stacks will be deleted automatically as well. The CloudWatch log groups of the logs that were created on the virtual machines are deleted automatically. You can delete the CloudWatch log groups of the Lambda functions manually. You still can see the history in Systems Manager Run Command (this will be deleted by AWS automatically).

If you added an EC2 key for this blog only, you may want to delete it after destroying the environment.

Next blog…

In the next blog, I will tell more about Windows Failover Cluster management. I will explain where and why the cloud implementation differs from the on-premise implementation.

Links

[1] Previous blog: https://technology.amis.nl/2020/11/07/aws-migration-part-1-how-to-migrate-windows-failover-clustering-servers-to-aws/

[2] GitHub Repository with example code: https://github.com/FrederiqueRetsema/AWSMigrationWindowsFailoverCluster

[3] Windows Server 2019 can be downloaded via www.visualstudio.com if you have an MSDN license. If you haven’t, you can download a 180 days trial version via https://www.microsoft.com/nl-nl/windows-server/trial .

[4] You can download baretail via this link:https://baremetalsoft.com/baretail/

[5] You can download 7-Zip via this link: https://www.7-zip.org/

[6] You can download the command line interface of AWS via this link: https://aws.amazon.com/cli/

[7] If you never created an access key and a secret access key before, you can follow the description I wrote here earlier, under “Creating access key and secret access keys”: https://github.com/FrederiqueRetsema/AMIS-Blog-AWS/tree/master/shop-3/vagrant

[8] Python for Windows can be found here: https://www.python.org/downloads/windows/

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Next Post

Getting Started with serverless Azure Static WebApps (React, Angular, Vue, .. + backend)

Facebook 0 Twitter Linkedin Azure Static WebApps is a fairly new Azure service, currently in preview. Azure WebApps is a managed, serverless service that allows us to quickly deploy and publicly expose a static web application (from a global content delivery network) – such as single page applications as created […]