Customizing EC2 Systems Manager Run Commands - TriNimbus

Customizing EC2 Systems Manager Run Commands

Previously I wrote a blog post about using Using Run Command for Adhoc Operations to showcase efficiently managing remote systems at scale. While the approach in that post is extremely flexible, it also has some challenges. The scripts had no restrictions on what could be executed, and had significant opportunities for incorrect syntax or typos in commands.

EC2 Systems Manager has some great features to help with repeatable tasks. In this post I will talk in detail about creating custom Run Commands to make remote operation tasks safer and targeted to specific tasks. If you are new to using Run Command, I recommend reading the above post first.

Run Commands are defined using Systems Manager Documents. These documents can have different schemas, support version control, sharing and more.

Let’s look at iteratively building a Systems Manager document. For my use case scenario I have common servers using the same local username and password for some services, but I want to have an efficient way to change passwords without requiring a centralized authentication system like Active Directory. While Run Command can be executed on Windows and Linux systems, both inside EC2 and external to AWS, I used a Windows instance inside EC2 for my example. The goal is to have a command to securely rotate the password used by the Apache HTTPD service.

First I logged in to the AWS Console, went to the EC2 service and selected the Systems Manager Shared Resources > Documents link.

I then clicked the Create Document button.

I named my command ExampleChangePassword, set it to a document type of Command,  and pasted in the document contents before clicking the Create Document button.

Let’s take a closer look at the contents. There are 4 main elements:

  • schemaVersion  – schema versions 1.2 and 2.0 are currently supported for Run Command
  • description – information for your reference
  • parameters – section to specify values to pass when executing the Run Command
  • mainSteps – define details of commands to run

I defined 2 parameters for the local Username and for a Password to be entered when we run the command:

I used the aws:runPowerShellScript plugin for my execution logic, referencing the names of the parameters dynamically. To change the password for the user, I used the net user command.

I confirmed my document created successfully.

With my document created I was ready to test out the command on an instance.  
I selected the command I want to run.

I selected the Instance I want to target.

I entered in my parameters and clicked the Run button.

The execution request was successful.

Then I looked at the output, it looks like I have a few problems. My password parameter did not enforce any complexity criteria, and my Run Command result showed successful even though it failed.

For the NewPassword parameter I had a criteria  of “minChars”: “8” in use. I could change it to a RegEx pattern using an allowedPattern criteria, but for now I just used an external password generator to retest with a complex password.  I re-ran my command and it worked this time.

My next problem was I need to change the Apache httpd service to use the new credential. I really want to reduce the time between password change and updating the Windows service configuration.  I don’t want my service to fail any automatic restarts.  Let’s move on to a more targeted Run Command. I need to adjust the mainSteps to run a few PowerShell commands:

This time the aws:runPowerShellScript plugin will execute 7 commands:

  1. net user ApacheHTTPDsvc {{ NewPassword }}
    • This changes the password. I am hardcoding the username to firm the purpose.
  1. sc.exe config Apache2.4 obj= \".\\ApacheHTTPDsvc" password= \"{{ NewPassword }}\"
    • PowerShell can also run executables, I called the sc.exe to change the login information for my Apache2.4 Windows service. Just a note that the sc.exe is required because sc is a PowerShell alias for the Set-Content cmdlet.
  1. stop-Service Apache2.4
    • Stop the service. It would have been possible to use the restart-service cmdlet but I have less control over wait times and ran into issues with responsiveness of the service.
  1. for ($i=1; $i -le 5; $i++) { if ($(Get-Service Apache2.4).Status -eq 'Stopped') {$i+=10} else {Start-Sleep -s (3*$i)}  }
    • This is a PowerShell loop to wait for the Apache2.4 service to become stopped.  I wait for 5 checks, each one adding an extra 3 second delay between checks. This is only a simple delay, I’m not handling failures to stop.
  1. start-Service Apache2.4
    • Start up the service, should be using new password.
  1. for ($i=1; $i -le 5; $i++) { if ($(Get-Service Apache2.4).Status -eq 'Running') {$i+=10} else {Start-Sleep -s (3*$i)}  }
    • Wait for the service to start, using the same basic logic as the delay for stopping.
  1. Write-Host HTTP Response Code $(invoke-webrequest -uri http://localhost -UseBasicParsing).StatusCode
    • Adding a test to see if the default page from Apache will load, passing the code back to Run Command output.

Here is the full code for the 2nd iteration.

I created my new document since I am moving from a generic user password reset to a specific use case.

I ran the ExampleChangeHTTPDPassword Run Command against the same instance. Here is the output:

Great, my service was restarting and I am getting back an HTTP 200 response so I know it is serving some traffic.  I still have 2 problems though. Last time when I didn’t meet the password criteria the Run Command result was still successful.

The bigger problem with this approach. My password has been saved in the Run Command History!

Using parameters like this works fine for disabling and enabling accounts, or adjusting local group membership, but secrets should not be included in parameters!

Let’s move to a more complicated document to allow us to reset our service account securely, and to get a better return code.

First I needed a way to save my password somewhere outside of the command content or parameters.  Luckily Systems Manager has a great new feature named Systems Manager Parameter Store that I can use for my secret management.  I can create a secure parameter that is encrypted using the Key Management Service (KMS).

I went to the Parameter Store link and clicked the Create Parameter button.

I created a new Secure String Parameter named ApacheHTTPDsvcPassword.

With my new password saved security, I can leverage it using the PowerShell Get-SSMParameterValue cmdlet. Here is what the new code for my ExampleChangeHTTPDPassword document looks like:

As you can see I removed the Parameters entirely and have a few additional commands:

  • $NewHTTPDsvcPassword  = (Get-SSMParameterValue -Name ApacheHTTPDsvcPassword -WithDecryption $True).Parameters[0].Value
    • Using the Get-SSMParameterValue cmdlet to save the password as a variable. The instance’s IAM role is used to authenticate the request.
  • $WebTestResult =(invoke-webrequest -uri http://localhost -UseBasicParsing)
    • Saving the result of the test to a variable rather than just echoing it.
  • if ($WebTestResult.StatusCode -ne 200) {exit 1}
    • If the StatusCode is not 200 (i.e. 404, unable to connect) exit with error code 1.

Rather than creating a new Document, I just created a new version of the document and added the code.

I then updated the Default version of the SSM Document Run Command would use when executing.

When I executed the Run Command this time, I had no prompt for the password. This should reduce impacts from human error.

Everything worked successfully using the parameter store.

I then tested against a system without the user account or Apache installed to see if it showed the failed execution. This time though, I executed it from the AWS CLI, rather than the console. Using the CLI I can specify a single tag value to execute against, rather than using an Instance Id. This lets me hit multiple targets with a single execution.

aws ssm send-command --profile trinimbus --region us-east-1  --document-name "ExampleChangeHTTPDPassword" --targets "Key=tag:Role,Values=WebServer" --max-concurrency 20% --max-errors 2

The reply included the Command ID value. I then used the list-command-invocations command to monitor the progress.

aws ssm list-command-invocations --profile trinimbus --region us-east-1  --command-id <Command ID from prior command>  --output table --query 'CommandInvocations[*].{InstanceId:InstanceId, Status:Status}'

Run Command now shows 1 instance was successful, one failed. To get the details of the output for the failed instance, I ran the following.

aws ssm list-command-invocations --profile trinimbus --region us-east-1  --command-id <Command ID from prior command> --instance-id <Instance ID from prior command> --output text --query 'CommandInvocations[*].CommandPlugins[*][Output]' --details

The output looked like this:

Everything seems to be working ok. The last check is to confirm the CloudTrails events to ensure the password does not end up in an audit log.

There isn’t any unintentional leak of the password now. Everything was good and my objective was met. I can now roll out password rotations for the Apache httpd service to a large set of instances securely and easily.

Note: This example is not intended for production use in this state.  It does not include things like error handling, targeting different groups of servers to prevent downtime, removing from load balancers before updating, or adding any notification/audit logging beyond CloudTrails. If you need assistance with creating production grade automation toolkits like this please contact TriNimbus.

This is only one example of using Run Command, I encourage you to test out a few scenarios to see if it can help reduce any pain points you have. The code from this blog is available on GitHub.

In upcoming posts I will share more details on additional benefits of using this service.