Nightmare on Cloud Street: Halloween Stories...from the Darkside - TriNimbus

Nightmare on Cloud Street: Halloween Stories…from the Darkside

With great power

It happens so often it’s almost become cliché.  It’s a dark and stormy night.  You’re home alone.  The phone rings. It’s… Amazon Web Service’s automated identity verification system, calling you back to confirm your account.  You enter the PIN number provided by Amazon’s website into your phone and complete the sign-up process.  Moments later you’re logged in to the Management Console, laughing maniacally over the amount of cloud computing power that now lies at your fingertips. That was easy…  a little too easy.

Unbeknownst to you, a chain of events has just been triggered that will have grave consequences for you and everyone involved.  Swept up in the excitement of it all, you fail to read the warnings about best practices during account setup (perhaps the warning was obscured by an inconveniently overgrown bush at just that moment) and particularly about a service named Identity and Access Management.  You log in with the root account credentials, and everything just works!  You’re in the cloud, launching EC2s, creating S3 buckets, deploying databases on RDS.

You even figured out how to create a set of access keys, and throughout the weeks and months that follow you hand them out like candy to trick-or-treaters:  3rd party services, coworkers, and even haphazardly commit them to a source code repository or two.

And then it finally happens. Your keys are leaked.  An evil spirit gains access to your account, and it’s one of the worst kind.   Most just want to join your instances to their zombie botnet so they can eat more brains and perform larger DDoS attacks, while others want to use your account to send spam, and others yet just to quietly mine Bitcoins.  But the entity that has compromised your account wants to cause havoc, so they start deleting resources. And because they have your root credentials, they can delete everything.

By the time you even notice something is wrong, it’s too late.  Everything is lost, and because all your backup strategies centered on that one account, there is no hope of recovery.  The ghosts of your data are doomed to roam the tubes of the Internet for all of eternity.

The devil is in the details

If there’s one uncompromising truth about cloud computing, it’s that figuring out the pricing can be scary.  But for the most part, we all know and lookout for the three biggest monsters when it comes to calculating the cost of cloud computing: time spent on compute, the storage space used to save our data, and the bandwidth needed to move that data hither and yon.  And for the most part, if you focus on keeping those three beasts under control, you’ll have a fairly predictable monthly bill, and will be well on your way to living happily ever after.

But then maybe you have a new idea for a product that has different usage patterns of an AWS service you thought you were already comfortable with.  You’re used to serving webpage assets from your storage buckets, which is a typical usage pattern that results in a typical bill.  But your new idea only uses very tiny amounts of temporary data—way smaller than the average web asset—so it’s cost should be even less!  But you forget about per-request pricing, which was a negligible cost before, so you understandably didn’t have to pay attention to it.  But now you’re not serving content triggered by human requests, but by a complex automated system that’s reading and writing thousands of requests per second, which completely changes the dynamics of the service’s cost.  Your per-request costs build up exponentially, and by the end of the month you receive an invoice for AWS that is a horror story all on its own.

Just when you’ve recovered from that shock, AWS releases a new managed service that you could take advantage of, offloading some of the work you previously had running on a set of customized instances that took a lot of man-hours to maintain.  But an aspect of this services pricing strategy slips by you.  You know the amount of bandwidth you have to process hasn’t changed, but before you were just paying per hour of compute needed to process it. The new service charges a per-GB fee as well, and this doesn’t align well with your usage patterns.  Before you know it, another surprise is lurking for you at the end of the month as a line item on your AWS invoice.

Not happy about the bill over the past couple months, you aim at cutting costs to make up for past digressions.  You convert some of your always-on EC2 instances to run as part of an Auto Scaling group, so you don’t have to pay instances when they’re sitting idle.  But you don’t get the settings for the scaling alarms or the termination policy quite right for your needs.  The instances are stuck somewhere in the void between having too many launched and not quite enough, and the ASG adds and removes instances at an erratic rate.  Now what once could be handled by a single instance running for an hour is now handled by twelve different instances running for five minutes at a time, increasing that services compute bill by an order of magnitude as you pay for a full hour each time an instance is launched.

Billing alerts, service monitoring and a careful review of the pricing details of each service you use as your usage patterns change would have saved you your sanity, but it is too late for you.  Only the padded walls of your room at the asylum can comfort you now.

An ancient evil awaits

Many years ago, in times long forgot, an ancient evil was accidently released upon your cloud environment.  It has layed there, dormant, biding its time.  This evil is known by many names, but in this story we’ll simply call it the bug in the open source software project.

But you’re in the cloud.  You were told it’s safe.  Secure.  Immune to curses and hexes.  You may have even heard of the shared responsibility model, but haven’t had time to read it yet.  If you had, you’d know that AWS is responsible for the security of the cloud—the underlying physical infrastructure up to and including the hypervisor—but you’re responsible for security in the cloud.  And that includes any software you have have unleashed upon your innocent, unsuspecting EC2s.

Since that dreadful day, your company has seen developers come and developers go, and by the time this particular exploit becomes known in the wild, no one is left who even remembers that the susceptible software is in use, let alone is monitoring its release history to know when a critical patch is available that needs to be applied.

And so it gets left unchecked, and as typically happens in a story like this, it’s the bad guys who discover its existence first.  And once they’ve found it, they wield its power for evil, like any good bad guy would do.  First, they send a payload through the exploit to install a rootkit on the system.  Then they join it to a botnet.  Once possessed, they bend the instance to do their evil deeds, as instructed by a command and control service hosted in some far off country ruled by a corrupt dictatorship.  You have to tackle this problem at its source, so you call the Ghostbusters (a.k.a. TriNimbus) to help gain control.

Together, armed with malware scanners and web application firewalls, you track down the ancient evil.  You contain it. Patch it.  And when you’re done with it, that open source project is as docile as a labrador retriever puppy.  You’ve won this battle, this time, but there are other evils in existence in this world.  This isn’t the last time you’re going to come across one in your travels, but when the next one comes, you will be ready.