AMIS DataSafe, the answer to Black Swan scenarios

Jeroen Gouma

How to be prepared for a black swan scenario?

In the light of worldbackupday I want to discuss the AMIS DataSafe concept we recently introduced.

Again, this is a blogpost without scripts or code-snippets. It will try to explain the philosophy behind “Amis DataSafe”, a concept we developed in order to better prevent environments against black swan scenario’s.

The black swan theory according to Wikipedia: The black swan theory  is a metaphor that describes an event that comes as a surprise, has a major effect, and is often inappropriately rationalized after the fact with the benefit of hindsight.

Putting this in IT perspective this goes far beyond the purpose of making a regular backup. We all make backup’s (don’t we?), and hopefully we execute a restore test at least once a year. So far, so good.

But wait, what if the backup is compromised? Or if the data is modified outside the retention period? In these cases the normal backup procedure will not bring you back in business.

What if you’re under attack with ransomware and all your backup’s are also held hostage? Or if a (former) employee modified each 438th record in each table in your database. And so on… I can bring up more examples, but you got the picture.

The similarity between all the scenarios above is you need to be prepared to at least have a chance on surviving.

In my opinion the only way to have any chance on surving in these cases is saving (on regular basis) all the data you need to restore an environment in a safe place for a longer period of time. Of course this place needs to be as safe as possible and comply with several requirements:

  • Access requires at least 2 different persons involved
    In order for an operator to gain access the keykeeper first needs to activate access via the web interface. The role of keykeeper is given to somebody who is not involved on day-to-day business in the environment, and has no access to the specific IP-address which has access.
  • It needs to be inaccessible without the proper keys
    The DataSafe is only accessible from 1 specific IP-address with 1 specific ssh key (after the keykeeper did his/her trick.
  • It needs to be able to run autonomous
    The system must run unattended.  After every job the result is send by email to the involved DevOps team.
  • It only retrieves what is specified
    Only specified data is collected using a minimum of wildcards. This minimizes the risk of collecting executable code (which would not be executed anyway)
  • It needs the data stored must be immutable
    All data in the Safe is collected via a pull mechanism and stored without any further action. So if any executable code should be collected, it would never be able to execute.
  • It contains everything to raise from ruins
    One of the most important aspects to think about is what all is required to rebuild a production environment from the ashes. Imagine that you lost everything and have to rebuild from scratch. This is not only your database, but also think on code repositories, Active Directory of LDAP, Infrastructure as Code, etc. etc.
  • It requires a lifecycle mechanism
    Unless your requirement is to keep the data “forever”, you will need to decide on a lifecycle policy. For which period of time can the data be useful to restore?
  • It is documented
    In order to be able to rebuild an environment it is crucial all the knowledge is documented. Not only the deployment strategy, but also think about network connections, internal and external certificates etc. And last but not least  describe the restore procedure itself.

Once  you have this all in place, the fun part begins. Try to restore your production environment completely isolated form your real production environment without taking a peek in the “real world”. What information is missing and needs to be added to the DataSafe to be able to rebuild?

Once the gaps are identified, adapt the changes and invite another colleague to execute the recreation based on the new procedure.

Some final thoughts and quick notes

  • Depending on the required level of completeness: do you also have all your documentation stored in an online service like Atlassian? What if this service is having a service disruption if you need to rebuild your environment?
  • AMIS Datasafe is a concept which can be adapted in (almost) every environment. On premise, cloud or hybride cloud.
  • Implementing AMIS DataSafe is not a cheap solution, but imagine the costs if you loose your complete production environment.

Curious to learn more about AMIS DataSafe or how we can help to implement? Contact me: jeroen.gouma@amis.nl

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Next Post

WebSinker–generic event detection for unhooked systems to trigger WebHook

Quite a few systems – business applications large and small – do not have support for WebHooks. That is: they do not offer the ability to register a URL as HTTP endpoint (WebHook) that the system will send requests to that inform the endpoint about relevant events. However, there are […]
%d bloggers like this: