Of course we take all reasonable measures to ensure only people we know gain access to our systems and our data (confidentiality through authentication) and that they do so according to the rules and roles we have specified (and authorization). We implement validations to prevent invalid data from entering our enterprise (integrity) and we take precautions to prevent a loss of availability. However, watertight security is an illusion. Chasing that illusion with ever more complex and expensive security measures is not a great idea. We have to accept the fact that at some point, we will probably come under a DDOS attack. Or someone will get their dirty little hands on our data, ready to ransom us or straight out leak the data.
It can be a relief of sorts to accept that a security breach is inevitable. Despite the most rigorous safety measures, no one can rule out a fire. No matter hard a chef can work on food safety, there is always the possibility of a contaminated ingredient. And even though we build our fences high and strong and enforce strict access controls, eventually and inevitably, someone will get passed the defenses. It is not okay, it should not happen. But it will. It is not the end of the world – provided we have prepared for that situation.
In addition to run time authentication and authorization based on identity and access management complemented with protection against peak loads of traffic coming in to our systems and appropriate encryption of data inside our organization and messages traveling from and to our enterprise – all with the aim to prevent bad stuff from happening – we also have to get ourselves the IT equivalent of the smoke detector & fire alarm and the motion detector.
Stuff that should not happen but probably still will happen should come to our attention ASAP. We need to be able to spot a security breach very quickly by detecting any wrongdoing as quickly as possible.
And that of course is just the first step. Learning about the problem as quickly as possible. The additional elements we need to have in place:
- [learn about any security violation as soon as possible]
- stop any further violation of our security policies
- establish the damage done and repair where possible; repairing may include PR moves and communication strategies and perhaps some form of financial compensation for customers or business partners
- analyze how the breach occurred, which security measures may have failed or fallen short and determine how to prevent this violation from happening again and/or how to improve our detection approach or the response. Note: we may realize that protection against attacks of this type are beyond our means; the business case for prevention vs. detection & rapid reaction may not pan out and we will settle for the latter.
Detect, act, correct, protect
If the objective is to quickly spot events and situations that should not occur, then we first have to determine what these events and situations are and secondly how to spot them. Examples of what should not happen are:
- someone saw information [of a specific confidentiality classification] they were not authorized for
- an employee approved his own purchase [instead of a different employee approving it]
- two bank accounts were created for a client [which is against the policy]
- response time for the public website is much longer than it should be (typically more than 15 seconds) (and many requests do not return at all and end with a time out)
- two hundred order records were deleted
- someone did not change their password within the mandatory period
- an inappropriate email was sent out to 2000 customers
- after 5000 incorrect password attempts a session was successfully logged on
It is undoable to create an exhaustive list of all the things that could go wrong and should be prevented or at least detected and remedied. This list is dynamic – constantly evolving, as the business changes, systems evolve and both new regulations and new threats arise.
For the situations on the list that should not occur, we first may decide that they are so serious that we should put protective measures in place to prevent them from happening. Then, we need to determine how we can detect that these occur or have occurred. What measures can we put in place to learn about breeches in real time? Examples are: a monitoring robot to periodically verify the response times on our public web site and agents that are scheduled to analyze log files and transaction records to look for undesirable events. When security breaches are identified, a form of reporting is required – possibly through alerts and notifications. Typically a human operator is informed of the finding – along with details about the when and where and maybe the who and why as well.
For ongoing problems, actions should be available to stop them from continuing, such as killing sessions, rebooting servers, failing over to an uninfected site or sending security staff to a specific desktop computer or server location.
For breaches that already have happened we should determine why the breach took place when it did. If it could and did happen because of a recent event – for example a virus has infiltrated, a password was recently stolen, a security weakness was exploited, new (faulty) software has been deployed – and a direct link or probable cause can be found, then perhaps a change can be rolled back or action can be taken: a fresh install, reset of passwords, an update of certificates, or the application of a patch.
If the breach seems an incident, a (sudden) exploit of something that was already there, then we could:
- apply a close watch to detect additional breaches even earlier [now we know what to look for]
- try to apply a patch to stop the leak as soon as possible
- and/or (temporarily) add security measures such as enforce multi factor authentication, apply more stringent login policies, add approval steps in workflows, run a frequent virus scan, strengthen firewall / load balancer
Once the immediate threat is averted or at least closely observed, we need to repair the damage. What has happened – that should not have happened – is ideally undone with no harm done if at all possible. Knowing exactly the results of the security breach allows us potentially to recover any lost data, correct any invalid data, arrange for approvals on transactions with wrong level of approval (or rollback these transactions). Alternatively or additionally, we may need to make a formal record [and report] of breaches that cannot be undone: someone saw data without proper authorization.
Hopefully we can take something away from the incident. Learn and improve. So we should analyze the breach and based on the analysis reevaluate our security measures. We need to find out in as much detail as possible: what happened and how could that happen? What went wrong? And most importantly – based on what happened leading up to the security breach how can it be prevented a next time?
Several options may be available to prevent a similar issue in the future. We need to find the balance between the cost of possible prevention measures and the risk-weighted cost of the security breach. We should decide whether to apply additional measures – or accept the risk of similar breaches. Note: a measure can also be an earlier warning system rather than an actual prevention attempt.
Conclusion
Despite our best security efforts, it is wise to assume scenarios involving a breach. A watertight perimeter is not realistically achievable – but that can be okay. We need to think through the security violations we could be facing – and come up with ways to detect these violations, raise our guard to ward of further invasions, determine the damage done and control and ideally repair this damage. With this pragmatic approach, we could perhaps find a sensible balance between prevention measures on the one hand and detection & resolution procedures on the other.