Over the last years, as part of my consultancy experience, I had the chance to work on several assignments tackling IT performance problems. In this post I will try to share my experiences on that topic, considering performance problems that I have faced on several (on-premise, cloud, hybrid) and different technology types (Oracle, Microsoft Azure and open source) of solutions. More specifically, I analyse the root causes (Whys) and present my best practices (Hows) to prevent an IT-solution from performance problems.
The Whys.. (root causes of performance problems)
Performance problems on IT sector is a very wide topic, so you might thing that root causes are mainly related with each individual case. Thus, it is difficult to isolate and generalize what can be the root cause of performance problems. That might seems correct, but if you examine more closely different cases of IT-solutions (either of “small” or “large” scale) which are suffering from performance problems, you can identify some common characteristics between all of them.
But before start talking about the root causes, let’s begin by analyzing the meaning of the following sentence: “The current IT-solution suffers from performance problems”
The first part of the sentence (The current IT-solution) refers to existing hardware and software pieces, which have been used or combined for solving specific business needs. While the rest of the sentence (suffers from performance problems) indicates that provided hardware and software pieces do not meet the performance requirements of these specific business needs.
Let’s try now to explain what are these performance requirements and why are so important. The performance requirements define how well the provided IT-solution performs certain functionality under specific conditions (some examples can be: speed of response, throughput, execution time, storage capacity, etc..). The performance requirements usually are defined as quantitative and qualitative Key Performance Indicators (KPIs), which are measurable values that demonstrates how effectively a solution is achieving the key business objectives. Performance requirements need to be considered along with other type of solution attributes, named as non-Functional requirements and are key elements when designing and testing a new IT-solution. While Functional requirements define WHAT the solution does (or must not do), the non-Functional requirements specify HOW the solution should do it.
Non-Functional requirements define solution’s attributes such as security, reliability, performance, maintainability, scalability, and usability. Additional, they serve as constraints and restrictions during the solution’s design and implementation phases (very important factor, which usually is ignored or it is considered as of minor importance). Non-Functional requirements are also known as quality attributes and are just as critical as the Functional requirements. They ensure the usability and effectiveness of the entire solution. Failing to meet any of them can result to a solution that fails to solve business- or users- or market- needs, a situation which is translated to a technical debt. In some cases, additional to technical debt, non-compliance with the non-Functional requirements can cause significant legal issues as well (privacy, security, safety, to name a few).
Considering my experience (IT-solutions with performance problems that I have faced), all cases had the following common characteristics:
- During their design and implementation phases, the non-Functional requirements had not considered equal as the Functional requirements. While in some cases did not even exist non-Functional requirements and the reason always was “not enough time”.
- In almost all cases, the solution had never been evaluated against them. While only for a few cases, some of the non-Functional requirements (but not all) had been evaluated in rush just before solution goLive.
That had as consequence the provided solution to not comply with the non-Functional requirements (performance requirements are part of them) and end up to a technical debt. An observation, which was realized either on production or in best case when the solution was close to production.
To make above assertions more clear, let’s explore what are the potential consequences when the non-Functional requirements and more specifically the performance requirements (like Response-Time, Throughput, Scalability, Capacity, Availability, Reliability to name a few) are not considered during the solution’s realization:
- Solution end up with non-proper infra sizing (like cpu, memory, storage, networking).
- Applications and systems, used as part of solution, do not have the right configuration and the proper setup to cope with production load and/or data volume (like DB-systems, application-servers).
- Low quality code implementation, with poorly-written algorithms or code-issues related to scale and optimization (non-Functional requirements act as “constraints” on development that limit some degree of design freedom).
Above three types of technical debts are the root cause of all performance issues and occur due to non-compliance of solution with the non-Functional requirements.
Summarizing all of the above, the main reason of performance problems is related with the fact that provided solution does not comply with the non-Functional requirements (quality attributes). Or in other words, because the “quality” of provided solution does not meet the business expectations.
The Hows.. (prevent solution from performance problems)
After exploring the root cause of performance problems, let now see how can prevent a new solution from these three types of technical debts and ensure optimal performance in the long-run.
Bellow I list my best practices to forestall IT-solutions from performance problems.
1. Collect, define and get consensus over the non-Functional requirements
After identifying the crucial role of non-Functional requirements, ensure prior start designing a new solution that non-Functional requirements have been clearly defined, documented, presented and communicated to all the relevant stakeholders to get their consensus.
Defining non-Functional requirements is a difficult and challenging process. Over-specifying them, the solution may be too costly while on the other hand under-specifying them, the solution will be inadequate for its intended use. An adaptive and incremental approach should be used for exploring and defining them together with the clearly define scope of solution.
2. Consider non-Functional requirements during design and implementation phases
Ensure that you have you clearly define the non-Functional requirements and more specifically those related direct and indirect with performance (like Response-Time, Throughput, Scalability, Capacity, Availability, Reliability to name a few). Estimate or project what will be the expected production load and data volumes that solution needs to cope with. Furthermore, analyse the KPIs and SLAs requirements (KPIs and SLAs are defining the thresholds for metrics you don’t want to cross).
Consider all these parameters as constraints or restrictions to design and implementation decisions over the solution.
3. Evaluate design assumptions against non-Functional requirements
During the design phase it’s common the need to make several assumptions (e.g. over infra sizing, the technology, the applications, etc).
Always try to evaluate the compliance of all these design assumptions with at least all non-Functional performance requirements (if not possible for all). Noncompliance end up to a technical debt, which in some cases might have huge impact putting at risk the success of solution (e.g. wrong assumption over the technology or product capabilities).
In most cases, architects’ experience and their cooperation (between solution, integration and technical -architects) will apply all these compliance evaluations. But in case of doubts or luck of prior experience, especially nowadays with the plethora of new Cloud options and technologies, a Proof-Of-Concept (POC) is a highly recommended approach for evaluating these design assumption.
4. Define a performance test strategy
Define the “Performance Test Strategy” of solution, a high-level document that you need to create during the design phase. This document describes the directives and the approach that is going to be followed for achieving the performance testing goals.
A typical Performance Test Strategy document should have the following contents:
- Overview: The overall goal of performance testing
- Scope: The scope is very important and need to be very specific, because it describes what exactly will be tested.
- Test Approach: How will be tested, the approach that is going to be followed during the performance tests (e.g. how to create the baseline, which then will be used as a reference for benchmarking etc)
- Test Types: What types of tests will be carried out (e.g. Load Test, Stress Test, Endurance Test, Volume Test etc)
- Test Deliverables: What will be the deliverables of performance tests (e.g. test run report, summary report, etc)
- Environment: Which environment will be used for the performance tests, considering details as if the environment will be a replica of production and if it will be sized (data-wise) similar/up/down from production.
- Tools: Mentioning which tools will be used as part of performance tests. Tools likes like Defect Tracking tools (e.g. Jira), Management tools (e.g. Confluence), Performance Testing tools (e.g. Jmeter, SoapUI, Selenium), and Monitoring tools (on Infra and Application layer)
- Entry & Exit Criteria: Clearly described the entry criteria (e.g. application-X must be functional stable prior starting performance tests) together with the exist criteria (e.g. most of the SLAs need to be met).
- Risk and Mitigation: Mention any risk that can affect the performance test together with the corresponding mitigation plan. This helps to ensure completion of performance tests on time without affecting the deliverables.
5. Follow a shift-left approach and start performance testing in parallel with SDLC
The purpose of performance testing is not to find bugs (functional issue) but to eliminate performance bottlenecks. Performance testing measures the quality attributes (non-Functional requirements) of the solution.
Thus, it is highly recommended to adapt a shift-left approach to identify and address performance issues as much as earlier in the development process. But first, let’s explain the shift-left approach and why is important to adapt it as part of performance testing.
Shift left is a practice intended to find and prevent defects early in the software delivery process. The idea is to improve quality by moving tasks to the left as early in the lifecycle as possible. The shift left approach emphasizes the need for developers to concentrate on quality from their earliest stage of a software build, rather than waiting for errors and bugs to be found late in the SDLC. Discovering potential performance issues early in the development cycle minimizes the impact and the cost of fixing them, which increases (in orders of magnitude) as solution realization moves from the design phase to production. Additional, introducing performance testing during development, helps keeping performance criterion in the forefront.
Now that we’ve seen the importance of shift left performance testing in parallel with SDLC, let’s explain how that can be achieved. Shift left performance testing can be done by introducing the following principles as part of the SDLC mentality and way of working:
- Measure early and often. By tracking key metrics that provide insights into the performance of what you’re building. This helps to keep tabs on performance and areas of improvement.
- Make performance a prominent part of the coding process. Including simple performance unit-testing (additional to functional one) and apply performance code reviews not only for poorly written code but also for things that affects scalability (developers additional to bug-free code should ensure that the code they’re committing performs as well as possible, in terms of scale and optimization).
- Keep real-world usage/cases always in mind. Without keeping real-world usage patterns in mind as part of testing (not using scenarios from real-world usage), might will bring surprises when the solution moves to production. On that direction team-leads should explore issues that pop up in production and keep an eye on things that popped up at the beginning of the SDLC to ensure that get fixed and not shoved aside for other more urgent matters.
6. Add throttling options
It’s always a best practice the provided solution to contain throttling capabilities. In other words, to have mechanism/s for controlling the load (both inbound and outbound). These throttling capabilities can be used, “if” and “whenever” it is necessary.
Every solution should have such type of throttling options not only for controlling the load, in case of performance related issues. But as an extra “safety valve” preventing the solution from the potential functional impact due to an extreme peak load (e.g. due to an unexpected business situation or due malicious intent).
In this post I have tried to explain the importance of non-Functional Requirements (also known as quality attributes) and reveal why they hide behind the performance problems. Noncompliance with them end up to technical debt, because solution does not meet the business expectations (it’s main goal). These types of technical debts are the root cause of performance problems.
Thus, it very important to always consider carefully the non-Functional requirements as part of the solution realization. Even if that might require some additional efforts (to collect and define them), the impact and the cost in case of performance problems will be multiples time more.
Keep in mind, providing a solution with optimal performance in the long-run and without suffering from technical debts it makes a difference 😉