Service Line Classification can reduce your Data Center Cost and Increase Systems Reliability
Reliability and Infrastructure Design
Service Line Classification (SLC) represents the bringing together of several strategies and concepts that have been around for a while in one form or another. The value proposition of the Service Line Classification is to introduce the business as a key player in the development of Data Center services and improve the investment and operating costs of a Data Center by way of strategic alignment to the business requirements.
Traditional Data Center investment has been focused on the development of facilities to meet the most critical needs of the business from the perspective of resilience and redundancy. This has invariably led to the installation of less critical systems under the same roof, utilising the same critical infrastructure, which has been sized according to the total load and capacity requirements of that business.
Service Line Classification – Categorization
In SLC we aim to categorise the grouping of applications and hardware into discrete units, or Service Lines, that can stand alone in the delivery of specific services to a business.
By grouping the services components like this, we have the opportunity to do two fundamentally different things that impact both Operational Continuity of the Service Line and reduce the Total Cost of Ownership (TCO) to the business.
How do we do this?
At a high level we are now able to improve the reliability of a Service Line by bringing together the key components of a Service Line into an environment that has improved reliability modelling. If we take the example below;
Each function within the Service Line is representing a complex combination of applications which may reside on multiple server platforms and be dependent on a single, or multiple, storage devices, as shown here:
In the Service Line above, which we use to represent a basic procurement and delivery service, we can see that each process has a dependency on its predecessor. If any process fails then the whole Service Line fails.
When you look at the complete Service Line with its attached storage hardware you will have something similar to this sitting across your Data Center.
In this instance the Service Line has multiple points of failure – one each being represented by the distribution of the servers and storage hardware across the Data Center facility.
This is a common strategy for deploying Data Centers today. In many cases where there is a mix of applications running on different operating system platforms there is often a requirement to group similar platforms together – Windows servers in one row and Unix servers in another etc. Traditionally there have been good reasons for designing the Data Center this way. Operational controls can be enhanced by limiting service and maintenance vendors to cabinets that only hold their equipment. However, this distribution is not efficient from a reliability point of view as the increase in the number of component units effectively decreases the reliability of the model.
The most reliable model for this Service Line would be to install all of these applications, and the storage requirements, within one hardware box. This reduces the single points of failure from 5 components to 1 component of hardware. This may seem, on the face of it, counter intuitive. Surely moving from 5 boxes to one box increases the single point of failure risk? Well, consider this; if each box in the system of 5 is a critical service and has dependencies within the whole system, such that failure of any one box brings the whole system down, then you can appreciate that each box is a single point of failure in its own right and therefore increases the probability of failure by 5. Reducing the number of boxes in the critical system therefore reduces the probability of failure.
Mathematically we can demonstrate this by;
Rsl = Rapp1*Rapp2*Rapp3*Rapp4*Rstore
Where Rsl is the reliability of the Service Line.
If each of the components has a stated reliability of 99.99% then the product of the components is calculated to be 99.96%. In terms of the Mean Time To Failure (MTTF) the more components that make up a service line the less reliable the Service Line is.
That covers the first opportunity of SLC – improving reliability of a Service Line.
Service Line Architecture
The second opportunity that comes from SLC is this; by “bundling” the Service Lines into critical groups we can now look at designing a Data Center space that provides a variable tiering system of critical services.
By doing this we have the opportunity to scale down the overall critical services required by the Service Lines because, we previously discussed, not all service lines require the same level of reliability / availability. The opportunity to be gained here is the design of a variable tiered Data Center that provisions very resilient infrastructure at the most critical end of the Service Lines and provisions basic non- redundant services for the non-critical service lines at the other end of the Data Center.
In this example we see a representation of such a Data Center layout.
We can see that provisioning various levels of redundancy and resilience from the mechanical and electrical (M&E) services will be more efficient than provisioning the whole Data Center to the highest level of service redundancy.
In the financial industry it is estimated that between 7-15% of all business applications require critical services to the level of a fully redundant and resilient Data Center such as Uptime Tier 3 or 4. The cost saving in capital investment and operational maintenance can similarly be estimated to be better than 20% depending on final design and size of such a facility.
Today there are no documented real-life examples of this strategy being put to use, therefore the cost efficiencies can only be based on estimates and intelligent assumptions from years of experience and inside knowledge of the financial industry.
There are a number of challenges that the Data Center industry has to overcome to deliver on this strategy. These are not just limited to the Data Center industry as a whole but to business owners that are moving towards better energy and cost efficiency models in how their businesses are managed and grown.
- From a business perspective there is a long track record of Business Continuity Planning work that would be the best foundation of data to start building Service Line analysis from. Indeed, most of the Service Line Classification work will already be done under a stringent BCP program.
- Translation of the SL Classification into a Solutions Architecture that can then group the applications and their hardware components into “SL Bundles” will be the next logical step. By Bundling Service Lines we are opening the door to an improved reliability model for critical systems and applications.
Note: there is an off-shoot advantage of Bundling Service Lines this way – in today’s ever changing business landscape many organisations are seeking to change their Data Center footprint. This change often means migrating the Data Center facility to a new location – be it in-house or co-lo or purpose built and owned by the organisation. One of the most complex and time consuming activities that a project manager needs to deal with in a Data Center migration project is that of developing application dependencies and producing application and system “Bundles” in order to plan migrations and deliver them without breaking the operational business systems in the process. The process of defining Service Line Bundles will effectively do this work in advance and allow for faster and less risky Data Center migrations to happen.
- Developing a Data Center facility that naturally lends itself to the concept of Service Line Architecture – the provisioning of variable tiering and scalability while retaining a competitive commercial model, has yet to be defined. This will encompass both M&E engineering design strategies as well as recognising any architectural implications that could facilitate and support the SL Concept.
- Assuring that the following results from this strategy are recognised:
- Commercial advantage – lower capital investment, lower operational costs
- Improved Reliability of critical Service Lines – fewer components in a Service Line
- Operational Continuity – being able to build and manage a Service Line Environment
- Scalability – growth of a business Data Center environment while retaining all the benefits of the Service Line Architecture.
The concept and theory of Service Line Classification can demonstrate, on paper at east, that there are two potential benefits that can come out of this as a strategy:
- Increased reliability of systems
- Decreased capital and operational expenditure of the Data Center facility and services to the business.