Virtualizing
business critical applications brings many benefits for organizations. This
blog explains the technical challenges and offers solutions.
When
companies deploy virtual infrastructure environments they achieve immediate
savings in data center footprint by consolidating server workloads onto few
hardware components. Tuning those achieves higher levels of availability for
those applications running on them. But after much virtualization organizations
often fail to progress.
Getting all
applications migrated to a virtual infrastructure platform requires new skills
and ways of managing capacity. The shift to Software Defined Databases requires
fundamental shift of how applications are developed and deployed. Licensing
issues require special attention (as vendors also realize that compute
workloads are no longer directly tied to physical hardware components).
The most common problem
As soon as
a Business-critical application require a higher levels of availability than
available virtual infrastructure can provide, problems arise.
Business
critical applications are understood as applications such as Microsoft SQL,
Exchange, SharePoint; SAP; Custom Java on Linux; Oracle and Oracle RAC (most
common examples) as well as DB2, Cassandra, Hadoop/ HBase, WebLogic, WebSphere;
Tibco, Rabbit MQ, MQ Services and other message queue systems and finally
in-house custom built/ maintained “home grown” applications.
When the
application runs slowly or even becomes unstable, the application is
temporarily moved back to the original physical infrastructure and the virtual
environment is blamed. The reason is not a problem of the virtual environment,
but in the configuration of how the virtual environment was deployed on the physical
infrastructure. Further, some often basic mistakes are made.
Understand the key issues
Business-critical
applications share a number of technological characteristics: They have high
compute loads (with heavy math or thread processing), RAM utilization,
specialized I/ O (particularly storage), availability configurations (requiring
OS or application clustering) and complex networking configurations (public and
private networks to support clustering).
Each
critical application requires a disproportionate amounts of CPU, RAM, Disk
(including disk space and I/O) and network (including number of connections and
bandwidth) and higher levels of redundancy, availability and recoverability. Each
application’s requirements are unique, but predictable. Important to translate
resource requirements to run on native hardware to the virtual environment.
Although
every application has something unique it is not necessary to define individual
best practices for each application to thrive in a virtual infrastructure environment.
The abstraction layer of the virtual environment with a set of common practices
can apply to all critical applications. Then each application can be further
tuned like on any other physical infrastructure.
Solutions
Critical
applications are already complex, so keep
design and solution simple.
-
Avoid
adding disks and spreading them across multiple data stores. Keep number of
disks and data stores to a minimum. Avoid splitting out base files that are
part of a virtual machine’s core components (vswap and others).
-
Avoid
duplicating features for high availability or redundancy through external/
homegrown solutions (often already present in the base systems or
architecture).
-
Avoid
assigning more CPU cores than necessary as it may slow performance (hypervisor
may seek to schedule CPU cores that will do nothing; heavily threaded
applications use more cores while number crunchers use fewer cores and more
cycles).
Instead, architect hardware from a total performance
perspective.
The virtual
environment always depends on the hardware. Therefore, size HW components appropriately to handle the anticipated loads.
Optimize CPU, RAM, Disk and Network.
-
RAM
is almost always exhausted first on virtual infrastructure environments.
-
Spread
I/ O appropriately across storage area network (SAN); use solid state drive
(SSD) and cache capabilities to boost performance. Enable jumbo frames as norm
for IP SAN technologies (iSCSI and NFS).
-
Use
10GbE connections for all network connectivity.
Storage is the perhaps the most complex resource to
manage, because it is almost always abstracted in multiple layers and varying
dependent on the make & model of the storage system used. It is where most
application performance problems arise first and most frequently.
-
Storage
capabilities should be pushed as low as practical in the hardware stack.
-
Storage
should appear as simple, local disks, and networks should appear a simple
connections.
-
Make
sure that individual components are not easily overwhelmed similar to
architecting shared storage for high-capacity I/O systems and applications.
-
Use
raw disk mappings (RDMs) as last resort only (does not add performance
advantage over a virtual disk located in a properly configured data store).
Instead and where feasible, use OS-level storage systems like ASM on Oracle.
Keep networks simple.
-
Avoid
virtual network interface controller (vNIC) teaming and bonding inside a VM, as
it is already handled by the hypervisor. Use one NIC for each distinct network
to connect to.
-
Keep
virtual machines simple and transparent. Do not install/ turn off unnecessary
services and features.
-
Follow
best practices to harden OS (it should feel too the applications as any other
optimized environment).
A typical business critical application optimization
stack
A typical
business critical application optimization stack could look as follow (from
bottom to top):
-
Application oriented optimization
o
5b)
Java Application
§ Resource Allocation, App Tunables
o
5a)
Java Virtual Machine
§ Heap Size, Threads, etc.
o
5)
Application
§ Cache, SGA, RM Commitment, App
Specific Tunables
o
4)
Operating System
§ Para-virtual Drivers, Kernel
Parameter Tuning (Linux)
-
Virtual infrastructure oriented
optimization
o
3)
Virtual Machine Hardware
§ Optimize vCPU, RAM, Storage,
Resource Limits & Reservations
o
2)
Hypervisor
§ Resource Pools, HA, DRS, Data
Stores, Parameter Tuning
o
1)
Physical Hardware
§ Server, storage, network
Clustering/ final optimizations
Understand
when to cluster and when not.
With a
well-engineered virtual infrastructure platform certain high-availability
configurations provided by system clustering for physical infrastructure
deployments can often be eliminated. However, clustering plays and important
role still for active-active clustered systems to support rolling upgrades,
regular maintenance, minimize downtime during patches, etc.
When
clustering on top of virtual infrastructure the high-availability features of
each layer should be optimized to complement one another. Avoid clustering techniques
that may interfere with infrastructure layers above and below.
To use
shared disk between individual nodes (voting and quorum drives) for operating
system clusters on VM use one of the four available methods. The iSCS/ NFS
Gateway VM is gaining traction as it resolves almost all of the limitations of
the other available solutions (RDM, multi- write virtual disk, iSCSI or NFS on
SAN/NAS). However, it is also more complex to set up and maintain.
Use
anti-affinity policies between the various cluster nodes to avoid that two
nodes run on the same physical host at the same time (and by thus defeating one
of the high-availability purposes of clustering).
Use a
multi-write virtual disk to have all data remain in virtual disk files on a
data store. All cluster nodes can then access that folder.
Credits & Special thanks: This blog
incorporates thought leadership and publicized content of Chris William,
director of Cognizant Virtual Solutions.
+++
To share your own thoughts or other best practices about this topic, please email me directly to alexwsteinberg (@) gmail.com.
Alternatively, you also may connect with me and become part of my professional network of Business, Digital, Technology & Sustainability experts at
https://www.linkedin.com/in/alexwsteinberg or
Xing at https://www.xing.com/profile/Alex_Steinberg or
Google+ at https://plus.google.com/u/0/+AlexWSteinberg/posts
+++
To share your own thoughts or other best practices about this topic, please email me directly to alexwsteinberg (@) gmail.com.
Alternatively, you also may connect with me and become part of my professional network of Business, Digital, Technology & Sustainability experts at
https://www.linkedin.com/in/alexwsteinberg or
Xing at https://www.xing.com/profile/Alex_Steinberg or
Google+ at https://plus.google.com/u/0/+AlexWSteinberg/posts
No comments:
Post a Comment