devxlogo

Best Practices for VMware Environments Performance Optimization

VMware environments
VMware environments

Optimizing performance for VMware environments is important so that the workloads run efficiently. Virtualization enables multiple virtual machines (VMs) to use host hardware resources such as CPU, memory, storage, and networking. Without optimization, high contention could negatively affect performance.

This article outlines best practices for optimizing key areas in VMware environments. Following these recommendations can improve workload throughput, reduce response times, and avoid bottlenecks. The suggestions are aimed at vSphere administrators, infrastructure architects, and application owners who use VMware to host business workloads.

CPU Optimization

CPU is often the first resource to bottleneck when expanding workloads in VMware environments. Adding more CPU capacity or optimizing CPU use is key for performance, especially considering the new VMware licensing model and its impact on resource allocation.

Size VMs Appropriately

One of the most common mistakes is overprovisioning vCPUs for a VM. Particularly with scheduling overhead, it can hurt rather than help performance with too many vCPUs. As a rule of thumb:

  • For light workloads (e.g., web servers, small databases), use 1-2 vCPUs
  • For medium workloads (e.g., application servers, medium database,s), use 2-4 vCPUs
  • For heavy workloads (e.g., large databases, batch/analytics, jobs), use up to 8 vCPUs

Consider application requirements and monitor usage before deciding on vCPU numbers.

Use Resource Pools for Priority

Resource pools let you segment CPU capacity and guarantee resources for key workloads. Set shares and reservations on pools based on workload priority:

  • High-priority pools get a higher share value or CPU reservation
  • Lower priority pools get a lower share or remain unreserved

This prevents less critical VMs from taking resources from production applications.

Keep Host CPU Utilization Under 80%

When aggregate host CPU usage exceeds 80% for sustained periods, most workloads experience degraded performance due to queueing delays.

Monitor usage with vCenter and vRealize Operations Hot. Add or hot-plug hosts to increase capacity before reaching thresholds for a long.

Favor Newer CPU Hardware

New CPUs are sold with performance gains that come from increased cores, higher clock speeds, new architecture, or bigger caches. When possible, upgrade older hosts, especially if they are CPU-bound.

Intel tests show that the E5-2600 v3 series offers 2X more throughput for vCPU-intensive workloads than prior generations. This difference is substantial for large production VMware environments.

Memory Optimization

Memory capacity and performance tend to be the next bottleneck after the CPU. Expanding and tuning memory resources helps avoid paging and queuing when the workloads grow.

Set Memory Reservations

By default, memory allocation is dynamic based on VM activity. This can let some VMs consume excess resources during peak times.

Set a memory reservation equal to the VM’s active working set size to guarantee availability without paging. Monitor VMkernel swapping and ballooning metrics to the right size.

See also  Reducing Write Amplification in High Throughput Databases

Use Resource Pools for Fairness

Memory resource pools for production environments are created, and reservations, limits, and shares are set according to the workload priority. It ensures that less important VMs do not affect others when host memory is scarce.

Provision Memory Inline with Scaling Expectations

Try to provide memory capacity in line with a particular data center’s computing capacity over 12-18 months. Expanding VMware cluster resources like CPU and memory without disruption can be complex. Plan to avoid emergency upgrades.

Size Memory to Application Needs

Default VM templates often overprovision memory without regard to actual usage. Right-size VM memory allocation based on data from monitoring tools or by checking for balloon drivers. Avoid allocating 100% of host memory, which can cause swapping under peak loads. Target 90% commitment for better performance.

Use Newer DDR Memory Technology

Upgrade older hosts to leverage newer memory technologies like DDR4 with higher clock speeds, especially for memory-intensive applications. Tests show that DDR4-2666 offers 25% better performance compared to DDR3-1333 when paging and swapping occur.

Storage Optimization

Storage performance affects every VM and application in a vSphere environment. Slow storage causes queueing that impacts upstream workloads.

Use All-Flash Storage for Production

For the highest performance with the lowest latency, use all-flash storage arrays rather than traditional spinning disks for production workloads. Hybrid flash options strike a balance when cost is a concern.

VMware tests comparing all-flash vs hybrid arrays show:

  • 33% greater VM density
  • 97% lower read latency
  • 63% higher IOPS

All-flash storage delivers the speed needed to meet response time SLAs for mission-critical databases or real-time workloads.

Use RAID 10, not RAID 5, for Transactional Workloads

RAID 10 offers better performance for transactional use cases than RAID 5 or 6. Its mirroring allows for higher throughput, whereas the parity calculation overhead of RAID 5 slows writes.

Use RAID 5 for reading Heavy Workloads

For workloads like data warehouses, which mostly involve read operations, RAID 5 offers efficiency advantages over the 50% capacity overhead of RAID 10.

Use RAID 6, Not RAID 5, for Larger Disks

With higher-density multi-TB disks, RAID 6 offers advantages over RAID 5 with the second parity disk:

  • Improved performance during rebuilds
  • Ability to withstand a second disk failure

For disks larger than 2TB, the rebuild time and risk window are high enough to warrant RAID 6.

Spread Workloads Across Multiple Datastores

Rather than oversubscribing a single NFS or VMFS datastore, present storage from multiple LUNs and volumes to balance activity. This prevents hot spots and aligns performance needs with storage capabilities.

Size LUNs Based on Performance Requirements

The right size Storage Area Network (SAN) LUNs are needed to meet the workload’s IOPS needs, not just capacity requirements. Thin provisioning allows space over-allocation as needed.

See also  The Hidden Costs of “Simple” Architectural Patterns

Use Storage DRS and Storage I/O Control

Enable Storage DRS with datastore clusters to automatically load balance activity across LUNs. SIOC shares I/O resources evenly across VMs to prevent resource contention.

Network Optimization

Network transport is critical to performance in vSphere environments. Optimizing network usage and throughput removes bottlenecks impacting applications.

Use 10GbE Networking for Hosts

Use 10GbE to interconnect infrastructure for clusters hosting 10 or more hosts. This provides the I/O capacity necessary to prevent contention as east-west VM traffic grows.

Use NIC Teaming for Availability and Throughput

Configure NIC teams to associate VMkernel adapters and virtual switches to multiple physical NIC ports. This increases bandwidth and upholds connectivity during a NIC failure.

LACP provides dynamic detection and better load balancing across links versus standby and failover teaming policies.

Separate VM Traffic Types

Carve out distinct networks and VLANs for specific traffic types rather than using a single congested VLAN:

  1. VMotion – Live migration traffic
  2. vSAN – Storage cluster traffic
  3. VM Network – Client-server traffic Management Administrative traffic

This prevents key infrastructure traffic from impacting production workloads and vice versa.

Use Jumbo Frames

Enable jumbo frames (MTU = 9000) on physical switches, VMkernel adapters, and VM NICs to reduce packet fragmentation and CPU overhead. Testing shows 30-50% greater throughput for large file transfers and backups.

vSphere Configuration Optimization

Tuning and configuring vSphere for best performance practices is as critical as the underlying infrastructure. Well-configured vSphere removes issues that impact VMs.

Upgrade to the Latest vSphere Version

Upgrade vSphere versions to run the latest hypervisor code optimized for faster speeds. For example, vSphere 6.7 added predictive DRS to load balance VMs proactively before hot spots occur.

Use VMware-Supported Hardware

Only run vSphere on certified hardware listed in the VMware Compatibility Guide. Using unsupported hardware often leads to unexpected crashes, failures, and performance issues.

Install VMware Tools

Install (and upgrade) VMware environments tools on all provisioned VMs. VM tools add drivers necessary for maximum performance and enable important CPU, disk, and network metrics in vCenter.

Disable Unused Hardware and Features

Disable any unused physical hardware in the BIOS, like serial/COM ports, sound cards, and  USB controllers. Also, disable vSphere features that are not utilized, like fault tolerance or vGPU, to reduce resource consumption on hosts.

Virtualize Business Critical Apps First

When possible, virtualize mission-critical workloads first to validate the environment under load. Issues are identified earlier and are to be addressed in the transition phases rather than post-deployment.

Monitoring and Alerting

Proactive monitoring helps identify performance issues before they have a major impact. Alerting brings attention to problems as they emerge so they can be resolved quickly.

See also  When Architecture Needs Rules Vs. Guardrails

Monitor Host Resource Usage

Enable performance graphs in vCenter and set alerts for CPU and memory at 75-80% sustained utilization. This will allow you to add capacity before performance degrades.

Monitor VM Resource Usage

Similarly, enable VM resource monitoring for CPU, memory, disk, and network. Set thresholds based on application profiles – for example, 80% CPU may be normal for batch workloads but impact real-time apps.

Monitor Storage Latency

Storage latency is key for workload performance. Enable graphs for read/write latency on SAN/NAS devices. Set alerts if latency increases 20-50% from baselines to investigate.

Monitor Network Usage

Lastly, the monitoring of physical switch ports connected to the ESXi host is enabled. Alert at 75-80% sustained utilization to add capacity before contention sets in.

Profiling and Troubleshooting Tools

When performance issues do occur, vSphere offers advanced tools to profile workloads and pinpoint problem areas.

Use assistants Profile Storage

vscsiStats logs SCSI commands between VMs and the host to detail storage latency. This helps diagnose if high VM latency originates from a struggling SAN array versus the hypervisor.

Use esxtop to Profile CPU and Memory

esxtop gives real-time visibility into host and VM resource usage for CPU, memory, storage and network activity. Drill into VM statistics to quantify abnormal values.

Use vRealize Operations Manager

vROps features advanced analytics, models, and dashboards that span apps and infrastructure. Out-of-box policies flag performance anomalies across physical, virtual and cloud environments before the outage.

Integrate vROps with vCenter for automatic relationship mapping. Customize policies and alerts based on application profiles and priorities.

Conclusion

Performance tuning for VMware environments encompasses a wide range of areas, from storage and networking to vSphere configuration and monitoring tools. Apply these best practices to ensure your infrastructure keeps pace with expanding workloads.

Modernizing to all-flash storage, 10GbE networking, and the latest vSphere release creates a foundation for scaling. Right-sizing VMs and using reservations and resource pools prevent noisy neighbor issues. Proactive monitoring and profiling give insight into preempting problems before they impact.

With strong fundamentals, VMware environments sustain reliable performance even as new applications and user bases grow exponentially. Efficiency improvements also allow for the future adoption of cloud and container workloads.

Image Credit: Photo by ThisisEngineering; Unsplash

Noah Nguyen is a multi-talented developer who brings a unique perspective to his craft. Initially a creative writing professor, he turned to Dev work for the ability to work remotely. He now lives in Seattle, spending time hiking and drinking craft beer with his fiancee.

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.