THE PROS AND CONS OF SERVER CLUSTERING ON LINUX IN THE DATACENTER


Introduction

As enterprises increasingly leverage the Internet to drive revenue, improve efficiencies and achieve competitive advantage  and the infrastructure needed to operate mission critical Web sites grows ever more complex  IT organizations are confronted with a host of new challenges. The need to rapidly scale data-processing, storage and transaction capabilities to meet the demands of Web-based applications has created a new market segment: the outsourced data center. For example, Application Service Providers (ASPs) offer applications and management through Internet connectivity, thus alleviating the complexity of maintaining these systems within an individual company. Hosting and collocation services provide data-center space and potentially servers to run applications. Even Internet Service Providers (ISPs), traditionally focused on providing connectivity, are expanding their horizons to include offerings like email and financial services. These outsourcers share similar pain points, such as managing growth, ensuring application availability and attracting/retaining systems administrators. While server clustering can alleviate these obstacles and enhance the ASP business model, current solutions fail to provide the reliability, manageability and cost-effectiveness needed in a large-scale, mission-critical environment. Instead, a radically new approach to clustering that integrates hardware, software and networking into a single architecture is required.


Challenges


The Hardware Bottleneck

Application service providers offer multiple benefits to enterprises. Using an ASP means not having to maintain the servers needed to run applications  eliminating the challenge of managing far-flung, mixed systems and the associated expenses for people and space. In fact, the ASP model provides an excellent way to centralize systems management and enables a business to focus expensive IT people on developing value-added applications rather than performing basic administration.


But, theres always a catch  in this case, the same catch faced by insourced enterprises that operate their own data centers. ASPs must grow rapidly to meet the demands of customers, both large and small, looking to offload their corporate application farms. As a result, ASPs must acquire huge amounts of server hardware in much the same way that Storage Service Providers (SSPs) acquire Storage Area Networks (SANs). This explosive growth is occurring while ASP customers are demanding increasingly robust solutions and rapid deployment. Unfortunately, it can take weeks or months to provision, much less stabilize, a new server. This lead time turns the acquisition of hardware into a major step function, whereas demand is actually closer to a curve.


Moreover, ASPs must meet stringent demands for availability: if an application is unavailable, the ASP value proposition fails. Downtime costs money, and while customers may tolerate limited scheduled downtime, random or extensive outages are not acceptable. To that end, service providers often cite how many nines of uptime they deliver. Five nines means 99.999 percent uptime, or approximately five minutes of unplanned downtime per year in a stable environment. These claims can be somewhat irrelevant, however, since customers measure environmental stability not in years, but in days. Planned downtime is even more extensive, and just as costly. Examples include software and hardware upgrades, power or cooling upgrades, and adding or reconfiguring resources. Moreover, business patterns such as follow-the-sun can require configuration changes several times daily. Attempting to mitigate planned downtime through such techniques as dynamic domain partitioning often fails, since this approach is sensitive to factors as simple as a processor-speed upgrade.


The ideal scenario is no downtime, with no reduction in application responsiveness. Indeed, to avoid the woodpecker effect (hammering the return key because nothing is happening) system responsiveness is crucial. Traditionally, ensuring responsiveness meant sizing systems to meet peak demand, thus leaving expensive processing cycles idle most of the time. Sizing to peak also means sizing to burst: since Internet traffic is bursty in nature, demand for application access is not a smooth average yet is itself still somewhat predictable. Unfortunately, the limited flexibility of current server technology for allocating resources means that sizing to peak/burst means over-provisioning the hardware, increasing costs for equipment, space and power.



Solutions


Server Partitioning and Consolidation

One solution is server partitioning, whereby servers are designated to run services based on load rather than assigning one or multiple servers to a single service. An example is the traditional tiered software model comprised of client, middleware and database services. It is not unusual to find the middleware service combined with one side or the other, an approach that reduces flexibility while increasing management complexity. A second approach to partitioning is sharing a large system among multiple business units or companies. However, the mere concept of sharing inspires IT professionals to reach for an aspirin. In a word, the issue is security. Today, securing partitioned applications on a single server is not well understood and, without investing in a secure OS, fairly non-existent.


A more common approach is server consolidation. The server farm, with its racks of disparate systems, begins to resolve the sizing issue but significantly raises the level of pain inflicted on systems administrators. In fact, managing the server farm may be the single greatest challenge faced by ASPs. Since a primary value-add of the ASP approach lies in offloading the management of customer applications, anything that makes system administration more difficult negatively impacts the ASP business model. Moreover, with costs for data-center real estate increasing rapidly, cramming more capacity into less space is thought to be crucial. Yet density alone is not enough. Instead, server consolidation must result in usable density, which not only addresses the space issue but also considers management and application requirements.


The need to manage disparate systems while resolving such issues as application availability, resource requirements and security has led to a plethora of third-party tools. Unfortunately, most tools developed for server farms fail to improve resource management, being focused instead on reporting and crisis-event notification.


Clustering

Clustering holds great promise for addressing the ASP pain points described above. Essentially, a cluster is a group of systems organized to share resources. While the shared resource in a cluster is typically processing capacity, cluster solutions dating back to the Digital VAXCluster were designed to share all resources equally and distribute loads as needed. Clusters should permit the easy addition or removal of a resource, provide a degree of high-speed communication among members, and understand some level of distributed resource ownership.


Clusters fall into three categories  high-performance computing (HPC), high availability (HA) and load balancing  each with specific strengths to address a specific problem. At the same time, they may overlap in functionality. For example, an HPC cluster might offer some level of checkpoint/restart, a high availability-like function. Likewise, an HPC will generally offer tools to direct problem sets to the cluster members best equipped to resolve them, a specialized load-balancing feature.


In the Linux world, HPC clusters are often described as Beowulf clusters, wherein individual systems  not necessarily of the same make and model  are each allocated part of a computational problem that has been broken down into parallel problem sets. Once the individual segments of the computation are completed, a master program assembles them for the final solution. While HPC clusters can be a powerful tool for scientific applications, they do not generally apply to the ASP space.


Of greater relevance are high-availability and load-balancing clusters. While their similarities make it customary to treat these varieties as a single class, fundamental differences in philosophy and uptime requirements warrant examination.


High-availability Clusters

With high-availability clusters, the emphasis is on complete avoidance of unplanned downtime. Thus, HA systems are designed around rigorous standards for redundancy with no single point of failure. The decision to implement HA clusters is generally business-driven; for example, if a brokerage firm isnt processing stock transactions, it isnt making money.


High-availability clusters can be stateful or stateless. In a stateful model, a transaction in progress during a failure is not lost, though it may be rolled back and reissued. In a stateless environment, the transaction may be lost and the user required to reissue the request. For instance, during an e-commerce transaction, a failed link involved in issuing an order may require user resubmission if the system fails. However, the exchange operation (e.g. credit card processing) cannot be lost or issued more than once. This combination most often compels ASPs to implement the more rigorous stateful approach, which is best achieved by combining a stateless hardware architecture with application software designed for statefulness.


Load-balancing Clusters

Load balancing  the process of sharing a workload among multiple distributed systems  is often found in environments where availability is crucial yet a change in the makeup of the system handling a request can be tolerated. Returning to our transaction example, the first step is connecting to the ASPs Web server. If the server is unavailable, or too busy to handle requests as they arrive, the site is essentially down. Alternatively, if the load had been distributed across multiple systems, the loss of a single system would not impact access though it may affect performance.


In most approaches to load balancing, a centralized server distributes independent services across a group of systems assigned to a function. This centralized server may itself be a process shared among the systems. While this sounds simple, the hardware infrastructure and necessary load-balancing algorithms make this a complex puzzle. Algorithms may be round-robin (each in turn) as shown in Figure 3, capability-based (more powerful servers receive more requests), capacity-based (the least-busy server receives the next request) or specifics-based (the best-suited server receives the request). In practice, a load balancer may use several of these techniques in tandem, or be totally rules-driven.


Clustering Tools

As clustering models and requirements have grown more complex, tools have been introduced to mitigate the complexity. With the accelerating acceptance of Linux and open source software by the ASP community, service providers should evaluate the various cluster-capable software and hardware systems against their specific needs.


Open source cluster products are built (or derived from proprietary solutions) to let users modify the source code. These products run the gamut from HPC to high availability. As the demand for Linux systems in the enterprise has continued to grow, commercial clustering packages have also appeared. In some cases, these commercial offerings are derived from portions of open source code; in others, the code is developed from scratch or is a port of proprietary code. The tradeoffs between commercial, closed codes and open source codes revolve around support, viability, responsiveness and platform availability. However, these lines have become blurred in that most open source programs have commercially available support while retaining their roots.


Of course, as complexity arises, tools arise to reduce the complexity. Today, there are many cluster capable software and hardware systems. Yet, with the accelerating acceptance of Linux and open source software into the ASP space, it is worth considering the products that offer clustering solutions as well as how they apply to the data center needs of ASPs.


Open source cluster products are built (or derived from previous proprietary products) to allow users to receive and modify the source code. These products run the gamut from HPC to high-availability. Please note that this is not meant as a comprehensive list; rather it is designed to overview popular and successful packages.


One of the best-known clustering packages is MOSIXi, a cluster set for Linux which allows a group of systems to operate as if a single system. Note: this is not an SSI (single system image). MOSIX dynamically moves sequential and parallelized jobs to free nodes. While MOSIX is often discussed in HPC applications, it is an excellent fit for scalable web farms as well.


Mission Critical Linuxsii Kimberlite" is a high availability offering targeted to a small cluster of redundant services. In version 1.2, Convolo connects 2 systems, and drives the necessary components to recognize the status of each system; allowing failover and restart of applications. Not truly fault-tolerant, Convolo requires that applications restart on or reconnect to the surviving cluster member.


SGI (Silicon Graphics)iii has released their IRIS FailSafe" product into open source. Working with SuSE to deliver this shared everything cluster for Linux, FailSafe offers 16 node scaling, shared data storage, and a decent GUI. However, FailSafe does not offer load balancing capabilities, and while it has acceptance in the SGI community, it is unclear if it can make the transition to Linux.


An interesting project, well worth watching for those technically inclined, is the Linux Virtual Server Projectiv. Using a Linux-based load balancer, LVSP is targeted at making multiple Linux systems into a single load balanced HA system, widely distributed on networks. LVSP is a work-in-progress, but the site itself offers insights into the clustered server problem.


Loosely related is the Linux-HAv project. Targeted at developing the tools necessary for high availability solution, Linux-HA is best known for HeartBeat, which provides, UDP, and PPP/UDP heartbeats together with IP address takeover resource groups. Linux-HA also has perhaps the best overview of the file system structures necessary to provide high availability.


Of course, as the demand for Linux systems in the enterprise has continued to grow, commercial packages for providing cluster have also appeared. In some cases, the commercial offering is derived from parts of open source code; in others, the code is developed from scratch (or a port of proprietary code). The tradeoff between commercial closed codes and open source codes are the usual arguments; support, viability, responsiveness, platform availability. These lines blur heavily in that most open source programs have commercially available support, while retaining their open source roots.


PolyServevi offers two products: UnderStudyvii and LocalClusterviii. UnderStudy offers a relatively simple web failover model, based on IP. Thus if a node fails to respond, UnderStudy removes it from the cluster, and redirects to the next server. While it offers load balancing, it is a simple DNS round-robin model. UnderStudy will re-integrate a recovering node back into the cluster.


PolyServes more upscale LocalCluster offers 2 to 10 nodes in each cluster, providing higher end services for IP or web failover capability. LocalCluster also provides a data replication service, which is useful in synchronizing all servers to the same data. LocalCluster still only provides a DNS round-robin for load balancing.


Steeleyes Lifekeeper for Linux ixis targeted at fail-over applications. With its history and offerings for Unix and for Windows, Lifekeeper is stable, has excellent fail-over capabilities, and supports active failover. Unfortunately, Lifekeeper itself doesnt offer load balancing, and on Linux is limited to 4 nodes.



Product

Node limits

Failover capable

Load Balancing

Management Software

Shared Storage

Mosix

N/A

No

Yes

Transparent

No/NFS

Kimberlite

2

Yes

Yes

Yes

Yes/SCSI/SAN

FailSafe

16

Yes

No

Yes

Yes/SCSI/SAN

LVSP

22/100

No

Yes

Yes

No/NFS

LocalCluster

10

Yes

Yes

Yes

No/NFS

UnderStudy

N/A

Yes/IP only

Yes

Yes

No/NFS

Lifekeeper

4

Yes

No

Yes

Yes/SCSI/SAN


Table 1. Linux Clustering Products (Blue indicates commercial product)



Choosing a Cluster Solution

In choosing a clustering solution, some basic questions should be asked. First, how large a cluster is required? What is the setup process? Is load balancing or high availability required for a specific application? How will the cluster be monitored and tuned? What impact will clustering have on the business model?


Without a doubt, clustering offers several advantages to service providers. By providing failover, clusters enable ASPs to deliver virtually uninterrupted service while managing the cluster as a single entity. Moreover, clustering addresses the issues surrounding infrastructure growth and flexibility by enabling ASPs to add/remove nodes (within the clusters limits) in response to bursts and other fluctuations in activity level, although this remains a manual process.


On the other hand, while clustering packages address some pain points, these products currently ignore a host of physical and management issues.


Physical constraints are quickly becoming the bane of the data center, whether on-premises or outsourced. As the number of systems steadily increases, IT operations are constrained by the sheer quantity of computers being wedged into increasingly limited space. Moreover, the physical cabling of these systems is becoming a seemingly insurmountable problem. A rack of 42 1U servers with redundant power, dual Ethernets and serial connections for management entails more than 120 cables out the back of a single cabinet. True redundancy would require separate storage connections as well. Now& imagine a single cable failure. Todays clustering solutions assume that each system is a separate box, whether they utilize all the functionality of that box or not.


The issues surrounding physical provisioning are not resolved merely by establishing a cluster. While it may be easier to add a node to a cluster than to provision an entirely new, larger server to accommodate an increasing workload, the ASP still has the daunting task of ordering the incremental machine; scheduling power, networking and system-management personnel to set it up; and stabilizing the environment. While clustering allows data centers to more readily apply computing power where its needed, todays clustering products do not migrate power to meet geographical concerns or application-specific needs, nor do they support secure virtual partitioning into dedicated machines. Its still a one-type-fits-all model.


These limitations manifest themselves in two ways. First, while clusters scale upward as servers are added, control over clustered resources may not scale as easily. Sharing disks, networks and even printers is an exercise best left unconsidered. In fact, most clusters have an upper node limit, particularly high-availability systems wherein each machine must be aware of all others. Second, while clusters may scale as a whole, they lack the management flexibility to be partitioned into virtual servers with secure resources. This stumbling block is linked to a lack of trust, inasmuch as business units prefer not to share resources.


Finally, todays clusters do not offer adequate management flexibility to meet the needs of current or predicted usage. Consider a global, ASP-based e-commerce application. Each geographic region has unique needs related to data and will experience peak activities that require sizing to burst levels, generally while other regions are not utilizing their compute resources. A cluster solution should automatically reconfigure to meet the burst needs of each region using either a time schedule or a rules-based configuration.


The Ideal Cluster Solution

As noted, todays solutions lack the features to enable simple, seamless cluster implementation. Instead, whats needed is an integrated system of hardware, software and networking that combines simplified deployment and provisioning, rapid reconfiguration, N+1 cluster failover, cable consolidation and storage accessibility.


First, an ideal cluster system would provide hot-swappable processing components, in a tight form factor, with easy removal and replacement. The components would be as simple as possible to reduce the number of failure points and hence unplanned downtime. A processor failure or replacement would not affect connectivity or cause an operating failure in any other component. Further, a single failure in any single component would not change the operating status of the processors.


To simplify connectivity, this system of independent processors would communicate over a redundant, active-active, internal network at very high speeds. Again, a single failure could not bring down the system. Exterior connections would be dramatically simplified, with consolidated master cables for each major subsystem including redundant power, storage and Ethernet.


The management system would allow for secure, logical partitions to protect individual ASP customers. In this way, the owner of any one partition would be completely unaware of other partitions on the system. At the same time, the resources of a given partition could be dynamically allocated and reallocated among the clusters on that partition as needed, either by operator intervention or rules-based activation.


Storage resources would be separated from computing activities for redundancy as well as failover. Storage would be allocated dynamically as in a SAN, with similar security among partitions.


Essentially, solving the cluster problem in an ASP environment requires the processing equivalent of a Storage Area Network. The ideal solution would segment computing functionality to its base set, then reintegrate it to form a dynamic, N+1 failover cluster. This Processing Area Network, or PAN, consolidates, centralizes and simplifies the management of processing capacity just as a SAN simplifies storage. Once in place, a PAN allows incremental processing capability to be added without disrupting the activity underway.


At the service-provider level, combining physical resource consolidation with logical resource separation enables an ASP to dynamically add and subtract processing capacity in response to changing market conditions. Likewise, by separating processing, storage and connectivity, a PAN enables any component to be failed over to a similar component, lowering ASP costs for failover systems. Finally, since the PAN retains a centralized view of resources, it allows service providers to reduce both systems-administration and maintenance workloads.


Summary

Todays clustering software offers advantages for ASPs facing rapid growth, availability requirements and system-administration overload. A clustering environment scaled to meet burst conditions can mitigate concerns about reliability and performance. Solutions run the gamut from minimal functionality to advanced capabilities across both commercial and open source solutions. However, while todays clustering alternatives reduce some ASP pain points, they do not completely resolve service-provider issues since they lack the features needed to provide true flexibility and scalability. A new approach to clustering, the Processing Area Network, is needed to resolve all problem areas.


Copyright © 2001 Dave McAllister. All rights reserved. Egenera and Egenera stylized logos are trademarks of Egenera, Inc. All other company and product names are trademarks or registered trademarks of their respective holders. The information in this document is subject to change without notice.


End notes: URLS for mentioned products and projects

Mosix http://www.mosix.org


Convolo http://www.missioncriticallinux.com/products/convolo


FailSafe http://oss.sgi.com/projects/failsafe/


Linux Virtual Server Project http://www.lvsp.org/


Linux High Availability Project http://www.linux-ha.org


PolyServe http://www.polyserve.com


PolyServe UnderStudy http://www.polyserve.com/prod_overview_us.html

PolyServe LocalCluster http://www.polyserve.com/lcenterprise/index.html


Steeleye Lifekeeper http://www.steeleye.com


iEnd notes: URLS for mentioned products and projects

Mosix http://www.mosix.org


iiConvolo http://www.missioncriticallinux.com/products/convolo


iiiFailSafe http://oss.sgi.com/projects/failsafe/


ivLinux Virtual Server Project http://www.lvsp.org/


vLinux High Availability Project http://www.linux-ha.org


viPolyServe http://www.polyserve.com


viiPolyServe UnderStudy http://www.polyserve.com/prod_overview_us.html

viiiPolyServe LocalCluster http://www.polyserve.com/lcenterprise/index.html


ixSteeleye Lifekeeper http://www.steeleye.com

10