
Clustering has been a popular concept in IT design for some time now. The commonly known database and machine clusters typically provide high availability and scalability for two major components of an information system: the data and the machine. If you want to achieve full scalability and high availability within a distributed computing environment, however, you also need to consider a third component for clustering: the application. This is especially crucial in building enterprise-class, distributed, mission-critical systems. (See Figure 1.)
Application clustering is a key architectural design concept that can provide such major capabilities as network connectivity among similar or different application components, location transparency, message switching using queues, very high performance, application management, dynamic load balancing, and high availability. The important difference between application clusters and other types of clusters is that they are easily reconfigurable and relatively inexpensive. You should consider application clustering if you hope to catch the new waves of distributed technology, network computing, and ubiquitous computing, and be successful.
Several technical architects are realizing the paramount importance of these features for their large-scale distributed systems and are in pursuit of products that can come to the rescue. Fortunately, a number of products and technologies are emerging that attempt to offer these features at various levels. Very few products, however, can come close to achieving the desired results. It is more critical than ever that you recognize these required features up front and choose the most comprehensive solution.
In this article, I'll explore the essential architectural design considerations for application clustering to build large-scale client/server applications. I'll also identify some of the major established products and describe a sample scenario of an application cluster using currently available products.
One common approach to supporting large applications has been to buy the largest machine possible with the most CPU. Today, even Unix machines can support mainframe-class applications. This approach may be justified if you want to start simple and centralize the corporate data. Adding additional processors, memory, and disk storage to support additional requirements can increase performance. However, it can also cause a contention for shared resources that increases your chance of diminishing returns on investment. A single server machine also results in one single point of failure.
Clustering is a configuration of a group of autonomous server machines that work together to behave as a single system. With the help of explicit hardware connections, operating systems, and middleware, the system projects a single image to the client program. The two most common types of clusters are machine clusters and database clusters.
Database clusters are built using a combination of software and hardware. The two popular types of database cluster configurations are shared nothing and shared disk. Shared-disk configurations are more popular and are easier to set up. An example of a shared-disk cluster is Oracle's Parallel Server option used in Sun Parallel Database clusters.
Machine clustering involves deploying multiple machines connected on a private network and configured to share critical hardware or system resources. Machine clusters provide near 100 percent reliability and build extremely large applications such as high-volume OLTP or high-end data warehouses. IBM supports some of the largest-known OLTP applications on its mainframe clusters. Pharmacy chain Rite Aid Corp. expects to handle peak volumes in the range of 360 transactions per second. The load is handled by four IBM S/390 mainframes running IBM's DB2 database, CICS TP monitor, and Parallel Sysplex clustering technology. Tandem is a major player supplying NonStop clusters that provide near 100 percent reliability using fault-tolerant components. Microsoft has released Microsoft Cluster Server (also called WolfPack), a software package that vendors such as Compaq are using in Windows NT clusters.
An application cluster is a configuration of homogeneous or heterogeneous systems in which the application components are accessed with the help of a special manager middleware. These "soft clusters" are relatively inexpensive, can be easily reconfigured, and let you make incremental investments to scale up to enterprise-level systems. In essence, an application cluster appears as a single virtual application server.
Application clustering is an essential architectural design criterion that can provide several major capabilities to an information system. First, it provides network connectivity between application components. Network connectivity should support different protocols and work on multiple platforms. Networking details such as establishing connections, packetizing the data, controlling the data flow, and marshalling the data should all be handled in a way that is transparent to the application programmer. You can accomplish this by using special code libraries that get linked with a client program and installing gateway processes to transfer messages across machines.
Application clustering also permits location transparency, which gives you flexibility in distributing the application and the data across different machines. This makes data objects and application functions accessible from any machine. Neither the client program nor the programmer has to know the location of a server object. At run time, you can obtain current information about server objects across all engines by using a form of directory services provided with the help of a manager or broker. The client program consults the directory services to determine a destination server object (process level) based on such criteria as the availability of the server machine and the process, load balancing across multiple instances of server processes, or any other routing criteria.
In a distributed database application, the location of a data repository (or database) can also be made transparent to the client program. The manager middleware can look at the contents of a client request and route it to the server object connected to the right database instance.
Using application clusters, you can also implement message switching using queues. This infrastructure lets different application components send or receive messages using an intermediate queue in a time-independent and connectionless fashion. Queuing helps facilitate a variety of advanced communication methods among components. The queue mechanism lets the client program send a message to a busy server object by placing it in the server queue without having to wait for the server to be available. This forms the foundation for asynchronous or event-based communication, and it can greatly improve application performance. It also helps minimize point-to-point connections between client and server components, which often reduce application scalability considerably.
There are two fundamentally different types of queues: transient and permanent. Transient queues are provided with operating system facilities such as Unix IPC message queues, which offer high performance. The downside to transient queues, however, is that they are not entirely reliable. If you need reliability, you should use permanent queues where the messages are stored on the hard disk, usually under transaction control. The disadvantage in this case is the penalty of transaction coordination and writing to the hard disk. Large applications often require the use of both these types. You can choose to use transient queues for default communication between components for higher performance and use permanent queues in cases where you have exceptions.
Application clustering provides high performance and throughput because it makes optimal use of system and network resources and distributes application and data processing efficiently. When a database application is subject to peak loads, the database engine takes up most of the server cycles, which means little room on the server machine for application processes. Application clusters let you distribute application processing to other server machines. Distribution is further helped by the message-switching, load-balancing, and parallelism features of application clusters.
With application clustering, you can administer and manage the distributed application as a whole. It helps provide the directory services that dynamically find the location of different available server objects. It also provides application configuration, administration, and management. You can integrate the management tasks into the enterprise system management framework, which helps you monitor and manage different application components and related resources on your system.
With their support for heterogeneous systems, application clusters enable application integration. Large corporations typically have a number of applications developed at different times using dissimilar technologies. For example, a large corporation may choose PeopleSoft for HR functions, SAP for financials, and a custom application for customer service. Eventually, all the different implementations must interoperate with one another and with legacy systems. Application clusters can help this multitude of applications interoperate using network connectivity, location transparency, and support for heterogeneous environments.
It is a common experience that management asks the system architects to develop a "one time only" or "throw away" system to solve an immediate problem. It is also a common experience that if the system is well built, its life will extend beyond initial expectations. This means that architects will not have to design throw-away systems or applications if they build them right from the start. Architects can also reuse critical components across applications. Also, as I already mentioned, large corporations often have different systems designed or implemented in a vacuum without any knowledge of other similar efforts brewing (or already built) within the company. Finally, information technology and packages are changing at unprecedented speeds. No matter how good a design looks right now, it can become obsolete very quickly. This makes it difficult for architects to finalize products and technologies for use across all different applications.
Another required feature for application integration that application clusters offer is support for distributed transactions. As part of application integration, transactions are conducted across different databases requiring explicit two-phase commit coordination. This is a critical need that most people don't recognize up front when they are collecting business requirements. In addition, only a few technologies support distributed transactions. Some middleware products (TP monitors) support standard protocols, such as XA from X/Open, which helps coordinate two-phase transactions across XA-compliant distributed databases from different vendors. Examples of XA-compliant databases are Oracle, Informix, and Sybase.
You can instantiate multiple instances of a server object in an application cluster. The number of instances is often determined based on the "importance" of the server, whether it is critical or has a lower processing load. This way you can use machine resources in a way that is proportionate to the importance of different server objects. For example, because queues receive messages, less important servers may be configured to let more messages wait in the queue. This provides a significant advantage over stored procedures, some of which can lock a database server's system resources. If there are any outstanding messages on these queues, the application cluster can balance the load with the help of information from directory resources.
Flexible application partitioning allows easy reconfiguration of application components based on administrative runtime load and business requirements. In general, server objects can be packaged based on their common functionality and deployed closer to their resources (databases or communication gateways). Location transparency and network connectivity features, along with a messaging infrastructure, help provide this flexibility.
As a result of these features, you can also obtain relatively inexpensive high availability in an application cluster. A simple goal you can have is to avoid having a single point of failure in the system, both under normal and peak loads. You can start redundant instances of server objects to handle any process failures. If you start some of the instances on backup machines, the services will be continued beyond machine or network failures.
You can configure the directory services to monitor the system for any failures and correct problems automatically. If the problem cannot be corrected, the monitoring system generates a system event that will be sent to the management console (which is part of the management framework).
Several middleware products have ventured into supporting enterprise-class application development and deployment. Past issues of DBMS have covered them in great detail. For your reference, however, a brief overview of the major middleware technologies follows.
TP monitors are the oldest and most proven technology used in all large OLTP applications. TP monitors are used in more than 95 percent of TPC-C benchmark configurations by all hardware and database vendors (according to the Transaction Processing Council, www.tpc.org) to achieve the best results on open systems. Examples are CICS from IBM, BEA Systems Inc's. Tuxedo (used in 80 percent of the TPC-C benchmarks), IBM Transaction Server (used in the best TPC-C benchmark of 57,032 tpmC, by IBM) and BEA TopEnd (recently acquired from NCR). Later I will discuss an enterprise-class application cluster that was built using BEA Tuxedo.
Object request brokers (ORBs) are an evolving technology that are rapidly capturing people's attention. Some of the popular products using the object broker concept are Orbix from Iona and Visibroker from Visigenics. Oracle uses these same standards in its new Oracle Application Server.
Object transaction monitors/managers (or object transaction servers) are an emerging technology using the popular, easy-to-program, object-oriented paradigm built on top of proven mission-critical deployment frameworks such as TP monitors. TP monitor and ORB vendors are improving their core technologies and converging in this space. Examples include M3 from BEA (code named Project Iceberg), OrbixOTM from Iona, and the Jaguar OTS from Sybase.
Message-oriented middleware (MOM) solutions include IBM MQ Series and BEA MessageQ.
Let's try a sample case scenario and see how current products would fit into the design. (See Figure 2.) Say you need to build a distributed application for a typical large corporation. This new application must help support thousands of users in a reliable and responsive fashion, and it must be integrated to support major corporate divisions of sales, marketing, billing, and customer service. Furthermore, the application's end-user interfaces will be determined by their respective departments. The application must be able to interface with other applications running in legacy environments and new packaged implementations (such as SAP and PeopleSoft). Finally, it must be flexible, scalable, and manageable.
The architecture should be based on a managed multitier client/server model, where a TP monitor (such as BEA Tuxedo) is the manager middleware. The database is centralized and maintained on a database cluster configuration, say, a Sun Parallel Database cluster. The application is coded as Tuxedo services, which are distributed across a number of application server machines to form an application cluster. You can view the services as remote functions (or procedures) that clients can call from any machine in the application domain. Normalization techniques can create modular service objects that improve process utilization and encourage reuse.
This design is primarily service-oriented compared to a data oriented (or object-based) approach. BEA released an object transaction manager called M3 this summer that helps move to CORBA-based development and IIOP-based interoperability with other distributed object systems.
The end-user programs can join the system either as a native client (from one of the application server machines), a workstation client (from Wintel or Unix boxes using the popular GUI packages such as PowerBuilder, Visual Basic, or Oracle Developer 2000), or a Web client (using HTML and Java). This setup gives you several choices for different division users involved so they can use their existing system resources, services, and talent pool instead of being forced to reinvest in new hardware and training. This is especially important when dealing with large numbers of users.
Once the clients join the application domain, they can send requests to services that can be processed by any service object instance on any of the machines in the domain. Information about all transactions, services, servers, and machines is maintained in shared memory as a bulletin board. With the help of the bulletin board, clients and servers are able to provide the functionality of directory services.
The application interfaces with the legacy systems using gateways coded to run on the Systems Network Architecture (SNA) over LU6.2. Duplicate SNA connections and gateways are configured as a hot backup and are also share excess load. If you are running packaged applications, there are also special gateways written to interact with SAP and PeopleSoft.
BEA Tuxedo lets you describe this entire distributed application in a configuration file, which allows easy control and administration. The entire application or parts of it can be booted or shut down using simple commands. Runtime statistics are provided to monitor performance.
The application cluster configuration can benefit our hypothetical company in several ways. First, it can support large numbers of users (in the tens of thousands) and transactions on a network of machines using a shared-nothing cluster design and relatively inexpensive hardware. You can add server machines incrementally as the load increases because the server objects and database can reside on different machines.
The configuration also provides the flexibility to repartition the application dynamically to allow maintenance shutdowns and software and hardware upgrades and improve load balancing. Because multiple instances of services were booted on different machines and the clients had transparent access to all machines, you could shut down individual machines for maintenance without interrupting user productivity or causing any downtime. You also have complete flexibility to mix service objects, gateways (legacy), and data sources. Each working application domain is described in a configuration file with extensive detail in terms of services offered, service priority and load factors, different timers, and so on. With Tuxedo in particular, you get interoperability across different application domains as well. Entirely separate domains can be configured to handle additional users or different types of users.
Finally, the sample scenario lets you integrate with other business applications easily via gateways and crisp interfaces and it lets you introduce new client technologies such as Web clients without having to reprogram service logic. You also get improved service object reusability based on efficient partitioning of service logic.
Because application clusters can interoperate across various technologies and platforms, they let large organizations build application blocks incrementally. An application block is an aggregate of different objects or components combined to fulfill a specific business function. Application blocks communicate with each other in the most efficient manner, using crisp program interfaces. A department can develop (or buy) several application blocks that can communicate with each other and also with other department blocks.
With incremental application block construction, organizations can build a single "virtual" enterprise IT system that spans different departments and extends to other partner companies as well. Furthermore, the heterogeneous capabilities of application clustering let you choose your computer platforms and systems based on what they do best. If it makes sense to buy a database cluster to support a centralized corporate database, go for it. If an NT server is most economical to host a special package, use NT for that package. Application clustering helps you integrate these multiple platforms and disparate environments.
Application clustering can play a very crucial role in the architecture of enterprise-scale distributed applications. The concept of application clustering is also appearing in system design discussions under the title of application servers. The difference between the two is that, whereas application servers offer support for heterogeneous environments to help with application integration, application clusters go well beyond this and into the areas of high throughput and application management.
Before choosing a product to design your application cluster, keep in mind all the features I have described in this article. The key is to use proven products in your design. Be aware that this technology is not new. If a new product has not offered certain important features for more than a year before you plan on using it, look for another product with a longer history.
Success breeds success. All successful applications grow in terms of data, number of users, and transaction load. It pays to apply high-end design principles even to departmental applications. Use a framework or infrastructure that helps your application be flexible, scalable, and manageable. You should also apply prudent judgment to keep the design simple and costs under reasonable control. These are the key ways that application clustering can help you apply high-end design principles and technology to any scale application at a reasonable cost.

Figure 1. The various enterprise components brought together with application clustering.
