DBMS

Packaged Apps on Scalable Systems. By Kevin Jernigan. The increasing demands of data warehousing, data mining, e-commerce, and Web access are making packaged application scalability a critical requirement.
DBMS, September 1998

The computer software industry is in the midst of a transition from providing database software and tools, which require end users to build their own applications, to providing prebuilt packaged applications that solve a class of problems, such as managing financial accounting, human resources, or manufacturing. Several software companies have gained prominence in the past few years by capitalizing on this market, including SAP, PeopleSoft, Baan, and Oracle. These companies, with the exception of Oracle, are solely focused on building application software that can provide a better solution to their customers' needs than do the in-house applications they currently use.

The packaged applications that these vendors build include enterprise resource planning applications, customer interaction applications, human resource management systems, and supply chain management applications. Packaged applications typically depend on database software to support their data storage and access requirements. They also need it to provide a reliable and high-performance transaction system to guarantee the integrity and durability of the transactions that form the core of most packaged applications. Scalability

You can generally define scalability as the ability of a system to grow in one or more dimensions. In the packaged application arena, these dimensions include the amount of data accessible to the application, the number of concurrent users supported, the number of transactions that can be processed in a given unit of time, and the breadth of functionality that the application encompasses.

In the early days of packaged applications, scalability wasn't a high priority because the systems were generally being implemented in smaller environments. In the background, the packaged application vendors worked on beefing up their applications to handle larger and more complex environments. Most of the scalability demands were handled by the database software that the packaged applications were built on top of, such as Oracle, Informix, Sybase, or IBM DB2. The business environment has changed, however, and several factors are increasing the demands on both the database software and the packaged applications themselves.

Data Warehouses

The growth in popularity of packaged applications has roughly coincided with the growth in data warehouse implementations, especially in larger companies. As packaged applications become more mature and are implemented in larger-scale environments, the need to integrate them with existing or planned data warehouses is growing. Data warehouses take data feeds from operational systems, and most packaged applications are operational systems, so they must be integrated to provide the right data at the right time to the data warehouse.

Packaged application systems must support more transactions than they ever did before. This increase in transaction volume can sometimes be scheduled during off-peak hours, but such a procedure usually conflicts with preexisting off-peak batch processing. Therefore, the entire system supporting the packaged applications in question must have built-in scalability right from the start to guarantee that data warehouse integration won't cripple its performance.

The typical usage pattern of a new data warehouse also requires that scalability be built into the packaged applications that are feeding data to the warehouse. Most data warehouses grow faster than planned, with more data and more users being added to the system sooner than expected. This increases demand on the operational systems on which the data warehouse depends. If scalability is built into the operational systems from the start, you'll be able to manage these unexpected increases in demand.

Data Mining

A more recent trend than data warehousing is the implementation of data mining applications on top of data warehouses. Data mining applications often show the data warehouse users that they need more data in their warehouse. These new requirements usually mean that more data must be extracted from the operational systems that feed the data warehouse, resulting in increased workloads on the operational systems. As is the case with data warehousing, the packaged applications that make up the operational systems in question must be built with scalability in mind so they can accommodate unexpected increases in demand quickly and flexibly.

Internet Access

Another business driver increasing the scalability requirements for packaged applications is Internet access. More and more companies want to open up their internal applications to more of their employees and even to their customers. They are using the Internet to do this by providing Web browser interfaces to parts of their packaged applications.

Opening up access to packaged applications by providing a Web interface, even if it's confined to a company's Intranet, usually results in higher numbers of concurrent users and transactions per second that the applications must support. In addition, the new user population usually makes requests for new functionality that may not have been planned for in the original implementation. All of these factors drive home the need for scalability to be included in all parts of the packaged application.

A corollary to the increase in Internet access is the push to provide e-commerce access to corporate systems. The belief is that e-commerce will reduce costs and increase responsiveness, improving the overall efficiency of all those participating in the e-commerce implementation. Some of the major packaged application vendors have already started adding e-commerce and Internet-related hooks to their products, including PeopleSoft with its Interprise and Universal Application initiatives and Oracle with its Network Computing Architecture (NCA).

One direct result of e-commerce implementations is that the packaged application at the core of each company's system must be able to support increased transaction response times. The total number of transactions may not increase, but outside parties - whether they are partners or customers - are now directly connected to the core packaged applications and expect a timely response. Without e-commerce, the transaction is more likely to be performed with invoices through the mail or via facsimile, and response times in terms of seconds aren't nearly as important.

Of course, the goal of e-commerce implementations and Internet access in general is to increase the number of transactions a company can process and, thus, the amount of business it can do. Therefore, both e-commerce and Internet initiatives generally have a major impact on the packaged application's system resources, making scalability extremely important.

Changing Business Environments

The pace of change is accelerating both in terms of technology and business environments. The economic pressures in many industries are forcing smaller companies to merge to achieve enough critical mass to be effective on a global scale. Larger companies are trying to solidify or increase their foothold on the world stage by acquiring (relatively) smaller companies. These changes are creating enormous pressures on the resulting companies to implement applications that can grow quickly when new user communities are added, when new data is brought in, or when new transaction loads are enabled. Packaged application vendors are starting to address these needs, but there are a number of specific technical and cultural challenges they must overcome, which I will discuss next.

Implementing Scalability

True scalability cannot be an afterthought. It is impossible to add a "scalability module" to an application that is inherently nonscalable. All components of the application must be tightly integrated, from the computer hardware and operating system to the database software, application layer, and interfaces with outside systems - including data warehouse feeds. Both from a vendor and end-user standpoint, scalability must be planned from the start.

Scaling packaged applications is a challenge due in part to packaged applications' implementation history. Most were originally developed without scalability as the primary design goal. The database system was seen as a black box, and people assumed that any access to it would be handled "appropriately" - meaning that if there was a performance problem, it was the database software's fault.

As the scope and popularity of packaged applications have grown, this assumption has not proven to be valid. The need for much tighter integration among packaged applications and the DBMSs that support them has become more apparent as data volumes, transaction volumes, and user communities have grown. For example, packaged application vendors have found it necessary to put extra effort into tuning particularly egregious SQL statements and denormalizing certain parts of the data model.

A more general but equally major challenge for packaged application vendors is that of achieving and maintaining tight integration with the underlying database software. Each of the major DBMSs that most people use to support their packaged applications (including Oracle, Informix, Sybase, and DB2) has open APIs to let application vendors implement portable applications. Unfortunately, each DBMS also has additional features that are not portable but that must be used if you want to achieve the highest possible performance and scalability.

Packaged application vendors are working with the DBMS vendors to ensure that their applications can exploit the special features available in each of the systems. (Nobody wants to create artificial bottlenecks by conforming too tightly to open standards.) Meanwhile, the DBMS vendors are constantly improving and upgrading their database software, adding new features and disabling obsolete ones as they go - making the packaged application vendors' task that much more challenging.

Scalability Bottlenecks

There are a number of relatively straightforward scalability bottlenecks to be aware of and understand how to detect and eliminate. A good example is to check how the application generates unique ID numbers, for example, to represent customer IDs. The portable, nonscalable way to generate a unique ID number is to use a row in a table to store the next unique value. When a transaction needs to get the next unique ID, it simply selects the value from the appropriate row and updates the value so that the next transaction will see the next highest value in the sequence. To ensure that other similar transactions don't get the same values, the table is locked while the updating transaction is performing its work. Unfortunately, this implementation doesn't scale well. When many transactions are being executed concurrently, they will serialize behind the bottleneck that the lock on the unique ID table created. This limits the system throughput to the number of updates that can be performed serially.

Each of the major DBMS vendors has a nonportable solution to this problem. One example is Oracle, which uses a special data object called a SEQUENCE to get around the bottleneck. The Oracle SEQUENCE object allows transactions to get unique ID numbers without holding a lock on a table for the duration of the transaction. Many more transactions can run concurrently using this mechanism.

Meanwhile Informix uses a special data type called SERIAL. If a column in a table is defined as being SERIAL, then a unique ID number is generated every time a row is inserted into the table. As with Oracle's SEQUENCE object, this avoids the serialization bottleneck inherent in the typical implementation of unique ID numbers. Other DBMSs have similar mechanisms, but each of them is slightly different.

Many mistakes are based on an application developer's lack of understanding of the underlying processing models that the DBMSs use. For example, Oracle's DBMS has historically been a process-based implementation, with a separate operating system process invoked on the server to support each client connection. On the other hand, Informix's DBMS has historically been a threads-based implementation, with a separate thread invoked inside a preexisting server process on the server to support each client connection. Each model has advantages and disadvantages, and both Oracle and Informix have added functionality to let them support one another's models. However, both the process and threads models incur an extremely high overhead for user communities of more than 200 people - especially in terms of memory and CPU usage - and alternatives must be found.

With a deeper understanding of the underlying DBMS, you can also avoid common serialization points such as table locks, concurrent inserts into the same part of the database, and inappropriate isolation levels for transactions that are reading data vs. transactions that are modifying data. In addition, the application itself may have unnecessary serialization points within it, such as using a single database table to handle the coordination of batch jobs.

Other issues affected by scalability concerns include load balancing and backup and recovery. For distributed systems, it's especially important that the transaction load is evenly distributed across all the processing resources available to keep any one resource from becoming a bottleneck on the overall throughput of the application. For backup and recovery, it is vital that the operation of backing up the database and other data on the system has a minimal impact on the availability and throughput of the system and that the recovery time in the event of failure is also minimized.

Scalability Solutions

Luckily, there are solutions to most of the scalability challenges and issues I've just described. Application and database vendors must implement some of them, but you can do much of the work as well. As I already mentioned, one obvious area for application vendors to work on is tighter integration with each of the database vendors' systems. They must continue to take advantage of the nonstandard features that each of the database vendors has seen fit to implement and work with the database vendors to influence the enhancements they implement in the future.

A more specific area for vendors to work on is integration with transaction processing (TP) monitors and with middleware in general, which helps in load balancing and recovery. TP monitors help reduce recovery time by balancing transaction loads across multiple distributed resources and maintaining transaction integrity across partial failures. Other middleware that is becoming available has features that allow for abstraction of transactions at the business level using object-oriented approaches, which promises a more scalable application modification timeline. The middleware vendors leading this area include BEA Systems Inc. and CrossWorlds Software Inc.

Rather than assuming that each transaction has full access to the entire database and all system resources, packaged application developers need to learn how to share pools of resources creatively. For example, rather than locking an entire table to update a single row in the table, simply lock the row itself. A more aggressive example would be to use shared memory for temporary tables. In general, application developers can't afford to treat DBMSs like "black boxes" and simply assume that all SQL statements will be processed with subsecond response times. They must have more intimate knowledge of how the DBMSs really work if they want to maximize the performance and scalability of the applications they are implementing.

End Users' Work

The partitioning of packaged application workloads can be a cause for major performance bottlenecks. For example, if transactions that must share access to a common resource are scheduled on separate systems in a distributed environment, the overhead of passing the shared resource between the separate systems will quickly kill the application's performance. The obvious solution to this problem is to group together all transactions that share access to a common set of resources all on the same system or set of systems.

But what if the system that is supporting the shared resources and the transactions accessing those resources has reached maximum capacity? It is usually possible to partition the workload among separate systems based on a partitioning key, such as geographical region. In this scenario, all transactions that refer to the Northeast, for example, would be handled by the Northeast system, while the transactions for the Southeast would be handled by the Southeast system. In general, it is possible to partition work either by partitioning the data that the transactions will work on using a partitioning key or by copying the data to multiple systems and periodically resynchronizing the copies to keep them up to date.

Workload Scheduling

The better you schedule predictable workloads, the better your overall performance will be. For example, many reports don't need to be processed immediately and could be better suited to be run as part of overnight batch processing. Providing software facilities, training, and help-desk support to encourage users to schedule their work properly will improve the system's daytime performance by offloading less-critical processing to off-peak hours. Most packaged applications provide facilities to support some form of batch processing; it's generally a matter of providing the internal, "cultural" support for the users.

Focus and Discipline

One of the most important factors in implementing a successful and scalable packaged application is creating a focused, disciplined architecture and implementation team. Ideally, this team consists of people from the user community, the IT group, the support group, the help desk, and the sponsors of the application. Bringing the team together by first focusing on the business problems that need to be solved is critical. Once there is a common understanding of the focus, you can begin the architecture planning and subsequent implementation. If the business problems are well defined, scalability will be a natural part of the architecture. Following through with a process that takes into account the current and future needs of the business is essential.

Scalability is the Key

The advent of packaged applications based on open databases is accelerating the move toward open systems, with their emphasis on "open" database platforms. The driving forces behind this move - such as data warehousing, data mining, the Internet, and e-commerce - are making scalability a requirement for packaged applications. Pay close attention to all aspects of a packaged application implementation to guarantee that scalability is built in from the start, both in the software licensed from vendors and in the architecture and implementation that you create. Focus on scalability and your large packaged application implementation will be a success.


Kevin Jernigan is the president of Emergent Corp., a professional services firm that helps people design and deploy high-impact information systems that deliver high levels of scalability and adaptability. You can e-mail Kevin at kerniga@emergent.com.


What did you think of this article? Send a letter to the editor.


Subscribe to DBMS -- It's free for qualified readers in the United States
September 1998 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS (http://www.dbmsmag.com)
Copyright © 1998 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated August 7, 1998