Taming exponential application growth by building scalable-enabled solutions.
When it comes to information technology, mother doesn't always know best. It's unlikely that she, or anyone else for that matter, would have predicted that it would be commonplace for applications to grow continually at exponential rates. Looking at the current state of the industry, however, it's clear that strategic applications are rapidly scaling up the amount of data they manage, the number of users they support, and the types of functionality they include. And this trend doesn't appear to be slowing down. For those who are unprepared, rapid application growth represents a wild beast that will eventually wreak havoc on internal systems. However, those who truly understand the nature of the beast can manage this growth and use it strategically.
Handling such growth successfully requires expanding the way you think about classifying applications. Traditional classifications were limited to describing the general functionality of the application (OLTP, Data Warehousing, OLAP, and so on). However, a new dimension of classification is taking the IT industry by storm. This dimension is based not on the specific functionality of the particular application, but rather on whether or not this application's functionality can grow and adapt as an organization's needs increase and change. If an application (regardless of its traditional classification) can achieve these goals, it is called "scalable-enabled" or simply "scalable." If it can't, it's deemed "unscalable." You can have scalable or unscalable OLTP applications, scalable or unscalable data warehousing applications, and so on.
Why are corporations so interested in building scalable applications? As the need to collect and access more information increases, applications must sustain unprecedented levels of growth and change. For example, a typical data warehouse will more than double the amount of data it manages within its first year of use. If you want to address these new growth requirements (and reap the benefits of doing so), you must build scalable-enabled applications. Traditional approaches to application design and implementation haven't been sufficient for building scalable applications. Although the combination of traditional technologies and traditional application design techniques can yield robust applications, they were not designed to deliver scalability. Building truly scalable applications requires a combination of two critical components: scalable underlying technologies and scalable application design techniques. Organizations must be savvy in both areas and know how to integrate them.
One type of parallel hardware is the symmetric multiprocessor (SMP), which incorporates a few CPUs into a single computer. For some applications, this is still not enough power, so vendors have created ways to tie multiple SMP machines together into a "cluster" architecture. Yet even clusters aren't able to handle certain classes of applications, so hardware vendors have developed massively parallel processors (MPPs), which allow hundreds of processors to be used together in a single computer.
Software vendors have also recognized the advantages of scalable solutions, and all major database server vendors have delivered parallel versions of their database software to take advantage of scalable hardware architectures. When combined with such hardware, users have all of the technological components they need to build scalable applications. If developed correctly, solutions that leverage these technologies can address the needs of high growth and rapid adaptability.
Fundamentally, the philosophy behind creating scalable solutions is simple: If you want to build a solution that can grow incrementally, you must think incrementally, design incrementally, and implement incrementally.
But what happens when an application (database) transaction and data volume grows by orders of magnitude in a very short time? For example, a highly publicized new state lottery could sell millions of tickets the first week, or sales could double for the first several weeks until they plateau. The real answer to this question lies in whether or not you expect this initial hypergrowth. If you do, you must build your initial system to handle the workload that is expected after the initial hypergrowth stage is over. For example, you'd build the first iteration to be able to handle a few million hits per day. Then, subsequent iterations are more incremental as the hypergrowth settles down. This is the only way to handle the problem. If you didn't expect the hypergrowth but it happens anyway, you'll be in some serious trouble because you can't add and test additional computing resources very quickly.
In contrast, conventional application design focuses on building large, monolithic applications. Given the monolithic nature of many of the uniprocessor hardware platforms, a monolithic development approach is often optimally suited to those platforms. However, applications designed in this fashion are poorly suited to the scalable systems environment. In the scalable environment, the platform is composed of multiple smaller resources, all pooled together. A monolithic application cannot be divided easily into smaller components in order to be spread across these multiple resources, and therefore it cannot take advantage of the scalable platform. In concrete terms, if a system can have no more than a single CPU, then it is often optimal to write the majority of the application as one large piece of code, so that the hardware will not have to waste resources switching among various components of the application. However, on a scalable platform with multiple CPUs, this single monolithic application can only run on one processor at a time. To take advantage of the multiple processors, the application must instead be written as a collection of smaller components, representing individual functions or different partitions of data, that can be added incrementally and that can then be spread across the multiple processors. (Note: Parallel database vendors follow this philosophy in their approach to executing SQL statements. For example, rather than executing an entire query on one processor, the database server divides the query processing into a number of subtasks and runs these subtasks on different processors.)
This information is gathered mostly via a process of interviews with all of the major constituents of the application (including management, IT staff, and end users), as well as through an audit of the existing environment. During this phase, the traditional approach is to focus on the current needs of the organization and to ask such questions as "what are your biggest challenges" and "how many users must be supported." To design for scalability, however, you must expand the scope to include brainstorming questions that focus on the future. For example, additional questions must identify how this application might be used in two or three years, as well as what types of functional enhancements or changes might need to be made to support these future uses. Also, examine what new types of data might be useful to add to the application in the future to augment the data that will initially be included. One of the major deliverables from this phase is therefore not just the traditional detailed description of the functionality of the application in its Version 1 release, but also an outline of how the application may adapt over time. Explicitly documenting potential changes will help enforce a flexible design that will scale up the functionality of the application.
However crucial the functional specification might be, Scalable Program Management puts equal weight on the performance specification. Various performance criteria must be identified, and these will become the basis for the Performance Assurance process -- the process that is critical throughout the life of a scalable solution. Performance Assurance is a set of application-specific metrics and tests that ensure that any incremental change to the application will only be released to the end users once it has been determined that the application still meets the performance metric requirements. Some sample metrics that need to be specified are listed in Table 1.
There is one catch here, however. If the application is new, there is often very little realistic data on which to base the performance requirements. Can you really predict what a query or transaction workload will look like for an application that has never existed? How can you really know what usage patterns will be? You can't. Therefore, you are often left in the unenviable position of having to define requirements based on nothing more than educated guesses. Because of this guesswork, it is critical to gather feedback from users of the first iteration of the application. This feedback is used to refine the performance specifications that will be used in subsequent incremental development iterations.
The question of how all this information should be documented also comes into play here. What structure should be used to ensure that all the information is complete and understandable? From experience, I must say that as long as some documentation structure is agreed upon, then it doesn't much matter what that particular structure is. Organizations should be encouraged to use whatever documentation guidelines they have used historically. If they have never had guidelines, then they should choose whichever of the numerous existing guidelines best suits their work style.
Once the metrics are specified and documented, the next step is to build the Performance Assurance Test Environment. This environment consists of two components. First, a benchmark suite must be written to test the various metrics that I just specified. In many cases, if you have existing benchmarks from previous application-implementation projects, this suite does not need to be written from scratch -- the existing tests can be modified to suit the needs of the Performance Assurance Test Environment. Regardless, the benchmarks must be written to run against any scale of the application, because they will be used and reused throughout the application's life cycle. Second, system statistics must be collected for CPU utilization, CPU load balance, I/O utilization, I/O load balance, and system bus utilization. The test environment will be used not only to ensure that the application performs within specification, but also to identify actual or potential bottlenecks in the system design.
The Performance Assurance metrics and associated test environment form another critical deliverable of the Business Discovery Analysis phase. Scalable applications will be continually growing and organically changing, so it is crucial to have a Performance Assurance process to ensure that any performance problems will be caught before the next incremental iteration of an application is released to the end users.
The sole criterion on which to base the selection of the hardware and database software layers is the "fit for purpose" test: Can the combination of hardware and software meet the requirements of the application? It no longer makes sense (and it's not clear that it ever did) to choose the hardware and software first and then design the application around those decisions. With scalable solutions, the application's needs must drive the selection process. Determining the appropriate hardware and database software components is greatly simplified by the Performance Assurance process, because it ensures that you already have two critical tools at your disposal. First, you've already determined what the requirements are, and second, you've devised tests to see if those requirements are being met. With these tools in hand, you must then carefully design your testing database, ensuring that it takes full advantage of the scalable capabilities of both the hardware and database software. This is no trivial task, but once the design is done, running the performance assurance tests on various hardware and software platforms can quickly identify which combinations are fit for which purpose.
Designing the application layer is more complicated. With the hardware and database software layers, the vendors have (ideally) taken care of making those layers scalable-enabled. However, with the application layer, scalability is your responsibility. The application designer should be primarily concerned with two issues. First, for reasons mentioned earlier, if an application is to be scalable, it must be designed as a set of functional components rather than as a single monolithic application. These functional components will then usually exist as their own processes on the system. It may help to think of designing a scalable application as a set of "applets" or object-oriented "objects" that interact by routing requests and results between themselves. For example, in a retail environment, components may include an order entry component, an inventory management component, and an order shipment tracking component. Once you've designed the application as a collection of interacting components, it's very easy to add new components to scale up the system's functionality. The ability to add new components easily is what makes truly scalable applications so adaptable.
But be careful here: As with most things, excess is dangerous. Keeping an application monolithic limits scalability, but so will dividing an application into too many components -- if the components are too narrowly focused to do anything useful without constantly requiring communication with other components, then the excessive communications overhead becomes a bottleneck. A balance must be found.
As a second issue, the application designer must focus on designing systems that will not have any inherent bottlenecks (and must know how to exorcise bottlenecks that have crept into the system). The key element to look for is shared resources. Whether it's a shared data element, a set of disk drives, or a routine that hands out tokens as a coordination mechanism, if the shared resource has a fixed bandwidth, it can become a bottleneck. The trick is to redefine the resource in such a way that its bandwidth can scale up. There are three common ways of doing this.
First, the resource can be "replicated." Many applications require daily downloads of a significant amount of data from a centralized server. If the number of downloads continually grows, the server's bandwidth will eventually be exceeded. At that point, the straightforward solution is to replicate the data and put it on a second server, thereby cutting each server's workload in half. However, if the applications were written assuming that only one server would exist, this scheme wouldn't work without modifying the applications. To be scalable, the applications must be written from the start to assume that the servers might be replicated, and to check how many currently exist before choosing which one to use.
Second, the resource can be "partitioned." If you write a coordination routine that uses tokens to coordinate access to various data objects, you cannot simply replicate the routine if it runs out of bandwidth. If you did, before fulfilling a request for a token, each routine would then have to poll its replicated counterparts to determine if any of them had already handed out that token, and such a communications overhead would cripple scalability. However, if you divide the set of tokens into smaller discrete partitions, and then have each coordination routine only manage a single partition, then you don't need any communications overhead. To interface with these coordination routines, you would have to write a scalable application to include a mechanism whereby it could be told how the tokens are partitioned, so it would know which coordination routine to go to whenever it needs a particular token.
Third, "chunking" can be used in some situations. This clever technique often solves scalability problems without requiring you to duplicate the shared resource. For example, many applications have a purchase order number generator. Every time you call the routine, the next PO number is returned, and the PO number generator increments the value by one in preparation for the next request. As the number of requests increases, bandwidth will eventually be exceeded, and a bottleneck will result. But what if you wrote the generator to return a range ("chunk") of, say, 10 PO numbers at a time, and to increment the PO number by the chunk size, instead of by one? The application would also be written to understand that rather than returning a single PO number, the generator returns a range of numbers (for example, the PO numbers 13,751 through 13,760). The application would only need to send a request to the generator when it has used up the full range. Ultimately, rather than increasing available bandwidth, chunking reduces bandwidth requirements.
Why do scalable applications require incremental development cycles? First, there are simply practical concerns: These applications are usually too big to build all at once. Second, scalable applications will continually and organically grow and adapt as users needs expand and change. Given these two issues, it isn't possible to build the "whole" application, because what is defined as the "whole" application will be in constant flux as the scope and scale of the application continually increases. An incremental implementation approach is optimally suited to address these constantly growing needs.
When starting a particular incremental development cycle, define which functionality will be built in that cycle, keeping the goals small enough so that the work can be completed in three to five months. Next, rather than starting with the logical first step and progressing through to the end in a linear fashion, scalable applications require that the critical pieces of the cycle be prototyped first. Because the system is dynamic, you must prototype those pieces that are pushing capacity, performance, time, or technology constraints, and then test them with the Performance Assurance test environment. This process ensures that none of the performance criteria have been violated, and it surfaces unforeseen bottlenecks.
Next, the remaining pieces are implemented, and the resulting application is put through final Performance Assurance acceptance testing. After testing, the new functionality is released to the end users, and the entire process repeats as the development staff begins defining the functionality desired for the next incremental cycle.
To put it simply, the Internet means more access by more people to more data coming from more sources. By creating applications that can be accessed via the Internet (for example, accessing your data warehouse on the Web), you may have a user population growing at the same rate as the Web itself. Essentially, this growth rate means that the need for scalability for all applications will be intensified once they are connected to the Internet. Otherwise, these applications will collapse under its crush. The same story can be told for Intranets, though to a lesser degree. Ultimately, scalable systems will be the enabling backbone behind the rush to put applications on the Net.

TABLE 1. Performance Assurance Metrics | |
| OLTP | DSS / Data Warehouse |
|---|---|
|
|