

A critical task for developers building a transaction processing (TP) application is to determine if the hardware and software environment will support the applicationıs performance requirements as the system scales in terms of number of users, data volumes, transaction rates, and so forth. Benchmarking is an important technique system architects can use to predict performance and determine the hardware and software needed to achieve performance and scalability goals.
Over the past several years, one of your primary focuses as an IT professional has probably been on building a data warehouse or data marts for your company. During this exciting process of creating the corporate data warehouse, you may have ignored the performance of your backbone application, the OLTP system. It never had a performance problem before, but now that the corporate data warehouse and departmental data marts are in production, new demands from these systems ı as well as continued company growth ı have strained the online environment. The time has come to replace the aging OLTP system. You now have the task of choosing the next-generation OLTP server for your company. In this article, I will focus on key elements of OLTP benchmarking, industry-standard benchmarks that are applicable to the OLTP environment, and the process of creating a custom benchmark.
There are several industry-standard benchmarks that you can use to evaluate the OLTP performance of a particular system. Almost all hardware vendors publish results of these benchmarks for each of their platforms. These results can be instrumental in helping you determine which machine will provide the necessary scalability levels to support current functionality and ensure that future growth requirements will be met as well.
Should you base your decision to purchase a particular platform on these results? Probably not. The major downfall of industry-standard benchmarks is they do not simulate a real-life workload accurately. It is this deficiency that renders the results of these benchmarks inconclusive in determining a particular platformıs performance levels for customized applications. However, hardware vendors rely on these results to market and sell their machines, giving them the ability to pit their wares against the competition.
The real strengths of these benchmarks lie in their ability to demonstrate the scalability of different platforms with respect to database size, user load, and transaction throughput. This information will help you gauge the machineıs ability to sustain the growth of your company. Another potentially useful aspect of these benchmarks is to let you see where your current system measures in the spectrum of results and determine how much more iron you must purchase to meet your new performance goals. The sidebar provides a brief description of the benchmarks with the most visibility in the OLTP arena.
As I noted previously, the results of these benchmarks cannot give you adequate assurance that a particular application will perform as expected. Even if these benchmarks can measure the effect of the online performance levels, they cannot predict other factors that are present in a unique business environment such as the performance required to run special batch jobs or an individual companyıs requirements for feeding its data warehouses and departmental data marts.
A more precise and meaningful approach to predicting performance levels of your companyıs complete application environment is to create a custom benchmark that mirrors your business functions as accurately as possible. A custom benchmark is simply a matter of deploying your existing application on various vendor hardware systems. This is the only way you can best predict whether or not a particular platform will meet your current and long-term performance goals.
How do you go about defining and creating a custom benchmark? The first step is to ask yourself if it is absolutely necessary to perform a custom benchmark or if you can safely relate your companyıs transaction profiles to published industry-standard benchmark performance results. Can it be safely assumed that one (tpcM) is equal to one of your online transactions? If not, can you assume that two tpcMs equal one of your transactions? (The tpcM measures the rate at which orders are entered into a system while other tasks are occurring simultaneously.)
If you are able to compare your transaction profile performance requirements with published benchmark results, then do so, and donıt go through the gut-wrenching process of hosting a custom benchmark. I cannot stress this point strongly enough. Having been involved with benchmarking for the better part of my career, I can say that if you have never been involved in a benchmark, you are not missing anything exciting. They are not fun. They require substantial resources from your company to develop, test, package, support, and audit. In addition, each hardware vendor executing the benchmark will most likely require one or two people from your company to be on-site during the process.
Depending on the complexity of your application, benchmarking can be a time-consuming and costly endeavor. However, with that important caveat in mind, a custom benchmark is your safest method for ensuring that the performance levels and scalability you demand will be satisfied by a particular hardware and software system.
You must understand the business requirements you are attempting to measure. If benchmarking the entire business IT environment is too overwhelming, then identify the key functions your company needs. For example, the most important function might be the rate at which orders are entered into the system. If this is the case, then you should use new order throughput rates as the metric for benchmark, much like the tpcM is in the TPC-C benchmark. Obviously, youıll want these other background tasks to run simultaneously with the "key" transactions you have identified.
One of the most obvious goals in a benchmark is to ensure that raw performance levels meet your requirements. For example, if your company currently processes 1,000 transactions per second and is growing at a rate of 100 percent per year, you probably need a system that can accommodate 16,000 transactions per second to sustain the growth for the next three years (assuming that the load will have already grown by 100 percent before the benchmark is completed, the new system installed, and the database converted).
Transaction processing throughput is not the only performance metric you need to measure. Although your initial goal might be to satisfy the OLTP requirements of the business, transaction growth will also have significant ripple effects on other functions within the system, for example, the nightly batch processing and weekly unloads to the data warehouse.
If you perform batch processing nightly, you will also need to measure this function in addition to the OLTP portion to ensure that the system can handle these processes within the nightly batch window time frame. It is not sufficient to assume that a system configuration capable of achieving the desired goal of 16,000 transactions per second will also be able to complete large custom batch processes running during off-peak times. For example, you can achieve high transaction rates by using large memory configurations to cache user data. However, nightly batch processing may not use this cache as efficiently as the OLTP functions will, and it may need the data to be spread out differently, requiring substantially more disk drives to achieve adequate performance levels.
If your OLTP system will also be used to feed a data warehouse and/or data marts, you should also consider benchmarking this process as well, especially if the aggregation for the warehouse is performed on the transaction database. In other words, make sure you cover all the bases involved in the application process. The more thorough the benchmark, the more meaningful the overall results will be. Unfortunately, the more thorough the benchmark, the more time-consuming and support-intensive it will be as well.
Once youıve established the basic foundation for what the benchmark will measure, itıs time to specify details such as transaction mix, transaction response times, user loads, and overall transaction throughput.
Transaction Mix. It is important to understand the profile or the transaction mix of your business in order to relate the same mix to the benchmark. This means understanding the ratio of inserts to deletes to updates. If this information is currently unknown, then you must perform several days of analysis to understand the quantity and/or ratios of each transaction type against each database object, preferably at peak usage times.
Transaction Response Time. This metric measures the response times of each transaction type. For example, the benchmark will probably contain transactions that insert new data and delete and modify existing data. You must specify the maximum response times allowable for each transaction type by database objects. It is important to specify this metric by transaction type because deleting records from different database objects requires different response times. For example, it is faster to delete an object type with no child records than to delete a record that performs a cascading delete of all child records. Determining transaction response times usually boils down to the business requirements that the user community dictates. For example, you may require that x number of new order transactions complete within a specific time frame, and each of these transactions might need to complete within five seconds.
Transaction Throughput. This is a measure of the number of transactions in a given time frame that can be completed during the benchmark test. It has a direct bearing on the required response time and the number of users tested.
Scalability. As I noted earlier, it is just as important to measure the scalability of a machine as it is to measure performance. After all, what good is a machine that meets your current performance levels if it canıt scale to meet future demands?
Two essential components of scalability are database size and user growth. Benchmarking an application with an inappropriate database size can skew transaction throughput results by letting a smaller number of users obtain high transaction throughput rates due to data caching. In order to measure the true scalability of an application, it is important to measure both user scalability and data scalability.
A straightforward approach to measuring scalability with respect to data size is to perform a series of tests with varying database sizes. You can do this by benchmarking a quarter-database test, a half-database test, and a full-database test. The quarter database should contain roughly the same amount of data currently contained in the production system. The half database should contain enough data to represent growth for approximately three years, and the full database test should contain an amount that will represent the system in five to six years. By benchmarking the system with this method, you will be able to see how the system scales your application with both data size and user growth. You might initially only purchase enough hardware to sustain growth through the first year or two, but you want to make sure that the system you purchase now will be easily expandable to handle future requirements.
By analyzing up front what needs to be measured during the benchmark, you will avoid last-minute functionality changes that can cause bugs to creep into the benchmark, which inevitably require more resources to resolve and ultimately delay the procurement process.
This is probably the most important step in developing a benchmark. Model it as close to real life as possible. The more realistic the benchmark, the more accurately you will be able to predict the performance levels of each platform.
How do you accurately model current real-life transactions and batch processes? There are a number of tools on the market today that let you capture live user working sessions that can be replayed to simulate actual work being performed. Keep in mind that it is not sufficient just to replay the transactions back in real time because real-life data entry operators pause between field entries to verify the correctness of information entered. Data entry operators also rest between data entry screens and separate transactions, possibly waiting for the next customer to call. Luckily, these "session capturing" tools allow for specification of user think time, user keying time, and user sleep time as well as the time between transactions to enhance the realism of the benchmark. Most tools even model the performance impact of opening and closing user data entry screens, because these will also affect system performance.
The next phase of the benchmark process is testing and preparing of the final distribution package. The package delivered to the vendor should include everything needed to stage, create, load, and execute the benchmark. This package includes all database creation, load, and benchmark execution scripts and programs, along with detailed information on other software requirements such as a remote terminal emulator. The documentation delivered with the benchmark must detail all requirements needed to satisfy the benchmark goals, including detailed explanations of transaction profiles, table schema listings, table sizing, and so on. Another important piece of information is the number of I/Os per transaction type per database object type. Providing this information will help expedite the machine configuration process.
Once everything is packaged and ready to go, test the installation of the package and all the procedures. Doing this one last time will identify any deficiencies with the final deliverable and will make the benchmark process run smoothly. Because the benchmark execution will almost always occur at the vendorıs benchmark center, supplying patches and fixes for the benchmark can be quite tedious. The hard efforts put forth in this phase will reap high dividends in later stages of the benchmark.
The vendors will configure their machines with enough hardware to satisfy the benchmark requirements for both initial performance levels and scalability demands. The vendors will draw upon past benchmarks they have executed and design a hardware solution.
The next step is for the benchmark team to review the benchmark and accompanying documentation. During this review, lay out the database and start to generate data (if needed). The benchmark will commence.
Conducting a thorough benchmark is more than just evaluating transaction response time and throughput. Every phase of the "hands on" portion of the benchmark is a candidate for measurement. Areas to focus on other than the transaction rates include:
OLTP benchmarks by their very nature are the most stressful tests that you can place on a system. This is not to say that DSS benchmarks do not stress systems, but at very high user loads OLTP environments stress every component of the operating environment, including the network, I/O subsystem, operating system, and the stability of the database software. In order to control transaction rates and lessen the load on the system, hardware vendors use TP monitors. TP monitors help reduce the number of user connections by "multiplexing" a static connection and allowing multiple transactions to use the same connection repeatedly. In other words, instead of dynamically connecting and disconnecting users from the RDBMS, thereby placing additional load on the system and network, a fixed number of connections is established when the database is started, and these connections are shared among a large pool of users. Another hot feature of TP monitors is the ability to route transactions to specific database instances, thereby giving control to the places where specific types of transactions are executed. This function greatly enhances the ability to partition applications in a parallel server environment and ensures consistent performance levels by optimizing access to the database server.
The overall purpose of a TP monitor is to enhance the performance of OLTP application systems; however, in doing so, the application environment is made more complex. Will you use a TP monitor in your final production environment? If so, include it in the benchmark specifications. Donıt use a TP monitor in the benchmark if you donıt have plans to use it in the production environment, because the final results will be invalid.
In todayıs IT centers, open systems play key roles in delivering mission-critical solutions to businesses of all types and sizes. (Open systems are no longer used as standalone environments and may in fact be the only platforms within a given IT center. They are tightly integrated into the total service offerings provided by most IS departments.) The fact that they are now interwoven into the IS structure makes the platform decision more complicated than ever before. The next step in the benchmark process is to decide which system to purchase. If the benchmark modeled the entire business application accurately, then you donıt need to do another performance test after the system has been shipped and installed. However, if the system only tested a portion of the application, you should perform a more thorough test once the entire application and all the data have been loaded. Performing this last test should occur sometime during the functional parallel test and will preempt any serious performance issues from surfacing when the system goes live.
By this time, you should have compiled enough information from sources such as industry-standard benchmark results, industry reviews, vendor reference accounts, and the results of the custom benchmark itself to make an educated purchasing decision. The actual benchmark results will not be the only criterion to use in making the final selection (although it will be a major factor). Your system of choice must integrate seamlessly into the companyıs current IS structure. This means that it might be necessary for a particular open system to provide ESCON connectivity, be able to run the system monitoring tool of choice, and be able to connect to existing storage devices.
In this article, I have attempted to shed some light on the benchmarking process and highlight several important aspects to consider should a custom benchmark become necessary. The last thing any company needs is to go through a system migration exercise only to discover that the chosen platform does not have enough horsepower to support its demands. A custom benchmark will help avoid this costly mistake and provide a platform that will carry the successful migration of the application.
| Benchmarks Commonly Used for OLTP |
|---|
|
TPC-A One of the first benchmarks to measure the performance of a machine for use in an OLTP environment was the TPC-A benchmark. This benchmark measures OLTP rtransactions per second by using four test files and transactions such as accessing and updating records to simulate a simplistic banking transaction. Unlike the TPC-B, the TPC-A benchmark takes terminals attached via a network into account.
TPC-B
TPC-C -- Ken Fried |
Ken Fried is a principal consultant with Emergent Corp., a consultancy dedicated to helping clients deploying high-impact business information solutions that deliver unprecedented levels of scalability and adaptability. You can email Ken at kfried@emergent.com.