by Robin Schumacher and Billy Bosworth
Prentice-Hall, Inc., a Simon & Schuster Company, 1997
ISBN: 0-13-271156-7 (Paperback)
To order, visit www.cbooks.com for more information.
(**** indicates areas where text has not been included)
The technical scenario involved an ORACLE7 database on a Unix server with OS/2 and Windows clients that connected using the TCP/IP protocol. The response time to receive a query result could top an hour or better for some reports. Clearly, something had to change.
First we ran a complete set of database diagnostics (which included using the Oracle Track monitor included on the CD ROM that accompanies this book) as well as Unix diagnostics. We found a number of specific problems both in terms of placement of database files and heavy database fragmentation, which no doubt contributed to the overall situation. Still, something else had to be the main culprit for such a pronounced lag in response time.
We asked to see one of the typical reports that users requested. The report we received included a fairly complex query that contained a join of a number of database views. On the surface, nothing appeared wildly out of place. Only upon closer examination of the underlying views did the light began to dawn. The first view we examined nearly knocked us out of our chairs. It involved a selection of 43 columns that joined 33 tables, and had a join predicate that contained not less than 28 outer joins. And remember, this was just one view within the report. We looked at the client and proclaimed "Ta-da!"
In this example, our point is that even if you adhere to every high-performance guideline for building your PowerBuilder application, you will still fail miserably if your database is in poor condition. Obviously, the database described here suffered from a red-hot case of the database designers taking normalization to an extreme degree. As a PowerBuilder professional, you need to be aware of all aspects of database development--both inside and outside of your software application.
This chapter presents proven methods for handling hard-core database development with PowerBuilder. It is written for the developer who finds him or herself acting as the DBA for a project. With client/server, the roles of systems development professionals are sometimes melded together and one team member will often have to perform a variety of functions. Follow the guidelines outlined in this chapter and you will be able to get a much better handle on maximizing PowerBuilder's capabilities using the powerful functionality of today's database management systems.
The material in this chapter will help you build better datawindows and applications that efficiently access and update the RDBMS that you have chosen for your project. But, in addition to PowerBuilder tips and techniques, we also want you to have a clear understanding of how your database of choice can be used to its full potential and therefore contribute significantly to your overall success.
In this chapter, we will explore the issues with regard to packing your PowerBuilder application with more database strength. In addition to the topics we listed at the opening of this chapter, we will also quickly examine some questions common to many developers, such as whether or not native database drivers are faster than ODBC, and whether or not datawindows should always be used instead of embedded SQL, and more.
We've seen this attempted and, without fail, it either hasn't worked or has produced mediocre systems, at best. Unless you are developing software for market that has to have the capability to interact with your customer's choice of databases, then there's really no reason to do this anymore. At one time, the capability of some databases to scale and meet the growing needs of an enterprise were lacking, but this really is no longer the case. Most popular RDBMS products now can begin at a small departmental server level and later scale to massive parallel Unix platforms.
For example, many databases enable you to use stored procedures for database processing. Instead of performing complex SQL logic at the client, you can place the code inside the database where the SQL is already parsed and most likely will run more quickly.
Partitioning your client application and database server wisely will pay off in performance improvements, when done correctly.
As a PowerBuilder developer, it is your responsibility to see that your users get their data back as soon as possible. This is difficult if large volumes of data are involved. Because there are PowerBuilder applications being written to interact with data warehouses containing hundreds of gigabytes of information, you will want to find the optimum solutions for data speed and presentation.
One of the most powerful features of PowerBuilder's datawindows is the 'Retrieve Rows As Needed' (RRAN) option***. This feature, when used correctly, can give you the performance boost you and your users need. But, as we will see, it must be implemented carefully.
RRAN works like this: Size your datawindow control on your window to display, for example, 20 rows at a time. Your datawindow has the RRAN option set. When the datawindow retrieves information, only the first 20 rows are obtained and fill the datawindow control. Then, as the user scrolls down inside the datawindow, PowerBuilder issues requests for more information until, if necessary, all the data requested is present in the datawindow control. Clearly, this can increase response time for a user requesting a large volume of detailed information.
****
Being new to PowerBuilder, we were not at all familiar with the datawindow. To produce our desired sort, we were using the Sort. . . selection from the Rows menu option inside the datawindow painter. ****
We were performing the entire sort of the table at the client, rather than at the database level (done by coding a standard ORDER BY clause in the SQL as we had done with the other tools). When we changed the sort to use the SORT tab option inside the datawindow SQL painter (which causes a SQL ORDER BY to be done at the database server), PowerBuilder became much more competitive in our test, and in fact beat the other tools by a hair.
****
As a PowerBuilder developer, you need to hone your ability to understand both how PowerBuilder handles sorts and how your database performs sorts so you can achieve maximum performance from both.
Most RDBMSs handle sorts either in memory, at the disk level, or a combination of both. When a sort request comes into the database, Oracle first attempts to use a designated area in memory set aside for each user to perform the sort. If it can, Oracle will perform the entire sort in memory. If it cannot handle the sort in memory, Oracle begins writing out chunks of the sorts to a tablespace that is normally set aside and assigned for sorts. When all the sort batches are finished, Oracle merges them into one result set for the user.
Microsoft SQL Server also has an area set aside for processing sorts, referred to as its TEMPDB database. Each user is limited to a maximum number of pages in the TEMPDB database for their sort. The TEMPDB area, which normally resides on disk, can also be placed into memory to speed up the sorting process.
Oracle users can use the Oracle Track monitor supplied on the CD accompanying this book to monitor sort activity on their database. The Memory QuickCheck Sort option enables you to see the number of sorts that have occurred in memory versus sorts that have gone to disk.
Keep in mind that every sort performed causes extra overhead either at the database or client level. So when painting your datawindows, ask yourself whether or not a sort is necessary.
****
To begin, ask what your environment is like. Is your PowerBuilder system being developed for a heavy online transaction processing environment (OLTP) or a query-intensive decision support system (DSS)? Why does this matter? Because in an OLTP system, too many indexes can actually slow the system down when a lot of data is being driven into it. As your users key information into your database, the tables are updated with the data, but so are your indexes. If your tables have numerous indexes created on them, this additional overhead of maintaining them can definitely put a strain on the system.
Another question to ask when creating indexes through the Database painter is "How large will the tables grow?" Indexing small tables is normally a waste because they end up being cached in memory anyway. So what's a small table? A good rule of thumb is any table with less than 200 rows is a small table. This is something you can test yourself.
If you decide your table is a good candidate for indexing, which columns should you index? A few good places to begin include:
For Oracle, this means you need to create separate tablespaces on separate drives of the server machine. For Microsoft SQL server, this can be done by creating segments on different drives. Your tables are then created in one tablespace or segment, and your indexes in the other.
If RAID is installed on your server machine (and therefore all your data is spread over all the drives), you don't have to do this because RAID normally offers better performance than splitting out the tables and indexes on their own drives.
Stored procedures enable you to application-partition your system and put business logic at the server level, hopefully to increase performance ( for more information on this, see Chapter 4, "A System Is Only as Good as Its Network"). The question for us as PowerBuilder developers is "Being that PowerBuilder supports the use of stored procedures, can they cause our application to run better?" As we'll see, it depends on the situation.
Step two is also performed quicker for both Oracle and SQL Server because only the user's permission to execute the stored procedure itself is checked instead of the actual underlying tables. These features on the surface seem to indicate that stored procedures will help speed up a datawindow's execution.
****
Happy that we no longer had to write code to build a table's primary key, we quickly created several tables with IDENTITY columns, coded-up our PowerBuilder 4.0 datawindows, placed the datawindows into our inherited windows that perform database updates, and ran them. We inserted new rows and committed them to the database with ease. What a nice addition the IDENTITY column was!
However, when we changed some data while the window was still up and attempted an update, we got the unfortunate message: "Primary key not found."
What happened? The INSERT into the database went fine because our PowerBuilder application didn't need a primary key for that function. However, an UPDATE performed through a datawindow must have a primary key to work properly and since we used the IDENTITY column to set the primary key, our PowerBuilder datawindow had NULL values in its primary key columns. So much for using IDENTITY columns easily!
The handling of primary keys can get sticky in some PowerBuilder applications. Most applications have table structures that contain numeric primary keys. Because of this, PowerBuilder developers most often have to write code to handle the setting of these keys. But does this always have to be the case? There are instances when the answer is "No."
****
Instead, we discovered that database lock contention was running rampant throughout the system during the peak hours of the day. The reason no one spotted this was because the application was written to automatically retry any update operation that failed due to the inability to obtain the necessary lock. To the end user, it appeared that the system just hung without reason until their request obtained its lock and went through. Only when a robust database monitor that tracked lock contention was used did the problem come to light.
When writing PowerBuilder applications, you need to design your windows that perform database operations in such a way that the chance of lock contention is minimized. The one golden rule you can apply is the same one that solved the majority of the locking problems at the DB2 site mentioned previously.
When the application mentioned earlier was changed to include more frequent COMMITs, the locking problems all but disappeared.
Other databases, such as Sybase and Microsoft SQL Server, only go down to the page level (although the release of Microsoft SQL Server 6.5 offers the potential for row-level locking on insert operations) and therefore have a greater chance of producing lock contention. Some databases also have built-in functionality referred to as lock escalation, where the database will, under certain circumstances, escalate a lock from the page level to the (gasp!) table level. This usually occurs when the database has determined that keeping track of all the page locks is causing too much overhead and therefore locks the whole table. You can either check your database manuals or ask your DBA if such a thing can occur in your database.
There's a direct relationship between the granularity offered by a DBMS and the chance of lock contention. The possibility of your users locking the exact same row at the exact same time are remote, thus giving the row-level locking databases an edge in this area. Page locks can definitely increase the likelihood of lock contention depending on how many rows take up a page. Nothing more really needs to be said about table locks.
At the DB2 client site mentioned previously, the source of lock contention for the most part was not the tables, but instead was concentrated mainly in their indexes.
One last note concerning locking: Depending on your database, PowerBuilder offers a number of database DBPARM settings for lock parameters. Check PowerBuilder's online help to see if any parameters could apply to the database with which you are working.
****
PowerBuilder provides native drivers for you to access most of the databases that you will use for your project. You also have the choice of using ODBC drivers from a variety of vendors. The choice of which to use has been discussed in a number of articles and user group meetings.
WHY YOU SHOULD CARE:
The obvious reason is better performance.
OUR CONCLUSIONS:
ODBC critics point out that ODBC adds another layer between the client and the database. However, in many of the benchmarks we have done, the native drivers and ODBC run virtually neck and neck. We prefer the native drivers because they are easier to work with, can be deployed with the PowerBuilder runtime library, and require less configuration on the client machine than ODBC. Naturally, ODBC must be used when native drivers are unavailable.
****
A number of database gurus have stated that no high-performance production database will run efficiently in true third normal form. We have seen cases in which data administrators took database normalization too far and produced logical data models that make no physical sense.
WHY YOU SHOULD CARE:
An over-normalized database makes development more difficult and leads to the use of an abundance of joins even for simple queries. The end result is usually performance degradation.
OUR CONCLUSIONS:
Creating datawindows that involve an extreme number of joins should clue you in to the fact that maybe your database needs to be de-normalized a bit. This process usually involves combining groups of two or more tables into one table. It goes without saying that you should attempt to preserve as much of a normalized model as you can to guard against data redundancy and the like, but you may be surprised at what a performance boost de-normalizing some of your data model will give you.
*****
Fragmentation in a database usually occurs when objects are created and dropped continuously or when tables and indexes that have not been sized correctly start to grow beyond their initial space allocation. This can lead to a table or index being spread out over various places in a tablespace or segment. In addition, tables and indexes that have data continually inserted and deleted from them can contain gaps of unused space between the actual table or index data.
WHY YOU SHOULD CARE:
After an object grows beyond its initial space allocation, it will usually take a database longer to scan it because it must look in many places for the object instead of only one. This results in a longer response time for your datawindow retrieves.
Further, many databases have the capability to read ahead for data. Once a request for data is made, the database begins to read multiple chunks of data into memory for faster access. For many databases, a fragmented object causes this performance feature to be turned off and therefore your response time will suffer.
Recall that at the beginning of this chapter, we mentioned a client site that suffered from a number of database performance problems. One of the first things we did was to reorganize the database for faster performance. After this was accomplished, we received very positive feedback regarding their response time. Query time for some reports were literally cut in half!
All databases provide an area in memory where data is read in and out. Depending on the machine's available memory, this area might be large or small. The size of this area has a direct impact on an application's performance.
WHY YOU SHOULD CARE:
The more memory that is available to a database's buffer cache will equate to more data being held in memory for your application.
OUR CONCLUSIONS:
It doesn't take a rocket scientist to figure out that data read from memory will be infinitely faster than data read from disk. Therefore, your project team should strive to keep the database's buffer cache as large as possible in order to increase the query performance of your application.
Note that there does come a point of diminishing returns from continually increasing your database's buffer cache. Guard against increasing the size of your buffer cache so much that the physical memory available to the server machine is less than adequate. The end result can be a machine that begins to page, swap, or thrash--definitely not something you want. Use the tips below to size your buffer cache correctly.
Users of Microsoft SQL Server can use the performance monitor supplied with Windows NT to examine the Cache Hit Ratio for the SQL Server object. Again, a good measure is normally 80 percent or better.
Oracle now offers a 'CACHE' parameter applied using DDL that can cache a table in memory the first time it is accessed. For small look-up type tables, this parameter is ideal for helping you get more reads from memory.
A well-normalized database has some form of referential integrity (RI) enforcement. This is usually performed either through a declarative form (meaning it's coded in the DDL for the table objects) or through database triggers.
WHY YOU SHOULD CARE:
No matter how it's done, RI management is essential to a well-functioning database application.
OUR CONCLUSIONS:
How will you enforce RI in your PowerBuilder application? Depending on the database, manual RI enforcement through an application can lead to tremendous overhead with all the checks that may be performed before a DELETE or UPDATE operation is attempted.
The datawindow user objects and database windows supplied in the class library on the CD ROM that accompanies this book show you how to handle these events so that RI enforcement is performed correctly. Special RI handling is done for DELETE operations that prove to be more of a problem than they appear to be on the surface.
*****
Keeping up with all the improvements that both Powersoft and the database vendors keep adding to their products takes a lot of effort, as we're sure you are well aware. Hopefully, with each new release, we will find new features to apply that will improve our application's performance and capabilities.
Let's be honest: If your system has queries in it that perform a SQL JOIN of 33 tables, you're in for some trouble no matter what. But hopefully, by using the strategies in this chapter that have been proven to be successful in numerous applications, you can help better both your PowerBuilder application and its underlying database.