DBMS

Reporting Against Large Databases

By Peter Brooks and Christa Wollenweber
DBMS Data Warehouse Supplement, August 1996

Server-Based Reporting Engines Apply Multi-Tier Architectures to Very Large Databases


We all know databases are getting larger, terabyte by terabyte. But how are end users supposed to create reports from all that data?

PC-based report writers used to be the answer. A user would format a report, structure the query, execute the query, and then wait several seconds (or minutes) for the response from the database. The bad news: This doesn't work against today's large databases - users would stare at the Windows hourglass for hours waiting for the query to complete. The good news: Both existing reporting tool vendors and newly formed startups are developing advanced reporting tools to access large databases.

Modern reporting tools have been designed to accommodate:

  • accessing large numbers of database rows
  • transferring large result sets from the database to the end user's workstation
  • complex SQL statements including multi-table joins and subqueries
  • complex report logic such as summarization routines that involve large numbers of rows

    Large reports are sometimes also needed as well. Intelsat, which owns the world's most extensive global communications satellite system, creates a set of customer invoices consisting of over a million rows of output on an HP/9000 using MITI's SQR3 software. Reports consisting of very large result sets may also require sophisticated formatting that extends the time required to completely create and process the report.

    Many report tools are marketed as "end-user tools" with the theory that users can create reports with no involvement from MIS. But the reality is that when large databases are involved, database professionals are needed to optimize the database for performance, allocate data and processing among the platforms, and create and test complicated processing logic.

    Advanced reporting tools consist of several components. Developers (and sometimes end users) build reports using client-side development tools. End users run reports from client workstations. Reports are stored and run on a second tier, and database servers are the third tier. This architecture is intended to allow reports to run efficiently against large databases and also return large amounts of data.

    Architecture Overview

    The heart of an advanced reporting system is a report server that offloads processing from end-user workstations. For example, the Actuate Reporting System's architecture includes separate end user, developer, and report server engine components. (See Figure 1.) A report server can run on its own Windows NT or Unix platform, or it can coexist on the database server platform. Coexistence maximizes the speed of data transfer between the report and database servers, but it could degrade DBMS performance because both servers compete for resources. Use of separate platforms gives each server its own dedicated resources, but it requires additional hardware and operating system software. Using separate servers also lengthens the data transfer time between the two servers, so it's a good idea to keep the two servers on adjacent network nodes.

    All reporting tools have associated proprietary viewers. Some viewers can be integrated with other vendors' products. Crystal Reports has more than 90 OEM bundling agreements for products such as Visual Basic, and Actuate is partnering with organizations such as Informix and Forté. Most reporting tool vendors have announced plans for viewers that let users access reports within Web browsers.

    Client Components

    Client components of reporting tools let users develop, execute, and view reports. In general, the report design and development tool is a separate product from the end user's report viewer. A viewer can improve performance by:

  • initiating execution of reports that run on report server (or on the client platform as an option)
  • scheduling reports to be run immediately or at a later time (some vendors such as Information Advantage provide agents or alerts that run when data changes or as certain conditions occur)
  • beginning report viewing when the first page is complete rather than waiting for the entire report to be complete (most reporting tools transfer data on a page-by-page basis and save previously viewed pages for backward scrolling; Platinum's InfoReports has a buffering capability that allows specification of the number of rows to be retrieved on each transfer so that more than one page can be transferred at one time)
  • restricting the amount of data retrieved by specifying a maximum number of rows (some query governors also limit execution time based on other estimates of a query's cost or complexity)
  • canceling long-running queries

    While end users generally have little control over the performance of the queries they create, developers using a reporting tool's programming language have several options:

  • allocate input buffer sizes synchronized with the database output buffer size (if a database is set to buffer 10 output records before sending the data, the reporting tool should be able to retrieve 10 records at one time rather than require 10 separate reads; SQR3 supports this functionality)
  • alter the SQL generated by the tool and create custom queries (reporting tools let developers optimize SQL to take advantage of database indices or partitioning
  • select the type of interprogram communication used
  • support both static and dynamic SQL
  • compile the report code rather than run it through an interpreter at runtime

    Windows 3.1 and Macintosh are the most prevalent client operating environments, while Windows 95, Windows NT, Unix, and Internet browsers are beginning to be used.

    Report Servers

    The most interesting advance in reporting against large databases has been the development of report servers that offload data access and report generation from end-user workstations. Windows NT and Unix server platforms are the platforms of choice to run 32-bit advanced reporting tool software. Performance and storage of extracted data and finished reports on the report server is generally limited by memory and disk space, which can be upgraded as needed.

    Older report writers restrict the data in a report to the amount that can be selected in one SQL statement. (This may be a very complicated statement!) Unfortunately, the generated SQL may also result in long database processing time. Advanced reporting tools and data warehouse servers reduce query processing time by creating alternatives to single SQL statements. For example, summarizing all sales data for each company in a large table can lead to long processing times while creating a very large result set. It is often more efficient to build the report by selecting all of the company names and calculating each company's sales on a row-by-row basis. Most reporting tools can now perform this type of query optimization behind the scenes and without users' knowledge or intervention. Similarly, queries for summary information can be directed to aggregation tables on the database or report server instead of resorting to large table scans.

    These features require a metadata or semantic layer that resides on the report server. This metadata lets users build queries using data defined in business terminology rather than in cryptic or programmer-oriented database table and column names. The metadata layer translates the business terminology into database access statements that may involve multiple SQL statements or the use of an aggregate table rather than selecting against detail data. One disadvantage of this approach is the time and effort required to build the metadata layer. Although end users don't need to know physical database details, you need DBA skills to build and maintain the metadata. Database structure changes must be monitored to understand the effect on the metadata layer, which may also require changes.

    One critical yet often overlooked aspect of reporting against large databases is the performance impact of report calculations. Calculations that consist of complicated logic, totaling, or OLAP-style "slice and dice" operations require a significant amount of time if large volumes of data are involved. All reporting tools allow calculations to be performed on the report server, eliminating the performance impact on end users. It is also common to find reports that are heavily laden with many complex formulas, often involving if-then-else logic or even CASE statements that choose from several alternatives. These CPU-intensive formulas may not be noticeable in a report that prints several thousand rows, but when the result set is several hundred thousand rows, the impact is magnified. Reports and data can be cached on the report server so that similar end-user requests can access the stored information rather than run a long query.

    Some reporting products use native or custom drivers to access the source databases while others use ODBC drivers. Although the first ODBC drivers were slow, it is now unclear whether there is a significant difference between the newer ODBC drivers and native drivers. ODBC's SQL Passthrough capability allows virtually any SQL command to be passed to a data source. The bottom line: Both native and newer ODBC drivers provide acceptable performance, but native drivers may still be slightly better.

    Most tools can access numerous relational databases by executing the SQL for each report from the server, not the client. The information from each database can then be consolidated on the report server. Be wary of restrictions on the maximum number of database access statements or databases that can be accessed in one query. Concurrent heterogeneous database access - data selection or joining of rows from disparate databases at the same time - is generally not supported.

    Report servers have the ability to cache reports or store reports that have been retrieved from the database so that they need not be re-created for each request. One simple way that report servers minimize the impact of building large reports is by running reports during nonpeak hours. Reports can usually be initiated from either an end-user client machine or a report server scheduler. Capabilities range from simple time-of-day to more sophisticated "third Thursday of each month" types of scheduling. Other administrative capabilities of report servers include batch processing of multiple reports, query management such as the ability to prioritize queries, error handling, and implementation of security.

    Database Server Tier

    A common theme in reporting tools is to let the database server undertake as much processing as possible, the theory being that databases have sophisticated query optimization routines, indexing, and partitioning to take advantage of high-performance parallel-processing SMP and MPP hardware. Although this is true to an extent, advanced reporting really requires a coordinated report and database server architecture approach:

  • Distribution of tables must be carefully planned.
  • Query design and database indexing are mutually dependent.
  • The impact of placing the report server on the database platform must be determined.

    In fact, report server capabilities such as stacking multiple SQL queries and caching of reports are designed to overcome database performance problems.

    Choosing a Reporting Architecture

    Reporting tools are becoming more sophisticated as the need to access very large databases becomes more prevalent. The choice of a reporting tool is not as easy as it used be, when performance was not much of an issue and the decision could be based primarily on the user interface. Now, you must have an understanding of the user access requirements, database structure, and LAN/WAN communications capabilities so that you can select the most effective reporting solution, as opposed to a single software tool. When accessing large databases, you must define "large" in order to choose the best category of reporting tool:

  • End-user tools that allow the database to handle query performance optimization are best for selecting a small number of rows from one or two large tables based on simple logic.
  • Customizable report servers are required for selecting a moderate to large number of rows based on complex logic, so that developers can build queries that most efficiently access the database.
  • A server-based production reporting tool with programming, buffering, caching, and query optimization capabilities is required for selecting a large number of rows from several different tables based on complex logic and is the requirement for complicated output formatting.

    You should prototype your reporting tool against a production-size database so that all of the data access and performance tuning capabilities can be exercised.

    While server-based reporting engines offload processing from both end-user client machines and database servers, database tuning and an appropriate application design are necessary to take optimal advantage of the architecture. Combined with client viewers, these multi-tier systems make reporting against large data sets more practical than traditional client-based report writers.


    Peter L. Brooks and Christa Wollenweber are management consultants with Coopers & Lybrand's Technology Advisory Services. Peter is based in Boston and can be reached at 617-478-3754 or 74477.3043@compuserve.com. Christa is based in San Francisco and can be reached at 415-957-3315 or cwollenw@colybrand.com.

    Figure 1


    Actuate Reporting Systemęs architecture includes an integrated development environment, database and Internet support, and end-user and viewing components. These end-user components let users design, customize, and execute reports and queries The free Report Viewer supports viewing and printing.


    Table of Contents - August 1996 | Home Page
    Copyright © 1996 Miller Freeman, Inc. ALL RIGHTS RESERVED
    Redistribution without permission is prohibited.
    Please send questions or comments to mfrank@mfi.com
    Updated Monday, August 12, 1996