DBMS, August 1997
DBMS Online: Server Side By Martin Rennhackkamp

Sybase Warehousing

An Interactive Warehousing Architecture Can Extend the Scale of Your Information.

In keeping with this month's theme of data warehousing, I would like to take a look at Sybase Inc.'s (Emeryville, Calif.) warehousing solution. In the company's data warehousing white paper ("Sybase Interactive Warehouse," www.sybase.com/products/dataware/iwwpaper.html), Sybase defines data warehousing as the process of integrating information from a variety of sources, at regular intervals, to construct an integrated view of the business, combined with query-optimized technologies for interactive analysis.

Sybase has a whole range of products as part of its data warehouse offering. They include:

The components of this architecture are integrated to support Sybase's so-called "interactive warehouse" strategy. The architecture consists of processes, products, and data from Sybase and other databases. These are integrated into an end-to-end framework to collect data from numerous sources (legacy and mainframe databases, applications, and external data sources). They also apply transformations and conversion logic to present the data in useful business terms. The architecture includes tools to automate and manage the distribution of the data to the data warehouses and data marts, and tools to store the data in preparation for end users accessing the data from any of the standard query and reporting tools and applications.

Sybase IQ

From a data warehousing point of view, Sybase IQ (Interactive Query Accelerator) is the central component. Sybase IQ is a DBMS server that is designed specifically for decision-support databases and complex data analysis queries. The latest release, Sybase IQ 11.1, is now available on Sun Solaris 2.5.1, IBM AIX 4.1, HP HPUX 10.2, and Digital Unix 4.0a. Sybase IQ 11.2, which contains additional performance improvements, will be released by the time you read this.

Sybase IQ was designed for interactive data warehousing. Architected around Sybase's patented Bit-Wise query-processing technology, it uses advanced performance-optimized algorithms for complex ad hoc queries. Sybase claims that this technology offers five to 500 times faster ad hoc query performance than traditional relational database management systems (RDBMSs), even with changing and ad hoc user queries. Sybase also claims that Sybase IQ requires much less resource-intensive tuning to obtain excellent performance than a traditional RDBMS requires.

Sybase IQ fits in the Sybase Open Architecture. Sybase IQ supports a collection of heterogeneous data sources (including accessing data directly from Sybase SQL Server or, through a gateway, from any of more than 20 supported data sources such as Oracle, DB2, Informix, and flat files). It can be accessed from a wide range of popular, off-the-shelf query tools (such as Business Objects and Cognos Impromptu). Sybase IQ also supports a standard relational framework. It processes standard SQL commands and supports Sybase's Open Client interfaces, DB-Library, Client-Library, and ODBC client calls. Its integrated catalog can serve both Sybase SQL Server and Sybase IQ, allowing DBAs to manage security and other administrative services consistently throughout the Sybase environment. It uses the standard SQL Server data definition language, with extensions for the Sybase IQ access methods.

This architecture can even be used to allow transactions and queries designed for execution by an operational data store to be passed through and executed transparently by an attached database. This lets you build high-performance environments that support both interactive analysis and realtime updates.

Features

Sybase IQ incorporates a number of innovative features, including advanced indexing and access methods, preoptimized and ad hoc join strategies, data reduction, and partitioning techniques.

Bit-Wise Indexing

Sybase IQ stores data very differently from conventional RDBMSs. A copy of the data is stored in a column-oriented format (instead of a row-oriented format, as in a conventional RDBMS). The data values are indexed in arrays of bits for each column, called Bit-Wise indexes. The index either contains the distinct values if there are a few (up to 1,000), or it contains slices of values for larger numbers of distinct values. In both cases, these structures can be manipulated through simple binary operations. These Bit-Wise indexes can be applied to all types of data, for data elements with large numbers of repeating values, as well as for data elements with large numbers of distinct values. They can aggregate and group data quickly; they also count, identify, and present detailed level records back to the user.

Sybase IQ's Bit-Wise indexing resolves queries by efficiently combining and manipulating only the relevant data, without scanning large amounts of raw data; often it does not access any raw data at all. Because Sybase IQ's Bit-Wise indexing accesses only the minimum amount of data needed to resolve each query, Sybase IQ can rapidly process almost any query, whether preplanned or ad hoc. Accessing the data stored in a column-oriented format eliminates accessing and processing irrelevant data. Because most decision-support system (DSS) queries access a relatively small number of columns across a large number of rows, this technique eliminates a great deal of unnecessary I/O. In a worst-case scenario, Sybase IQ would scan an entire column rather than do a full table scan.

The Bit-Wise indexing technique reduces the index data structures to as little as 20 percent of the size required for the equivalent B-tree index used by a traditional RDBMS. Its data reduction and compression techniques enable the entire database to be fully indexed on each column and made available for ad hoc query in less space than it normally takes to store the raw data alone. It also does not require the ongoing maintenance tasks, as do most other relational indexing techniques.

The Bit-Wise indexing is flexible. It lets you add new data values to the data structures without needing to rebuild or reorganize these structures. This ensures fast access to current information. With the Bit-Wise indexing technology, Sybase IQ can resolve queries without tuning. The Bit-Wise technology can handle ad hoc queries with little optimization. Sybase IQ resolves queries using only information that is directly relevant to each query. Administrators need not define all of the specific queries and operations in advance to provide high performance. The access methods are data-dependent, not query-dependent, so data warehouse users can pose an unlimited number of complex ad hoc queries without needing query-specific tuning by the DBA.

Compression

Because columns of data are easier to compress than rows (rows contain mixed datatypes and columns of varying widths), Sybase IQ can store raw data on disk in as little as 40 to 50 percent of the size of the original raw data. This reduces the I/O required to read that same data back from disk at query time.

Large Block I/O

Using the Bit-Wise indexes, Sybase IQ performs I/O (prior to compression) in 512 bytes to 64KB blocks. With this approach, your system can read as much information as possible in a single I/O. This means that information from as many as half a million records can be read in a single I/O. With intelligent column access, this can result in an I/O reduction of up to 98 percent. This reduced I/O eliminates the major bottleneck found in most query-processing decision-support systems.

Database Access Methods

Sybase IQ builds compact, efficient secondary structures on each column as the data is being loaded. These structures further speed access to, and filtering of, information in the column. These structures include variations on low-cardinality bitmaps, high-cardinality B-trees, and the Bit-Wise indexes described previously. These data structures are optimized for the data characteristics and query operations that are typical of decision-support applications.

The query optimizer automatically picks the best available access methods and algorithms for executing each individual query. By intelligently combining all of the capabilities I described earlier, Sybase IQ performs efficiently and consistently across the vast array of complex queries typical of decision-support analysis, without manual intervention.

Parallelism

Sybase IQ optimizes data manipulation operations to maximize query throughput on fast processors, large memory caches, and parallel-processing and symmetric-multiprocessing architectures. It breaks low-level operations, such as sorts, bitmap manipulations, and I/O, into nonblocking operations that the operating system can schedule independently. This ensures scalability and consistent performance as data or users are added.

The nonblocking parallel-processing architecture of Sybase IQ 11.1 can take full advantage of all of the available processors in a symmetric multiprocessing (SMP) environment when performing resource-intense activities, such as data loading. The parallel multithreaded loader can read multiple source files to bypass the 2GB file-size limit still found in many operating systems. This allows huge amounts of data to be loaded in a single run.

In addition, the Sybase IQ 11.1 architecture can build all of the data structures that are required to support ad hoc queries at 2GB to 4GB per hour or more on midrange SMP machines. More important, this architecture enables multiple processors to share query tasks and permits the system to execute many simultaneous queries with little or no performance degradation.

Data Loading

Sybase IQ 11.1 will load data from a wide array of data sources, including several BCP (Bulk Copy Program) formats, fixed-column width EBCDIC, ASCII or binary data, variable-column width EBCDIC, or ASCII data using delimiters. The fixed-format binary data can be loaded from a different machine type and automatically converted to the alignment of the target database (for example, Big Endian format data from Sun, HP, IBM, or most other RISC systems can be loaded into Little Endian format databases such as Intel or DEC, or vice versa).

Prejoin Indexes

Sybase IQ lets you take advantage of the known join relationships between your tables by allowing you to define the join indexes between them in advance. These join indexes reduce the processing required to process join queries at run time. The join indexes are supplemented by a variety of join algorithms, such as the sort-merge and nested loop techniques.

Relational and Multidimensional Data

Sybase IQ can accommodate relational data models or multidimensional data models, without the need to preaggregate the detailed data. It can perform data summary functions dynamically, therefore allowing you to use standard query tools to drill down to the detailed data. In addition, you can build and maintain compact aggregate tables to support repetitive reporting and analysis requirements.

Smaller Memory Footprint

In Sybase IQ 11.1, the amount of memory required for each user has been reduced. This increases the number of concurrent users who can access the interactive data warehouse. This memory saving can be attributed to several factors:

Join Between Private and Shared Index Spaces

Sybase IQ 11.1 will support joins between private and main (shared) index spaces. This extends its capability to use private index spaces for holding temporary result sets, which can now be joined back against the main index space. This will allow Sybase IQ 11.1 to support OLAP tools, such as MicroStrategy Inc.'s (Vienna, Va.) DSS/Agent, that use this feature heavily.

The objects in the private index space are referenced using standard SQL Server db.owner.table.column(index) syntax for compatibility with a wide range of tools. Disk allocation for the private index space is preallocated and will provide excellent performance and will support private index spaces larger than the 2GB limit imposed by many operating systems.

Performance Enhancements

Sybase IQ 11.1 and 11.2 have five main areas of performance improvement over previous releases:
  1. Group by: High-performance grouping is essential for many queries found in DSSs. Sybase IQ 11.1 uses new algorithms -- based on the number of groups, number of distinct elements, and other factors -- to choose among its execution strategies. This results in improved performance on a wide range of grouping queries.
  2. IN list: OLAP tools make extensive use of the IN (...) SQL construct, frequently with a very large number of entries in the list. Sybase IQ 11.1 has taken a new approach to this problem by combining column-oriented processing with hashing techniques to improve the performance of IN lists containing thousands of values.
  3. Subquery: Subqueries or queries containing other queries often cause performance problems in some relational databases. Sybase IQ 11.1 introduced several new approaches to subquery execution that draw strongly on its column-oriented processing and its ability to use the cache to emphasize reuse of the consistent portions of each query expression.
  4. Improved costing model: Sybase IQ 11.1 has significantly improved the costing portion of the query optimizer. These changes provide the optimizer with better information on how to choose an optimal execution plan for each query. Choosing the best execution plan is critical to the performance of ad hoc join queries.
  5. New join algorithm: One of the most popular data models for the data warehouse today is called the star schema. A star schema consists of a central table (called the fact table) surrounded by a series of smaller tables (called dimension tables). In a star schema, there are millions of fact records that consist of only a few numeric columns and a relatively small number of dimension table records that provide meaning to the facts. Some relational join algorithms do a poor job of joining the relatively small dimension tables with the very large fact table, causing very slow query performance. Sybase IQ 11.1 has a new hash join algorithm that provides a significant improvement in join queries over the traditional relational join methods when used on star schema queries.

Using Sybase IQ

Sybase IQ has a single install utility that sets up and starts up Sybase IQ and the Sybase server components that must be installed. If a database already exists, you can run a template command to generate the scripts required to create the Sybase IQ indexes. (The warehouse index space and warehouse indexes can also be created directly from PowerDesigner's Warehouse Architect.) Once the index space has been created, the data can be loaded through the IQ_LOAD utility, which simultaneously indexes the data. The data can be loaded from flat files, from Sybase SQL Server databases, or from other relational databases, through SQL select statements. Server administration currently is performed through a command-line interface, but Platinum Technology Inc. (Oakbrook Terrace, Ill.) has developed a Desktop DBA module for Sybase IQ.

Once the data is loaded, users can query any aspects through most of the popular query and reporting tools, including Microsoft Access (Microsoft Inc., Redmond, Wash.), Impromptu (Cognos Inc., Ottawa, Ontario, Canada), BusinessObjects (Business Objects Inc., Cupertino, Calif.), Forest & Trees (Platinum Technology Inc., Oakbrook Terrace, Ill.), and SAS Assist (SAS Institute Inc., Cary, N.C.). Access through Sybase's CT-Lib and db-Lib is also supported, along with ODBC.

After the data has been loaded and set up, there is very little administration and tuning to be done. Because the Sybase IQ warehouse is pretuned according to the nature of the data and not according to the nature of the queries, the system does not have to be retuned as the nature of the queries evolves over time.

A New Playing Field

To play on the changing field of data warehousing, Sybase has redefined the traditional data warehouse and replaced it with the "interactive warehouse" to help users create, integrate, and communicate information throughout the organization. Sybase has exploited the name "interactive" because, the company claims, it extends the scale of data and information, the types of users, and the range of questions that can be asked. What is definitely true is that Sybase's warehousing flagship product, Sybase IQ, makes it possible to store, index, and query large volumes of data with less storage-space utilization and less processing and tuning than conventional RDBMSs.

However, no matter how a data warehouse is described, it is still a process, not an off-the-shelf product. A successful data warehouse, be it a conventional data warehouse or a Sybase IQ interactive data warehouse, requires an architecture and a plan for an end-to-end solution. This architecture is the foundation for delivering useful, reliable, accurate, and timely information to business users. However, the functionality provided by the warehouse server and the speed by which it provides information to its users can significantly improve its usefulness.


Martin Rennhackkamp is the owner and principal consultant of The Data Base Approach, a corporation specializing in relational and distributed databases, based in Cape Town, South Africa. You can email Martin at mr@dba.co.za or visit his Web site at www.dba.co.za.
* Sybase Inc., Emeryville, CA; 510-922-3500 or fax 510-658-9441; www.sybase.com.
What did you think of this article? Send a letter to the editor.


Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
August 1997 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1997 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Thursday, July 10, 1997