In its press releases, IBM describes business intelligence as the gathering, management, and analysis of data for the purpose of turning it into useful information distributed throughout an enterprise and used to improve strategic decision making.
When we look at IBM's business intelligence initiative in more detail, we see that it consists of a wide portfolio of elements integrated from many of IBM's diverse divisions to offer, in the company's words, a truly complete, end-to-end solution to meet complex business challenges. As a result, the business intelligence initiative consists of solutions and applications to address specific industries or related industries, such as banking or insurance; data analysis products to mine and interpret structured, textual, and Web data; consulting, services, and sales by IBM's business intelligence specialists to provide integration, implementation, and support for a customized business intelligence solution; specialized partnership, marketing, and development agreements with ETI, Cognos, Vality, Arbor, Per-Se, and Business Objects for tools and applications to access, extract, cleanse, and analyze data; data management software for the deployment and management of small to very large data warehouses; advanced mathematical algorithms developed by IBM Research for data analysis; database products, such as DB2 Universal Database (UDB) for creating, accessing, and managing traditional and multimedia data types; and hardware platforms, such as S/390, RS/6000, AS/400, and Netfinity, that offer scalable performance for the computation needs of advanced business intelligence.
Of this wide range of elements, I will focus on the server components - the data mining, data analysis, and data warehousing products - in the context of the wider business intelligence initiative and related to IBM's DB2 UDB database management systems.
IBM's business intelligence initiative includes a number of products in the data mining field, including the well-known Intelligent Miner family and more obscure products such as Fast Lookup Algorithm for Structural Homology (FLASH) and Teiresias. The Intelligent Miner family of products is aimed at extracting previously unknown, comprehensible information from any data source. It consists of the Intelligent Miner for Text and version 2 of the Intelligent Miner for Data. IBM also markets a set of applications, called the IBM Discovery Series, that sits on top of the IBM Intelligent Miner for Data, solving specific business problems through data mining.
The Intelligent Miner for Text has three major components: a search engine called TextMiner, Web access tools including a Web search engine called NetQuestion and a Web crawler, and text analysis tools. Figure 1 shows Intelligent Miner for Text's architecture. You can use this combination of tools to analyze documents such as word processing documents, online news articles and email messages to group and prioritize information contained in the text data. It can discover in which language a document is written, and it can extract names, multiword terms, abbreviations, and other vocabulary such as dates, figures, and amounts. It extracts patterns, organizes documents by subject, finds predominant themes, and searches for relevant documents.
Many of these tools are information extractors that enrich documents with information concerning their contents because the first step in text mining is to extract key features from texts to act as "handles" in further processing. The information retrieval component uses hash indexes built offline to perform Boolean or relevance ranking queries to select text documents. The TextMiner search engine provides full-text search and indexing of documents written in 16 languages, including double-byte languages such as Japanese, Chinese, and Korean, stored in many different file formats, using natural language, free-text, Boolean, fuzzy, phonetic and hybrid search conditions. The patented hybrid queries, for example, combine free-text and Boolean queries to overcome the problems of pure free-text queries. A hybrid query is a free-text query that restricts the result set to the documents that also match the Boolean part of the query. This allows for negative specifications in free-text queries that are not supported by pure free-text systems. The NetQuestion Web search engine, on the other hand, although it uses the same techniques as TextMiner, is streamlined for the types of information typically found on Web pages. You can use it for Boolean queries and phrase and proximity searches, as well as for front, middle, and end masking using wild cards.
Intelligent Miner for Text provides extensive functionality. Although computers do not easily process unstructured text, it is becoming the predominant datatype stored online. This is evident from the ever-growing number of news groups and email messages, not to mention Web pages and word processing documents. There is an amazing amount of useful information contained in such unstructured data.
The Intelligent Miner for Data searches for hidden information, associations, or patterns. It clusters data records based on similar values, using a voting technique called Condorset. It segments data using neural clustering - a technique that employs a type of neural network called the Kohonen feature map that clusters together similar data records and defines the typical attributes of an item that falls in a given cluster or segment. It discovers associations, sequential patterns, and similar time sequences and creates predictive or classification models of the data. It performs deviation detection by relying heavily on statistical analysis and visualization. The visualization techniques are useful for detecting deviations that hold for a rather small subset of the data, while it uses statistics to measure their significance.
You can use the Intelligent Miner for Data to analyze data stored in traditional files, relational databases, data warehouses, and data marts. Version 2.1 has a new Java-based graphical user interface with hover help, pop-up context menus, SmartGuides, optional hiding of advanced features, user preference settings, a progress indicator, graphical representation of the mining base and mining objects, and a graphical construction mechanism for composite objects. It uses new and enhanced statistical functions, algorithms, and optimized mining techniques, such as factor analysis, linear regression, principal component analysis, univariate curve fitting, univariate statistics, bivariate statistics, and logistic regression. It contains a new neural net implementation of the value prediction method, and its mining techniques have been optimized to handle outliers, missing values, and lift.
Version 2.1 runs on more platforms, exploits the DB2 UDB functionality, and is more scalable than the previous release. It can mine data in flat files or other databases accessible through DataJoiner, such as Sybase or Oracle. You can use its high-speed extract facility to import data into DB2 UDB from Oracle, Sybase, or DB2 for OS/390. On DB2 UDB, it uses parallelized versions of the mining algorithms for large-scale mining runs. The Intelligent Miner for Data supports English, French, Hungarian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, and traditional Chinese data. In addition to providing a published API as a client interface, it also provides a server API.
FLASH is an advanced pattern-matching algorithm designed to identify similar, but not identical, data. It is useful in business applications such as insurance and finance, DNA and genetic matching research, biometrics, rational drug design and other data-intensive work.
Teiresias is a pattern-discovery algorithm used in a joint research project by IBM and Monsanto to reduce the research and development cycle for life sciences products. You can apply it to any database to discover previously unknown patterns quickly.
Similar to text mining, data mining discovers hidden patterns in the data - patterns that, when analyzed appropriately, can contribute to the organization's prosperity. Although you may think that structured data, as stored in a database, may have all the relationships, meanings, and constraints identified, you can gain a lot of knowledge by analyzing that same data with data mining tools. Patterns hidden within the data or within relationships among the data can, for example, guide decision makers to different marketing models.
The DB2 OLAP Server integrates the OLAP engine and APIs of Arbor Software's Essbase analytical processing engine with IBM's DB2 UDB. All the design, manipulation, calculation, and analysis functions of the Essbase server are available in the DB2 OLAP server. The DB2 OLAP Server stores and manages the data in a DB2 UDB database using a star schema structure. You can populate the star schema with data processed by the Essbase calculation engine to improve query performance. You can access the data from clients supporting the Essbase client API or from any other clients using standard SQL. The API supports Visual Basic, C, C++, and other application development environments and works with Windows 95 or NT, Unix, and OS/2.
There are also a number of tools you can use with the DB2 OLAP Server to extend its functionality further. The Web Gateway provides the standard OLAP functions from standard Web browsers. The Extended Spreadsheet Toolkit includes more than 20 macros and Visual Basic functions to integrate custom Lotus 1-2-3 or Microsoft Excel applications with the DB2 OLAP Server. SQL Drill-Through provides links between summary data in the DB2 OLAP Server and detailed data in relational databases. The Partitioning Option makes it possible to design and manage multidimensional databases (cubes or star schemas) that span OLAP applications or servers. The Adjustment Module integrates secure, auditable controls for adjustments into a comprehensive reporting, analysis, and planning environment. The Currency Conversion converts financial data using different currency exchange rates. The SQL Interface provides direct access to more than 20 PC and SQL relational databases, including Oracle, Sybase, Informix, Microsoft SQL Server, and other middleware packages. Objects are a set of open, ActiveX, OLAP-aware objects through which users can develop OLAP applications on Windows 95 and NT clients with minimal programming.
Visual Warehouse is IBM's data warehouse and data mart solution targeted at the entire range of data warehouses - from departmental data marts up to the enterprise data warehouse. Visual Warehouse is not a single system, but a family of integrated tools. It consists of three components that cover the various steps in building, managing, and analyzing data warehouses and data marts: Visual Warehouse Desktop, Operations, and Agents. You use Visual Warehouse Desktop to define metadata, such as sources, targets, and mapping transformations. Operators use Visual Warehouse Operations to manage and monitor the various operational procedures related to data warehouse processing. The Visual Warehouse Agents perform the actual tasks specified by the Desktop and Operations components. You can also extend Visual Warehouse with specialized third-party tools for some of these steps.
Visual Warehouse Desktop provides the facilities to define relationships and mappings between online data and the data warehouse. It can map data from DB2, DB2 UDB, Oracle, Informix, SQL Server, CICS/VSAM, or IMS. Visual Warehouse can interoperate with other systems through metadata interchange facilities based on IBM's metadata tag language or the Metadata Coalition's Metadata Interchange Specification, an industry standard that simplifies interoperability among CASE tools; repositories; analysis tools; and extract, transformation, movement, and loading tools.
Visual Warehouse provides various tools to consolidate, cleanse, restructure, correlate, standardize, and summarize data from multiple source systems through so-called Business Views. The Business Views control how the data is transformed from the source databases into meaningful business information and automatically extracted, transferred, transformed, and refreshed in the data warehouse. The Business Views are graphically specified aggregations, summaries, and derivations of the source data, where multiple data sources are combined in a single view for decision makers. Visual Warehouse can maintain multiple copies of these data sets. You can specify the number of copies it must keep and when older copies must be deleted.
You can also extend Visual Warehouse with various third-party tools. The ETI-Extract tool suite from Evolutionary Technologies International is a loader generator that generates extraction and transformation programs from a visual specification from virtually any data source to any data target in any programming language. The Vality Integrity Data Re-engineering tool from Vality Technology Inc. provides facilities to uncover hidden, undocumented values from legacy systems and correlate information across independent systems to deliver high-quality input data for the data warehouse. Integrity is reputed to be one of the best name and address "scrubbing" tools on the market.
IBM also markets its entire Data Replication Solution as full and differential data warehouse loading tools. This includes DataPropagator Relational Version 5, DataPropagator NonRelational, DataRefresher, DataJoiner, and Infospeed. DataPropagator Relational Version 5, for example, can incrementally capture changed source data for propagation to one or more data warehouses. DataJoiner, in conjunction with Visual Warehouse, provides a single SQL interface to access and join data from a wide variety of non-IBM source databases, such as Oracle, Sybase, and Informix. Infospeed replicates data from S/390s to a wide variety of Unix and NT servers.
In addition, Visual Warehouse provides SQL, incremental updates, and bulk and parallel loading facilities to populate data warehouses.
You can store the data warehouse data in any DB2 database. Visual Warehouse comes bundled with DB2 UDB, but it can also use DB2 for OS/400 and DB2 for MVS. DB2 UDB is IBM's preferred platform because it is easy to manage through the DB2 Control Center, it is scalable from a single processor to SMP to MPP environments and can thus support hundreds of users and gigabytes to terabytes of data, and it has the facilities to process complex data warehouse queries in parallel. DB2 UDB's query rewrite facility is especially useful for optimizing the poorly structured queries often generated by the popular query and reporting tools. Visual Warehouse can also populate data warehouses implemented using other DBMSs through DataJoiner, IBM's multivendor database middleware that provides access to Oracle, Sybase, and Informix. Through IBM's Cross Platform Attachment, you can store the data from a Unix or Windows NT data warehouse on S/390 storage facilities to reuse its excess storage capacity and take advantage of the S/390 storage management facilities.
The Visual Warehouse Operations components include scheduling and monitoring tools for periodic building and refreshing of the data warehouse through which DBAs can concentrate on managing exceptions rather than worrying about day-to-day operations. It collects status information and statistics about the build processes to enable analysis and tuning that ensures that these maintenance tasks are as efficient as possible. Operators can customize their views to focus on their most important tasks through the new Work-In-Process console.
Visual Warehouse encompasses various tools for accessing and analyzing the data in the data warehouse. It supports a range of data access options including ODBC, JDBC, native DB2 clients, industry-standard SQL, and data analysis through the Arbor's Essbase OLAP API. You can order it with data analysis tools from Business Objects or Cognos as part of the package. It can also provide wide-range information access from the Web or an intranet through all the familiar Web browsers.
The latest release, Visual Warehouse 3.1, runs on AIX, OS/2, and Windows NT. The addition of AIX and OS/2 results in improved performance for data warehouses on those platforms, because data does not need to flow through a Windows NT agent.
It should be obvious from this column that IBM's business intelligence initiative is not a single product, but rather the integration of a wide range of products - some new, some established - from various subject areas, in an attempt to address all the needs of the business users. In other words, the goal is to transform any variety of data source into a single integrated information source that you can use for more intelligent business decisions.
What's encouraging to me is that it is not merely a conglomerate of quasirelated proprietary products, but rather an open solution in which you can use the most applicable or best-of-breed products. For example, replication is not always a viable data warehouse population technique because of the complex transformations you sometimes have to perform. Similarly, an organization doesn't necessarily want to store its warehouse data in DB2 UDB or access it through tools supporting the Essbase API. With this solution, you can use specific best-of-breed products. For example, Essbase, BusinessObjects, and Cognos's Impromptu and PowerPlay are, in my opinion, the most useful and most widely used data analysis tools. Similarly, ETI's ETI-Extract is a very flexible, extendible data warehouse loader generator. It is one of the few tools in which you can include customized transformations written in C++, C, or Visual Basic to perform more than the traditional aggregation and summarization transformations.
Using IBM's business intelligence initiative, an organization can now implement its data warehouse or data mart solution to improve its decision-making process using an intelligent mix and match of the appropriate products and system components without being forced down a particular avenue that it may not want to follow. The solution is closed enough to ensure proper integration of the data sources into the data warehouse, but open enough to let users pick the appropriate mix and match of tools to satisfy their information requirements.

Figure 1. Intelligent Miner for text.