The contemporary decision-support architecture (DSA), including data warehouses and OLAP tools, should be designed to accommodate the widest range of sources and uses of information. The kind of flexibility and speed demanded by today's business environments require that these architectures embrace change as a design principle, rather than trying to iterate to a set of fixed requirements. This acknowledgment of the sheer power of contingency is the driving force behind the explosion of data warehousing. The most tangible result of these efforts are the so-called "front-end" tools that face the clients (especially nontechnical end users), but unfortunately, the ultimate effect of these much-anticipated programs often leaves customers with the feeling of anesthetized anticlimax.
So what does all of this have to do with selecting the query and/or analysis tools for a data warehouse? Everything. In an earlier attempt to provide some framework for selecting OLAP tools in my article "Data, Data Everywhere" (Information Week, October 25, 1995, pages 60-74), I suggested that five qualities neatly partitioned the evaluation space: functionality, fit, performance, scalability, and future. Although much has changed in the past two years, those categories as axes of evaluation are holding up. Architectural issues are still necessary criteria for evaluation (it's rather pointless to choose a tool that cannot connect to your database, for example), but they are not sufficient. More than anything else, the often competing issues of performance and flexibility are the critical decision points.
In this article, I attempt to segment the offerings in this area based on how people use them (a cognitive approach). For example, when evaluating the issue of scalability from a technical perspective, we can make a pretty clear case that a two-tier SQL generator will hit the wall at some point when the number of simultaneous users increases. A cognitive approach deals with scalability from the application perspective: Can I expand my range of analysis from daily sales reporting to event- or alert-driven notification, without having to rearchitect the entire system? Although both approaches are necessary, the cognitive approach is designed to meet the requirements of the business.
IT-led initiatives provide invaluable ideas and assistance and are often the catalysts for productive change in organizations. The problem is that the nature of work today is too complex for simple solutions and rules of thumb. Worthy goals of standardized access, data quality, consistent development methodologies, and scalable architecture float to the top in the projects, but the business objectives are poorly understood and often missed completely. Content to decompose these complex issues into arbitrary categories of components (typically given currency by aggressive marketing of the vendors) -- such as data warehouse/data mart, MOLAP/ROLAP, relational/multidimensional, and thin/fat -- purchase decisions are based on these technical issues instead of their use and impact in an organization. Too many products, approaches, methodologies, and service offerings in this area are nothing more than repackaged jetsam of last year's latest thing. Because they aren't designed specifically to support a networked community of knowledge workers, the vendors create new categories of one and demonstrate how their product can fill it.
Any categorization of the market is arbitrary, because the tools are evolving rapidly. For example, Express (Oracle Corp.), Holos (Holistic Systems Inc., a subsidiary of Seagate Software-Information Management Group), Essbase (Arbor Software Corp.), and Pilot (Pilot Software Inc., a Cognizant Corp. company) have moved aggressively to position their multidimensional database (MDD) products as true hybrids, combining the features of a MDD with those of a relational OLAP (ROLAP) tool such as those of MicroStrategy Inc. or Information Advantage Inc. Also moving up the food chain are managed query tools such as Business Objects (Business Objects Inc.) and Esperant (Software AG of North America Inc.), that now offer some measure of desktop OLAP capabilities formerly the realm of PowerPlay (Cognos Corp.) or Brio (Brio Technology Inc.). The area is trendy, and new entrants are proliferating riotously. And, of course, everyone is waiting for the next shoe to drop, which is most likely to be a big entrance by Microsoft Corp. Microsoft acquired technology from Panorama Software Systems of Tel Aviv, Israel in the fall of 1996 and has been integrating it into what will be the company's first OLAP tool. Currently code-named Plato, the tool is expected to be released by the end of this year.
The technical perspective is concerned with issues such as data organization (relational, multidimensional, "other"), processing location (server-based, desktop, cooperative, n-tier, "fat," or "thin"), standards compliance (DCOM vs. CORBA, 3NF vs. Star Schema, Notes vs. Web, ActiveX vs. Java), and openness. This leads to distinctions such as MOLAP vs. ROLAP, "fat" vs. "thin" client, or "open" vs. "proprietary." Although these distinctions are useful for making decisions about the technical architecture, how they clarify the decision-making process for business purposes is less clear. In Table 1, I compare 10 differentiating categories. There is a yawning gap in the interpretation of apparently simple concepts such as openness, scalability, and security. Rather than inform and educate, the currently used terminology in IT just confuses businesspeople in the worst possible way -- it leaves them with the presumption of understanding where, in fact, there is none. I'll examine a few of these criteria in more depth.
What the "consumers" of these databases need is reliable, timely, clean, and understandable information that is presented through a medium that facilitates and enhances the consumers' ability to perform their jobs. Specifically, they require performance in loading, querying, and updating. The last aspect is especially worthy of note. Most data warehouse designs are serial and unidirectional, overlooking completely the value of data generated at the desktop, such as scenarios, what-if analysis, commentary, discussion, and even frameworks, models, and calculations. The notion that only operational systems capture data is a limiting factor for data warehouses, and most of the tools offered in the analytical space extend this notion. The only exceptions are the multidimensional databases, in varying degrees of strength.
Because of their positional arrangement of data, with relatively coarse indexing, it is rather easy to add or update data to most multidimensional databases, but it's correspondingly difficult to change the structure of a model. Table 2 (page 50) demonstrates that a "hypercube" design such as Essbase can accommodate desktop data input effortlessly; changing the arrangement of variables or structure can only be accomplished by an administrator. Gentia, with a more EIS-type of approach, requires much more effort to create the tableau for data entry, but its ability to construct new models on the fly gives it a real advantage. Holos combines the aforementioned elements of Essbase (namely its ability to capture values that fit into a preexisting structure) and Gentia (the ability to create fairly complex models on the fly) and can accommodate both types of updates nicely. In fact, its ability to separate "rules" from the rest of a structure makes it quite powerful as a generator and collector of on-the-fly scenarios. Express, based on a multicube model, is extremely flexible at viewing data but quite cumbersome at updating or calculating on the fly without extensive development in either the Express 4GL or Express Objects.
Relational databases, with few exceptions, can be modeled to accept input data effortlessly (the OLTP schema for transaction processing) or to provide good query performance (the "star" schema that is common in data warehousing), but not both. The approaches are mutually exclusive at this point, or else they require extensive behind-the-scenes work to perform. Because of their general-purpose approach, they provide no native support at all for modeling or what-if analysis. There has been some movement in this area, especially from Informix Software Inc.'s DataBlades and from Oracle, with impending integration of the Oracle database server with the Express OLAP server.
The sublime irony is that the evaluator is often a serial offender who most likely draws insight from the same well that brought you the DP backlog, the not-invented-here syndrome, and the Year 2000 problem. How something so central to the well-being of a firm as DSS can remain such a mystery to IT practitioners is beyond my understanding.
Performance should be defined in terms of output, not throughput. To businesspeople, efficiency of the solution has nothing to do with machine throughput, storage density, cache hits, or clever use of threads. Instead, the desired effect is to improve turnaround time, create a more fact-based domain for decision making, and turn personal discovery into organizational learning. Slice-and-dice and drill-down tools are only a piece of the puzzle. The different perspectives provide a good example of "doing things right or doing the right thing." Unless the tools can contribute to a meaningful change in the nature of the work, they accomplish little in the long run.
One of the current industry buzzwords is "closed-loop decision support," which is meant to describe the interoperability of analytic processes with operational ones -- for example, the ability to explore a customer database to uncover desirable (or undesirable) attributes and to immediately dispatch an action, such as creation of a mailing list. In our practice, we consider all decision-support systems incomplete unless they can "close the loop." Analysis of sales results is only useful to a point. The engine of value creation in DSS is the linking of the analysis to discussion, consensus, action, and results in a systematic way. Only those tools that can cooperate with workflow, the new "push" technologies, and alternative access methods and operational systems are capable. Single-platform technologies; closed, proprietary front ends; arcane 4GL languages; weak platform support; and obsolete architectures are the enemy. Pretty screens, nice charts, simple navigation, and adherence to "standards" can easily divert your attention.
In the thin OLAP model, with each request for data -- whether an original template or a drill, pivot, or slice operation -- a new request is dispatched to the database for processing, and the resulting data for display is returned to the client. This avoids building the "cube" on the client completely, requiring instead that the database perform the joins, aggregations, and calculations, returning only the final result for display. DSS/Agent (in two-tier mode) is a good example of this arrangement. Obviously, the "fat" OLAP model can look very much like this when the query is simple, but the difference between the approaches becomes more marked as the magnitude of the sampled data increases.
If viewed from the technical perspective, fat OLAP may look like a better bet with a growing community of clients, relieving the database server from escalating loads of complex queries. In fact, the criticism most often applied to the thin OLAP model is that it can crush the database over time. But this analysis is too limited. The fat OLAP model is very useful for certain kinds of analysis, but it falls apart completely for some very common problems. Unfortunately, the distinction between these types of queries is subtle, and the lack of understanding stems from taking a purely technical perspective.
In practice, many seemingly innocent queries cannot be run in the fat OLAP model. Regardless of the size of the final result set, any analysis that needs to "touch" a great deal of data to reach a conclusion will create scalability problems, because all of the data must be transferred to the client first. An example of this is queries that compare variable time periods, where selection is based on the existence of a record, such as, "Show the growth, by product group, by territory, during the last two Christmas selling seasons, for only stores that carried the products in both periods."
Figure 2 depicts the more robust three-tier OLAP models. The most common implementation is the fat/thin model, employed for the most part by hybrid OLAP tools (HOLAP) such as Holos, Express, and Pilot. Hybrid OLAP creates multidimensional models on the fly from data stored in relational databases, combining aspects of ROLAP and MOLAP. Other MOLAP tools, such as Essbase, can extract data from relational databases as source data, but all reading and writing from the Essbase "cube" must be from a persistent Essbase database -- thus I can't include it here in the hybrid mode. However, if we used a middle-tier server to build the cube from an Essbase database, the combination would qualify as a "fat/thin" model.
The client in this arrangement sends thin requests to the server, which dispatches "thin" SQL (namely a simple select statement) to the database, returning large results of usually raw data. The fat OLAP server performs most of the calculation and filtering work, and requests for information from the clients are dispatched by the OLAP server to the database server only as needed.
In the thin/thin model, the thin server creates complex requests that drive the database server in precisely the same manner as the two-tier thin OLAP model. Although caches of result sets may be stored on the thin OLAP server, most data requests flow back to the database engine. ROLAP tools use this model for the most part, although Information Advantage's tools can be tuned somewhat to act as either a thin or fat server; the same is true for Platinum Technology Inc.'s InfoBeacon product. Only MicroStrategy's DSS/Agent and DSS/Server suite adhere strictly to the thin/thin model. A new entry on the scene, InfoSpace Inc., appears to work in the same way, providing a full set of OLAP functions through a 100-percent Java implementation on both server and applet. (For more information on InfoSpace's SpaceOLAP product, see this month's review on page 27.)
The issues in thin/thin vs. fat/thin are somewhat different from those in two-tier mode. The fat/thin model allows for much more complex modeling, rather than relying on the limited capabilities of SQL, although MicroStrategy has some ingenious ways to circumvent this problem. In general, however, fat OLAP servers add capabilities that can't be easily duplicated by thin/thin models without further integration at either the server or client level. But the same scalability problems apply in three-tier systems that hold in two-tier: If your application requires the manipulation of much source data, fat/thin architectures have built-in overhead that is absent in the thin/thin models. Again, make sure that you understand the full range of analyses that are required. Underestimating this is a major cause of failure.
The second factor affecting the query and analysis tools market is the rising popularity of data mart suites. These bundled packages of software, hardware, and services will take a major part of the market. This will tend to put pressure on many vendors to join a coalition, but the lower margins will not be able to support their direct sales forces, and many vendors will disappear or be merged into larger organizations. Pay particular attention to the midterm viability of any vendor you are evaluating.
The newest, most promising technology is the "push" broadcasting started by PointCast Inc. Some industry analysts estimate that push technology will account for $6 billion per year by 2000, or almost one-third of the entire Web market. Decision-support applications are particularly well suited for this approach, with the ability to integrate OLAP applets, discussion groups, customizable agents, alert reporting, and, finally, a way to integrate operational and analytical workflows in a single network view. Pay close attention to vendors that are making progress in integrating with these technologies.
Organizations that foot the bill for all of this have not been served well. As happened to other good ideas before it, data warehousing is in danger of losing relevance unless it can produce sustainable results. Decision support requires a focused view on businesspeople, not just processes. IT is generally more skilled at specifying and developing systems that perform predictably under a finite number of conditions. When this approach is used in decision support, the results are always suboptimal. These highly structured solutions, though derived of good intentions, leave end users, in the etymologically radical sense of the word, a-stone-ished rather than empowered.
Taking a cognitive perspective will pay dividends by helping you make the right choices in query and analysis tools. The even more valuable byproduct will be your improved insight into how your organization really works.
| TABLE 1. Differentiating Categories for Query Tools | ||
|---|---|---|
| Technical Perspective | Cognitive Perspective | |
| Database Organization | Relational vs. "proprietary" | Reliable, understandable, available |
| Resource Efficiency | Impact on hardware/network resources | Impact on time and ability to complete work |
| ROI | Reducing the workforce | Reducing the workload |
| Design Criteria | Certified methodology | When can I have it? |
| Openness | Connection to other "approved" tools | Connection to the business |
| Features | Parallel load, multithreading, cache coherency | Ability to print a report, work remotely, export a spreadsheet |
| Conformance | Adherence to "standards" | Flexible enough to keep up with the business |
| Scalability | Able to scale in volume, concurrency, and complexity | Breadth, extensibility, and power of the tools |
| Security | Sealing from prying eyes | Work won't get lost or rendered useless by changes |
| Effect on Work | Empowering "users" to write their own reports | Connecting data to the rest of the work |
| TABLE 2. Interactive Input Characteristics of Databases | |||
|---|---|---|---|
| Input "Facts" | Input "Structure" | On-the-Fly Scenario | |
| (Arbor) Essbase | Excellent | Poor | Poor |
| Gentia | Fair | Good | Good |
| (Oracle) Express | Fair | Poor | Fair |
| (Seagate) Holos | Excellent | Good | Excellent |
| Relational - "Star" | Poor | not supported | not supported |
| Relational - OLTP | Excellent | not supported | not supported |

