
Web browsers can lower the cost of delivering information to business decision makers.
Corporations recognize that information placed in the hands of decision makers is a powerful tool. To meet decision makers' nearly insatiable appetite for information, data is being extracted from operational systems and placed in data warehouses. Data warehouses contain historical data organized by key business dimensions. For example, a data warehouse for a retailer contains daily product sales for each store. A data warehouse for a bank contains customer information for each bank service. Each warehouse summarizes individual transactions into time-series data for monitoring and analyzing performance.
Delivery of data warehouse information to decision makers throughout the enterprise and around the world has been an expensive challenge. Once data has been extracted and organized for user access, analytic software must be loaded on each user's PC, users must be trained, and ongoing user support staffs must be recruited. User requirements and even the users themselves change constantly, resulting in a significant support burden. The World Wide Web offers a solution. In addition to simplifying the deployment of data warehouse access, an Intranet can introduce a new level of collaborative interactive analysis and information sharing among decision makers.
Most Intranets currently manage unstructured content -- text, image, and audio data types -- as "static" HTML documents. A data warehouse stores structured content -- raw alphanumeric data. With the right tools and the right architecture, a data warehouse can be made accessible over an enterprise Intranet, forming the basis for a comprehensive enterprise information infrastructure. (See Figure 1, page 36.) There are three important advantages of such an infrastructure:
* intranet economics
* information integration
* user collaboration
The cost of client/server computing is high when communications, support, and other hidden costs are considered. In fact, an often-cited Gartner Group study ("A Guide for Estimating Client/Server Costs," by K. Dec, Gartner Group, Stamford, Conn., February 28, 1995) suggests that client/server computing is more costly than mainframe computing. Certainly the personal computer has become bloated with processing power, memory, software, and user-managed files of considerable proportions. The result is a "fat" client architecture.
Intranets will change the economics of supporting a large population of knowledge workers. An Intranet is a "thin" client architecture. Not only does an Intranet reduce communications costs, but some speculate that the personal computer may be replaced with a low-cost Intranet device. Whether corporations will indeed replace existing PCs is debatable, at a minimum an Intranet will increase the life expectancy of the latest round of PC upgrades.
The thin client model requires server distribution of applications software. This is where Sun's Java plays a role. Java allows software to be served to an Intranet browser in code fragments or applets. The only portion of the application software that needs to be installed on the client is a browser such as Netscape Navigator. And application software is acquired only when needed for a specific application. Application software licensing will shift from per seat pricing (licensed for each personal computer) to a server-centric licensing model. One possibility is metering that tracks the number of users that download an applet or some other measure of usage. Another possibility is the use of a tiered pricing model tied to server size. The result is likely to be another round of competitive pricing battles. The economics of an Intranet are lower communications costs, less expensive thin client hardware, and, in some cases, reduced application software licensing costs.
One of the most valuable assets of the enterprise is the operational data used in managing day-to-day business activities. This numeric data provides frequent measures of performance. By developing a data warehouse, corporations are organizing the data in a way that makes it useful to decision makers. When a data warehouse is put on an Intranet, users can toggle between structured data analysis (producing reports in columns and rows) and unstructured browsing. One software application can be used to view data both ways. The marketing manager at a retailer can display on his or her screen both an advertising image and a report on the sales of those products featured in the ad.
An Intranet is as much about communicating at an interactive level as it is about making structured and unstructured content easily accessible. Few people would disagree with the view that decision making improves with timely, accurate, and complete information. An Intranet influences how the ideas and experience of a workgroup are exchanged as part of knowledge sharing. The experience and new ideas of users form the basis of rapid-fire questions. This is the analysis and problem resolution process. Unstructured and structured content searches provide answers to questions, but answering one question inevitably leads to more questions. The quality of questions and the completeness of the answers is the basis for the best decisions. The key benefits of an Intranet are information-enriched communications and collaborative problem resolution. An Intranet facilitates these capabilities on both a workgroup and an enterprise scale that is not possible under the communication constraints of a regular LAN-based application.
Today, most users can communicate via a corporate email system. While an email system allows text files to be exchanged, it does not facilitate true collaboration. Lotus Notes is one step closer to a collaborative method of exchanging valuable information, but the focus is still textual file sharing. True collaboration requires interactive sharing of information in such a way that the recipient can continue an analysis or branch off in an entirely new direction without assistance. For example, if I receive a report from another Intranet user, I should immediately be able to drill down or drill up on any report dimension, pivot and rotate the results, add additional calculations as part of my analysis, and then pass my work to others in the organization -- this requires dynamic report creation based on data stored in a warehouse.
True collaboration for business decision making requires a higher level of interactive analysis and knowledge sharing than exists today in most text-oriented groupware products. Users need to dynamically explore the data warehouse and freely build on each other's analysis of a business issue, jumping to structured content searches at any point in the analysis process.
Data warehouses employ relational database management systems that use SQL to retrieve rows and columns of numeric data, while unstructured content is managed as HTML documents. The challenge in putting a data warehouse on an Intranet is in properly enabling SQL data warehouse access from HTML browsers. Four application software services must be provided:
* analytic layer
* file management
* security
* agents
Analytic Layer. Putting structured data content on the Intranet requires a server-resident analytic layer to generate SQL on the fly, perform computations, and format reports based on user requests. In essence, a specialized structured content Web server is required to support data warehouse access from an HTML browser client-initiated request. Often, this Web server will be the same hardware platform used to manage all or a portion of the structured content database. The Web server for structured content must be configured to support the higher processing loads of a robust analytic layer. The analytic layer will typically make heavy demands on a relational database layer and the number of queries and results communicated between these layers will be large; therefore, there should be a high-speed network connection between these layers or they should reside on the same machine. This processing capability could be supplied, for example, by a symmetrical multiprocessor (SMP) configuration with enough memory to minimize virtual memory I/O operations. Due to the variability in the sizes of the data sets being analyzed, these sizing parameters should be determined empirically via benchmarking.
The analytic layer shares some capabilities with spreadsheet software. The power of a spreadsheet is derived from a user's ability to author custom calculations based on facts stored in cells in a spreadsheet. For example, facts (numeric data) are stored in two dimensions: letters A, B, C, D, etc., and numbers 1, 2, 3, 4, etc. Combining the two dimensions, A1, provides a unique address. This unique address can be used to create a formula for a required calculation, A1-B1. The number of calculated rows and columns in a spreadsheet application often far outnumbers the number of stored facts.
A spreadsheet provides two additional powerful capabilities. First, users can replicate formulas for calculations down a column or across a row easily. Second, the calculation logic is automatically updated if new rows or columns of data are inserted into a spreadsheet.
A data warehouse has multiple dimensions -- product, market, customer, outlet, vendor, period, etc. -- as opposed to a two-dimensional spreadsheet. The combination of values for each of the multiple data warehouse dimensions provides a unique address. The analytic layer on the structured content Web server allows users to apply calculations based on database dimensions to create more useful reports. Furthermore, once authored, calculations can be shared with other users much like calculation formulas are replicated within a spreadsheet. And, the calculation logic is maintained as the data warehouse is updated each day, week, or month.
Like a spreadsheet, reports that users request from a data warehouse often contain more calculated rows and columns than raw data. Without a robust analytic layer in front of a data warehouse, the user is limited to a simple listing of stored data elements. The analytic layer is key to addressing business questions that users must answer.
File Management. Collaboration requires interactive analysis and knowledge sharing. A report requested by one user is valuable when it is shared with other users to gain their insight and ideas. The recipients should be able to continue the analysis initiated by the original author. In this way, the analysis process becomes an interactive exchange. Many users can pursue different analysis paths from a common starting point.
To meet the challenge of providing interactive analysis of a data warehouse, users must have access and be able to change their copy of the logic used to create the report. Users must be able to access public files and manage their personal files over an Intranet connection. A sophisticated server-based file management system is required to support user collaboration and maintain security at this level.
Security. The liberal sharing of information and collaboration among users on an Intranet immediately raises data security issues. A data warehouse contains highly confidential performance data for the entire enterprise. Only a few users have security authorization to view data anywhere in a warehouse. Most users must be provided access only to a relevant portion of a warehouse. Data must be secured, but if too tightly controlled the value of the warehouse will never be fully realized.
The security issues are indeed complex. To illustrate, a sales vice president has authorization to view financial data at a national, regional, and sales territory level. At the territory level, the financial data would include all salary data for sales representatives. If the sales vice president creates a report and decides to share it, the regional managers should have access to only territory information for their region, and be blocked from accessing territory information for other regions. In other words, if a user is not authorized to receive and access a report, then that user must not be able to view the report or drill into areas where the user does not have authorization.
Again, there is a subtle capability that is being described. Reports that are created from data stored in the warehouse should not simply be shared as text files. All reports should include the underlying logic, giving the recipient the ability to immediately analyze and modify the report, as well as the logic and assumptions supporting the analyses. Before a recipient can view the report or build upon the analysis, the authorization level for that user is verified. For effective collaboration, reports must be shared throughout the workgroup and enterprise. If the recipients are not authorized, their access to report logic should be denied.
Encryption of data can provide a higher level of security than is generally available for business applications. For example, utilizing the Netscape Secure Socket Layer (SSL) and a Commerce Server, data passing between the client and server can be encrypted. This enables business users to run important applications over unsecured communication lines without worrying about an intruder tapping into the network and viewing the transmitted information.
Agents. One of the common complaints about email and even voice mail is that a mail box fills up faster than a user has time to isolate and address the really important issues. Agents are intended to work on behalf of users to isolate important information sought by a user. An agent can be triggered by some predefined event or at a specified time interval. The agent sends an alert to notify specific users on a "need-to-know basis." Agents must have the ability to run continually as background processes on an Intranet logic server because each user is almost always disconnected, a result of the stateless nature of Web servers. This provides a means of automating the routine analysis process. And, when users sign on to the network, the agents must be smart enough to notify users of conditions that occurred while they were disconnected. Because data warehouses tend to grow exponentially, it is critical that agents proactively monitor and manage activities, alerting decision makers only when specific conditions exist. It is unrealistic to believe that decision makers could be productive by aimlessly data surfing through potentially hundreds of gigabytes of data looking for valuable insight. The decision maker should be free to concentrate on the immediate and critical issues while the system ensures that developing conditions will not go undetected.
An example of the use of triggers, agents, and alerts is a forecast accuracy monitoring application. When each week's actual sales are updated to the data warehouse, the system automatically calculates the mean absolute percent error between forecasted and actual results over the last six weeks for every product. The mean absolute percent error is a simple way of representing forecast error over time. An alert is sent to each marketing manager responsible for the product forecast if a threshold for the mean absolute percent error calculation is exceeded. The marketing manager can then take appropriate actions to avert out-of-stocks or excessive inventory buildup.
The Common Gateway Interface (CGI) facility of Web server software provides a method to execute server-resident software. Building secure applications for an Intranet requires a well thought out security strategy, as well as the appropriate application architecture. Most Web applications provide all users with the same access permissions to the reachable files on the server. It is certainly possible to send a DBMS query from a Web browser and enforce any DBMS security just as if the query came from a traditional LAN-based client application. The maintenance of security for each report application and user creates a significant burden. This approach is best suited to simple requests such as a query to determine a user's current credit card balance.
Business users require a system that maps them to their server account by verifying user names and passwords. When server applications are run, they will have access to their files secured by user, group, and permission levels. The same issue exists with database security. Users must be mapped to the appropriate database user or group in the relational database in order to control the data that a user can access. And, because the number of users may be large, the administration of the security system should be centralized at the server and minimized to the extent possible.
A second issue with the CGI interface is that it does not offer a continuous connection to the database. As a result, it is impossible to support an application requiring multiple interactive queries -- a data warehousing requirement. One approach to solving this problem is to employ a message-based protocol between the client browser and the server-resident analytic layer using the CGI. By mapping a user to a server account and starting a Unix process that executes as that user, a continuous connection is maintained between the logic layer and database during iterative queries over the lifetime of that process. This facilitates the execution of efficient SQL query strategies and computational routines to meet the user requirements for structured content analysis. For example, an HTML form can request that the user enter their Unix username and password and database username and password. Using Netscape's SSL facilities, the passwords are then encrypted before being transmitted over the network. This information is then passed as parameters or through environment variables to a CGI program, which could be a C, C++, or Perl program. That software then verifies the Unix username and password via operating system security features and starts a process that executes as that user. This process could then connect to the relational database management system (RDBMS) using its API (for example, DBLIB from Sybase) and the supplied username and password. Once this connection to the RDBMS is established, many queries and result sets can be processed. A final output report can be generated, converted to an HTML document, and sent back to the client browser for display to the user.
The personal computer era transformed a generation of users into electronic "knowledge seekers." As knowledge seekers, users sought to load their personal computers with as much data and software as possible. Users wanted to be self-sufficient analysts; in fact, they had to be self-sufficient.
The network computer era allows users to evolve into "knowledge sharers" and emphasizes the powerful advantage of collaborative problem resolution. Knowledge sharing requires the free flow of all types of information among users, not just text file transfer, but an interactive data analysis capability that encourages the exchange of experiences and ideas.
An Intranet should be the basis for rethinking the enterprise information infrastructure. By putting a data warehouse on an Intranet and deploying structured content web servers, organizations gain economic and rapid application deployment benefits. More important in the long term is that users will collaborate more freely on an Intranet, hopefully resulting in thorough analysis of business issues, free flow exchange of experience and ideas, and faster competitive response. Today's corporations want to have superior knowledge. An Intranet is where knowledge is placed in the hands of decision makers. An Intranet is far more than a place to manage email and text file management, it is the infrastructure for a comprehensive decision support system.
Richard Tanler has over 25 years of data analysis and decision support experience, having held executive positions at Pepsico, McKesson, Information Resources, and Metaphor Computer Systems. He currently serves as chairman of Information Advantage Inc., headquartered in Minneapolis. Richard is a frequent speaker at data warehousing trade shows and seminars. You can email him at Rick.Tanler@infoadvan.com.