
Lessons learned while building a large commercial web site with immature tools.
Strategic Concepts has an interesting charter: develop a Web site for the insurance industry (http://www.insweb.com) that can be used by both the industry itself and by consumers to do research and to buy insurance. At Strategic Concepts, we now have over 40 people devoted to this single Web site. Since May 1995 we've learned a lot about which tools are available for developing a Web site, which tools actually work, and which tools are just not useful at all. We've also learned that the Internet is just barely ready for commercial use with some gaping holes and some usability issues that we had to overcome. This article summarizes some of the best and worst lessons we've learned in developing the InsWeb site. In the tradition of the Internet we are distributing (free!) one of the most useful software tools we had to develop to ensure the success of the site.
The Internet offers the insurance industry two alluring possibilities: It is an effective medium to reach millions of customers in a cost-effective way, and it is perhaps the most effective medium ever devised for communicating such a complex subject comprehensively. The industry immediately understood the advantage of having customers key in their own data, which is captured electronically and can be transmitted through the entire insuring process, thereby eliminating errors and rekeying and ensuring accurate quotations. Less clear to those never exposed to the Web is the idea of using hypertext documents to clearly explain a subject like insurance, which has so many permutations and combinations. However, a simple demo of the power of hypertext to explain insurance coverages has fired the imaginations of both our clients and our site visitors.
Building upon these two commercial advantages, and the obvious strength of the Internet to publish material effectively (as has been testified to by the millions of URLs devoted to publishing), we started developing the InsWeb site and immediately faced the dilemma all site developers wrestle with today: For what level of HTML do you program? The lowest common denominator is HTML 1.0, but it is very restrictive. HTML 3.0 isn't totally defined or accepted yet, but Netscape is supporting it, and it is much richer. We opted for HTML 3.0 on the theory that we couldn't have mediocre-looking Web pages representing the insurance industry, and we believed that 3.0 extensions would be generally available soon and we would have to change page looks anyway if we didn't use the latest standard. Today that decision seems to be a good one.
Using the Web for commercial purposes assumes a level of user interaction that new products such as Java and Microsoft's VB Script are attempting to provide. Existing browsers are very restrictive at the client site, and the HTML form protocol works very much like a 1960s-style dumb terminal interface rather than an interactive tool. Further, to solve the scripting problems of multipage forms and multiform pages in HTML, or to solve the problems of error correction and form validation, the software tools that we have evaluated so far are mostly toys rather than usable industrial-strength products.
Designers of commercial Web sites today must live within the strict boundaries imposed by Web protocols. We are a commercial site that has to program to the popular publicly used browsers. HTML 2.0 (or 3.0 for that matter) is already widely disseminated and we feel that we can use this and still reach 80 percent of our audience. That's the state of the art for an extended period of time when programming for a broad public audience.
In an HTML-based Web application, the user completes multiple fields in a screen form and submits the form to the remote server. A script in the server accepts the data and returns another HTML page, which may contain another form. This mode of operation severely limits client interaction.
Adding to the burden, the HTML protocol does not maintain a context from page to page. Applications that require multiple entry or information screens must establish their own session mechanism to maintain a session across multiple pages. Unfortunately, HTML form-based applications are the only choice available at this time.
With multiple page forms, a data set must be created or added to previously entered data stored on the server. Data entered at one stage in the process may be required for display in subsequent pages. Additional data may be generated by a script or application based on previous input and displayed in a subsequent page. Examples of this type of page include price quotes based on a previous entry or search results. Obviously, such a page cannot be loaded directly from disk; instead, a script must create the page with the appropriate data included. Further, the application needs a mechanism to allow the server script to verify entries and to ask for corrections, interrupting the originally scripted flow of interactive entry.
Our first solution was to use a Perl script that contained an HTML page. Perl variables embedded in the HTML text are resolved when the page is generated and sent to the client browser. This technique is effective and easy to implement, given Perl's powerful text-processing facilities. However, it required changes to the script to implement even the simplest modification to the HTML page. A second disadvantage of Perl is that it does not lend itself to foundation development, being a fairly restrictive language in terms of its overall capability. Yet another concern is that Perl does not have flexible database connectivity built in.
We also learned that Perl is a somewhat complex language that the usual HTML page designers couldn't readily maintain. With a site projected to grow to thousands of pages, with many pages changed frequently, we felt that it was not cost effective to turn script maintenance over to page developers who had to have a programming background. Furthermore, we determined that Perl would not serve as a solid platform for developing complex applications, particularly those based on a database.
Our current solution (see Figure 1) is to separate the HTML page design from the scripting by using HTML templates that can then be created and maintained separately from the processing script. An HTML template consists primarily of the normal HTML constructs, with the addition of special tags. These tags are translated by a special server program at runtime into a standard HTML formatted page, based on a data set and a class of translation routines. Programmers are necessary to maintain the server script itself, but page designers can maintain the pages by working with the skeleton template and inserting special tags.
This solution has allowed us to establish the concept of a session, that is, a sequence of pages that interact with the client based on a script that is controlled at the server. The session manager, in script form, can decide which pages are appropriate for sending, validate user entries, build new pages to correct entry errors, and build up a database of entries from multiple Web pages for further processing.
Implementing sessions has allowed us to manage complex user interactivity under scripting control while still maintaining flexibility in Web page design and keeping our page development costs under control. Without session management we would have had to assign the more expensive programming staff to maintain the routine HTML code and page design. Other important benefits accrue from the use of a session manager that are not readily apparent.
To use an example from our site, suppose that clients filling out insurance applications are asked to identify the insurance coverages they require. If a client needs to research a coverage before answering (for example, when applying for an auto insurance policy, a client may need to know what comprehensive coverage a policy covers), then hypertext links to the appropriate explanations and articles within the site are available. The difficulty is that if the client was to take too many hypertext links without bookmarking the return-to page, the browser cache might not retain the return location and the insurance application data entered up to that point is lost. But with the session manager concept, we have the ability within the script to return users to the page from which they started their research. The only restriction imposed by this type of session management is that users must remain within your site because crosslinks taken to other locations defeat the ability to manage the session.
The InsWeb security system (see Figure 2) is designed to build a higher wall than exists in the usual installation. Interestingly, the security system is in place to protect the already publicly available pages. Insurance companies want to make sure that their published pages are not subject to tampering.
Intelligent agents and other interesting features are also possible with session management. Within InsWeb we are developing the agent concept in a variety of ways with special scripts that are designed to give the site real industrial-strength interactivity using existing HTML constructs. (Java could give us that capability today, but until the Java browser capability is widely deployed it cannot be used effectively in a public, commercial site like ours.) There are other tools available at some cost, but we have found that, for the most part, they are inadequate and inflexible when compared to our approach.
We looked at a dozen or so different shareware tools and a few other commercial products. We even liked many of them, but couldn't use them because they couldn't be integrated into our system and were inadequate as standalone tools. We found that virtually all of the tools were released too quickly (they had bugs), they couldn't do everything we needed, or they would cost too much to maintain for commercial use. We also found that many products were written for academic use (primarily page publishing), and they didn't have decent error correction facilities. We weren't just interested in developing our own thing, we felt we were forced into doing it.
Although we are working on more capable session management ideas and intend to use whatever new tools become available, the session manager we have now is currently the best solution and we have decided to make it available free of charge to Web site owners. You can download it for use only on your own site. Please do not resell it without our permission. If you wish to download the session management software go to http://www.insweb.com/session.
An additional sidebar available on the Internet Systems Web site (inswebsb.html) describes how we are using this session concept and includes some sample code showing a simple interactive application. One of the important features of this software is that it maintains a method of identifying the end user throughout the session. A session numeric ID is assigned to a particular user and is passed by the manager from page to page. When the script receives a block of user data, it uses this session ID to find the existing data set for the session.
We believe that the Web has very sophisticated publishing tools, but tools for building interactive applications are not yet up to the demands of a commercial site. And, we are not convinced that solutions currently in development are going to be sufficient in the near term. Our biggest fear is that the new languages such as Java and Microsoft VB Script will create complexities that are expensive to support.
For now, we are working with our own tools and scripts and find that we have been able to develop the InsWeb site. As we write this in February 1996, the site has over 2000 Web pages of facts and research material, and scripts that can generate literally thousands of additional Web pages on the fly. The InsWeb site not only offers industrial-strength interactivity and security, it can also be supported cost-effectively as well.
Darrell J. Ticehurst, a director and president of Strategic Concepts Corp., has extensive experience in software development, marketing, and communications. He also serves as chief technical officer of Strategic Concepts. Keith D. Hilen, manager of Interactive Programming, is in charge of scripting for the InsWeb site. He has 17 years of experience in software development and is a specialist in networking, communications, operating systems, and systems programming. You can reach both Darrell and Keith via telephone at 415-373-0200. You can also contact Darrell via email at darrell@insweb.com and Keith at khilen@insweb.com.