DBMS
 

 

A New Era of Document Management

By Stewart McKie
DBMS, June 1995

Here's a look at the latest techniques for managing documents in a client/server environment.


It is a sure bet that any business in operation today can benefit from document management--even those with the most advanced computing and communications infrastructures. The reason I would bet my money on it is because the term "document management" is a misnomer, a term of convenience, a link with the past. You should really think of document management as "electronic transaction management." In fact, the goal of document management software is to avoid the need for a tangible document to exist at all, by providing the tools for managing structured digital information. Once information is digitized, it can be manipulated in a variety of useful ways.

In this article I outline the basic principles of document management technology, and then discuss the key types of business documents and the ways in which current software tools can manage them. I also review how certain client/server-based products are providing a range of document management facilities. Finally, I provide a checklist of vendors and products in this marketplace to highlight some key players. (See Table 1.)

Document Management 101

Leaving aside true electronic transaction processes such as electronic funds transfer, electronic data interchange, and other software-initiated documents, document management begins with a source paper document. A sequence of events follows:
  1. Document input--scan or digitize the document using imaging and optical character recognition (OCR) technology. Tag and index the document data.
  2. Document management--manage the document information itself and the document workflow via rules, routes, and roles. Documents can be retrieved and annotated at this stage.
  3. Document output--archive the documents to other media for long-term storage.
A typical paper document has three key events: receipt, review, and ready-to-file. These events represent the document life cycle in a nutshell. The transitional document life cycle or workflow of a document takes place in the review stage. This may involve moving the document from an in-box, then stamping, annotating, and linking it to other attachments. The document life cycle may also involve incremental changes and additions. In any case, the document is eventually batched with others and archived in filing cabinets or on microfiche. Document management is all about managing these events and life cycles via software.

Remember the Paperless Office?

The paperless office concept is still far from a reality, and does not truly convey the scope of the change that sophisticated document management can provide to a business. Certainly, the idea and benefits of the paperless office remain compelling: The paperless office is cheaper, but electronic transaction technology is both faster and smarter. An electronic transaction (a digitized document) can move much faster through its life cycle than any paper equivalent. At the same time, it can allow more user interaction, be subject to a more rigidly controlled rule set, and automatically trigger related processes--advantages beyond the scope of a paper document. In addition, the only human interaction required is opening an envelope and scanning the contents into the document management system. Thus, it's easy to see how document management gets you the faster part of the equation. Smarter is where the real action is.

The World of Documents

Documents can originate from a variety of sources: traditional paper documents such as letters, invoices, orders, checks, and other structured business forms. Today, many documents originate from electronic formats such as fax, e-mail, and images or data keyed into database, word processor, and spreadsheet files. No matter where the document originates, the first priority of document management is to get it into a database-- whether a relational or object database. Only then can you intelligently manage the document data. As a result, one of the biggest challenges facing document management vendors is providing a standard way of accepting document data from all of these source "spokes" and integrating it into one "hub" database for ongoing document management.

Another challenge is categorizing documents into types in order to determine the characteristics and behavior of a document and the tools required to manage it. I discuss the following four document types: request, informational, living, and business-critical.

Businesses receive requests every day for information. The documents generated from this information must be dealt with by certain people, they are subject to certain rules and response times, and they require some form of response. Because these documents are voluminous, regular archiving is important.

Businesses are run on informational documents: internal memos, reports, charts, plans, specifications, and so on. These documents demand circulation lists, and they may allow others to annotate or attach to them. In addition, they may require multiple versions to track their evolution. They are subject to frequent recall during their short life cycle, and, as with request documents, the large volume makes archiving important.

A living document is a specialized variant of information, usually in the form of a plan, engineering diagram, architectural drawing, or work of digital art. For these documents, it's important to see subsections, view layers of detail, track multiple versions, limit access, and audit the interaction with the document.

Business-critical documents (such as invoices, checks, orders, and bills of lading) are specific to a business or department, and they initiate an important business process (usually involving money). These documents require detailed analysis to identify who originated them, what they are for, and how they should be handled. They may also require complex routing and authorizations depending on their content. Also, their workflows must be audited and their data must be available for online inquiries.

Key Features

These are only a few of the many types of documents. From these, we can extrapolate some of the features that document management software should offer in order to deal with all but the most specialized documents.

Tagging. Some systems can scan documents and automatically extract key data from the text. They can do this either by searching for data in certain sections of a document or by reading a bar code. However, most business documents do not use standard formats or provide bar codes. Therefore, a document management system must have some kind of "tagging" facility for new documents. This facility tags a document with key data, including its type, its origin, its key piece of information, its recipient, and the time and place of its arrival. Thus, if a bank receives an overdraft request, it might be tagged with IDs to identify:

Another common approach is to tag a document with a longer, descriptive file name, as well as information about the document content, author, and statistics (word count and so on). These attributes are often stored separately from the document itself and indexed for faster retrieval.

Security. Because many documents contain sensitive information, security is a must. Security should include controls within the document manager itself and controls to prevent access from third-party applications. In highly sensitive activities, you may need controls to prevent direct access to document files at the operating system level. Such controls might include encryption and sophisticated compression/decompression algorithms.

Organization. Document managers should be able to provide a logical view of a document library. This makes working with those documents easier and more intuitive for users. In many cases, this means using the familiar concepts of files, folders, and cabinets as a means of graphically displaying document organization. Alternatively, drill-down browsers manage hierarchical and multiversion documents in an effective, visual manner. Document organization also means that storage profiles can be defined to ensure that the system can make organizational decisions without forcing the user to determine where a document should be stored.

Searching. One benefit of electronic documents is that they are easily accessible on a LAN or WAN. Document search and retrieval should allow searches based on file names, logical operators, ranges, and matches to specific keywords. It should also support searches based on content profiles that retrieve documents according to relevancy to the search criteria, and then rank the results based on the relevancy or priority of the document content.

Versioning and Check-In/Check-Out. Documents that require multiple iterations or have complex life cycles need versioning tools. These tools logically link multiple iterations of the same document. Sub-versioning within a document may be required for separate elements of the content. Also, for project-management purposes, you may require some means of auditing and reporting the versioning history of the document. The display of versioned documents requires more sophisticated display techniques to highlight changes (via red-lining, for example). Also, the ability to check in or check out whole or partial documents is invaluable for complex documents. For example, remote users can check out documents for manipulation on their laptop, and then check them back in when they have completed the changes.

Annotating and Attaching. It is often helpful to annotate a document or link it to other attachments after it has been scanned into the document manager. For example, a scanned invoice may contain a highlighted message about new pricing discounts, and a manager may wish to attach a note to the document to suggest that the vendor be contacted to discuss the new prices. Annotations may be graphical or textual. Attachments may be notes or other logically linked files (for example, you can attach the image of a check to an image of an invoice).

Accounting and Auditing. Documents that are literally "pieces of work," such as advertising copy or a product design, require accounting and auditing tools. Accounting lets you track and estimate the cost of document access. Auditing provides a means of knowing who accessed the document, when, and for what, in order to track the document.

Workflow. Many business documents have workflows that depend on the Seybold Group's three Rs of workflow: rules, routes, and roles. Rules let you define how the document is managed within its business process. Routes define where the document is routed in an organization in order to be processed effectively and efficiently. Roles determine who can interact with the document during its workflow, and what authority they have over the document content.

Archival. Once a document has completed its life cycle, the document manager should include tools for deleting documents based on user-defined criteria or archiving documents to off-line storage devices such as optical, tape, or CD-ROM media. Ideally, you should still be able to retrieve even off-line documents from the document manager. Auditors and the IRS have a nasty habit of wanting immediate access to data long since archived.

Open Architecture. All documents should be stored in a database and use a data type that is compatible with other front-end tools. In practice, this means storing documents in an RDBMS or ODBMS using some form of BLOB (binary large object) format that can be accessed via SQL, ODBC, or OLE. Documents stored in proprietary formats suffer from the same drawbacks as data that is not open to direct manipulation by other tools.

Application Services

A document manager should be capable of providing a document management layer to other desktop applications such as word processors, spreadsheets, and desktop publishing for document file management. In this role it controls the applications' access to the document library, and it provides document manipulation functions to those applications. Also, the document manager may be attached to the host application's menus or embedded within its screen forms, providing seamless add-in functionality. This lets you retrieve documents from the database directly into the host application workspace.

Now that I have provided a broad functional overview of document management, in the following sections I discuss a few interesting document management solutions available today. I do not discuss some major heavyweights in this market such as Keyfile and Filenet, or offerings from leading office-automation specialists such as Xerox and Wang. Instead, I highlight open, scalable client/server solutions designed to integrate with other client/server business applications.

Watermark's Image-Enabling Software

Watermark Software Inc. basically created the image-enabling marketplace for desktop applications when it released a $149 Discovery Edition of its software in 1993. The initial mechanism for image-enabling a desktop application was Microsoft OLE--essentially making Watermark compatible with any OLE-compliant application. Because an OLE object can be a container of text, graphics, voice, or video, Watermark uses OLE as a window to its own image server, and as a means to launch its own document-management toolset.

Using the new Professional Edition, a Watermark-enabled application simply displays a thumbnail of the linked document in an OLE container located on a form--thereby reducing network traffic for retrieval and memory usage on the front end. Double-clicking on the thumbnail displays the actual image from the image server via a pointer to the image file itself. Users can manipulate the retrieved document directly in the launched Watermark application using tools such as sticky notes, text and voice annotations, redliners, and highlighters.

Watermark Professional Edition uses Windows NT and SQL Server to provide the foundation for the image server. The use of a popular SQL RDBMS back end also lets Watermark exploit document tagging or "relational image-enabling." In this scenario, documents can be tagged simply with a key value from another table, such as a customer ID or invoice ID. This key value is stored along with the document on the image server to allow rapid retrieval of multiple documents, based on these key "tags." Attaching a viewed Watermark image to the current database row displayed by the front end is easy: Simply use a pop-up dialog box to collect the relevant data.

Watermark's technology is popular with business software vendors because it offers a simple and effective way to image-enable applications. Several accounting software vendors, including Great Plains Software Inc. (Fargo, N.D.), Solomon Software (Findlay, Ohio), and Flexi International Software Corp. (Shelton, Conn.) are using the technology to image-enable their financial and distribution suites. The software is already compatible with key groupware infrastructure components, including MAPI, VIM and MHS, Lotus Notes, and Delrina Corp.'s (Toronto) WinFax Pro. Integration kits are available for application development environments such as Microsoft Visual Basic and Powersoft PowerBuilder.

Documentum's Object-Based Document Management

Like Watermark, Documentum Inc. is a relatively new company with a growing reputation in the world of client/server document management. This is partly due to its open and scalable architecture. The Documentum Server runs on a variety of Unix platforms, and the client software is available for Windows, Macintosh, and Motif GUIs. Through DDE and OLE, Documentum can also reach out to other image repositories or RDBMS products from Oracle and Sybase.

Documentum takes an object-oriented approach to document management by using an object-relational architecture as the foundation for its document server. Documents and document objects are stored in an ODBMS, whereas other document data used for document retrieval is stored in relational formats so that documents can be accessed via SQL. Documentum calls the document library "DocBase," and the document manipulation objects "DocObjects."

DocObjects are the key to Documentum's strength. They are reusable and extensible components for managing the interaction with and workflow of a DocBase document. A DocObject, which represents a document, consists of four parts: content, metadata, operations, and relationships. The content is indexed and stored in the ODBMS, while the metadata and the document attributes (or tags) are indexed and stored in a relational format. Documentum provides its own query language, DQL, for bridging the two storage formats to allow complex document queries based on both content and attributes.

Documentum is designed for complex document management in an enterprise environment. Consequently, it offers functionality in three key areas: versioning, replication, and routing. Versioning is controlled via a Dynamic Document Assembly feature that lets you rebuild documents on a "bill of materials" basis, using templates to specify the level of granularity at which the document is assembled. This lets you reassemble a document at a certain point in time or during the workflow process, for example. Replication and routing are managed by the Documentum Relationship Manager combined with user-defined workflow objects or "routers." The Relationship Manager tracks parent/child links for document assembly and manages versioning and the encapsulation of subdocuments within main documents. It also manages routing relationships so that only certain people can see certain parts of, changes to, or notes on a document.

Documentum encourages integration with other applications via its System and Workspace APIs and its recently released Quickbuilder screenpainter. The company recently announced the integration of DocBase with Lotus Notes, and released version 2.0 of the core product. This new version adds more workflow capabilities, such as templates for automating complex task-based workflows, electronic sign-off and document distribution via e-mail, and an event-notification system driven by document triggers.

PC DOCS Open

PC DOCS (Document Organization and Control System) Open, from PC DOCS Inc., is a document management package originating from the automation of a major law firm in 1988. It offers a smorgasbord of document management features that cover the essentials for most document-oriented businesses. Originally based on a Novell NetWare/Btrieve platform, PC DOCS Open is a rewritten, client/server version of the original PC DOCS product.

PC DOCS Open's Document Management System (DMS) is based on document libraries that can reside in a logical multiserver architecture. Document Servers provide the file services, Library Servers store document data in an RDBMS based on Oracle or Sybase engines, and Index Servers can optionally store a real-time index of text in each document. Each document is described through a profile that includes: attributes (or tags), versions, attachments, document subcomponents, and audit trail history. PC DOCS supports Apple Macintosh, Windows, and DOS clients, and it can run remotely on laptops using the Watcom SQL RDBMS for local storage.

PC DOCS integrates with existing document authoring and editing tools by replacing the File Open and Save commands with dialogs from its own DMS. It also adds functionality in other areas, such as allowing the host application to access and use documents stored in PC DOCS libraries as the basis for a mail merge. Integration is available currently for a range of word processors, spreadsheets, messaging systems, and groupware (such as Lotus Notes, where the integration is particularly comprehensive). PC DOCS (like Watermark) supports the Open Document Management API (ODMA) for standardizing integration between a DMS and other desktop tools. C, C++, and Visual Basic developers can also use ODMA, which now supports OLE 2.0 and provides accessibility from any OLE-enabled tool such as PowerBuilder.

PC DOCS also manages document input and replication. The Workgroup Imaging product bundles the Watermark software for image-enabling. And by storing the application information with an individual document, PC DOCS can launch documents into their original authoring tool on the desktop. A document viewer can display up to 175 different document formats. Through PC DOCS' interchange agent software, PC DOCS can replicate new and edited documents to other PC DOCS or third-party servers. Finally, with PC DOCS Mobile, you can automatically check in or check out documents when a laptop is connected to the network, to enable secure and automated off-line document management.

Excalibur EFS

Excalibur Technologies' EFS document management and information retrieval software incorporates the text-retrieval capabilities of Excalibur's XRS adaptive pattern-recognition processing engine. EFS can manage the entire life cycle of electronic documents, including scanning the original paper copy. The XRS toolkit provides high-speed retrieval of digital information, including image, sound, video, and signal data--in addition to text. Excalibur's pattern-based retrieval technology allows for the search and retrieval of any type of digital information, including documents that incorporate images, signatures, voice annotation, and other components.

Excalibur EFS supports IBM RS/6000, Sun SPARC workstations, HP 9000, and Digital platforms. Clients can run under Windows, Macintosh, and X-terminals.

Challenges Ahead

The products I described are just four of the many document management products available, and I have only scratched the surface of their features. As in any other software niche, there remain a number of functional challenges for document management vendors: Despite these challenges, today's document management systems offer measurable benefits to most corporations, and provide a catalyst for thorough business reengineering projects in paper-driven organizations.


Stewart McKie is principal of Pinpoint Inc., a financial software consulting firm based in Redmond, Washington. He also edits the CFO/Info newsletter.


TABLE 1. A list of document management vendors and their products

CATEGORYVENDORPRODUCTDESCRIPTION
ImagingCompulink Management Center Inc.
Torrance, Calif.
310-212-5465
LaserFiche ExecutiveIntegrated document and text management; provides scanner interface
Diamond Head Software
Honolulu, Hawaii
808-545-2377
ImageBasicDocument imaging development toolkit for Visual Basic
Visioneer Information
Palo Alto, Calif.
800-787-7007
PaperPortComplete scanning hardware and imaging software solution
Wang Laboratories Inc.
Lowell, Mass.
508-459-5000
Open/ImageText/image management product line
Watermark Software Inc.
Burlington, Mass.
617-229-2600
WatermarkWindows-based application that integrates scanned paper documents with e-mail and databases
Document ManagementPC DOCS Inc.
Tallahassee, Fla.
904-942-3627
PC DOCS Lets users control DOS-based application files
Documentum Inc.
Pleasanton, Calif.
510-463-6800
DocumentumClient/server object-oriented document management system
Excalibur Technologies
San Diego, Calif.
619-625-7900
Excalibur EFSDocument management with image, sound, text, and signal data retrieval
Fulcrum Technologies Inc.
Ottawa, Ontario, Canada
613-238-1761
SearchServerClient/server text-retrieval engine
Keyfile Corp.
Nashua, N.H.
603-883-3800
Document ServerIntegrated document management system
Saros Corp.
Bellevue, Wash.
206-646-1066
MezzaninePlatform for integrated PC LAN-based solutions, including document management
XSoft
Palo Alto, Calif.
415-424-0111
Visual RecallDocument management system
WorkflowAction Technologies Inc.
Alameda, Calif.
510-521-6190
Action Workflow DocRouteLets users workflow-enable document management systems
Highland Technologies Inc.
Greenbelt, Md.
301-345-8200
Highview 2.4Imaging and workflow development environment
Reach Software Corp.
Sunnyvale, Calif.
408-733-8685
WorkManE-mail-enabled workflow management application
Timeline Inc.
Bellevue, Wash.
206-822-3140
WinWorkA work process development toolkit
UES-KIC
Dublin, Ohio
614-792-9993
KI ShellDatabase-enabled cross-platform workflow management software
XSoft
Palo Alto, Calif.
415-424-0111
InConcertA client/server workflow development framework


Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
June 1995 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1995 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Saturday, January 25, 1997