Here's a look at the latest techniques for managing documents in a client/server environment.
It is a sure bet that any business in operation today can benefit from document management--even those with the most advanced computing and communications infrastructures. The reason I would bet my money on it is because the term "document management" is a misnomer, a term of convenience, a link with the past. You should really think of document management as "electronic transaction management." In fact, the goal of document management software is to avoid the need for a tangible document to exist at all, by providing the tools for managing structured digital information. Once information is digitized, it can be manipulated in a variety of useful ways.
In this article I outline the basic principles of document management technology, and then discuss the key types of business documents and the ways in which current software tools can manage them. I also review how certain client/server-based products are providing a range of document management facilities. Finally, I provide a checklist of vendors and products in this marketplace to highlight some key players. (See Table 1.)
Another challenge is categorizing documents into types in order to determine the characteristics and behavior of a document and the tools required to manage it. I discuss the following four document types: request, informational, living, and business-critical.
Businesses receive requests every day for information. The documents generated from this information must be dealt with by certain people, they are subject to certain rules and response times, and they require some form of response. Because these documents are voluminous, regular archiving is important.
Businesses are run on informational documents: internal memos, reports, charts, plans, specifications, and so on. These documents demand circulation lists, and they may allow others to annotate or attach to them. In addition, they may require multiple versions to track their evolution. They are subject to frequent recall during their short life cycle, and, as with request documents, the large volume makes archiving important.
A living document is a specialized variant of information, usually in the form of a plan, engineering diagram, architectural drawing, or work of digital art. For these documents, it's important to see subsections, view layers of detail, track multiple versions, limit access, and audit the interaction with the document.
Business-critical documents (such as invoices, checks, orders, and bills of lading) are specific to a business or department, and they initiate an important business process (usually involving money). These documents require detailed analysis to identify who originated them, what they are for, and how they should be handled. They may also require complex routing and authorizations depending on their content. Also, their workflows must be audited and their data must be available for online inquiries.
Tagging. Some systems can scan documents and automatically extract key data from the text. They can do this either by searching for data in certain sections of a document or by reading a bar code. However, most business documents do not use standard formats or provide bar codes. Therefore, a document management system must have some kind of "tagging" facility for new documents. This facility tags a document with key data, including its type, its origin, its key piece of information, its recipient, and the time and place of its arrival. Thus, if a bank receives an overdraft request, it might be tagged with IDs to identify:
Security. Because many documents contain sensitive information, security is a must. Security should include controls within the document manager itself and controls to prevent access from third-party applications. In highly sensitive activities, you may need controls to prevent direct access to document files at the operating system level. Such controls might include encryption and sophisticated compression/decompression algorithms.
Organization. Document managers should be able to provide a logical view of a document library. This makes working with those documents easier and more intuitive for users. In many cases, this means using the familiar concepts of files, folders, and cabinets as a means of graphically displaying document organization. Alternatively, drill-down browsers manage hierarchical and multiversion documents in an effective, visual manner. Document organization also means that storage profiles can be defined to ensure that the system can make organizational decisions without forcing the user to determine where a document should be stored.
Searching. One benefit of electronic documents is that they are easily accessible on a LAN or WAN. Document search and retrieval should allow searches based on file names, logical operators, ranges, and matches to specific keywords. It should also support searches based on content profiles that retrieve documents according to relevancy to the search criteria, and then rank the results based on the relevancy or priority of the document content.
Versioning and Check-In/Check-Out. Documents that require multiple iterations or have complex life cycles need versioning tools. These tools logically link multiple iterations of the same document. Sub-versioning within a document may be required for separate elements of the content. Also, for project-management purposes, you may require some means of auditing and reporting the versioning history of the document. The display of versioned documents requires more sophisticated display techniques to highlight changes (via red-lining, for example). Also, the ability to check in or check out whole or partial documents is invaluable for complex documents. For example, remote users can check out documents for manipulation on their laptop, and then check them back in when they have completed the changes.
Annotating and Attaching. It is often helpful to annotate a document or link it to other attachments after it has been scanned into the document manager. For example, a scanned invoice may contain a highlighted message about new pricing discounts, and a manager may wish to attach a note to the document to suggest that the vendor be contacted to discuss the new prices. Annotations may be graphical or textual. Attachments may be notes or other logically linked files (for example, you can attach the image of a check to an image of an invoice).
Accounting and Auditing. Documents that are literally "pieces of work," such as advertising copy or a product design, require accounting and auditing tools. Accounting lets you track and estimate the cost of document access. Auditing provides a means of knowing who accessed the document, when, and for what, in order to track the document.
Workflow. Many business documents have workflows that depend on the Seybold Group's three Rs of workflow: rules, routes, and roles. Rules let you define how the document is managed within its business process. Routes define where the document is routed in an organization in order to be processed effectively and efficiently. Roles determine who can interact with the document during its workflow, and what authority they have over the document content.
Archival. Once a document has completed its life cycle, the document manager should include tools for deleting documents based on user-defined criteria or archiving documents to off-line storage devices such as optical, tape, or CD-ROM media. Ideally, you should still be able to retrieve even off-line documents from the document manager. Auditors and the IRS have a nasty habit of wanting immediate access to data long since archived.
Open Architecture. All documents should be stored in a database and use a data type that is compatible with other front-end tools. In practice, this means storing documents in an RDBMS or ODBMS using some form of BLOB (binary large object) format that can be accessed via SQL, ODBC, or OLE. Documents stored in proprietary formats suffer from the same drawbacks as data that is not open to direct manipulation by other tools.
Now that I have provided a broad functional overview of document management, in the following sections I discuss a few interesting document management solutions available today. I do not discuss some major heavyweights in this market such as Keyfile and Filenet, or offerings from leading office-automation specialists such as Xerox and Wang. Instead, I highlight open, scalable client/server solutions designed to integrate with other client/server business applications.
Using the new Professional Edition, a Watermark-enabled application simply displays a thumbnail of the linked document in an OLE container located on a form--thereby reducing network traffic for retrieval and memory usage on the front end. Double-clicking on the thumbnail displays the actual image from the image server via a pointer to the image file itself. Users can manipulate the retrieved document directly in the launched Watermark application using tools such as sticky notes, text and voice annotations, redliners, and highlighters.
Watermark Professional Edition uses Windows NT and SQL Server to provide the foundation for the image server. The use of a popular SQL RDBMS back end also lets Watermark exploit document tagging or "relational image-enabling." In this scenario, documents can be tagged simply with a key value from another table, such as a customer ID or invoice ID. This key value is stored along with the document on the image server to allow rapid retrieval of multiple documents, based on these key "tags." Attaching a viewed Watermark image to the current database row displayed by the front end is easy: Simply use a pop-up dialog box to collect the relevant data.
Watermark's technology is popular with business software vendors because it offers a simple and effective way to image-enable applications. Several accounting software vendors, including Great Plains Software Inc. (Fargo, N.D.), Solomon Software (Findlay, Ohio), and Flexi International Software Corp. (Shelton, Conn.) are using the technology to image-enable their financial and distribution suites. The software is already compatible with key groupware infrastructure components, including MAPI, VIM and MHS, Lotus Notes, and Delrina Corp.'s (Toronto) WinFax Pro. Integration kits are available for application development environments such as Microsoft Visual Basic and Powersoft PowerBuilder.
Documentum takes an object-oriented approach to document management by using an object-relational architecture as the foundation for its document server. Documents and document objects are stored in an ODBMS, whereas other document data used for document retrieval is stored in relational formats so that documents can be accessed via SQL. Documentum calls the document library "DocBase," and the document manipulation objects "DocObjects."
DocObjects are the key to Documentum's strength. They are reusable and extensible components for managing the interaction with and workflow of a DocBase document. A DocObject, which represents a document, consists of four parts: content, metadata, operations, and relationships. The content is indexed and stored in the ODBMS, while the metadata and the document attributes (or tags) are indexed and stored in a relational format. Documentum provides its own query language, DQL, for bridging the two storage formats to allow complex document queries based on both content and attributes.
Documentum is designed for complex document management in an enterprise environment. Consequently, it offers functionality in three key areas: versioning, replication, and routing. Versioning is controlled via a Dynamic Document Assembly feature that lets you rebuild documents on a "bill of materials" basis, using templates to specify the level of granularity at which the document is assembled. This lets you reassemble a document at a certain point in time or during the workflow process, for example. Replication and routing are managed by the Documentum Relationship Manager combined with user-defined workflow objects or "routers." The Relationship Manager tracks parent/child links for document assembly and manages versioning and the encapsulation of subdocuments within main documents. It also manages routing relationships so that only certain people can see certain parts of, changes to, or notes on a document.
Documentum encourages integration with other applications via its System and Workspace APIs and its recently released Quickbuilder screenpainter. The company recently announced the integration of DocBase with Lotus Notes, and released version 2.0 of the core product. This new version adds more workflow capabilities, such as templates for automating complex task-based workflows, electronic sign-off and document distribution via e-mail, and an event-notification system driven by document triggers.
PC DOCS Open's Document Management System (DMS) is based on document libraries that can reside in a logical multiserver architecture. Document Servers provide the file services, Library Servers store document data in an RDBMS based on Oracle or Sybase engines, and Index Servers can optionally store a real-time index of text in each document. Each document is described through a profile that includes: attributes (or tags), versions, attachments, document subcomponents, and audit trail history. PC DOCS supports Apple Macintosh, Windows, and DOS clients, and it can run remotely on laptops using the Watcom SQL RDBMS for local storage.
PC DOCS integrates with existing document authoring and editing tools by replacing the File Open and Save commands with dialogs from its own DMS. It also adds functionality in other areas, such as allowing the host application to access and use documents stored in PC DOCS libraries as the basis for a mail merge. Integration is available currently for a range of word processors, spreadsheets, messaging systems, and groupware (such as Lotus Notes, where the integration is particularly comprehensive). PC DOCS (like Watermark) supports the Open Document Management API (ODMA) for standardizing integration between a DMS and other desktop tools. C, C++, and Visual Basic developers can also use ODMA, which now supports OLE 2.0 and provides accessibility from any OLE-enabled tool such as PowerBuilder.
PC DOCS also manages document input and replication. The Workgroup Imaging product bundles the Watermark software for image-enabling. And by storing the application information with an individual document, PC DOCS can launch documents into their original authoring tool on the desktop. A document viewer can display up to 175 different document formats. Through PC DOCS' interchange agent software, PC DOCS can replicate new and edited documents to other PC DOCS or third-party servers. Finally, with PC DOCS Mobile, you can automatically check in or check out documents when a laptop is connected to the network, to enable secure and automated off-line document management.
Excalibur EFS supports IBM RS/6000, Sun SPARC workstations, HP 9000, and Digital platforms. Clients can run under Windows, Macintosh, and X-terminals.
TABLE 1. A list of document management vendors and their products
| CATEGORY | VENDOR | PRODUCT | DESCRIPTION |
|---|---|---|---|
| Imaging | Compulink Management Center Inc. Torrance, Calif. 310-212-5465 | LaserFiche Executive | Integrated document and text management; provides scanner interface |
| Diamond Head Software Honolulu, Hawaii 808-545-2377 | ImageBasic | Document imaging development toolkit for Visual Basic | |
| Visioneer Information Palo Alto, Calif. 800-787-7007 | PaperPort | Complete scanning hardware and imaging software solution | |
| Wang Laboratories Inc. Lowell, Mass. 508-459-5000 | Open/Image | Text/image management product line | |
| Watermark Software Inc. Burlington, Mass. 617-229-2600 | Watermark | Windows-based application that integrates scanned paper documents with e-mail and databases | |
| Document Management | PC DOCS Inc. Tallahassee, Fla. 904-942-3627 | PC DOCS | Lets users control DOS-based application files |
| Documentum Inc. Pleasanton, Calif. 510-463-6800 | Documentum | Client/server object-oriented document management system | |
| Excalibur Technologies San Diego, Calif. 619-625-7900 | Excalibur EFS | Document management with image, sound, text, and signal data retrieval | |
| Fulcrum Technologies Inc. Ottawa, Ontario, Canada 613-238-1761 | SearchServer | Client/server text-retrieval engine | |
| Keyfile Corp. Nashua, N.H. 603-883-3800 | Document Server | Integrated document management system | |
| Saros Corp. Bellevue, Wash. 206-646-1066 | Mezzanine | Platform for integrated PC LAN-based solutions, including document management | |
| XSoft Palo Alto, Calif. 415-424-0111 | Visual Recall | Document management system | |
| Workflow | Action Technologies Inc. Alameda, Calif. 510-521-6190 | Action Workflow DocRoute | Lets users workflow-enable document management systems |
| Highland Technologies Inc. Greenbelt, Md. 301-345-8200 | Highview 2.4 | Imaging and workflow development environment | |
| Reach Software Corp. Sunnyvale, Calif. 408-733-8685 | WorkMan | E-mail-enabled workflow management application | |
| Timeline Inc. Bellevue, Wash. 206-822-3140 | WinWork | A work process development toolkit | |
| UES-KIC Dublin, Ohio 614-792-9993 | KI Shell | Database-enabled cross-platform workflow management software | |
| XSoft Palo Alto, Calif. 415-424-0111 | InConcert | A client/server workflow development framework |