I remember a Gary Larson cartoon that had a cavalry scout with his ear to the ground listening for approaching Indian horses. Of course, the Indians were about 50 feet away, coming down the canyon at full gallop. Iıve been in that position all too often in the computer industry, and nowhere more painfully than with the Internet. Itıs always at full gallop, and it always runs right over my head.
The text-based academic Internet ı FTP, Gopher, and Archie ı was a long-time friend. Then rather suddenly, along came this "graphical Internet" called the World Wide Web, based on pages created with some half-baked markup language called HTML (hypertext markup language). HTML was supposed to be a subset of standard generalized markup language (SGML). I wasnıt enamored of SGML (having encountered it in the automobile industry and finding it neither simple nor very standard), and I didnıt expect to go gaga over HTML.
Iım still not crazy about HTML, but thatıs beside the point. Like a blunt-nosed scow, I finally came about and faced the mainstream. I needed to build Web sites, and for that I had to have more than a casual acquaintance with HTML. Having accomplished this awkward maneuver, I was content to piece together my Web pages ı with a text editor if necessary. Then Java happened.
Java didnıt happen because of the Web or because HTML left a lot to be desired for page layout, user-interface design, and data management. However, Java and its sibling, JavaScript, appeared to address many of HTMLıs shortcomings. Of course, Java is a full-blown (if not exactly hardened) programming language, and in order to add its benefits to my Web pages I had to take the big plunge. Learn it or leave it.
So I learned Java, just in time to discover that programs such as Jamba made it possible to add wonderful Java applets to Web pages without knowing much about Java. Oh well. Iıve since made half a career out of knowing something about Java, and it turns out Java is a lot more than an adjunct to Web pages. So I was content to mix HTML with Java applets and the occasional CGI or JavaScript program.
By now you probably see the pattern to this personal sagalet; perhaps it has similarities to your own experience. Building Web sites and Web applications is a restless occupation in more than one sense. Itıs not that Iıve been blindsided, just professionally ambushed. I was aware of shortcomings in the tools and languages, but I was often too busy to see how a growth-crazy software industry was trying to solve my barely perceived problems. Then more or less abruptly, Iıd be faced with something new, improved, better, and of course, absolutely essential.
Another example: As I was staring at the first announcements about dynamic HTML (DHTML; see "Data Binding in Dynamic HTML" by Rick Dobson, DBMS, March 1998), roughly 18 months ago, I felt a strange but familiar twinge along my medulla oblongata. One part of me whispered "Oh hell, donıt pay any attention. Let Microsoft and Netscape fight over it, and when they get done chewing it up, if thereıs anything left, maybe youıll get interested." Another voice, less clear but persistent, whispered "Be prepared." This Boy Scout in residence is the one that sent me into analytical action. In the case of DHTML, I decided to do the prudent thing and have a good look at it, even before support in the browsers was available. Iım glad I did.
But this column isnıt about DHTML, which is already a very real enhancement to HTML, although Microsoft and Netscape are indeed continuing to chew on it. Iım here to write about the Next Big Thing. Just as DHTML addresses the obtuseness of the object model (or lack of such) in HTML and opens the doors to some real event handling, extensible markup language (XML) is going to open new doors to document structure and data management in HTML. In fact, it might do a lot more than that.
If you havenıt immersed yourself in XML yet, then itıs your turn to listen for the sound of hoofbeats a few yards away. On February 10, the World Wide Web Consortium (W3C) officially recommended XML, one of those rare instances where the acceptance of a standard signals the beginning of a race to implement and market the technology successfully.
XML is based on SGML, which means that XML has more to do with documents and data exchange than it does with the formatting concerns of HTML and DHTML. However, youıll also read that XML is supposed to extend the capabilities of HTML, which is true because XML lives within the HTML framework of the Web page. But hold on there ı this is beginning to sound like a white paper. Iıd better get off my high horse and get real. Letıs start with some things HTML canıt do that XML can.
The other day, I was cobbling together a demonstration Web site for an online store. Rather than using one of the storefront programs, I was doing this the old-fashioned way ı with an HTML editing tool ı and timing how long it took. Of course, I already knew that HTML couldnıt handle the database connections by itself, so I was prepared to make CGI calls to do that. I also wanted to associate JavaScript with certain elements of pages, which meant coding in each page where the elements occurred. Once a page is loaded, HTML doesnıt know subheadings from Shinola.
Things are different in a page with XML. Hereıs a very simple example:
<?XML version="1.0"?> <ARTICLE> <TITLE>Net Developer</TITLE> <AUTHOR><LNAME>King</LNAME> <FNAME>Nelson</FNAME> <EMAIL>nhking@winternet.com</EMAIL> </AUTHOR> <DATEPUB>05-01-98</DATEPUB> <DESCRIPTION>XML is the successor to HTML</DESCRIPTION> </ARTICLE>
A quick glance at this sample will tell you that XML is familiar but has two big differences from HTML. XML deals with the structure of a document (for example, <ARTICLE> and the nested <AUTHOR>), and it can label data precisely (for example, <LNAME> or <DATEPUB>). The ability to represent structural complexity with tags is crucial because thatıs what makes XML object oriented. To XML, every document (such as a series of Web pages) is an object, and every element in the document is also an object. These objects carry their own description (attributes or properties) and their own methods (such as links to scripting). Because XML is extensible, you can create tags and attributes that identify the structural elements of any document.
A key piece of an XML implementation is the development of a parser that can read and make sense of these XML tags. The desire to have a parser that wouldnıt get bogged down in details was actually one of the reasons we have XML. The developers of XML started with proponents of HTML on one end (simple but limited) and SGML (powerful but complex) on the other. It quickly become apparent that SGML was too "heavy" (both in complexity and overhead) for efficient transmission over the Internet, and HTML was too "thin" in capability. XML is a compromise that can deliver about 80 percent of SGMLıs capability with only 20 percent of its complexity.
Those of you who are critically inclined may already have a question: How does the parser make an interpretation of a tag such as <FNAME>? Is this some kind of universal tag for "first name"? No, thereıs no way custom XML tags can mean the same thing anywhere and everywhere. There are two basic options: Use a set of custom tags that are known only within a particular application (your phone app knows all its own tags), or use a Document Type Definition (DTD).
While a DTD is not required for XML documents, using one has two important effects. An application can validate itself for the proper use of tags (sort of like a built-in syntax checker), and other applications can learn how to use the tags. The DTD section of an XML document will contain definitions of objects, their attributes, and their relationships. For example, a DTD might define <PHONENUMBER> as an object that must contain the attribute <AREACODE> but may optionally have an <EXTENSION>. As you can probably see, these are the familiar relationships for field validations.
Once these elements are defined in a DTD and youıve placed the identified tags within a document, then you can associate style sheets and scripts with any of the elements. This process vastly increases their power. Scripts, for example, cannot by themselves reference a documentıs structure (such as a specific table or paragraph). With XML you can communicate to a script the name and properties of a structural element and have it act upon it. You could have an <EMAIL> tag that has a script associated with it that will generate a link instead of needing to enter a <MAILTO> link for every email address.
As far as I can tell ı that is, in the absence of many fully completed XML products ı the true power of XML is in its ability for self-definition, or, to use the 64-bit word, to provide metadata. An XML document can, for example, operate independently of a server. Once an XML document has been downloaded, it may contain enough instructions and even data to be run completely from the client-side browser. The document can be run by any XML-enabled browser and by any application prepared to deal with the content and structure defined by a DTD.
Making Something with XML
So what does XML give us? The list, buckaroos, is endless. Here are a few illustrative areas of application:
<JAVA="PROGRAMMING LANGUAGE"> and not an island or a beverage, will enable much more precise and faster retrieval of information.
Do you smell opportunity here? Youıre not alone. Everybody is dreaming up ways to use XML. IBM has set a small army of people to work on an XML extender for DB2, which will act as a repository for XML data. IBM is also working on an XML parser written in Java that will be used within applications. IBMıs efforts reflect how XML is ultimately a database technology ı perhaps one of the most important in years.
Among the most enthusiastic XML supporters Iıve met are the folks at Poet Software. Their principal product, the Poet Server, an object database management system (ODBMS), is being tailored as an XML Repository. Their argument, and itıs a good one, is that ı in the object-oriented world of Java and XML ı it makes sense to use an object-oriented database system. Storing XML structures and data in an ODBMS will seem far more natural because the relationships between objects are retained in the database schema. I donıt think this naturalness is possible with a relational database system, at least not with much efficiency; but as IBM is trying to show with DB2 (an RDBMS), it may still work.
In the last few months, Iıve seen some XML products, but I know dozens are coming. One type of XML product to watch closely will be tool makers developing various XML utilities and programming environments. While I have yet to see something from the major language vendors (for example, Borland, Sybase, or Microsoft), companies such as DataChannel will soon release final versions of XML toolkits. DataChannel XML Development Environment contains an XML parser, XML Viewer, and XML Server. This seems to be a typical mixture.
I expect by the time you read this that Sun, Microsoft, and several Java vendors will also be introducing XML development kits. This market could become fiercely competitive, with companies adding some unusual spins to their products to distinguish themselves. I think this will be particularly true for companies such as Microsoft, Perspecta, and webMethods, which are attempting to set standards for specific DTDs. Iıve found that there is already a good deal of overlap in some of the proposals.
In fact, if there is a bug in the soup for XML, it will be the need to negotiate standard DTDs for a majority of the significant applications. I say negotiated because one company or another is proposing most of the standards, and typically, they are not met with universal acceptance. A good example is Microsoftıs championing of the Channel Definition Format (CDF). Currently, push platforms canıt share information with users. The CDF, an XML application, specifies channels, the information carried by each channel, the schedule of updates, and other information that makes it possible to share push data. This is good for the consumer, but has run into opposition and studied indifference from some companies (such as Netscape). While some industries will be able to adopt XML definitions rather quickly, I suspect the majority of DTDs will be slow to develop and perhaps even slower to implement.
In some sort of cosmic sense, XML could be the first true medium of data interchange over the worldıs first all-encompassing network. It could provide us with a structuring of information (or at least the handles for discovering meaning and structure) that the vast flood of information on the Internet must have if weıre not to be inundated. Iım not sure if XML is a replacement for HTML, but itıs likely to merge with DHTML documents in such a way as to make them inseparable. Is this an overzealous appraisal of XML? Who knows? While Iım gun-shy about missing the arrival of the Next Big Thing, Iım equally nervous about overestimating its importance.
I do know that I will keep an eye on the news releases for XML products. Iıll continue to try to understand how it is used and where I can use it in my own Web applications. As you can probably tell, XML goes off in more directions than a bunch of wild horses. In the long run, this is great because weıre likely to see a number of creative and useful products. We might even see something revolutionary. However, the short term is likely to be an overhyped and downright confusing mess. Weıll need to scrutinize the vendor verbiage carefully, test the products, and do our own pilot projects to figure out whoıs selling <ham> and whoıs selling <porkbellies>.
What did you think of this article? Send a letter to the editor.