Binary large objects (BLOBs) have been available in some DBMSs for a long while, but it is only now, with the influences of the object-oriented and Web worlds, that these datatypes are coming into their own. In some DBMSs, large objects are also called LOBs, where they drop the "binary" connotation to cater to large character data sets -- for example, word-processing documents and hyperlinked documents, such as Web pages.
For the remainder of this column, I will use the LOB acronym, except where a LOB is explicitly called a BLOB in the context of a particular product.
Borland International Inc.'s InterBase, initially developed and marketed by InterBase Inc., already had BLOBs as a standard datatype in its first release in 1986. At that stage we -- as young technocrats -- were looking for problems where we could apply this interesting solution. One of our InterBase clients at the time, the Electronic Engineering faculty of the University of Stellenbosch, was doing some interesting work on voice recognition. The group's researchers analyzed voice samples and broke the samples into words, syllables, phonetic sounds, and symbols, which they then stored in an InterBase database. They stored the voice samples and the frequency graphs in operating system files, which they tracked from the InterBase database.
No matter how much we tried to persuade them, they didn't want to store the voice samples and the frequency graphs as BLOBs in the database, because they were uncomfortable changing to this new approach. Raw data such as theirs had always been stored in operating system files, so, they reasoned, why change it? This kind of thinking has led to BLOBs' underutilization and lack of popularity.
Only with the relatively recent interest in object-oriented databases and object-oriented extensions to relational DBMS products are LOBs gaining the interest and acceptance that they should have received ages ago. The current hype around the Web is also furthering this interest. Most Web pages have graphical objects such as bitmaps and icons, binary objects such as sound and video, and unstructured textual data such as hypertext documents, all of which must be stored in the database. As a result, all of the major players on the DBMS market are following InterBase's early lead and including LOB support in their products.
Many DBMSs have datatypes that you could call MOBs: medium-sized objects. These are datatypes in which you can store a "smallish" document such as a 2,000-character memo or a smallish binary object such as a 2KB bitmap. In this month's column, however, I focus only on real LOBs: objects that at this stage seem almost limitless in size. Note, however, that most of the DBMS products currently limit the LOBs they support to 2GB in size.
In the remainder of this column, I investigate how the major players in the DBMS world (namely Borland InterBase, CA-OpenIngres, IBM DB2, Informix, Microsoft SQL Server, and Sybase SQL Server) address LOBs in their latest product offerings.
Some of the other systems follow more appropriate naming conventions for their LOB datatypes. Borland's InterBase 4.0 supports LOBs through its BLOB datatype. You can specify the subtype of an InterBase BLOB as 0 for binary data, 1 for ASCII data, and a negative number for a user-defined subtype. In IBM's DB2 release 2.1.1 you will find BLOB, CLOB, and DBCLOB datatypes to store data items larger than 4KB in size. DB2's Binary Large Object (BLOB) datatype is a byte string used to store binary data such as photographs, voice, and video. A Character Large Object (CLOB) datatype is a string in which the sequence of characters can be either single- or multibyte characters or a combination of both. An example of a CLOB is a word-processing document, such as a resume.
Other variations on this theme are Informix release 7.12, with its BYTE and TEXT datatypes to support large binary data items and large character streams, and Microsoft SQL Server 6.5 and Sybase SQL Server 11 with their IMAGE and TEXT datatypes.
At closer inspection, the reason for the differences between the various LOB datatypes becomes obvious. The LOB datatypes are not defined in the ANSI SQL-92 standard; therefore the various DBMS vendors have had a free reign in naming their LOB datatypes. I expect this will also be standardized in future, just like the standard datatypes. However, just like the current situation with the standard datatypes, each of the vendors will probably keep its own proprietary extensions to the basic LOB datatypes.
For example, in CA-OpenIngres you must use a data handler function to transform the data between the application program and the database, and vice versa. The data handler can then be specified in a SQL query wherever an expression or host language variable is acceptable. In the data handler function, you can use the PUT DATA and GET DATA embedded SQL statements and the ENDDATA SQL function to transmit or receive the components of the LOB data item, one segment at a time.
In Microsoft SQL Server 6.5, LOBs can be manipulated using a set of built-in Transact-SQL functions. These functions act like a SQL API to the LOBs. The functions include TEXTPTR to obtain a valid text pointer, READTEXT to read a segment of the object, and WRITETEXT and UPDATETEXT, to replace the whole object or change parts of the object, respectively. Alternatively, programmers can access LOBs using the relevant calls in the DB-LIBRARY API. The DB-LIBRARY API implements the same functionality that is available in Transact-SQL.
This problem is in part a result of the so-called "impedance mismatch" between the relational data model and the database support found in host programming languages such as C and C++. The relational DBMSs process sets of rows in single operations, giving sets of rows as the result. However, the host programming languages can only process single records at a time; they do not cater to the relational constructs such as a set of rows in a query result, nor do they cater to the nonstandard datatypes such as LOBs.
Programmers can also use ODBC to manipulate LOBs by using the SQLGetData and SQLPutData ODBC API calls in conjunction with SQLFetch ODBC operations. The functionality of these API calls is similar to the native DB-LIBRARY and Transact-SQL interfaces of Microsoft SQL Server and Sybase SQL Server. Using this kind of API can hide all of the respective DBMSs' implementation differences from the programmers.
But what are your options? You can use a standard API such as ODBC to hide the LOB's implementation and access details from you, but then you are at the mercy of the API's developers. You can only implement the type of access to your LOB data that their interface allows you to implement. Alternatively, you should investigate the object-oriented databases, where a LOB is implemented as just another object class, with the access details hidden in the object class's methods.
The other area of concern is the internal representation of the BLOB datatypes on different platforms. In a client/server environment with different platforms, you must pay careful consideration to the format and internal representation of the BLOB data to ensure that it is usable on all platforms. You must avoid platform-specific conversions and representations, which can render the BLOB data useless or unreadable on other platforms.
