DBMS, October 1997
DBMS Online: Server Side By Martin Rennhackkamp

Mobile Database Replication


Concepts, configurations, and concerns related to managing replication.

"Ground Control to Major Tom...commencing countdown, engines on..."
"This is Major Tom to ground control...for here am I sitting in a tin can, far above the moon--planet Earth is blue and there's nothing I can do..."
"Ground Control to Major Tom...your circuit's dead, there's something wrong...can you hear me Major Tom?"

No, I'm not on some strange medication. These words, written by David Bowie more than two decades ago, are pertinent because they highlight the importance of communication with a "home base." In any situation where multiple mobile databases are used or where there is a home-base database, you require replication to synchronize the data and operations between these databases.

In the four case studies I presented in my last column (DBMS, September 1997), I referred to various ways in which data is synchronized among different databases. Our traveling salesman dials his head office nightly to download new product information and new marketing collateral and upload sales and order data to the head office's database. The agents of the International Mission Board (IMB) synchronize their desktop databases from a master desktop database (which is regularly refreshed from a central database) when they return to their office periodically. Our botanist makes a radio or satellite link-up to her base laboratory to replicate her sighting data to the laboratory database. The reservations to transport liquid cargoes in Stolt Parcel Tankers' carrier and tanker ships--as well as their payroll, personnel, and scheduling data--is replicated from a central AS/400-based cargo booking system via satellite links to desktop databases on the various ships.

In this month's column, the second in my miniseries on mobile databases, I will review the concept of replication, various replication configurations, and issues related to managing replication in the mobile database context.

Replication

Let's begin with an informal definition of replication. Replication is the process where the transactions performed on one database (called the source database) are propagated asynchronously to one or more other databases (called the target database or databases) in a serialized manner.

There are three important terms used in this informal definition, which are relevant to mobile databases: transactions, asynchronous, and serialized.

Transaction-based replication means that an entire transaction is replicated to a specific target database, or not at all. A transaction is an atomic unit of processing on the source database. The effects of a partial transaction may leave the database in an inconsistent state. Thus, the operations propagated to any given target database must also adhere to the transaction boundaries and the transaction properties. For example, if a plant sighting is replicated from our botanist's desktop database to the base laboratory database, it should include the locality details that were recorded as part of the sighting. Similarly, an order replicated from our traveling salesperson's desktop database to the corporate database must contain all the order lines--that is, all the ordered items. Clients will not be satisfied if they receive only half their orders because the agent's replication facilities propagated only part of the transactions. It is important to note whether the replication facilities maintain the transaction boundaries. Although applications usually provide begin transaction ... commit boundaries, most DBMSs do not retain any knowledge of which changes were affected by which multirow and multitable transactions after the transactions have completed. The replication facilities must maintain this information explicitly.

Asynchronous means that transactions are replicated after they have been committed on the source database. There are two reasons that asynchronous replication is required. First, with mobile databases, the source and target databases may be disconnected from each other for relatively long periods of time. For instance, while our botanist is out on a field trip to a remote location, her mobile database is disconnected from the base laboratory database. Similarly, when the agents of the IMB are attending public-relations events, they are not connected to their master database. The second reason that asynchronous replication is required is that, if we were to use synchronous replication, the target databases would have to be accessed as part of the transaction on the source database. However, in the mobile database scenario, no transaction would be able to complete in a remote location because the target databases often cannot be accessed as part of the same transaction. Although it may sometimes be possible to establish the connection from a remote location, it may be impractical. It may take too long to connect to the target database and perform the transaction there as part of the transaction on the source database, or it may be too expensive to establish connections to the remote target database for every transaction performed. For example, our traveling salesman cannot really establish a dial-up call to his head office database from every client he visits. That would defeat the purpose of his carrying a mobile database and applications with him.

Asynchronicity--especially in the context of mobile databases--usually implies that you require some form of store-and-forward replication. The operations that have to be replicated from the source database must be stored in the source database until they can be propagated to the target database (or databases). When a transaction has to be replicated to multiple target databases, it must be stored on the source database until it has been replicated to all the identified target databases, and it may take much longer to establish a connection to some target databases than to others. For example, the data that needs to be replicated to the various traveling sales agents must be kept in the master corporate database until all the agents have dialed in to synchronize with the master database. It can even take longer in the case of the IMB, where the agents only synchronize with the master database when they return to the head office.

Throughout the asynchronous store-and-forward replication process, the transaction and serialization properties have to be maintained. No matter how long the transactions are stored, through how many hops they are propagated, or in what form they are stored and propagated, the transactions must be applied in their entirety, in the correct serial order, on each of the identified target databases.

Serialized means that transactions, as well as the operations within the transactions, are replicated in the same order in which they were applied. If the transactions, or even the operations within one transaction, are replicated to a target database in a different order from how they were applied on the source database, serious inconsistencies between the source and target databases may result. For example, if the locality details of a plant sighting are replicated to the base laboratory database from our botanist's mobile database before the generic details of the sighting are replicated, the referential integrity constraints on the target database may not permit the operation to occur. Similarly, the propagation of a cargo allocation to a specific ship may not be allowed on the target database if the cargo details were not propagated first.

Replication Configurations

In a typical mobile database scenario, there usually is a central database and a number of mobile databases. For example, Stolt Parcel Tankers has a central reservations system on an AS/400-based server and mobile desktop databases on each ship. The IMB has a master database in Richmond, Virginia, and agents take mobile desktop databases with them to the various public-relations events they attend. In most of these scenarios, you have the requirement for multidirectional replication:

( Transactions must be replicated from the central database to the mobile databases. For example, our traveling salesperson has to receive marketing data on any new marketable products. And the cargo reservations are replicated from the master reservation system to the relevant ships.

(Transactions have to be replicated from the mobile databases to the central database. For example, our botanist's sightings must be replicated to her base laboratory's database, and orders taken by our traveling salesperson must be replicated to the central database because sales commissions are calculated from this database.

( In some scenarios, transactions must be replicated among the various mobile databases. A number of botanists may want to share the localities of the plants they have identified, or the current inventory levels resulting from the orders placed for single items (such as individual farm implements) may have to be shared among a number of salespeople to prevent them from selling an implement that has just been sold by one of their colleagues. Replication among several mobile databases is quite complex because the various mobile databases may rarely--if ever--be connected to each other. In most situations, this form of replication has to take place via the central database. Two-way (actually, it's multiway) replication among a large number of mobile databases may be very complex to configure.

Managing Replication

For mobile replication, you require tools to activate and deactivate the replication facilities. When the mobile database is disconnected from the home-base database, you do not want either system to waste valuable resources by repeatedly attempting to replicate queued transactions to the other system. Both systems should be able to detect that the other system is temporarily unavailable and then cease all replication attempts. When the two systems are reconnected, they should detect the reconnection and automatically resume bidirectional replication. Alternatively, the user should be able to manually deactivate and reactivate the replication mechanisms, although this is the kind of administrative burden on the mobile database user that we want to avoid.

Setting up replication should be relatively straightforward. This is particularly important when the organization employs a large number of mobile databases, as do Stolt Parcel Tankers and the IMB. If setting up and configuring replication is a complex and time-consuming task (even though it is typically performed by the DBAs and system administrators and not mobile database users themselves), you may not want to use several mobile databases. You must keep in mind that the replication configuration may have to be changed from time to time. The applications, the database structures, and the source and target databases may change.

More important, replication mechanisms should be extremely easy to manage and monitor--as well as activate, deactivate, and reactivate. Replication from a mobile database may have to be activated manually from a remote site--for example, when the connection over a dial-up telephone line has been established. In the case of our traveling salesman, he may have to refresh his personal database nightly while visiting clients in some remote part of the country. For example, he may want to obtain the latest statistics on his clients before visiting them the next day. Furthermore, it should be just as easy to monitor that the replication facilities function correctly. Ideally, the replication mechanisms should be self-managing or configured in such a way that the user will be notified if they do not function correctly. In that case, the corrective steps, if any, should be easy to apply.

In last month's column, I also stressed that you cannot expect your mobile database users to perform extensive DBA tasks. Ideally, they should not be required to perform any DBA tasks at all. Similarly, you cannot expect mobile database users to be system administrators and replication experts, in addition to performing their day-to-day tasks. Although most DBMSs are becoming more self-managing and the others are providing increasing numbers of tools to manage them, the same does not necessarily apply to the replication facilities. I have not yet worked with or evaluated one DBMS's replication facilities that were easy to manage. The biggest problem, in my opinion, is conflict resolution. A conflict occurs when an operation from a source database has to update a row in a target database, but the row has already been updated by someone else. The problem is even worse when the same row is updated differently in two databases, and the two databases attempt to replicate their respective changes to each other. In most replication systems, if a conflict is detected between two systems, it must be resolved manually. One or two systems have automatic conflict-resolution mechanisms--for instance based on master copies of the data or based on timestamps applied to the respective rows--but can we be guaranteed that the logically correct solution (which corresponds to the real-world solution) is always applied? In the case of manual intervention, our mobile database user has to be trained in the technical details of conflict resolution of the replication software being used.

I Think My Spaceship Knows Where to Go...

Unlike Major Tom, sitting in his tin can far above the moon with nothing he can do, there is a lot you can do. When you evaluate the replication facilities offered by a particular vendor for implementation in mobile databases, you must check carefully for the issues I have described in this month's column. You must check that the replication facilities can propagate transactions asynchronously and serially. You have to review the possible replication configurations to all the required target databases--master-to-mobile, mobile-to-master, and mobile-to-mobile (if you want to replicate between the mobile databases themselves). For the mobile-to-mobile configuration, you should investigate how much effort is required to set it up on each mobile database, as well as the traffic load it will place on your communication channels. The most important aspect to examine, however, is the management of the replication facilities. They should be easy to install and configure, and they must be extremely simple to activate, deactivate, and reactivate. Similarly, the replication facilities should be easy to monitor and manage, especially where conflict resolution is concerned. Ideally, you should get some of the mobile database users involved in the evaluation process. Tasks that may seem quite simple for a DBA or system administrator may make your end users want to look for another job.


Martin Rennhackkamp is the owner and principal consultant of The Data Base Approach, a corporation specializing in relational and distributed databases, based in Cape Town, South Africa. You can email Martin at mr@dba.co.za or visit his Web site at www.dba.co.za.

What did you think of this article? Send a letter to the editor.


Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
October 1997 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1997 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated 9/15/97