DBMS Interview - July 1995
Most of us stroll the aisles of our local supermarkets without thinking about how all of the boxes and cans arrive on the shelves. Because the distribution system seems to work, we happily ignore it. But, if we peer behind the scenes, we find an industry ripe for reengineering. For example, a box of breakfast cereal may ride the roads and inhabit warehouses for nearly three months before you can buy it. Grocery chains may stock 60 to 90 days of inventory simply to hedge against unexpected surges in demand. To control inventory, grocery chains often buy and sell stock right off the trucks while theyıre in route. This style of just-in-case inventory control exacts a toll on manufacturers, distributors, retailers, and consumers alike.
Enter Non-Stop Logistics Corp., a two-year-old San Francisco-based company that hopes to change "just-in-case" to "just-in-time." Non-Stop Logistics has a plan to revamp the distribution system by establishing as many as 80 regional sort-and-load centers, through which manufacturers will ship their dry goods to grocery retailers or retail distribution centers. The sort-and-load centers will receive the goods from the manufacturers, mix-and-match them into store-ready pallets, and deliver them to retailers. The crux of the system will be Non-Stop Logisticsı ability to forecast the demand for products. By analyzing information stored in Oracle databases, Non-Stop Logistics will predict the amount of stock that manufacturers must deliver to the channel, thereby minimizing warehouse inventory, normalizing transportation routes, and speeding food to your table. Considering the potential of its plan, itıs no wonder that Non-Stop Logistics has received financial backing from companies such as Nielsen Marketing Research, GATX Logistics, and Schneider National Trucking, technology backing from Oracle Corp. (Redwood Shores, Calif.) and Sun Microsystems (Mountain View, Calif.), and development backing from EDS Consulting (Plano, Texas).
Reegineering the way an entire industry operates requires considerable computing power. In this case, Non-Stop Logistics has developed applications to predict product demands, track shipments, and manage inventory. Written in C++, these applications run on Sun Microsystems workstations and servers, and on graphical PCs. The data will reside in an Oracle database at each sort-and-load center. To make the object-oriented applications work with the relational DBMS, Non-Stop Logistics uses Persistence Softwareıs (San Mateo, Calif.) Relational Interface Generator. Persistence creates a mapping layer that defines the interface between C++ objects and data stored in the relational tables.
In April, DBMS Technical Editor Maurice Frank and Editor in Chief David Kalman interviewed Non-Stop Logisticsı George W. Earle and John C. White III about the challenges of building this system. Earle is a senior systems developer, serving as an assistant team leader and development manager for Non-Stop Logisticsı transportation optimization technology. Before joining Non-Stop Logistics, he worked at Trimble Navigation where he worked with DBMSs and signal-processing technology to record and analyze the reception quality of the companyıs Global Positioning System (GPS) products. John White is Non-Stop Logisticsı corporate DBA. He collaborates with a database architect to create the systemıs logical data models, and he implements the physical designs and manages the data. Before working at Non-Stop Logistics, White served briefly as a lead software engineer at Tesseract Corp. (San Francisco), where he helped move the companyıs human resource applications from mainframes to client/server and relational technology. For the previous 13 years, he worked at Bechtel Corp. as the Oracle Financials DBA. The following is an edited transcript of the interview.
DBMS: The supermarket industry seems to work smoothly, but Non-Stop Logisticsı premise is that itıs inefficient.
EARLE: Today, a manufacturer may make a deal with a distributor, and say, "You can have 20 truckloads of an item at this price, but you have to take all 20 truckloads. I canıt break it up." The distributor may take it, knowing heıs overbuying. He will then call somebody else (who may be a competitor), and say, "I can sell you this at such-and-such a markup." The next buyer may take it, knowing heıs also overbuying, and heıll sell a little bit more down the road. Thereıs a story going around about a distributor who did this and ended up buying his own stock at the other end because he sold too much of it.
WHITE: Seventy-five is often kicked around as the number of days a product sits on a truck or in a warehouse before you get a chance to buy it. Thereıs a shuttle warehouse, a plant warehouse, and a regional market warehouse, and, because there is no middleman, each company needs its own distribution center where it can break larger shipments into orders it can send to the stores. These can also be large warehouses.
EARLE: To the outsider, the system seems to work. There are three stores in my neighborhood, and they all have the same stuff, and it all gets there. The system works to the point that people donıt notice it, but itıs inefficient. The amount of cash the industry has to generate every month to buy all this stock ahead of time is amazing. A lot of the effort goes into forward buying and cash management.
WHITE: And inventory is a use of capital; by virtue of having 75 days of inventory, thatıs money not being used for something else in an extremely low-margin business. In grocery stores, one to two percent margins are typical. Sometimes one to two percent is great. If the grocery stores can free up some of that capital by reducing inventory to one- to two-weekıs worth, theyıll have tens of millions of dollars -- maybe more -- they can use for other things, including lowering prices.
DBMS: Exactly how will Non-Stop Logistics do business?
WHITE: We say weıre in the information business, because we own none of the products we move. The key is our ability to forecast. We tell the manufacturer to produce so much for a particular market, then we tell the line-haul (terminal-to-terminal) carriers that we expect to have the product picked up at a particular plant. Managing that takes computing power. The Oracle database is our information repository.
EARLE: The basic idea with Non-Stop is that we act as the middleman. We donıt own stores. We wonıt make any goods. We will figure out how much product needs to be in the marketplace for a particular day, then weıll ask the manufacturers to send us that much stock and put it in our sort-and-load center. Then weıll ask the distributors -- the retailers of the product -- to send us their orders, and weıll deliver them to the stores every day. Letıs say you want to predict how many boxes of Snack Ums that stores in the Bay Area will want tomorrow and the next day. You do that by maintaining a history of orders across all the stores. Then you use some statistical mathematics to predict what the stores will want in the future, based on the past. If you can do that, you donıt have to store 60 to 90 days of inventory. We may need only one or two weekıs worth of inventory, because as soon as we start depleting it, the forecast system will tell us weıll soon run out and that we need to order more. The main job of our software is to store the order data, predict when we need to order, and create information that gets sent to the trucking companies and the manufacturers. From that information, they can put the right amount of product on the dock, get it picked up, bring it in, and move it to the stores when the orders come in. We, like FedEx, will charge a price for moving the box. If you consider all the costs now from end to end, weıre going to save almost a dollar a case while weıre charging for moving the product. If the industry can save a few cents on a case, thatıs a big deal.
WHITE: Weıre not exactly sure what the numbers are, but itıs significant to the point that it causes the distributorsı eyes to light up.
EARLE: As a side benefit, we will have Activity-Based Costing (ABC) and Transportation Tracking subsystems that will be driven by our order concept. That will allow us to track every package, so there wonıt be any more Tylenol scares where nobody knows where the Tylenol came from. Weıll know that we picked up a case at a plant, and that it was delivered to a store. Weıll also know how much it cost to transport it. Weıll have an end-to-end cost model of what it takes to move a box from the plant to a store. That will tell us how the business is doing, and how we are doing against competitors.
DBMS: Why couldnıt a large manufacturer do what youıre trying to do?
EARLE: The big companies have efficiencies for themselves. They are big enough, and they supply enough of the country with certain products that they can do mostly just-in-time delivery to the stores from their plants. But they donıt share that ability with anyone else. In a way, weıre equalizing the playing field for other people.
WHITE: Thereıs now a move for manufacturers and distributors to get together -- called Efficient Consumer Response (ECR) -- to try to manage the inventory. But it doesnıt quite work because it is not industry-wide or market-wide.
EARLE: Itıs like trying to get every company to talk to every other company through EDI [Electronic Data Interchange]; it requires a massive infrastructure. But maybe you donıt need an electronic middleman; instead, you need a service middleman. For example, you talk to one company -- FedEx -- to ship a box. You donıt need EDI to hire a shipping company to move your box.
DBMS: How does the forecasting part of your application work?
EARLE: The forecasting does two main things: It models inventory in the sort-and-load center, and it predicts the orders that will be coming in the future. As orders are filled, inventory will drop. When you need to bring the inventory back up again, the system generates a "market need for product." The channel optimization part of the application then looks at the forecast, which might say, "We need 10 cases of cereal on Monday, 10 on Tuesday, 13 on Wednesday, 14 on Thursday, and 15 on Friday." But it would be inefficient and expensive to put only 10 cases on a truck. It would take a week to fill up a whole truck. The optimization moduleıs job is to aggregate all those cases into one truckload. It can also decide to aggregate different kinds of cereal, because one type of cereal may not fill a whole truck. The optimization process generates the trailer routes and release orders that go to the carrier and manufacturer, respectively.
DBMS: What is the status of the project?
EARLE: We started requirements and business definition about a year and a half ago. We proved and wrote the forecast software before the company could continue to be funded. Last fall, we wrote down everything everyone had in their heads. At the beginning of this year, we started implementing everything. This month [April] is our last month of writing the major applications. Over the next month or two, weıll be integrating and testing them. We have a suite of applications, including inventory management, forecasting, sort-and-load center operations, logistics optimization, customer information, and communications. We plan to go into business in mid-summer. Weıre opening our first sort-and-load center in North Carolina.
DBMS: How big is this system?
WHITE: We have about 430 tables in the Oracle database.
EARLE: and two to five times as many object classes that map to those tables. Persistence creates five classes for every database object. It sounds like a horrendous amount of work, but itıs somewhat symmetrical across all classes. For each table, you get a collection class, a key class, and the tableıs main class. The collection class lets you operate on a set of like objects (rows from one table) and iterate through them. The key class lets you set up the primary key for object retrieval. The main class provides read and write methods for each attribute and the classı query methods. Persistence generates two other classes, but the developer doesnıt use them directly.
WHITE: This is a work in progress, but weıre looking at 50 to 100 gigabytes for database size. There will be several of these databases around the country, but we canıt have a battalion of DBAs taking care of them. That makes for some interesting challenges in backup and recovery. We have 100 gigabytes online here now. We back up the database once a week, and, right now, I have the luxury of being able to do a cold backup. When the time comes, weıll be running in archive mode, doing incremental backups. Each sort-and-load center will be responsible for its own backup.
DBMS: Are you using any packaged applications?
WHITE: The only application that weıre buying is an Oracle-based accounting package called SQLTime, from Decision Data Systems (Lango, Fla.). We also have a subcontractor -- QSSI (Piscatawag, N.J.) -- that is customizing an operations application that lets people in the warehouse scan a product, putting the data in a format that we can use to create bills of lading, proofs of delivery, and that kind of thing. That application is running Recital (Recital Corp, Danvers, Mass.), a form of Xbase. It has an interface to our Oracle systems. We will also use Newton-like devices [personal digital assistants] for each truck driver. As the driver unloads a pallet at a store, heıll scan it and indicate that it was delivered.
DBMS: On what platforms will the applications run?
WHITE: Weıre running on Sun SPARC Solaris machines. We have a four-processor SPARC 1000 and a four-processor SPARC 1000e, which is an enhanced model. We also have some SPARC 10s, 20s, and 5s. We have a little bit of everything that Sun makes. We donıt have a 2000 yet, but Iım sure we will at some point. The 1000s are the primary database servers; these will be our production servers. The clients are Windows clients. The QSSI application runs on a SPARC 20 server; this is the Recital application that runs the scanners.
EARLE: The system architecture is such that each sort-and-load center does its own inventory, forecasting, and transportation optimization. Each center will also handle its operations and its communication to the local plants and the local distributors. The computing architecture is essentially a SPARC 1000, and the scanners operate like VT-100 terminals. Most of the network nodes will consist of those scanners. Once we have a lot of product moving through the sort-and-load center, 100 or so people will be connected to the SPARC 1000 in real-time, and every case will be scanned. The sort-and-load centers will be internetworked using a frame-relay network and will have a peer-to-peer architecture. We donıt want one big, monster data center holding terabytes of data. We donıt need that. For inquiry and reporting, we plan to buy write-once, read-many-times (WORM) CD-ROM drives. Every sort-and-load center will be pressing the database and delivering it here. The data is historical. Once we move a product, the data doesnıt change. Then weıll have a stackable carousel that holds the CD-ROMs. Weıll do the queries from there, or have another machine for queries. Nobody will be doing distributed queries [across sort-and-load centers]. The sort-and-load center managers will probably only want to know things about their own operations. The client/server part is the presentation of information for people who need to review customer information, change a contract, or update SKUs [stock-keeping units], or do anything to existing data. Weıre using Windows clients talking to the database across the wide-area network. Our goal was to be as homogeneous as possible. Everything is Sun, all the databases are Oracle. Each sort-and-load center has its own Oracle database. All the computers are talking TCP/IP, so thereıs no Novell, or anything like that.
WHITE: As a startup, we have no legacy systems, so we can design the system the right way. We donıt have to do any retrofitting. We can try to start with the best.
DBMS: What tools are you using for development, besides C++ and Persistence?
WHITE: Weıre using Oracle CASE as a front end to Interactive Development Environmentıs (San Francisco) STP [Software Through Pictures]. In Oracle CASE, we model the relations, foreign-key constraints, and so on, and generate the DDL. This DDL represents the persistent objects. We then import the DDL into STP, and create the class definitions there. We also have non-persistent objects that donıt go into the database. The super-model [containing both persistent and non-persistent objects] resides in STP.
EARLE: We have two camps here. We have the relational database modeling camp and the object modeling camp. They mirror each other now, because of the Persistence technology. The C++ programmers use STP to do the object model. From that tool, we generate Persistenceıs input files. Each file defines an object, what table it talks to, what fields in the table we need to access, and what relationships exist among tables. Then, we use the Persistence program to generate the .cc and .htt files. After that, we use SunSoftıs SunPro Workshop to code and link. The C++ itself is from Sun. We also use Rogue Wave Softwareıs (Corvallis, Ore.) Tools.h++ and Math.h++ to build our software. Our forecasting system uses Math.h++, and the rest of us use other classes in Tools.h++. For the presentation part of the application, weıre developing the GUI screens using Borland Internationalıs (Scotts Valley, Calif.) C++ and Object Windowing Library (OWL). Weıre using Persistence to map to the database using our same object model.
DBMS: Have you had problems in translating objects to relational models, either in performance or in functionality?
EARLE: Weıre a small company. The database people and the C++ people in a small company all sit close and design systems together. We developed both the object model and the relational database at the same time, so there is no mismatch. If we have a mismatch, we have to work it out.
WHITE: We are designing this as a team.
EARLE: Oracle CASE helps denormalize the database, and sometimes we end up with more tables than I think we might need. But itıs not a big deal. If the Oracle CASE modelers add another table, Persistence just generates another Persistence object and any necessary relationships. We just navigate through the generated C++ function.
WHITE: We havenıt run into any performance problems. The most performance-intensive application weıve written -- the core of it is the forecaster -- has to read a 15 million-row table. It must run a forecast for 3000 or 4000 SKUs in a four-hour time frame. The first time a developer wrote it, the thing would run for 20 years. Now itıs down to a few dozen seconds per SKU. Itıll run for an hour or two.
EARLE: In the beginning, we wrote to the database constantly. In the last revision, we changed the application to move all the data into C structures. Persistenceıs cache eliminates the need to program even that in. We just read all the objects, which are essentially C structures, into the programıs cache, and we access them in memory transparently.
DBMS: Is the object model pretty much a mirror image of the underlying relational model?
EARLE: Yes. For example, we have two important business objects and a corresponding table in the database for both of them. One is called a "release order" and the other is called a "trailer route." A release order is like a purchase order. It goes to a manufacturer and says, "We need this much product." The trailer route tells a carrier, "Go pick up this set of release orders on this date from the manufacturer, and bring it to the sort-and-load center." The database has forecast and non-forecast versions of these two business objects, so we actually have four tables. Our forecasting module will use inheritance to create new objects, and store them in the appropriate RDBMS tables. When we need to report all the trailer routes, we just ask the base object for them, and it will use the two appropriate tables in the database. The object program deals with a trailer route as an object. It has a constructor, a destructor, and it has get and set methods. These methods just happen to write to fields and records.
DBMS: How do you deal with complex models -- such as time series and hierarchies -- that RDBMSs donıt handle well?
EARLE: One of our classes has five hierarchies. The database deals with them as foreign keys; we deal with them as an inheritance hierarchy. Itıs probably the most complicated model we have. But we donıt see a problem, because there is an object for each table. Remember, we are using Oracle CASE to design the database and generate DDL for the database objects (including tables, views, indexes, and so on). From the output of the CASE tool, we then create an object model in STP. So, at the lowest level of detail we have a table in the RDBMS. The abstraction above that is what we call a non-persistent object. Thatıs where we handle time series and hierarchies, using a combination of C++ objects and SQL database triggers.
DBMS: Why did you decide to write your apps in C++ and not in a 4GL?
EARLE: If thereıs a religious war here, itıs been about the GUI. Itıs come down to productivity and the fact that weıre using tools that are very new. We looked at Inmarkıs zApp. We looked at all the GUI generators on Unix and PCs. We looked at Oracle Forms and we looked at PowerBuilder and all the PowerBuilder-like tools. Some of the tool companies shot themselves in the feet. One 4GL company told us it would take six months to be productive with their tool, but we had only two months to generate everything. We already had the C++ people, and we already had the Persistence object model, so we deemed that these tools would provide the more productive solution and would get us to market faster than buying a new 4GL tool.
WHITE: We have a very aggressive schedule. Our management has committed us to be in business mid-summer. Some tools companies are suffering under their own success because thereıs a shortage of talent. We couldnıt find the people to do some of this work for us. Oracle, for example, is selling like hotcakes, so itıs tough to find people -- even within Oracle itself -- to work with the new Forms product and with CDE2 [now Developer/2000 and Designer/2000].
EARLE: This is our 1500-page requirement spec for every system in the company. Weıre literally creating from the ground up a FedEx and all the systems to run a FedEx. Itıs not just a project to do a GUI to access an existing database. We donıt have an existing database. After this spec was written, we had six months to get 400 tables into CASE -- with all the attributes and foreign-key relationships --and 800 to 1000 C++ classes into the object modeling tool, then get it coded and working. The only reason it has worked to this point is that the tools, with some flaws, have been interoperable. Oracle CASE talks to STP, which talks to Persistence. Without those three companies we would have been dead in the water. It would never have worked. Weıre at the point where the C++ guys are done with the CASE tools. The database people are still using CASE to create the database, so they donıt have to write DDL by hand. This is the first company I know thatıs ever done that. My last company would not spend any money on any of these tools. We wrote all the code and DDL by hand. The fact that these tools have come through has made the company. Two people have created the database for this company. There are 10 to 12 C++ developers, and we have user-interface development people and SQL development people. In the other companies Iıve worked, two people couldnıt do a database this size.
WHITE: If youıre talking about doing something this size and not using a CASE tool, forget it. The CASE tool is a godsend. You do a fair amount of work up front --the hip bone is connected to the thigh bone sort of thing. Then you push the big mahogany button, and out comes the DDL. The power of the CASE tool makes it all hang together. And it makes the segue into Persistence much easier.
DBMS: What were the main obstacles that you had to overcome in working with this toolset?
EARLE: The tools didnıt work together out of the box. We had to do that work ourselves. There was a lot of hard work and a lot of understanding of what the tools were doing. Each has its own scripting language. We had to write scripts in STPıs script language to get the DDL into STP, and then we had to make sure that the scripts that STP generated for Persistence were correct. Persistence has a script language, from which we generate the C++ code. In the beginning it was tough, because, in a way, we had to learn two new 4GLs and get them to work seamlessly.
WHITE: The missing pieces and the limitations were minor. We have been fortunate to be able to spend most of our working time on the problems of the application, and not on getting the tools to work perfectly. We didnıt have to work hard on the tools. Once we perfected the import and export procedures, we had a scheduler to run the jobs later.
DBMS: After your system goes live, youıll have a lot of consumer information. Do you see a market for that information?
WHITE: For now, our core business is optimizing the distribution channel, as we discussed. An aftermarket for the information will come later. There has been some talk of it, especially because Nielsen is one of our backers. It would be possible in the future to see shifts in consumer purchasing if one company had a sale over a week and another company didnıt. We could see the shift because we cover the entire market.
DBMS: Do you have a feature set planned for "version 2"?
EARLE: Because weıll have just one or two sort-and-load centers to start, we donıt need the software for talking between them. If we have the sort-and-load center, with one distributor, we donıt need to do much analysis on the multiplant pick-up. The main thing we need to have working now is forecasting and transportation optimization. We need to make sure inventory works, and that we put the stuff on the truck correctly, and receive orders correctly.