DBMS Interview - September 1994
Based on database theory, object-oriented principles, and cognitive science, Wall Data's Semantic Object Modeling promises to turn non-technical users into database design experts.
Imagine that you own an ice cream parlor and you want to design a system to automate the purchase and management of your operation. You must keep the chocolate chip in stock, and prevent the rum raisin from spoiling. You sit down at the keyboard, and in a few hours -- with no special training -- you produce a complete forms-based application and relational database schema in Domain-Key Normal Form. Thus, instead of spending hours learning arcane computer skills, you have extra time to sample the wares. This is the vision of Dr. David M. Kroenke, chief technologist of Wall Data Inc.'s Salsa Business Unit. Kroenke is the creator of Semantic Object Modeling (SOM), an approach to building applications that lets users structure data in the language of their businesses, instead of the language of computers. He began developing SOM in 1987 and described it in the third edition of his popular textbook, Database Processing, (Prentice-Hall, 1988). In 1992, he received funding from Wall Data to develop products from his technology. In April 1994, the company formed the Salsa Business Unit to develop commercial products based on the technology, and today offers an academic version of the Salsa modeling tool that generates relational database schemas from semantic object models.
Kroenke's experience in database management and application development technology spans more than 25 years. He's taught at a number of universities, including the University of Washington where he was the Hanson Professor of Management Science (1990 to 1991). From 1984 to 1986, Kroenke served as Microrim's vice president of development, managing the team that developed R:Base 5000. He's also held various positions as a programmer, systems analyst, and software manager, specializing in database management technology.
DBMS Editor in Chief David Kalman interviewed Dr. Kroenke at the June 1994 Database and Client/Server World Conference in Boston. In that conversation, Kroenke discussed the origins of SOM, explained its benefits, and hinted at its potential to simplify the use of database technology.
KROENKE: I started in the database business in 1968, before it had a name. I was in the military, working on a project to build a simulation of World War III. In that simulation, we had all the classic database problems, even though we didn't know that's what we were solving.
In 1973, I got out of the military and became a professor at Colorado State University. The relational model had just come out, and I became excited about that. In 1975, I wrote the first edition of my database book, then I left teaching and became a database disaster repair person. The vendors were selling DBMS products left and right, but nobody knew what to do with them. People would start projects, then have all sorts of problems.
Later, I worked with Boeing on the IPAD project to build a DBMS for engineering data, and out of that project came a product called RIM (Relational Information Manager), which later became R:Base.
When Wayne Erickson started Microrim, I joined the company and worked there until 1986, managing the team that produced R:Base 5000.
When we shipped R:Base 5000, I happened to be in customer support and took a call that haunted me and has driven my thinking over the last six or seven years. The caller said, "I bought your product to track my sales orders. So I looked in the documentation under 'S' for Sales Order. Instead, I found Select and Sort By. I thought maybe you had called it Invoice, so I looked under 'I,' and I found Index and Inverted List. This can't be a rare problem."
He was operating at a high level of abstraction, and we were down at the table level. That was when I first began to look for ways in which we could deliver the benefit of database technology without forcing people to learn it. The vehicle I drive to work has a transmission; I know nothing about the transmission, the fluid dynamics, or any part of it, yet it gets me to work.
DBMS: Some database experts insist that users must understand the underlying technology...
I completely disagree with that. It's a great mistake. Suppose I take my car to a repair shop. I park my car and people come running out. The mechanic grabs a left front fender, tears it off, and puts it in a pile of left front fenders. Then another mechanic comes out and grabs the right front fender and throws it in a pile of right front fenders, and grabs the steering wheel and throws it in a pile of steering wheels. Why are they doing that? Because it's convenient to manage the garage like that. From the user's perspective, this is idiotic. The user shows up with a sales order, and we tear it up into pieces: the customer, the salesperson, the order, and the item pieces. Why? For the convenience of managing the garage.
We did that in 1970 because it was all we could do with the hardware that we had at the time. If we want to tear the "semantic objects" apart for our benefit, that's great. But why should we expose that to the user? The user shows up with a sales order. They want to see a sales order. That's all they want to see. Why should they have to know anything at all about the underlying tables? Tables are the two-by-fours from which users construct houses. They want to live in houses, not piles of two-by-fours.
DBMS: How do you define "user" in that context?
I use the term "user" as someone who wants to benefit from the technology. One of our goals is to push that as far out as we can. If we can make the technology totally transparent, then the user could be the ultimate end user. It's a joke when vendors say, "We're going to have real end users use a relational DBMS." They are exposed to tables, referential integrity, intersection tables, and so forth. The typical end user can't deal with that stuff, and it doesn't seem to me that they should have to. If users can describe their world, we should be able to deal with the technology and hide it from them.
DBMS: What were the first steps you took in exploring how to hide the technical detail from the user?
After I had that conversation about the sales order, I thought to myself, I've really got a problem here, because I call myself an information scientist, but I don't know what information is. What's it made of? Where is it? If I take a piece of paper and put in front of my dog, is there information on the paper? Probably not. If I put it in front of a human being, perhaps there is. How do know when I have more information? Do I weigh more?
I began to do some reading and found that there are 200 or 300 years of history in the discipline of philosophy on these very questions, and I determined that a database is not a model of reality. I think Immanuel Kant was right; reality is forever unknowable. So what is a database? It's a model of somebody's model. Human beings create a model of their world, and a database is a model of that model.
I then began to focus on how people process symbols. How do we form these models in our heads? What's the natural way that we think about things? What's the architecture, if you will, of the human mind? In 1968, we were totally involved in the machine. Somewhere about the time I had the sales order conversation in customer support, I decided to do an about-face and put as much focus on the human mind as I had been putting on the architecture of the machine. If I could figure out how people model data and give them a tool with which they could express how they have it modeled, then I could compute the application program.
It's algorithmic. Where's the art in this? It's in forming the data model. Once I have what we call the semantic object model, the rest should be computable.
No businessperson would ever say they want to convert a one-to-many relationship to a many-to-many relationship. They would say, "It used to be that employees could be assigned to only one department and now they can be assigned to many departments." When they say that, the consequences on the databases and the application are severe, but they're also computable. We know exactly what's supposed to happen. We need to create an intersection table. We need to take the foreign keys from one table and fill the intersection table. We need to change all the SQL statements from processing two tables to processing three, using the intersection table. How do we do that today? By hand. How ridiculous. I find myself doing the same things over and over and over again. Why not let the computer do this?
Between 1987 and 1989, I read philosophy, cognitive science, Immanuel Kant, and the hermeneutic philosophers, thinking about how we process symbols. A database application is an application for keeping track of symbols. About the time that Microsoft delivered Visual Basic, I began to do some prototypes. During this time, I was self-employed. I'd do some consulting and make enough money to continue working on this model, and back and forth. I built a prototype, and showed it to John Wall and Jim Simpson in early 1992.
DBMS: How did your work relate to their products and technology directions?
Our thinking was very similar. The Rumba family of products gave users the benefit of communications technology without forcing them to learn it. I wanted to give users or systems analysts the benefit of database technology without forcing them to learn or deal with it. Wall Data invested in that technology, and are now in the process of building products around it.
DBMS: What is a semantic object?
A semantic object is something you want to keep track of. Let's pick something from your world, such as Issues, Articles, and Authors. (See Figure 1, page 64.) In our modeling tool, you build these semantic objects from profiles. A profile is a generic data type that consists of a name and some properties. You can think of profiles as semantic domains. For example, you would use the PersonName profile in your Authors semantic object [or in any other semantic object that models a person]. This profile has a text data type with a length of 30. [When you use a profile in a specific semantic object, it becomes an attribute, inheriting all the properties of the profile. You can then customize the attribute. You can also customize the profiles.]
To create an attribute in a semantic object, you just grab a profile and drop it in the semantic object [frame]. Some of these profiles are groups such as Phone, which is a combination of area code and local number. This is what the cognitive scientists call a semantic chunk. I want to chunk that, and put it under one name. It's like the story of Rumpelstiltskin; you say his name and he disappears. You say phone, and whatever's in phone disappears.
This is one example of the importance of modeling as opposed to creating tables. This will all end up in one table -- Name, Area Code, and Local Number -- but there's a richness here. We don't lose the fact that Area Code and Local Number are related to each other in the user's mind under another name called Phone. Typically, in just a relational structure, we lose that. We can do this again, with Fax and Address (which is maybe another group). Now, we can do the same in the Issue semantic object, adding Volume and Date.
Notice that I'm making up facts about your world, which is, in my mind, a great violation. I'm building a model of your world and, by the way, I'm telling you what your model is. This is part of the arrogance that information scientists sometimes have.
DBMS: What kind of information does each semantic object contain?
A semantic object is described by a set of attributes, some of which are action-oriented, and some of which are data-oriented. For example, a semantic object Order Entry Clerk has primarily action-oriented attributes, such as "open this form," "print this report," and "do these things at the end of the month." Other kinds of semantic objects, such as an Order, are primarily data. Some attributes, such as Salesperson, can be a blend of actions and data attributes.
DBMS: How does the model handle relationships?
Back to the DBMS magazine example, the Article semantic object can have an Issue Number, an Issue Date, and one or more Authors. The Author semantic object can have Author Name, Phone, Fax, Address, and one or more Articles. The model doesn't show the relationship [between Authors and Articles] per se. When we give the Article semantic object the Author attribute, the system automatically creates a relationship from the Author semantic object to the Article semantic object. We infer that it's a many-to-many relationship.
This points out one of the fundamental problems with the Entity-Relationship [ER] model. People don't think of relationships that way. In the classroom, I'd just have to bang on the table to get people to think of one-to-one, one-to-many, and many-to-many relationships. People do think in terms of how many articles an author can write and how many authors an article can have. In this example, we have a many-to-many relationship. But people don't think that way. We shouldn't make people tell us that way. We're trying to provide a tool whereby people can express things in the way that they think about them, and we'll compute the consequences.
DBMS: How do semantic objects deal with expressions other than simple attributes?
When you drop a formula -- an expression -- into an object, it makes sense of itself in the semantic context. So I could say that Revenue is the sum of Payments. If we're modeling apartment management. When we drop Revenue into Apartment, it's the sum of payments on the apartment, If I drop it into Lessor, it's the sum of payments made by the lessor. If I drop it into Owner, ifs the sum of payments of all these buildings, of all these leases, and so on.
DBMS: Would you call that "polymorphism"?
That's not exactly the word I would use. But one of the things that's important about this technology is that it really is the integration of all the modeling concepts, with a lot of the object thinking, with some of this cognitive science and some philosophy thrown in.
After adding several profiles that have many properties, a model could quickly get very complicated...
One of the things technologists sometimes say is that end users are afraid of complexity. We've had end users using this product. They've built very complex semantic object models on their own. We had a yacht broker build an album (a semantic object model) that some of our C++ programmers thought was incredibly complicated. I don't think end users are afraid of complexity, as long as it's their complexity.
DBMS: How does SOM deal with the issue that different people view things differently and may use different semantics?
The users' views can be either inconsistent or consistent with each other. There's nothing we can do if the views are inconsistent. We can't build a database; we can't build an application. If one user says that an Article has at most seven authors, and another user says it can have 85 authors, there's nothing we can do. The users must ferret out that inconsistency.
The graphics of this are significant. In our modeling tool, the object frame is a boundary. Here's an Author with a Name, Phone, Fax, Address, and a bunch of articles. Article is part of Author. They're not separate. When we think about Author, we think of the articles that author has written. Article is part of Author and, similarly, Issue has a bunch of articles. You probably want to know, what was the issue that had the article on such and such? Article is clearly part of Issue. We should model it that way. We shouldn't string all these entities out with all these little diagrams and the diagrams between them. Let's show people the boundaries as they are in their minds.
DBMS: What is the key benefit of SOM?
Several years ago, I had to build a database application to package two-by-fours into boxcars in the context of a shipment. A couple of weeks later, I had to package employees into offices in the context of a department. They were identical applications. The fundamental structure of the two applications was the same. My whole job was to find creative ways to use search and replace to change the words of one application into the words of the other. I wanted the ability to work at a higher level of abstraction.
I have an album here that the [previously mentioned] yacht broker built. With a few simple changes, we could use it to sell airplanes or art. The process of selling expensive things, such as art, airplanes, or yachts, is pretty much the same. By working at a higher level of abstraction, you can do things much, much faster.
DBMS: This is a decidedly non-technical approach to modeling. How does it relate to formal, object-oriented methodologies?
James Martin wrote a book years ago in which he stated that great engineering is simple engineering. If we start the modeling problem in the wrong place, we end up with all these little loose ends, funny terms, and silly stuff. If we start it correctly, it will be simple. I've been in the database business 25 or 26 years. There's no database or modeling problem I've ever had that we can't model here. Yet it seems very simple because it reflects what's in the user's mind.
If we slice the data-modeling problem in the right place, we give people the ability to ask, "Is this required or not required? How many of these things can you have? Is there an identifier? Are they grouped?" With that simplicity, we can model an incredible number of things, because those are the constructs that we have in our minds. If a semantic object is not simply explainable to a user, then it probably does not exist in that user's world, and we should not be dealing with it.
Tom Marshall at Auburn University did a study of this technology and found that it was incredibly transparent. He taught management majors with no experience in data modeling, gave them three hours of training with the technology, and they outperformed the graduates of his 45-hour database class.
DBMS: At some point you have to resolve the semantic object model to code, or expose it in a way that a programmer can use to construct an application. How does the abstraction become code?
I don't think we need to generate code. With this version of the technology -- which we made available to the academic community -- we can generate a schema. Students are using it right now to generate schemas for SQL DBMSs, Paradox, and Microsoft Access.
After you generate the schema, the developer still has to deal with tables... I'm talking about potentials, not products. Let's look at the semantic object, Author. How hard do you think it would be to generate a form from that semantic object?
We could generate a form. It would be no harder to connect Author to Article. If somebody said, "Give me all the authors that have ever written an article about SQL optimization," you could type into the form, "SQL optimization" and do query-by-form. The structure of the application is encoded in the semantic object.
DBMS: How does it generate an appropriate relational schema?
We generate fully normalized schemas in Domain-Key Normal Form (DKNF), which is a much tighter Normal Form than Third Normal Form. If you want to denormalize, then model the denormalization. If you want to stick employee number, employee name, department number, and department phone into a semantic object, put it there. We'll generate a denormalized table.
What is the key to normalization? It's finding functional dependencies and analyzing them. Where do these functional dependencies come from? Semantics. The relational model has semantic underpinnings. Nobody ever talks about that. How do you find out what a functional dependency is? You go ask somebody, "What does this mean?"
As someone who's spent many hours in the classroom teaching normalization, I can say that it's been needlessly overcomplicated. Mrs. Gazernenplatz, our 8th grade English teacher, taught us normalization. She said that a paragraph should have a single theme, and if it had two themes, we should make it two paragraphs. That's normalization theory. I've written textbooks on this subject, and taught it for years. If you've got a table that has employee name, employee number, department name, and department phone, it has two themes. So break it up.
DBMS: Working at the level of semantic objects, why deal with tables at all?
My goal is to hide tables from everybody -- from users, from systems analysts, and from myself. If I never have to write another SQL statement in my life, I'll be real happy. Our goal is that if we build this model, we can compute an application, and totally hide the tables. But we do have to remember that we live in a relational word, so we need to bind back to tables. We need to use what's there. One of the problems of object database management systems (ODBMSs) technology is that it asks people to make a leap. People don't do that. We're bringing forward some of the ideas of object thinking, tied back to relational engines.
DBMS: ODBMS vendors often talk about the "impedance mismatch" between object-oriented development environments and relational databases...
SOM is an object-oriented model, and we compute across this impedance mismatch. It can be done. We're doing it. I think it should be done, but a computer program should do it.
DBMS: How can SOM capture information about complex processes? For example, how would I model a rule to "send Dave e-mail when inventory reaches five?"
Even with this technology, you get to a point where you may need to write some code. It should be the exception, not the rule. All the code that has to do with materializing the form, or doing an intersection table to model the many-to-many relationships, should be done automatically.
When we get to the point of defining rules, such as, "If an order contains nuclear fuel, and its going across the Mississippi River, we want specific things to happen," you will probably need to write some code. But every time somebody has to write some code, we have failed.
DBMS: Is there a higher level of abstraction to which you could apply a process methodology?
Yes. With our current technology, there are exciting things you can do with computed subtypes. [A subtype is a specialization of another semantic object.] Suppose you say a Gun is an item with a part number that starts with "G." A Minor is a person younger than 21. We can then state the constraint, "Don't sell guns to minors."
What you said a minute ago about "when this changes from this to that, do something," typically reflects some subtype change. For example, what's the difference between an order in process and a shipped order? A shipped order is an order where the shipping date is not null. Then, we might say something like, "When an order ships, do this." Businesspeople are talking about these things in their own vocabulary.
I used to think, if I could just think of the right set of questions, I could get the end users to tell me what they really want. The disadvantage of writing code is the maintainability issue. Change is the essence of this business.
DBMS: Again, it seems as though we're back to the issue of cognitive processes...
There's this wonderful theory, structuration theory, by the British sociologist Anthony Giddens. The best way to describe it is with the sketch by M.C. Escher of one hand drawing the other. That's the relationship between users and information systems. They don't just influence each other, they create each other. As soon as I deliver an information system, I enable the end users to do something new. As they do new things, they have a new set of requirements for the information system. I build the information system, change it, then deliver it again. And, again, they will have a new set of requirements for the information system. We have to plan on change. We have to assume it. The very presence of the thing we will produce will cause change.
When you write application code, it's hard to change. We had a rule here earlier that said that there could be no more than seven authors for each article. If I embed that rule in code, no non-programmer will ever be able to change it. If we put that limit right there in the model, then a nonprogramming person could come along and say, it's no longer seven, it's 14.
I don't want to write code because the semantics get buried in the application. My goal is to eliminate code completely. We're not there yet, but I'd like to get to the point where we can declare what should happen, and let the computer figure out how to do it.
DBMS: How does this vision of letting the computer do the work translate into a product strategy?
We're not prepared to discuss new products now. We have created the schema generator. This is not where we want to be in the long term, but we wanted to put it in the hands of students. I first wrote about the semantic object model in the third edition of my book. A lot of professors said it was interesting, and saw its advantages over the ER approach, but there were no tools for it. So Wall Data decided to develop this tool and make it available. It's available now to the academic community. There are now two text books on SOM, and approximately 100 universities are teaching it.
Recently, Wall Data created the Salsa Business Unit to develop commercial products based on the technology. This business unit has a development team, a couple of marketing people, and a business unit manager on board. We're just starting to have the initial conversations with product influencers to get a feel for the technology. We're not talking about specific products, but that won't be too far off.
DBMS: How do users avoid making mistakes in the model that would then propagate into the schema?
The tool's validation capability is one of its strengths. We try to catch errors as soon as we can, but there are some errors that we can only catch globally. For example, the tool can tell us about global logical inconsistencies in our model. We modeled an example from the InfoModeler [Asymetrix Corp.] documentation, and found an error in it. We know of [researchers] who are using the tool to model DNA. They don't want to generate a schema or anything like that, but they like the semantic chunking, relationships, and validation.
DBMS: How can SOM be used in the inverse, as a query mechanism?
There's a very close correlation between a semantic object and a form. You materialize a semantic object as a form. To construct a query, the user can type conditions into the form and say, "Go find it." People like to see their data in context.
DBMS: How does SOM handle data in unforeseen ways? In 10 years, I might want to view my author payments over time, and compare the rate of increase over years or quarters.
If you want to do something like that, build an object of it. Let's take a weird report. Suppose you want to take the phase of the moon, then cube it, and then find everybody who's percent of sales is less than the cubed phase of the moon. That's not something we can easily anticipate! We could build a semantic object of that query or report. We have the phase of the moon, then a formula to cube the phase of the moon. Then we have an action that says, "Print a report and specify the predicate using the cubed phase of the moon." If there's something you can't readily do with query-by-form, then build a semantic object of the query.
DBMS: How would SOM handle security?
Model security with the objects. If you have departments that can do some things, and some types of users within departments that can do other things, then model those things. Model the Department. Model the subtypes of Users and give them different sets of actions.
DBMS: SOM is now being taught in schools and published in books. What's to keep other vendors from implementing it before you do?
It's always a risk. There's a great old poem by Rudyard Kipling, and I think it goes something like this: "They can copy what I do, but they can't copy my mind, so I'll leave them sweating and stealing, and a year-and-a-half behind." All we have (those of us in smaller companies trying to develop new ideas) is that year-and-a-half lead.
Album -- A container window for related semantic objects in the Salsa software.
Profile -- A predefined semantic "chunk" that may consist of one or more related attributes. For example, a Phone profile may consist of Area Code and Local Phone attributes.
Semantic Object Modeling (SOM) -- A formal method that allows non-technical users to build data models by describing entities ("semantic objects") in their world. Semantic objects consist of attributes (fields) that describe the object. The software can then validate the models' logic, infer the complex relationships hidden within it, then automatically generate appropriate relational database schemas and applications.
Subtype Semantic Object -- A specialization of another object, where basic characteristics are inherited from a supertype object.