I went to a lot of trade shows at the end of 1997. One of the other press people described the products in the booths at these shows as going through three phases. The first year in the booth, they do not work but they sound nice and look pretty. The company would like you to give them money at this phase. The second year at the show, the products are actually working. You gave the company money and now you want results. The third year at the show, the vendors now have tools so you can manage the products you bought two years ago. Itýs finally time for real work, now that the learning curve has run its course.
To that end, you might want to look at some tools from BEZ Systems Inc. (www.bez.com) for measuring performance and capacity for Oracle on the IBM RS/6000 SP and NCR WorldMark MPP environments.
The tools are collectively named BEZPlus. Individually, the tools are Investigator, SerView DBA, Strategist, Appraiser, and CorpView DBA. Investigator is a performance-management tool that extracts measurement data, provides daily reporting, performs workload characterization, generates graphs, and creates a baseline model of existing performance bottlenecks. SerView DBA conducts a detailed analysis of SQL and tables accessed by different users ý a "drill down" of what you found with Investigator. Strategist models, evaluates, and predicts performance. This tool allows users to evaluate different alternatives and answer a variety of "what-if" questions regarding the impact of workload and database size growth, change in hardware configuration, scheduling, and more. Appraiser predicts performance for new applications and estimates the impact your database and application-tuning efforts have on existing applications. CorpView DBA shows the patterns in hardware use, arrival rate, response time by critical workloads, and how database objects are used. CorpView DBA can be used without the others if you wish.
BEZ feels that its market will be in data warehouse and distributed data mart environments, but this is useful stuff for production databases too.
How much do you really need management tools for a database? Are vendors just adding them in the third year to keep up revenue? That is one of those questions like "How much do you need a car?" The question does not give you enough data for a quick answer. New York City is not Wyoming. Let me give you one figure that someone came up with.
Anyone remember Btrieve Technologies? The company was a part of the former "Novell empire" that provided a nonrelational server database as an add-on. It also had a SQL that was pretty weak and was neglected while it was part of Novell. That seems to be changing now.
With one buyout and another, Btrieve left Novell and is now a separate company called Pervasive Software Inc. that went public in September 1997. This makes them a new company with a really good installed base and 11 years of history. Pervasiveýs databases are embedded in thousands of business solutions, including seven of the 10 most popular accounting packages.
A study Pervasive commissioned in October 1997 by the Aberdeen Group, a Boston-based computer industry research firm, reveals that Pervasiveýs Scalable SQL database is 52 percent less expensive to implement and maintain over a five-year period than Microsoftýs SQL Server. You can see a copy of the Aberdeen report on Pervasiveýs Web site at www.pervasive.com.
What are the big factors in the price difference? The first is that the two products are not completely compatible. Pervasive aims its products at software companies that want to embed those databases in their packages. This is not the same as a database that is designed to be used by multiple applications written in house.
The second factor is that Pervasiveýs Scalable SQL does not require ongoing support from a database administrator. According to the study, which is based on the total cost of a database application for 10 to 100 users over a five-year period, a single server running Microsoftýs SQL Server required up to 25 percent of a database administratorýs time to maintain the server and database, including adding and subtracting new users, redefining roles, and other ongoing maintenance chores requiring in-depth expertise in database operations.
I am sure that you will get higher personnel and management costs if you were to look at other SQL products ý Oracle, DB2, CA-OpenIngres, Informix, and other databases designed for larger-scale applications should be higher in absolute terms. But I do not know what percentage of the total costs their personnel would be. If anyone has figures for the effect of management tools on other SQL products, please share them with me.
The moral to the story is that personnel costs seem to be much higher than you would think.
More and more raw databases are being sent out on CD-ROMs because the price per unit to publish them and the cost of equipment to read them is so low per byte.
For example, you can get some of the Miller Freeman magazines on CD-ROM, and one Miller Freeman magazine, Dr. Dobbýs Journal (www.ddj.mfi.com), has been very successful with software and textbook collections on CD-ROM. If you missed an issue when you went to the dentistýs office, National Geographic and Mindscape are releasing 108 years of National Geographic on 30 CD-ROMs for $300 for educational institutions ý every page of every issue. As of this writing, I do not have a price for individual buyers. For more information, contact Mindscape at www.mindscapeschool.com or 800-231-3088.
On the same subject, a U.K. company, PAN Technology Ltd. (www.pantechnology.com), has launched CopyLok, a method of copy protecting CDs. CopyLok prevents copying on CD recordable devices and on expensive Laser Beam Recorders that produce replicators, glass masters, and stampers for injection molding. It allows any information to be loaded onto the hard disk of a computer, but will only allow that software to run if the original CD is in the drive. It stops friend-to-friend copying, illegal shop replication, and mass replication and also prevents Internet distribution.
The product looks good and will be popular with PC software and computer game companies, which have to deal with software piracy rates in China and Vietnam of 96 percent and 99 percent, respectively. CopyLok is the result of three years of research, and it is the first and only anticopying technology to have received Philips approval and a Philipsý Patent Application. It doesnýt contravene any of the Philips worldwide standards for CD-ROM recording, such as the Yellow or Blue Book.
The downside of all of this is that part of the protection scheme uses strong cryptography to recognize the CD. The Clinton administration is still opposed to strong crypto, so U.S. companies could be put in a situation where they have to publish their CD databases overseas to get protection. Once more, we disarm the victims and punish the innocent.
Along the same lines, the idea that security is something you add to the system after the fact ý if at all ý is amazing to me. I took a data warehouse class from Ralph Kimball in October 1997. One of his war stories was about a warehouse project whose creators were so proud of it, they made it available to anyone on their company intranet. They were thinking that it would be a good thing if the entire company could get this kind of information for their planning decisions.
The problem was that their suppliers had access to the intranet, and just about anyone outside the company could come in through the Internet via a supplier. This was an industrial spyýs dream come true. All the raw data for research and strategic planning of a competitor was there for the downloading.
You do not have to be such a Pollyanna to be a victim of data theft. I have a consulting job that involves one company downloading another companyýs proprietary database of timely logistical scheduling information and reselling it. Both companies use Internet sites to distribute their information.
The offending company was served with a court order to cease and desist, but at the time of this writing it is continuing to repost identical data within hours of its appearance on my clientýs site. The offender is claiming that it gets the data from unknown third parties. I think that I might have a few more columns on this topic in the near future.
Another source of fear on the Internet is how cookies work. (If you donýt know this term or others, visit the PC Webopaedia at www.sandybay.com/pc-web/ and look them up.) When you visit a Web site, you do not have a unique identifier and connection that the server can use to identify you. A cookie is a little "dropping" that a Web site can leave in a file on your hard disk. When you first visit that Web site, you will be asked to log on with your name and perhaps an email address or password. When you return to that particular site, the Web site can read your cookie and know who you are, without asking you again.
There are problems with this technology. The first is that people do not like having data written on their disk drives without their permission. The next problem is that any other Web sites can usually read all your cookies and get your identification codes. This is not very secure.
This monthýs puzzle is from Richard Romley. In this example we have a (nondata) warehouse made up of bins that contain inventory. This is a many-to-many relationship ý an item may exist in multiple bins and each bin may contain multiple items. The following table contains the quantity of a particular item in a particular bin. The primary key is, of course, item and bin.
CREATE TABLE Warehouse (item CHAR(5) NOT NULL, bin INTEGER NOT NULL, qty INTEGER NOT NULL CHECK (qty >= 0), PRIMARY KEY (item, bin));The table is populated with data for two items, ýAppleý and ýOrange.ý
INSERT INTO Warehouse VALUES (ýAppleý, 1, 5); INSERT INTO Warehouse VALUES (ýAppleý, 2, 0); INSERT INTO Warehouse VALUES (ýAppleý, 3, 2); INSERT INTO Warehouse VALUES (ýAppleý, 5, 9); INSERT INTO Warehouse VALUES (ýAppleý, 6, 1); INSERT INTO Warehouse VALUES (ýAppleý, 7, 6); INSERT INTO Warehouse VALUES (ýOrangeý, 1, 2); INSERT INTO Warehouse VALUES (ýOrangeý, 3, 3); INSERT INTO Warehouse VALUES (ýOrangeý, 4, 5); INSERT INTO Warehouse VALUES (ýOrangeý, 8, 1);
The problem is, given an item number and quantity, to figure out which bins to retrieve the item from, giving priority to the lower bin numbers. Show only the bins being used, and the quantity to be taken from each bin. If we want to get three apples, the results will be:
Results bin qty ============ 1 3
Because there are five apples in bin number 1, itýs the only bin we need. But suppose we wanted six apples, then we would want:
Results bin qty =========== 1 5 3 1
Now we take all five from bin number 1 but skip bin number 2 because it contains zero apples. Using the same logic, the solution for 11 items would be.
Results bin qty =========== 1 5 3 2 5 4
and so on. Can you do this in one query?
The best solution Richard Romley found is this query, which uses the parameters (:qty, :item) and some SQL-92 features.
SELECT W1.bin, CASE WHEN :qty >= SUM(W2.qty)
THEN W1.qty
ELSE :qty - (SUM(W2.qty) - W1.qty)
END AS qty
FROM Warehouse AS W1
INNER JOIN
Warehouse AS W2
ON W1.item = W2.item
AND W1.bin >= W2.bin
WHERE W1.item = :item
AND W1.qty > 0
GROUP BY W1.bin, W1.qty
HAVING SUM(W2.qty) - W1.qty < :qty
ORDER BY W1.bin;
This is a neat query that has a very powerful ýhavingý clause. Can you see why it works? The sum(w2.bin) is the total of the all bins up to this point, including the current bin, w1.bin. We are looking for the subset of bins that total just below the desired quantity. The algebra in the case expression then adds the fractional part in the current bin, w1.bin, needed to complete the desired total quantity.