Data mining is the latest craze sweeping the data warehousing and decision-support arenas. Data mining includes a variety of analytical techniques that promise to reveal previously unknown patterns and relationships in large transactional databases. (For more background on data mining, see Bruce Moxon's article, "Defining Data Mining," DBMS, August 1996, page S11.) The DataMind product family from DataMind Corp., a startup in the rapidly growing data mining market, is a Windows-based program targeted to end users who may be analytical power users but who are not steeped in the arcane mathematical, statistical, or artificial intelligence wizardry associated with the earliest data mining products.
DataMind runs on Windows 3.1, Windows for Workgroups 3.11, and Windows 95. (I did not try installing it on Windows NT.) The product uses Microsoft Excel for its user interface. The DataMind product family consists of three versions. DataMind Solo is designed for users whose data is stored in spreadsheet or text files. DataMind Professional Edition adds support for ODBC data sources, including RDBMSs. Unlike the other two end-user versions, DataMind DataCruncher is designed for use by MIS staff who create and run data mining studies on a server and then share the results with end users running DataMind Solo or Professional on their desktops. I evaluated DataMind Professional Edition using Windows 95 and Excel 7.0.
DataMind organizes your work into a study that defines input and output variables, one or more scenarios, and one or more data sources (also referred to as "domains"). A scenario is a collection of input and output variables. For example, one scenario might include all possible input variables, and another might be limited to a small number of variables. A study also includes numerous analytical reports, all of which are stored in a single spreadsheet file. The study and its reports represent an analytical model of your data.
Completing a study involves three procedures: discovery, evaluation, and prediction. The discovery process includes the initial analysis of your data to detect and quantify relationships. The evaluation process compares the derived associations with the actual value in each record's output field to determine how predictable the results really are. The prediction process applies the model to another data set. You will probably perform these steps any number of times to fine-tune the models you build.
Clicking on the DataMind icon launches Excel and starts a wizard that guides you through the creation of a study. After naming your study, you must choose a data source; your options are an ASCII text file, a selected range in an open Excel spreadsheet, an Excel file on disk (possibly exported from another database), or an ODBC data source. Unless your complete database is relatively small, your discovery data set will probably be a sample.
The data wizard's next step shows the field names, their usage, and the number of unique values in each field. Double-clicking on a field name displays these unique values as an indented list below the field name. At this point you typically choose an output field. If you do not, DataMind will proceed, but it will perform a segmentation analysis, a statistical procedure that groups records with similar characteristics. You can also mark fields such as address to be ignored, and you can indicate if a field is discrete or continuous. If a numeric field such as age contains numerous values, you should probably create groups. DataMind automatically creates five groups when you change a field from discrete to continuous. I could not find a way to revise the group boundaries while in the wizard, but the Scenario Specification dialog accessible from the Control Center or menus lets you perform such a revision later.
Clicking on a report icon causes DataMind to generate the report in a new sheet within the Excel workbook. The new report sheet becomes active; you can return to the Control Center to view other reports by clicking on the Control Center tab.
A DataMind model includes numerous canned reports. The discovery reports reveal the associations and relationships divined by DataMind. A good starting point is the Discovery Model Summary, which shows each output field's values and the input field values most closely associated with each outcome. (See Figure 4.) Clicking on a "+" icon expands the display to show all input values. A floating toolbar with a single button provides access to additional discovery views that use charts and graphs to display more detail about how each input criteria (specific field-value pairs such as "Gender=Female") affects each output value. Other discovery reports summarize the study specification and the distribution of data used in the study. I found it helpful to study all of these reports to gain a thorough understanding of what DataMind's model is trying to say.
The Excel reports are quite colorful and well designed, but you should become familiar with DataMind's terms in order to interpret the results correctly. (For example, the vague term "specific criteria" indicates input variables that are always associated with an output variable.) You can also generate reports into a Microsoft Word document. DataMind summarizes the study variables and the criteria that affect each output variable's values. The Word reports use natural language statements and include some definitions of DataMind's terms.
You can also run an evaluation process against another dataset. However, specifying another dataset could be easier. When you run the evaluation process, a dialog asks if you want to use the discovery dataset. Answering "no" does not lead to a dataset selection dialog as I expected; you must dig around in other dialogs.
DataMind performs two types of predictions: batch and case. A batch prediction applies the model to a new dataset. A case prediction lets you examine and manipulate each record.
The Batch Prediction Summary report includes a row for each record and columns for each field (variable), plus additional columns indicating the three most likely predicted outcomes. If you have many input fields, this report will be hard to digest. I also had to click on the "+" icon because not all of the report columns were initially visible.
The Case Prediction option uses a form-like dialog that displays one record at a time, with each field and value listed vertically down the window. The most likely prediction is displayed above the fields, and - because this is a drop-down list - you can also see the second and third most likely predictions. The best part is that you can play what-if games in this window by altering values (such as decreasing income levels) to see the impact of the change. A "Why" button displays another dialog that lists each criteria and its impact on the output variable.
The thin user guide - which includes a brief tutorial - explains the basics, but a product that performs a function new to most users should provide more background information and more than one tutorial example. I was unable to find instructions for several tasks such as creating a domain (dataset) in either the manual or the help system.
At press time, DataMind Corp. has plans to release version 1.1 in mid-September. This upgrade will support an unlimited number of discovered relationships for any single output value (version 1.0 can only discover up to 2000 relationships). It will also import text files delimited with commas, spaces, and any user-defined delimiter (version 1.0 imports tab-delimited data). Also, DataCruncher will be available on Windows NT and HP-UX 10.x.

