Reviews & Opinions
Independent and trusted. Read before buy SAS Enterprise Miner 6 2!

SAS Enterprise Miner 6 2


Bookmark
SAS Enterprise Miner 6 2

Bookmark and Share

 

SAS Enterprise Miner 6 2About SAS Enterprise Miner 6 2
Here you can find all about SAS Enterprise Miner 6 2 like manual and other informations. For example: review.

SAS Enterprise Miner 6 2 manual (user guide) is ready to download for free.

On the bottom of page users can write a review. If you own a SAS Enterprise Miner 6 2 please write about it to help other people.
[ Report abuse or wrong photo | Share your SAS Enterprise Miner 6 2 photo ]

 

 

Manual

Download (English)

 

SAS Enterprise Miner 6 2

 

 

User reviews and opinions

<== Click here to post a new opinion, comment, review, etc.

Comments to date: 5. Page 1 of 1. Average Rating:
burbanker 7:01pm on Friday, September 24th, 2010 
I came into Vanns on a whim on the iPads launch day not really expecting to see any there still available. I replaced my first-gen iPod Touch, which I had since they first came out a few years ago, with this new beast of a device. First of all.
Thierry Seunevel 3:58am on Monday, July 26th, 2010 
Awesome game player, and has replaced my laptop but I do not have to need for business and so I do not know about how those work. Great for traveling,...
casueps 1:56pm on Monday, May 31st, 2010 
Fast reliable seller I live in Eastern Europe, the The condition of the product as listed. Factory seal. The delivery. The best for what it is, BUT DONT BUY FROM AMAZON.
jerrmill 6:49am on Tuesday, May 4th, 2010 
This product is EXACTLY what I wanted. It fits perfectly and it got here very fast. The item was all that the description said it would be! I am very pleased with this product and would recommend it to friends.
daisy 7:15am on Sunday, March 28th, 2010 
Bought the 16G WiFi for my wife. She enjoys playing games, surfing the web, reading books, reading email and catching up on her Soaps at ABC.com.

Comments posted on www.ps2netdrivers.net are solely the views and opinions of the people posting them and do not necessarily reflect the views or opinions of us.

 

Documents

doc0

Applying Data Mining Techniques Using Enterprise Miner

Course Notes

Applying Data Mining Techniques Using Enterprise Miner Course Notes was developed by Sue Walsh. Some of the course notes is based on material developed by Will Potts and Doug Wielenga. Additional contributions were made by John Amrhein, Kate Brown, Iris Krammer, and Bob Lucas. Editing and production support was provided by the Curriculum Development and Support Department. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. Applying Data Mining Techniques Using Enterprise Miner Course Notes Copyright 2002 by SAS Institute Inc., Cary, NC 27513, USA. All rights reserved. Printed in the United States of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. Book code 58801, course code ADMT, prepared date 05Apr02.

For Your Information

Table of Contents
Course Description.... v Prerequisites..... vi General Conventions....vii Chapter 1 1.1 1.2 Introduction to Data Mining... 1-1
Background.....1-3 SEMMA....1-15 Predictive Modeling Using Decision Trees.. 2-1
Chapter 2 2.1 2.2 2.3 2.4
Introduction to Enterprise Miner...2-3 Modeling Issues and Data Difficulties...2-20 Introduction to Decision Trees....2-37 Building and Interpreting Decision Trees...2-46 Predictive Modeling Using Regression... 3-1

Chapter 3 3.1 3.2

Introduction to Regression....3-3 Regression in Enterprise Miner....3-8 Variable Selection... 4-1

Chapter 4 4.1

Variable Selection and Enterprise Miner...4-3 Predictive Modeling Using Neural Networks.. 5-1

Chapter 5 5.1 5.2

Introduction to Neural Networks...5-3 Visualizing Neural Networks...5-9

Chapter 6 6.1 6.2 6.3

Model Evaluation and Implementation.. 6-1
Model Evaluation: Comparing Candidate Models...6-3 Ensemble Models....6-10 Model Implementation: Generating and Using Score Code..6-16 Cluster Analysis... 7-1

Chapter 7 7.1 7.2

K-Means Cluster Analysis...7-3 Self-Organizing Maps....7-24 Association and Sequence Analysis.. 8-1

Chapter 8 8.1 8.2 8.3

Introduction to Association Analysis...8-3 Interpretation of Association and Sequence Analysis..8-7 Dissociation Analysis (Self-Study)...8-24 References... A-1

Appendix A

A.1 References.... A-3 Appendix B Index.... B-1

Course Description

This course provides extensive hands-on experience with Enterprise Miner and covers the basic skills required to assemble analyses using the rich tool set of Enterprise Miner. It also covers concepts fundamental to understanding and successfully applying data mining methods. After completing this course, you should be able to identify business problems and determine suitable analytical methods understand the difficulties presented by massive, opportunistic data assemble analysis-flow diagrams prepare data for analysis, including partitioning data and imputing missing values train, assess, and compare regression models, neural networks, and decision trees perform cluster analysis perform association and sequence analysis.

To learn more

A full curriculum of general and statistical instructor-based training is available at any of the Institutes training facilities. Institute instructors can also provide on-site training. For information on other courses in the curriculum, contact the SAS Education Division at 1-919-531-7321, or send e-mail to training@sas.com. You can also find this information on the Web at www.sas.com/training/ as well as in the SAS Training Course Catalog.
For a list of other SAS books that relate to the topics covered in this Course Notes, USA customers can contact our SAS Publishing Department at 1-800-727-3228 or send e-mail to sasbook@sas.com. Customers outside the USA, please contact your local SAS office. Also, see the Publications Catalog on the Web at www.sas.com/pubs for a complete list of books and a convenient order form.

Prerequisites

Before selecting this course, you should be familiar with Microsoft Windows and Windows-based software. No previous SAS software experience is necessary.

General Conventions

This section explains the various conventions used in presenting text, SAS language syntax, and examples in this book.
Typographical Conventions
You will see several type styles in this book. This list explains the meaning of each style: UPPERCASE ROMAN italic is used for SAS statements, variable names, and other SAS language elements when they appear in the text. identifies terms or concepts that are defined in text. Italic is also used for book titles when they are referenced in text, as well as for various syntax and mathematical elements. is used for emphasis within text. is used for examples of SAS programming statements and for SAS character strings. Monospace is also used to refer to field names in windows, information in fields, and user-supplied information. indicates selectable items in windows and menus. This book also uses icons to represent selectable items.

bold monospace

select

Syntax Conventions

The general forms of SAS statements and commands shown in this book include only that part of the syntax actually taught in the course. For complete syntax, see the appropriate SAS reference guide. PROC CHART DATA= SAS-data-set; HBAR | VBAR chart-variables </ options>; RUN; This is an example of how SAS syntax is shown in text: PROC and CHART are in uppercase bold because they are SAS keywords. DATA= is in uppercase to indicate that it must be spelled as shown. SAS-data-set is in italic because it represents a value that you supply. In this case, the value must be the name of a SAS data set. HBAR and VBAR are in uppercase bold because they are SAS keywords. They are separated by a vertical bar to indicate they are mutually exclusive; you can choose one or the other. chart-variables is in italic because it represents a value or values that you supply. </ options> represents optional syntax specific to the HBAR and VBAR statements. The angle brackets enclose the slash as well as options because if no options are specified you do not include the slash. RUN is in uppercase bold because it is a SAS keyword.

Types of Targets

Supervised Classification Event/no event (binary target) Class label (multiclass problem) Regression Continuous outcome Survival Analysis Time-to-event (possibly censored)
The main differences among the analytical methods for predictive modeling depend on the type of target variable. In supervised classification, the target is a class label (categorical). The training data consists of labeled cases. The aim is to construct a model (classifier) that can allocate cases to the classes using only the values of the inputs. Regression analysis is supervised prediction where the target is a continuous variable. (The term regression can also be used more generally; for example, logistic regression is a method used for supervised classification.) The aim is to construct a model that can predict the values of the target from the inputs. In survival analysis, the target is the time until some event occurs. The outcome for some cases is censored; all that is known is that the event has not yet occurred. Special methods are usually needed to handle censoring.

1.2 SEMMA

Define SEMMA. Introduce the tools available in Enterprise Miner.
Sample Explore Modify Model Assess
The tools in the Enterprise Miner are arranged according the SAS process for data mining, SEMMA. SEMMA stands for Sample - identify input data sets (identify input data, sample from a larger data set, partition data set into training, validation, and test data sets). Explore - explore data set statistically and graphically (plot the data, obtain descriptive statistics, identify important variables, perform association analysis). Modify - prepare the data for analysis (create additional variables or transform existing variables for analysis, identify outliers, impute missing values, modify the way in which variables are used for the analysis, perform cluster analysis, analyze data with SOMs or Kohonen networks). Model - fit a predictive model (model a target variable using a regression model, a decision tree, a neural network, or a user-defined model). Assess - compare competing predictive models (build charts plotting percentage of respondents, percentage of respondents captured, lift charts, profit charts). Additional tools are available under the Utilities group.

Sample

Input Data Source

Sampling

Data Partition
Sample Nodes The Input Data Source node reads data sources and defines their attributes for later processing by Enterprise Miner. This node can perform various tasks: 1. It enables you to access SAS data sets and data marts. Data marts can be defined using SAS/Warehouse Administrator software and set up for Enterprise Miner using the Enterprise Miner Warehouse Add-Ins. 2. It automatically creates the metadata sample for each variable in the data set. 3. It sets initial values for the measurement level and the model role for each variable. You can change these values if you are not satisfied with the automatic selections made by the node. 4. It displays summary statistics for interval and class variables. 5. It enables you to define target profiles for each target in the input data set. For the purposes of this document, data sets and data tables are equivalent terms. The Sampling node enables you to take random samples, stratified random samples, and cluster samples of data sets. Sampling is recommended for extremely large databases because it can significantly decrease model training time. If the sample is sufficiently representative, relationships found in the sample can be expected to generalize to the complete data set. The Sampling node writes the sampled observations to an output data set and saves the seed values that are used to generate the random numbers for the samples so that you may replicate the samples. The Data Partition node enables you to partition data sets into training, test, and validation data sets. The training data set is used for preliminary model fitting. The validation data set is used to monitor and tune the model during estimation and is also used for model assessment. The test data set is an additional holdout data set that you

of the interval target is computed from the value model and optionally adjusted by the posterior probabilities of the class target through the bias adjustment option. It also runs a posterior analysis that displays the value prediction for the interval target by the actual value and prediction of the class target. The score code of the Two Stage Model node is a composite of the class and value models. The value model is used to create the assessment plots in the Model Manager and also in the Assessment node. These modeling nodes utilize a directory table facility, called the Model Manager, in which you can store and assess models on demand. The modeling nodes also enable you to modify the target profile(s) for a target variable.

Assess

Assessment Reporter
Assess Nodes The Assessment node provides a common framework for comparing models and predictions from any of the modeling nodes (Regression, Tree, Neural Network, and User Defined Model nodes). The comparison is based on the expected and actual profits or losses that would result from implementing the model. The node produces several charts that help to describe the usefulness of the model such as lift charts and profit/loss charts. The Reporter node assembles the results from a process flow analysis into an HTML report that can be viewed with your favorite Web browser. Each report contains header information, an image of the process flow diagram, and a separate report for each node in the flow. Reports are managed in the Reports tab of the Project Navigator.
Other Types of Nodes Scoring Nodes

Score C*Score

Scoring Nodes The Score node enables you to generate and manage predicted values from a trained model. Scoring formulas are created for both assessment and prediction. Enterprise Miner generates and manages scoring formulas in the form of SAS DATA step code, which can be used in most SAS environments even without the presence of Enterprise Miner. The C*Score node translates the SAS DATA step score code that is generated by Enterprise Miner tools into a score function in the C programming language, as described in the ISO/IEC 9899 International Standard for Programming Languages - C handbook. You can save the score function to a plain text output file, enabling you to deploy the scoring algorithms in your preferred C or C++ development environment. The node produces a header file that is used to compile the C code. The C code compiles with any C compiler that supports the ISO/IEC 9899 International Standard for Programming Languages - C. It can be linked as a callable C function, for example, as a dynamic link library (DLL). While C*Score output could be used as the core of a scoring system, it is not a complete scoring system. It provides only the functionality that is explicit in the SAS DATA step scoring code that is translated. The node runs only in Enterprise Miner, and will only translate the score code that is produced by Enterprise Miner nodes. The intended users of the C*Score node are programmers with experience writing code in the C or C++ languages.

The consumer credit department of a bank wants to automate the decision-making process for approval of home equity lines of credit. To do this, they will follow the recommendations of the Equal Credit Opportunity Act to create an empirically derived and statistically sound credit scoring model. The model will be based on data collected from recent applicants granted credit through the current process of loan underwriting. The model will be built from predictive modeling tools, but the created model must be sufficiently interpretable so as to provide a reason for any adverse actions (rejections). The HMEQ data set contains baseline and loan performance information for 5,960 recent home equity loans. The target (BAD) is a binary variable that indicates if an applicant eventually defaulted or was seriously delinquent. This adverse outcome occurred in 1,189 cases (20%). For each applicant, 12 input variables were recorded.

Name BAD REASON

Model Role Target Input
Measurement Level Binary Binary
Description 1=defaulted on loan, 0=paid back loan HomeImp=home improvement, DebtCon=debt consolidation Six occupational categories Amount of loan request Amount due on existing mortgage Value of current property Debt-to-income ratio Years at present job Number of major derogatory reports Number of trade lines Number of delinquent trade lines Age of oldest trade line in months Number of recent credit inquiries
JOB LOAN MORTDUE VALUE DEBTINC YOJ DEROG CLNO DELINQ CLAGE NINQ
Input Input Input Input Input Input Input Input Input Input Input
Nominal Interval Interval Interval Interval Interval Interval Interval Interval Interval Interval
The credit scoring model computes a probability of a given loan applicant defaulting on loan repayment. A threshold is selected such that all applicants whose probability of default is in excess of the threshold are recommended for rejection.
Project Setup and Initial Data Exploration
Using SAS Libraries To identify a SAS data library, you assign it a library reference name, or libref. When you open Enterprise Miner, several libraries are automatically assigned and can be seen in the Explorer window. 1. Double-click on the Libraries icon in the Explorer window.
To define a new library: 2. Right-click in the Explorer window and select New.
3. In the New Library window, type a name for the new library. For example, type CRSSAMP. 4. Type in the path name or select Browse to choose the folder to be connected with the new library name. For example, the chosen folder might be located at C:\workshop\sas\dmem. 5. If you want this library name to be connected with this folder every time you open SAS, select Enable at startup.
6. Select OK. The new library is now assigned and can be seen in the Explorer window.

13. To make the newly created matrix active, click on My Matrix to highlight it. 14. Right-click on My Matrix and select Set to use.
15. To examine the decision criteria, select Edit Decisions.
By default, you attempt to maximize profit. Because your costs have already been built into your matrix, do not specify them here. Optionally, you could specify profits of 13 and 0 (rather than 12.32 and -0.68) and then use a fixed cost of 0.68 for Decision=1 and 0 for Decision=0, but that is not done in this example. If the cost is not constant for each person, Enterprise Miner allows you to specify a cost variable. The radio buttons enable you to choose one of three ways to use the matrix or vector that is activated. You can choose to maximize profit (default) - use the active matrix on the previous page as a profit matrix, but do not use any information regarding a fixed cost or cost variable. maximize profit with costs - use the active matrix on the previous page as a profit matrix in conjunction with the cost information. minimize loss - consider the matrix or vector on the previous page as a loss matrix.
16. Close the Editing Decisions and Utilities window without modifying the table. 17. As discussed earlier, the proportions in the population are not represented in the sample. To adjust for this, select the Prior tab.
By default, there are three predefined prior vectors in the Prior tab: Equal Probability - contains equal probability prior values for each level of the target. Proportional to data - contains prior probabilities that are proportional to the probabilities in the data. None - (default) does not apply prior class probabilities.
18. To add a new prior vector, right-click in the open area where the prior profiles are activated and select Add. A new prior profile is added to the list, named Prior vector. 19. To highlight the new prior profile, select Prior vector.
20. Modify the prior vector to represent the true proportions in the population.
21. To make the prior vector the active vector, select Prior vector in the prior profiles list to highlight it. 22. Right-click on Prior vector and select Set to use. 23. Close the target profiler. Select Yes to save changes when prompted.
Investigating Descriptive Statistics The metadata is used to compute descriptive statistics for every variable. 1. Select the Interval Variables tab.

Investigate the descriptive statistics provided for the interval variables. Inspecting the minimum and maximum values indicates no unusual values (such as AGE=0 or TARGET_D<0). AGE has a high percentage of missing values (26%). TIMELAG has a somewhat smaller percentage (9%). 2. Select the Class Variables tab.
Investigate the number of levels, percentage of missing values, and the sort order of each variable. Observe that the sort order for TARGET_B is descending whereas the sort order for all the others is ascending. This occurs because you have a binary target event. It is common to code a binary target with a 1 when the event occurs and a 0 otherwise. Sorting in descending order makes the 1 the first level, and this identifies the target event for a binary variable. It is useful to sort other similarly coded binary variables in descending order as well for interpretating results of a regression model.
If the maximum number of distinct values is greater than or equal to 128, the Class Variables tab will indicate 128 values. Close the Input Data Source node and save the changes when prompted. The Data Partition Node 1. Open the Data Partition node. 2. The right side enables you to specify the percentage of the data to allocate to training, validation, and testing data. Enter 50 for the values of training and validation. Observe that when you enter the 50 for training, the total percentage (110) turns red, indicating an inconsistency in the values. The number changes color again when the total percentage is 100. If the total is not 100%, the data partition node will not close.
3. Close the Data Partition node. Select Yes to save changes when prompted. Preliminary Investigation 1. Add an Insight node to the workspace and connect it to the Data Partition node as illustrated below.
2. To run the flow from the Insight node, right-click on the node and select Run.
3. Select Yes when prompted to see the results. A portion of the output is shown below.
Observe that the upper-left corner has the numbers 2000 and 21, which indicate there are 2000 rows (observations) and 21 columns (variables). This represents a sample from either the training data set or the validation data set, but how would you know which one? 1. Close the Insight data set to return to the workspace. 2. To open the Insight node, right-click on the node in the workspace and select Open. The Data tab is initially active and is displayed below.

5. Select Tree imputation as the imputation method for both types of variables.
When using the tree imputation for imputing missing values, use the entire training data set for more consistent results. Regardless of the values set in this section, you can select any imputation method for any variable. This tab merely controls the default settings. 6. Select the Constant values subtab. This subtab enables you to replace certain values (before imputing, if desired, using the check box on the Defaults tab). It also enables you to specify constants for imputing missing values. 7. Enter U in the field for character variables.
8. Select the Tree Imputation tab. This tab enables you to set the variables that will be used when using tree imputation. Observe that target variables are not available, and rejected variables are not used by default. To use a rejected variable, you can set the Status to use, but that would be inappropriate here because the rejected variable TARGET_D is related to the target variable TARGET_B.
Suppose you want to change the imputation method for AGE to mean and CARDPROM to 20. 1. Select the Interval Variables tab. 2. To specify the imputation method for AGE, position the tip of your cursor on the row for AGE in the Imputation Method column and right-click. 3. Select Select Method mean.
4. To specify the imputation method for CARDPROM, position the tip of your cursor on the row for CARDPROM in the Imputation Method column and right-click. 5. Select Select Method 6. Type 20 for the new value. 7. Select OK. 8. Specify none as the imputation method for TARGET_D in like manner. set value.
Inspect the resulting window. A portion of the window appears below.
Recall that the variables HOMEOWNR, PCOWNERS, and PETS should have the missing values set to U. 1. Select the Class Variables tab. 2. Control-click to select the rows for HOMEOWNR, PCOWNERS, and PETS. 3. Right-click on one of the selected rows in the Imputation Method column. 4. Select Select Method default constant.
5. To change the imputation for TARGET_B to none, right-click on the row for TARGET_B in the Imputation Method column. 6. Choose Select method none.
7. Select the Output tab. While the Data tab shows the input data, the Output tab shows the output data set information.
8. Close the Replacement node saving the changes when prompted. Performing Variable Transformations Some input variables have highly skewed distributions. In highly skewed distributions, a small percentage of the points may have a great deal of influence. On occasion, performing a transformation on an input variable may yield a better fitting model. This section demonstrates how to perform some common transformations. Add a Transform Variables node to the flow as shown below.
Open the Transform Variables node by right-clicking on it and selecting Open. The Variables tab is shown by default. It displays statistics for the interval-level variables including the mean, standard deviation, skewness, and kurtosis (calculated from the metadata sample). The Transform Variables node enables you to rapidly transform

7. Close the plot and select Yes to save the changes when prompted.
A new variable is added to the table. The new variable has the truncated name of the original variable followed by a random string of digits. Note that the Enterprise Miner set the value of Keep to No for the original variable. If you wanted to use both the binned variable and the original variable in the analysis, you would need to modify this attribute for AVGGIFT and the set the value of Keep to Yes, but that is not done here.
Examine the distribution of the new variable.
The View Info tool reveals that there is over 40% of the data in each of the two lowest categories and there is approximately 10% of the data in the highest category. Recall that the distributions of LOCALGOV, STATEGOV, FEDGOV, CARDGIFT, and TIMELAG were highly skewed to the right. A log transformation of these variables may provide more stable results. Begin by transforming CARDGIFT. 1. Position the tip of the cursor on the row for CARDGIFT and right-click. 2. Select Transform Inspect the resulting table. log.
The formula shows that Enterprise Miner has performed the log transformation after adding 1 to the value of CARDGIFT. Why has this occurred? Recall that CARDGIFT has a minimum value of zero. The logarithm of zero is undefined, and the logarithm of something close to zero is extremely negative. The Enterprise Miner takes this
information into account and actually uses the transformation log(CARDGIFT+1) to create a new variable with values greater than or equal to zero (because the log(1)=0). Inspect the distribution of the transformed variable. It is much less skewed than before.
Perform log transformations on the other variables (FEDGOV, LOCALGOV, STATEGOV, and TIMELAG). 1. Press and hold the Ctrl key on the keyboard. 2. While holding the Ctrl key, select each of the variables. 3. When all have been selected, release the Ctrl key. 4. Right-click on one of the selected rows and select Transform 5. View the distributions of these newly created variables. It may be appropriate at times to keep the original variable and the created variable although it is not done here. It is also not commonly done when the original variable and the transformed variable have the same measurement level. Close the node when you are finished, saving changes when prompted. log.
Fitting a Regression Model 1. Connect a Regression node to the diagram as shown.
2. Open the Regression node. 3. Find the Tools menu on the top of the session window and select Tools Interaction Builder. This tool enables you to easily add interactions and higher-order terms to the model, although you do not do so now.
The input variables are shown on the left, and the terms in the model are shown on the right. The Regression node fits a model containing all main effects by default.
4. Select Cancel to close the Interaction Builder window when you are finished inspecting it. 5. Select the Selection Method tab. This tab enables you to perform different types of variable selection using various criteria. You can choose backward, forward, or stepwise selection. The default in Enterprise Miner is to construct a model with all input variables that have a status of use. 6. Select Stepwise using the arrow next to the Method field.

A regression model provides a better fit than other, more flexible, modeling methods when the relationship between the target and the inputs is linear in nature. 22. Close the Lift Chart and Assessment Tool windows to return to the workspace.

6.2 Ensemble Models

List the different types of ensemble models available in Enterprise Miner. Discuss different approaches to combined models. Generate and evaluate a combined model.

Combined Ensemble Models

Sample 1 Model 1

Training Data

Sample 2

Modeling Method

Model 2

Ensemble Model (Average)

Score Data

Sample 3

Model 3
The Ensemble node creates a new model by averaging the posterior probabilities (for class targets) or the predicted values (for interval targets) from multiple models. The new model is then used to score new data. One common ensemble approach is to resample the training data and fit a separate model for each sample. The Ensemble node then integrates the component models to form a potentially stronger solution.

Modeling Method A

Model A Ensemble Model (Average) Score Data
Training Data Modeling Method B Model B
Another common approach is to use multiple modeling methods, such as a neural network and a decision tree, to obtain separate models from the same training data set. The Ensemble node integrates the component models from the two complementary modeling methods to form the final model solution. It is important to note that the ensemble model created from either approach can only be more accurate than the individual models if the individual models disagree with one another. You should always compare the model performance of the ensemble model with the individual models.
Other Types of Ensemble Models
Stratified Bagging Boosting
The Ensemble node can also be used to combine the scoring code from stratified models. The modeling nodes generate different scoring formulas when operating on a stratification variable (for example, a group variable such as GENDER) that you define in a Group Processing node. The Ensemble node combines the scoring code into a single DATA step by logically dividing the data into IF-THEN-DO/END blocks. Bagging and boosting models are created by resampling the training data and fitting a separate model for each sample. The predicted values (for interval targets) or the posterior probabilities (for a class target) are then averaged to form the ensemble model. Bagging and boosting models are discussed in detail in the Decision Tree Modeling course.

2. Open the SOM/Kohonen node. The Variables tab appears first. Inspect the options.
As in k-means clustering, the scale of the measurements can heavily influence the determination of the clusters. Standardizing the inputs is recommended. 3. Select the Standardize radio button from the Variables tab to standardize the input variables.
4. Select the General tab. The default method is a Batch Self-Organizing Map.
You can specify three options in the method field using the drop-down arrow including Batch Self-Organizing Map, Kohonen Self-Organizing Map, and Kohonen Vector Quantization (VQ), which is a clustering method. The Cluster node is recommended over Kohonen VQ for clustering. Also, for many situations, Batch SOMs obtain satisfactory result and are computationally more efficient. However, Kohonen SOMs are recommended for highly nonlinear data. 5. To facilitate comparing the clusters from the SOM/Kohonen node to those determined in the Cluster node, choose a grid space that corresponds to six clusters. Use the arrows to specify 2 for the number of rows and 3 for the number of columns.
6. Close the SOM/Kohonen node and save the settings. 7. Run the diagram from the SOM/Kohonen node and view the results.
Exploring the SOM/Kohonen Node Results The SOM/Kohonen results window contains two parts. The left side displays the grid. The colors of the rectangles in the grid indicate the number of cases in each cluster. Clusters with lighter colors have lower frequency counts. Clusters with darker colors have higher frequency counts.
The plot on the right shows the normalized means for each input variable. The means are normalized using a scale transformation function. You may need to maximize or resize the window to see the complete plot. Note that there are three variables associated with the variable CLIMATE. In general, the SOM/Kohonen node constructs n dummy variables for a categorical variable with n levels. Initially, the overall normalized means are plotted for each input; however, if the window is not large enough or if there are many input variables you will not be able to see the entire plot. In that case, you can use the scroll icon, and scroll to view the others. , from the toolbar
The Normalized Mean plot can be used to compare the overall normalized means with the normalized means in each cluster. 1. Select the Select Points icon, , from the toolbar.
2. Select the section for row 1, column 3 in the map. The section turns gray, indicating it has been selected.
3. Select the Refresh Input Means Plot icon, Inspect the Normalized Mean plot.

, from the toolbar.

Note that cluster three has lower than average incomes consists of married females has higher than average ages has slightly higher than average credit scores.
The other clusters can be compared with the overall average by repeating these steps. For example, inspect the normalized mean plot for row 2, column 3. 4. Select the Select Points icon, , from the toolbar.

The results now list association and dissociation rules. For example, among customers without a money market account, 65.58% have a savings account (rule 4).
Close the Association node when you have finished examining the results.

Appendix A References

A.1 References.... A-3

A.1 References

Beck, A. 1997. Herb Edelstein discusses the usefulness of datamining. DS Star, Vol. 1, N0. 2. Available http://www.tgc.com/dsstar/. Berry, M. J. A. and G. Linoff. 1997. Data Mining Techniques for Marketing, Sales, and Customer Support. New York: John Wiley & Sons, Inc. Bishop, C. M. 1995. Neural Networks for Pattern Recognition. New York: Oxford University Press. Bigus, J. P. 1996. Data Mining with Neural Networks: Solving Business Problems from Application Development to Decision Support. New York: McGraw-Hill. Breiman, L., et al. 1984. Classification and Regression Trees. Belmont, CA: Wadsworth International Group. Chatfield, C. 1995. Model uncertainty, data mining and statistical inference (with discussion). JRSS B 419-466. Einhorn, H. J. 1972. Alchemy in the behavioral sciences. Public Opinion Quarterly 36:367-378. Hand, D. J. 1997. Construction and Assessment of Classification Rules. New York: John Wiley & Sons, Inc. Hand, D. J. and W. E. Henley. 1997. Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society A 160:523-541. Hand, David, Heikki Mannila, and Padraic Smyth. 2001. Principles of Data Mining. Cambridge, Massachusetts: The MIT Press. Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: SpringerVerlag New York, Inc. Hoaglin, D. C., F. Mosteller, and J. W. Tukey. 1983. Understanding Robust and Exploratory Data Analysis. New York: John Wiley & Sons, Inc. Huber, P. J. 1997. From large to huge: A statisticians reaction to KDD and DM. Proceedings, Third International Conference on Knowledge Discovery and Data Mining. AAAI Press. John, G. H. 1997. Enhancements to the Data Mining Process. Ph.D. thesis, Computer Science Department, Stanford University. Kass, G. V. 1980. An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29:119-127. Little, R. J. A. and D. B. Rubin. 1987. Statistical Analysis with Missing Data. New York: John Wiley & Sons, Inc. Little, R. J. A. 1992. Regression with missing X's: A review. Journal of the American Statistical Association 87:1227-1237.

Lovell, M. C. 1983. Data Mining. The Review of Economics and Statistics. Vol. LXV, number 1. Michie, D., D. J. Spiegelhalter, and C. C. Taylor. 1994. Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood. Morgan, J. N. and J. A. Sonquist. 1963. Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association 58:415-434. Mosteller, F and J. W. Tukey. 1977. Data Analysis and Regression. Reading, MA: Addison-Wesley. Palmeri, C. 1997. Believe in yourself, believe in merchandise. Forbes Vol. 160, No. 5:118-124 Piatesky-Shapiro, G. 1998. What Wal-Mart might do with Barbie association rules. Knowledge Discovery Nuggets, 98:1. Available http://www.kdnuggets.com/. Quinlan, J. R. 1993. C4.5 Programs for Machine Learning. Morgan Kaufmann. Ripley, B. D. 1996. Pattern Recognition and Neural Networks. New York: Cambridge University Press. Rosenberg, E. and A. Gleit. 1994. Quantitative methods in credit management. Operations Research, 42:589-613. Rud, Olivia Parr. 2001. Data Mining Cookbook: Modeling Data, Risk, and Customer Relationship Management. New York: John Wiley & Sons, Inc. Sarle, W. S. 1997. How to measure the importance of inputs. SAS Institute Inc. Available ftp://ftp.sas.com./pub/neural/importance.html. Sarle, W.S. 1994a. "Neural Networks and Statistical Models," Proceedings of the Nineteenth Annual SAS Users Group International Conference. Cary: NC, SAS Institute Inc., 1538-1550. Sarle, W.S. 1994b. "Neural Network Implementation in SAS Software," Proceedings of the Nineteenth Annual SAS Users Group International Conference. Cary: NC, SAS Institute Inc., 1550-1573. Sarle, W.S. 1995. "Stopped Training and Other Remedies for Overfitting." Proceedings of the 27th Symposium on the Interface. SAS Institute Inc. 1990. SAS Language: Reference, Version 6, First Edition. Cary, NC: SAS Institute Inc. SAS Institute Inc. 1990. SAS Procedures Guide, Version 6, Third Edition. Cary, NC: SAS Institute Inc. SAS Institute Inc. 1990. SAS/STAT User's Guide, Version 6, Fourth Edition, Volumes 1 and 2. Cary, NC: SAS Institute Inc. SAS Institute Inc. 1995. Logistic Regression Examples Using the SAS System, Version 6, First Edition. Cary, NC: SAS Institute Inc.
SAS Institute Inc. 1995. SAS/INSIGHT User's Guide, Version 6, Third Edition. Cary, NC: SAS Institute Inc. Smith, M. 1993. Neural Networks for Statistical Modeling. New York: Van Nostrand Reinhold. Weiss, S.M. and C. A. Kulikowski. 1991. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. San Mateo, CA: Morgan Kaufmann. Zhang, Heping, and Burton Singer. 1999. Recursive Partitioning in the Health Sciences. New York: Springer-Verlag New York, Inc.

Appendix B Index

A Analysis mode options, 8-108-11 analytical expert, 1-11 analytical tools, 1-8 artificial neural networks, 5-5 Assess nodes Enterprise Miner, 1-23 Assessment node Enterprise Miner, 1-23 association analysis compared with sequence analysis, 8-18 Enterprise Miner, 8-98-17 overview, 8-48-6 Association node Enterprise Miner, 1-18 association rules, 8-4 B backward selection method Regression node, 3-7, 3-433-45 bagging models, 6-12 base SAS generating scoring code, 6-276-29 Bayes rule, 2-70 boosting models, 6-12 C C*Score node Enterprise Miner, 1-24 candidate models comparing, 6-56-9 CART algorithm, 2-43 case-control sampling, 2-25 CHAID algorithm, 2-43 chi-square criterion Variable Selection node, 4-154-16 choice-based sampling, 2-25 classification trees, 2-38 cloning nodes, 8-278-28 cluster analysis, 7-37-37 K-means, 7-57-8 clustering, 7-4 K-means, 7-57-8 Clustering node, 7-11 Enterprise Miner, 1-20 Control Point node Enterprise Miner, 1-25 credit risk management, 1-7 credit scoring, 1-7 Cumulative %Response charts Enterprise Miner, 2-592-61 curse of dimensionality, 2-29 customer relationship management, 1-7 D data expert, 1-11 data mining analytical tools, 1-8 definition, 1-3 KDD, 1-8 machine learning, 1-81-9 neurocomputing, 1-81-9 overview, 1-31-14 pattern recognition, 1-8 problem formulation, 1-10 problem translation, 1-12 required expertise, 1-11 SEMMA process, 1-151-26 steps, 1-10 Data Mining Database node Enterprise Miner, 1-25 Data Partition node, 3-223-24 Enterprise Miner, 1-18, 2-352-36 data replacement, 3-293-30 Data Set Attributes node Enterprise Miner, 1-19 data splitting, 2-32 data warehouses, 2-21 database marketing, 1-7 decision trees algorithms, 2-43 benefits, 2-44 building, 2-472-59 classification trees, 2-38 drawbacks, 2-45 fitted, 2-38

doc1

4 Figure 1: Example of Decision Tree
A decision tree creates a visual depiction of the interactions between variables. The decision tree is interpreted by following the binary spits of information until you have reached your desired objective. At each successive binary split, statistics are given on the defined population of an intended target variable. The intended target variable is the variable of interest for which the model was created. In the case of this project, the target variable was the movement of Reach Toothbrushes. Figure 1 illustrates a simple example of a decision tree. In this example, the final result is either good or bad. By following the binary splits in the data we can determine what variables or factors would lead to a desired result. For instance, if Var A is greater than X and Var B is less then Y, the data suggest a good result and the same if Var A is less than or equal to X and Var C is not equal to Z.
5 Regression Regressions create a function of a target variable with respect to independent variables. In the case of data mining, typically a multiple linear regression model is used to show linear relationships between more than one independent variable. Other options besides linear models exist, but for this project only the linear model will be discussed. Multiple linear regression models are of a form shown below for k independent variables. Y= 0 + 1X1 + 2X2 + 3X3 + + kXk + U Where 0 is the intercept, 1 is the parameter associated with independent (known) variable X1, 2 is the parameter associated with X2, and so on. The variable U is the error term or disturbance within the model (Wooldridge, 2003). The purpose of regression models is to determine the best beta values in order to estimate the target variable using just data that we can control. In other words, regression models are appealing because they allow the user to forecast the target variable by using known variables and multiplying them by estimated coefficients (Wooldridge, 2003). Neural Network According to Tretter, 2003, neural networks have their origins in artificial intelligence, and started out as an attempt to model the workings of a brain. A neural network is a group of analytic methods that compares all observations across all variables to get the strongest relationships between variables and observations. They produce robust models that are often difficult to interpret. (Tretter, 2003) A simple example of a neural network is shown below in Figure 2.

11 Step 5: Connect any parent variables that share a common child variable. In other wards, draw a line between any two variables that cause a common variable. Step 6: Strip arrow-heads from all edges of the lines or arcs connecting variables. Step 7: Delete lines into and out of the back door independent variables. Second Question: The second and final question is to determine if there is any way remaining for the original independent variable to reach the target variable. If the answer is, Yes, then repeat steps beginning at the first question. If the answer is, No, then all independent variable have been identified. An example of a directed acyclical graph is shown below in Figure 4. In the example, the steps to find explanatory variables needed in the model if our target variable is F and the original independent variable is J. Figure 4: Example of a Directed Acyclical Graph
12 Step 1 was given in defining F as the target variable and J as the original independent variable. The direct flow of causality from J to F would be through path J E F. Now for question 1, is it possible to follow the lines in the directed graph to get back to the target variable via the ancestors of the independent variable without running into converging arrows and not using the direct path? The answer is yes. The paths from F to J are shown below. Path 1: F I C H J Path 2: F I C H D G J From these two paths it is clear that variable I is the back door variable for the model. Step 2 checks to ensure that variable I is not a descendent of variable J. Step 3 deletes all non-ancestors of the target variable, the original independent variable and the back door independent variable(s). In this situation there are no non-ancestors of the target variable so the graph remains unchanged. Step 4 deletes all arcs emanating from the target variable as shown in Figure 5. Figure 5: Step 4 of Directed Graphs
13 Step 5 connects any parent variables that share a common child variable as shown in Figure 7.
Figure 7: Step 5 of Directed Graphs
Step 6 strips arrow-heads from all edges as shown in Figure 8. Figure 8: Step 6 of Directed Graphs
14 Step 7 deletes lines into and out of the back door independent variables as shown in Figure 9.
Figure 9: Step 7 of Directed Graphs

The final question is to determine if there is any way remaining for the original independent variable to reach the target variable. In this example the answer is, No, so all independent variable that need to be in the model to determine if the original independent variable explains the target variable have been identified. The variables are J and I.
QUESTION OF SAS ENTERPRISE MINER VS. TETRAD
As stated earlier, the purpose of this project is two-fold: (1) to determine what variables play the most significant roles in predicting the movement of Reach Toothbrushes and (2) to compare results given by SAS Enterprise Miner with those obtained using TETRAD to determine how and if they complement or complete each other. So, why choose to test the differences in these programs? This question was one of personal curiosity. How would directed graphs stand in comparison with decision trees, regressions, and neural networks? Theoretically, it is expected that similar results from the data should exist with some differences in reporting. In recent years, directed graphs have been used in more research. For example, Bessler has used directed graphs in much of his research to create accurate models for organizations around the world. Some examples of his use of directed graphs include the Impact of BSE and FMD on Beef Industry in UK (Chopra, 2005), Food and Agriculture Organization in Rome in 2003 determining the causes of poverty and hunger and forming policy recommendations for dealing with world hunger (Dept. Ag Economics, 2004). To create this comparison, three models were estimated using SAS Enterprise Miner. TETRAD was then used with the same data to see if the results confirmed or disagreed with the findings in SAS. If the findings disagreed, the models would be evaluated to determine the better model.

RESULTS

This section presents model results using Dominicks data. Summarized results for each model are presented in Appendices B, C, D, E, and F. The models suggested that price was the most significant factor in determining movement (Wang, Z., D. Bessler, 2006). SAS Enterprise Miner Results I When the data were analyzed using SAS Enterprise Miner, the following lift chart (Figure 10) was produced comparing the decision tree, neural network, and regression models. Figure 10: Lift Chart for SAS Enterprise Miner Results I

This explained a total of 62 weeks out of 243 weeks in which the product was carried, and 130 weeks when the price was less than $2.24. The data suggests that the majority of toothbrushes are sold when the price is $2.24 and when the percentage of working women is less than.3599.
22 TETRAD Results I The initial results from TETRAD were inconclusive. TETRAD was able to compute an output, but did not report any causal relationships between any of the variables. The output appears in Appendix D. It is hypothesized that the reason there were no causal relationships reported was because of the large number of zeros for price in the data. The entire population had 5,970 observations. Of those observations only 370 contained non-zero prices of the target value. Because of this situation it was determined to rerun a modified version of the data in TETRAD where all observation of price of the target value equal to zero were removed.
TETRAD Results II When TETRAD was run with the modified data, it produced useable results. The following Directed Graph is a part of the final output that was created at a.2 significance level shown below in Figure 10. The output for the graph can be found in Appendix E.
23 Figure 13: Directed Graph at.2 Significance
This result differed from the results generated in SAS. In this graph working women were not included and it shows that the move of the target variable causes price. The reason for some of the differences is that in SAS, the target variable was defined and in the directed graph it was not. The reason the target variable stayed undefined was because the program creating directed graphs does not account for a target variable, instead it shows the relationships between all of the data points. Is it possible that the move could cause price? The answer to that depends on the situation. Could a change in the quantity of a good cause the sellers to change the price? The answer is possible. It is also possible that the price drives consumers to purchase more or less.
24 Because of this difference from SAS, the program was rerun with a.05 level of significance. The results of the change in significance are shown in Figure 11 which represents a portion of the entire directed graph. The output for the graph can be seen in Appendix F. Figure 14: Directed Graph at.05 Significance

This graph differs from the original and includes different causal relationships between the same variables. One reason for this change in causal relation could be that the model is sensitive to changes in the level of significance, which makes the graph difficult to interpret. There are also many variables that clearly could not cause the stated event. A reason for this is that the model did not contain any limitation on variable causality.
25 SAS Enterprise Miner Results II After completing successful TETRAD runs, the new data was run in SAS Enterprise Miner a final time. This time the data differed from the first, but only slightly. The output for these results can be seen in Appendix C. When the data were analyzed it produced the following lift chart shown in Figure 15. Figure 15: Lift Chart for SAS Enterprise Miner Results II
The above lift chart suggested that the decision tree most accurately explained the target variable, followed by the base line scenario with no model, then by the regression
26 model and finally by the neural network. The neural network, which in the second evaluation of SAS Enterprise miner was now the weakest model according to the lift chart and data produced the following weights for the variables within the models seen in Figure 16. Figure 16: Neural Network Variable Weights Result II
These weights were again assigned within the hidden nodes of the neural network and were determined within the neural network to best explain the movement of Reach Toothbrushes. The regression model preformed similar to the first SAS Enterprise Miner analysis equal to Y = B0. Once again, the regression model was based on no other variables and created a horizontal line around the mean of the movement. Since no other variables were considered the regression model could not account for changes in movement and changes in the price of the item, therefore, the model did not perform well.
28 As stated earlier, the SAS Enterprise Miner decision tree results best explained the movement of the target variable. The decision tree can be seen below in Figure 17. Figure 17: Decision Tree Result I

This time, the decision tree had only two layers. The first showed the entire data set. The second layer split the data by if the price was $2.24 or $2.49. If price of the target was $2.24 then the move was as follows.
29 Table 3: Explained Data of Second SAS Results Where Price = $2.24 Movement of Target % of Data Explained 14.5% 8.1% 26.6% 18.5% 8.9% 23.4%
This explained a total of 124 weeks out of 241 weeks when the price was equal to $2.24. If price of the target was $2.49 the move was as follows. Table 4: Explained Data of Second SAS Results Where Price = $2.49 Movement of Target % of Data Explained 0.9% 0.9% 3.4% 14.5% 21.4% 59.0%
This explained a total of 117 weeks out of 241 weeks when the price was equal to $2.49. When the price was less than $2.24, more toothbrushes were sold or moved from the store. This is a significant difference considering that when the price is $2.49, 59% of the time only one toothbrush moves a week, compared with when the price is $2.24 and 49.2% of the time 4 or more toothbrushes are sold. Once again, the problem with these results is that there is not a way to assign the rights of movement to an individual store.

CONCLUSIONS

The purpose of this project was two-fold: (1) to determine what variables played the most significant parts in predicting the movement of Reach Toothbrushes and (2) to compare the results given by SAS Enterprise Miner and TETRAD to determine how and if they complement or complete each other. The variables that played the most significant part in predicting the movement of the target product was different in each program. SAS Enterprise Miner results indicated that price and possibly working women explained the movement of the target and TETRAD suggested different causal relationships based on the level of significance. The good news is that many of the directed graph variables that relate to movement are income related, which may suggest that the amount of income causes movement. TETRAD also has a correlation with price although it is in a reverse causal direction, because of the sensitivity in causality it may suggest a non-directional correlation. In conclusion, the best predictors of movement for this particular target variable were price and income levels. SAS Enterprise Miner did a better job of explaining the data when there were a large number of zeros. TETRAD and SAS Enterprise Miner both gave similar results when the zeros were removed. Depending on the type and amount of data being used either program may offer a better model. In this situation however, the SAS Enterprise Miner model seems to be more consistent in giving the same results.

REFERENCES

Bessler, D. 2004 "Directed Acyclical Graphs: Lectures 23-26".Unpublished. Presented in Lecture Series Fall 2004 at Texas A&M University, College Station Texas, 30 August - 5 December. Chopra, A., and D. Bessler. 2005. "Impact of BSE and FMD on Beef Industry in UK" PPT. http://agecon2.tamu.edu/people/faculty/jin-yanhong/ABSdocuments/presentA.PPT. Found on WWW on 01 March 2006. Coppock, D. 2002. "Data Modeling and Mining: Why Lift?". Covering Business Intelligence, Integration & Analytics.http://www.dmreview.com/article_sub.cfm?articleId=5329. Found on WWW 25 February 2006. Department of Agricultural Economics, TAMU. 2004. "Staying ahead of the curve: David Bessler eyes the future through innovative research. " AgEconnection, Vol. 11, No. 1. http://agecon.tamu.edu/publications/agEconnection/2004FallAgEconnection.pdf. Found on WWW on 01 March 2006. Hamilton, H., E Gurak, L Findlater, and W Olive.2000. "Computer Science 831: Knowledge Discovery in Databases." http://www2.cs.uregina.ca/~dbd/cs831/index.html. Found on WWW on 25 February 2006. James M. Kilts Center, GSB, University of Chicago, ' Dominick's Finer Foods', http://gsbwww.uchicago.edu/kilts/research/db/dominicks/; found on WWW, 20 February 2006. SAS Institute, Inc. 1990. SAS Procedures Guide, Version 6, Third Edition. Cary, NC: SAS Institute Inc. SAS, http://www.sas.com; found on WWW, 23 February 2006. Speed, M. 2005. "Special Topics: SAS Programming." Unpublished. Presented as Lecture Series Fall of 2005 at Texas A&M University, College Station TX, 30 August - 5 December. Spirtes, P., C Glymour, R Scheines. "The TETRAD Project: Causal Models and Statistical Data." http://www.phil.cmu.edu/projects/tetrad/index.html. Found on WWW 25 February 2006. Spirtes, P., Tetrad Software, created at Carnegie Mellon University.
32 Stergiou, C., and D. Siganos. "Neural Networks", http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html#What%20 is%20a%20Neural%20Network, found on WWW, 23 February 2006. Tretter, M. 2003. "Data Mining, Encyclopedia of Information Systems, Volume 1. Pgs 477-488. Wang, Z., and D. Bessler. 2006. Price and quantity endogeneity in demand analysis: evidence from directed acyclic graphs. Agricultural Economics 34 pg 87-95. Wooldridge, J. 2003. Introductory Econometrics, A Modern Approach, 2nd Edition.Thomas South Western.

APPENDIX A

Products Soap Toothpaste 1111310720 #FREEMAN STRWBRRY SH #FREEMAN HUMECTANT C RAVE MOUSSE RAVE SHAMP FINE #1 #TEGRIN SHAMPOO HERB GRECIAN FORMULA LIQU #JUST 4 MEN-M/B/S-SA #JUST FOR MEN M/B/S#JUST FOR MEN M/B/S#JUST FOR MEN M/B/S#JUST FOR MEN M/B/S #JUST FOR MEN M/B/S#JFM BRSH-IN GEL MBS #JUST FOR MEN NT BRN JUST FOR MEN LT.BROW JUST FOR MEN MED BRO DOVE BODY WASH DOVE SHOWER GEL DOVE BODY WASH ~DOVE /CARESS SHIPPE LEVER PK BAR ~LEVER 200 BATH 6 BA LEVER 2000 REG SIZE CLOSE UP RED GEL W/4 MNTDNT TARTR TP W/FR MENTADENT TP W/FREE MNTDNT TP FRESH W/FR MNTDNT TP COOL W/FRE MENTADENT REFILL TWN 16 OZ 16 OZ 6 OZ 15 OZ 7 OZ 4 OZ 1 CT 1 OZ 1 OZ 1 OZ 1 OZ 1 OZ 1 OZ 1 OZ 1 CT 1 CT 16 OZ 6 OZ 10 OZ 1 OZ 40 OZ 30 OZ 3.5 OZ 8.2 OZ 5.2 OZ 3.5 OZ 5.2 OZ 5.2 OZ 10.4 O

Shampoo

8.2 OZ 6 OZ 6 OZ 1.75 O 1.75 O 3.25 O 3.25 O 1 CT 1 CT 1 CT 1 CT

1204433930

PEPSODENT 28% FREE PEPSODENT ANTI TARTA PEPSODENT TP W/BAKIN OLD SPICE PUMP ORIGI OLD SPICE PUMP FRESH OLD SPICE SOOTHING G OLD SPICE SOOTHING G OS COL/HI END DEOD G OLD SPICE FRAG GIFT OLD SPICE FRAG GIFT OLD SPICE FRAG GIFT

Grooming Products

Demographic Variable Name age9 age60 ethnic educ nocar income incsigma hsizeavg hsize1 hsize2 hsize34 hsize567 hh3plus hh4plus hhsingle hhlarge workwom sinhouse density hval150 hval200 hvalmean single retired unemp wrkch5 wrkch17 nwrkch5 nwrkch17 wrkch Description % Population under age 9 % Population over age 60 % Blacks & Hispanics % College Graduates % With No Vehicles Log of Median Income Std dev of Income Distribution (Approximated) Average Household Size % of households with 1 person % of households with 2 persons % of households with 3 or 4 persons % of households with 5 or more persons % of households with 3 or more persons % of households with 4 or more persons % of households with 1 person % of households with 5 or more persons % Working Women with full time jobs % Detached Houses Trading Area in Sq Miles per Capita % of Households with Value over $150,000 % of Households with Value over $200,000 Mean Household Value (Approximated) % of Singles % of Retired % of Unemployed % of working women with children under 5 % of working women with children 6-17 % of non-working women with children under 5 % of non-working women with children 6-17 % of working women with children
nwrkch wrkwch wrkwnch telephn mortgage nwhite poverty shopcons shophurr shopavid shopstr shopunft shopbird shopindx shpindx % of non-working women with children % of working women with children under 5 % of working women with no children % of households with telephones % of households with mortgages % of population that is non-white % of population with income under $15,000 % of Constrained Shoppers % of Hurried Shoppers % of Avid Shoppers % of Shopping Stranges % of Unfettered Shoppers % of Shopper Birds Ability to Shop (Car and Single Family House) Ability to Shop (Car and Single Family House)

APPENDIX B

SAS Enterprise Miner Report using Original Data "EM Workspace" :
SASUSER.MSDATA2 Input Data Settings: Source Data: SASUSER.MSDATA2 ( 5,970 rows, 144 columns) Output: EMDATA.VIEW_QME Description: SASUSER.MSDATA2 Role: RAW Metadata Sample: EMPROJ.SMP_VIUT ( 2,000 rows) All variables Interval Variables Class Variables Notes: not available Data Partition
37 Partition Settings Method: SIMPLE RANDOM Partition percentages: Training: 65%, Validation: 35%, Test: 0% Output Log Training Code Notes: not available Neural Network Optimization plot:

Fit Statistic

Training Validation Test
[ TARGET=MOVETB ]. Average Profit 5.87 5.85. Misclassification Rate 0.06 0.06. Average Error 0.09 0.09. Average Squared Error 0.02 0.02. Sum of Squared Errors 421.87 239.53. Root Average Squared Error 0.12 0.13. Root Final Prediction Error 0.13.
38 Root Mean Squared Error 0.12 0.13. Error Function 2379.77 1353.11. Mean Squared Error 0.02 0.02. Maximum Absolute Error 1.00 1.00. Final Prediction Error 0.02. Divisor for ASE 27167.00 14623.00. Model Degrees of Freedom 102.00. Degrees of Freedom for Error 23184.00. Total Degrees of Freedom 23286.00. Sum of Frequencies 3881.00 2089.00. Sum Case Weights * Frequencies 27167.00 14623.00 Akaike's Information Criterion 2583.77. Schwarz's Baysian Criterion 3405.44. Network settings Objective function: Maximize profit Assessment Matrix: Default profit Utilities Variables Output Log Training Code Score Code Model assessment settings Train data set is not selected for assessment. Validation data set is selected for assessment. Test data set is not selected for assessment. Scored data set: 5000 observations are saved for interactive model assessment.
Confusion Matrix (Assessed Partition=VALIDATION) Regression Parameters: Estimates Table

Fit Statistics

Training

Validation

Akaike's Information Criterion 2704.9838707. Average Squared Error 0.017666469 0.0172302407. Average Error Function 0.0991270244 0.0983152771. Degrees of Freedom for Error 23280. Model Degrees of Freedom 6. Total Degrees of Freedom 23286. Divisor for ASE 27167 14623. Error Function 2692.9838707 1437.6642977. Final Prediction Error 0.0176755754. Maximum Absolute Error 0.9955947137 0.9955947137. Mean Square Error 0.0176710222 0.0172302407. Sum of Frequencies 3881 2089. Number of Estimate Weights 6. Root Average Sum of Squares 0.13291527 0.1312640115. Root Final Prediction Error 0.1329495221. Root Mean Squared Error 0.1329323971 0.1312640115. Schwarz's Bayesian Criterion 2753.3175163. Sum of Squared Errors 479.94496303 251.95781017. Sum of Case Weights Times Freq 27167 14623. Misclassification Rate 0.0626127287 0.0607946386. Total Profit for MOVETB 22705 12208. Average Profit for MOVETB 5.8502963154 5.843944471. Target Information: Name: MOVETB Label: MOVETB Measurement: ordinal Objective function: Maximize profit Assessment Matrix: Default profit Utilities Regression Settings: Regression type: LOGISTIC Link function: LOGIT Selection method: Stepwise Optimization technique: DEFAULT Output Log Training Code Score Code Model assessment settings Train data set is not selected for assessment. Validation data set is selected for assessment.

45 Test data set is not selected for assessment. Scored data set: 5000 observations are saved for interactive model assessment.
Confusion Matrix (Assessed Partition=VALIDATION) Notes: not available
Tree Model assessment plot:
Average Squared Error 0.01 0.01. Sum of Squared Errors 167.57 87.66. Root Average Squared Error 0.08 0.08. Maximum Absolute Error 0.99 1.00. Divisor for ASE 27167.00 14623.00. Total Degrees of Freedom 23286.00. Misclassification Rate 0.04 0.03. Number of Estimated Weights 4.00. Sum of Frequencies 3881.00 2089.00. Sum Case Weights * Frequencies 27167.00 14623.00 N* VN* N PRIORS V N PRIORS 1962 0.00

Node Leaf %V1 3638

%V6 0.00

%V5 0.00

%V4 0.00

%V3 0.00

%V%6 0.00 22.06 0.00 0.60 13.51 16.67 0.00 %4 21.62 6.67 0.00 %3 13.51 33.33 5.00 %2 18.92 16.67 13.33 %1 8.11 13.33 13.33

%5 0.00 4.41 8.06 2.65

0.00 100.00 24.32 0.00 13.33 0.00 68.33 0.00
0.00 0.00 0.00 0.00 100.00 16.18 19.12 13.24 25.00 0.00 27.42 19.35 20.97 24.19 0.00 0.88 13.27 22.12 60.18 0.00
English rules Sequence Matrix Target information Name: MOVETB Label: MOVETB Measurement: ordinal Tree settings Objective function: Maximize profit Assessment Matrix: Default profit Utilities Splitting criterion: Entropy Reduction Minimum number of observations in a leaf: 5 Observations required for a split search: 59 Maximum number of branches from a node: 2 Maximum depth of tree: 6 Splitting rules saved in each node: 5 Surrogate rules saved in each node: 0 Treat missing as an acceptable value Model assessment measure: Misclassification Rate Subtree: Best assessment value Observations sufficient for split search: 3881 Maximum tries in an exhaustive split search: 5000 Use profit matrix during split search Do not use prior probability in split search Log Score Code Model assessment settings Train data set is not selected for assessment. Validation data set is selected for assessment. Test data set is not selected for assessment. Scored data set: 5000 observations are saved for interactive model assessment.
Confusion Matrix (Assessed Partition=VALIDATION) Notes: not available Assessment

End Report

53 Path Information Name: Assessment_T256S1J_ Target: MOVETB Description: Mining Function: Transform Subject: No subject Rating: 0 Metadata Information Input Variables Required for Scoring Output Variables Produced by Scoring Target Variables Datastep Score Code C Score Code

APPENDIX C

SAS Enterprise Miner using Modified Data "EM Workspace" :
SASUSER.MSEDIT Input Data Settings: Source Data: SASUSER.MSEDIT ( 370 rows, 132 columns) Output: EMDATA.VIEW_QME Description: SASUSER.MSEDIT Role: RAW Metadata Sample: EMPROJ.SMP_VIUT ( 370 rows) All variables Interval Variables Class Variables Notes: not available

55 Data Partition Partition Settings Method: SIMPLE RANDOM Partition percentages: Training: 65%, Validation: 35%, Test: 0% Output Log Training Code Notes: not available Tree Model assessment plot:
Average Squared Error 0.12 0.116. Sum of Squared Errors 168.12 89.659. Root Average Squared Error 0.34 0.340. Maximum Absolute Error 0.99 0.991. Divisor for ASE 1446.00 774.000. Total Degrees of Freedom 1205.00. Misclassification Rate 0.58 0.612. Number of Estimated Weights 2.00. Sum of Frequencies 241.00 129.000. Sum Case Weights * Frequencies 1446.00 774.000
N* VN* Node Leaf N PRIORS V N PRIORS % V 6 % V 5 % V 4 % V 3 % V 2 % V %5 %4 %3 %2 %73 18.55 8.87 23.56 14.53 21.37 58.97
9.59 10.96 13.70 19.18 24.66 21.92 14.52 8.06 26.61 0.00 3.57 0.00 10.71 14.29 71.43 0.85 0.85 3.42
English rules Sequence Matrix Target information Name: MOVETB Label: Measurement: ordinal Tree settings Objective function: Maximize profit Assessment Matrix: Default profit Utilities Splitting criterion: Entropy Reduction Minimum number of observations in a leaf: 5 Observations required for a split search: 59 Maximum number of branches from a node: 2 Maximum depth of tree: 6 Splitting rules saved in each node: 5 Surrogate rules saved in each node: 0 Treat missing as an acceptable value Model assessment measure: Average Profit Subtree: Best assessment value Observations sufficient for split search: 241 Maximum tries in an exhaustive split search: 5000 Use profit matrix during split search Do not use prior probability in split search Log Score Code Model assessment settings Train data set is not selected for assessment. Validation data set is selected for assessment. Test data set is not selected for assessment. Scored data set: 5000 observations are saved for interactive model assessment.
59 Confusion Matrix (Assessed Partition=VALIDATION) Notes: not available Regression Parameters: Estimates Table
Fit Statistics Fit Statistic Training Validation Test
Akaike's Information Criterion 6881.393366. Average Squared Error 0.1977869986 0.188630491. Average Error Function 4.7520009446 4.5320080566. Degrees of Freedom for Error 1200. Model Degrees of Freedom 5. Total Degrees of Freedom 1205. Divisor for ASE 1446 774. Error Function 6871.393366 3507.7742358. Final Prediction Error 0.1994352236. Maximum Absolute Error 1 1. Mean Square Error 0.1986111111 0.188630491. Sum of Frequencies 241 129. Number of Estimate Weights 0. Root Average Sum of Squares 0.4447325023 0.4343161187 Root Final Prediction Error 0.4465817099.
60 Root Mean Squared Error 0.4456580652 0.4343161187. Schwarz's Bayesian Criterion 6906.8645402. Sum of Squared Errors 286 146. Sum of Case Weights Times Freq 1446 774. Misclassification Rate 0.5933609959 0.5658914729. Total Profit for MOVETB 839 474. Average Profit for MOVETB 3.4813278008 3.6744186047. Target Information: Name: MOVETB Label: Measurement: ordinal Objective function: Maximize profit Assessment Matrix: Default profit Utilities Regression Settings: Regression type: LOGISTIC Link function: LOGIT Selection method: Stepwise Optimization technique: DEFAULT Output Log Training Code Score Code Model assessment settings Train data set is not selected for assessment. Validation data set is selected for assessment. Test data set is not selected for assessment. Scored data set: 5000 observations are saved for interactive model assessment.

63 Neural Network Optimization plot:
[ TARGET=MOVETB ]. Average Profit 3.44 3.67. Misclassification Rate 0.59 0.57. Average Error 4.76 4.48. Average Squared Error 0.20 0.19. Sum of Squared Errors 287.73 145.10. Root Average Squared Error 0.45 0.43. Root Final Prediction Error 0.48. Root Mean Squared Error 0.46 0.43. Error Function 6877.88 3464.82. Mean Squared Error 0.22 0.19. Maximum Absolute Error 1.00 1.00. Final Prediction Error 0.23. Divisor for ASE 1446.00 774.00. Model Degrees of Freedom 96.00. Degrees of Freedom for Error 1109.00. Total Degrees of Freedom 1205.00. Sum of Frequencies 241.00 129.00.
64 Sum Case Weights * Frequencies 1446.00 Akaike's Information Criterion 7069.88 Schwarz's Baysian Criterion 7558.93 Network settings Objective function: Maximize profit Assessment Matrix: Default profit Utilities Variables Output Log Training Code Score Code Model assessment settings Train data set is not selected for assessment. Validation data set is selected for assessment. Test data set is not selected for assessment. Scored data set: 5000 observations are saved for interactive model assessment. 774.00.
Confusion Matrix (Assessed Partition=VALIDATION) Assessment
End Report Path Information Name: Assessment_T256S1J_ Target: MOVETB Description: Mining Function: Transform Subject: No subject Rating: 0 Metadata Information Input Variables Required for Scoring Output Variables Produced by Scoring Target Variables Datastep Score Code C Score Code

APPENDIX D

TETRAD Output Original Data Set { TETRAD II - Version 3.1 by Peter Spirtes, Richard Scheines, Christopher Meek, Thomas Richardson, Clark Glymour Anne Boomsam and Herbert Hoijtink Copyright (C) 1999 Output file: c:\fhout.txt Parameters: Sample Size: 5970 Continuous Data P-value for Correlations a1 a2 a3 a4 b3 b5 c2 c4 d5 d7 d9 e2 f7 f9 g2 g4

a5 c6 e4 g6

a6 c8 e6 g8

b1 d1 f3 h1

d3 f5 h3

APPENDIX E

TETRAD Output Modified Data at a.2 Significance Level { TETRAD II - Version 3.1 by Peter Spirtes, Richard Scheines, Christopher Meek, Thomas Richardson, Clark Glymour Anne Boomsam and Herbert Hoijtink Copyright (C) 1999 Output file: c:\tetout.txt Parameters: Sample Size: 370 Continuous Data Significance Level = 0.2000 /Pattern a1 -> a5 a9 -> a1 b1 -> a1 b4 -> a1 c5 -> a1 a1 -> c6 f6 -> a1 b1 -> a2 b7 -> a2 a2 -> b9 c1 -> a2 c3 -> a2 a3 -> a4 a8 -> a3 c3 -> a3 a9 -> a4 b1 -> a4 b3 -> a4 b7 -> a4 c1 -> a4 b6 -> a5 b8 -> a5 c2 -> a5

 

Tags

JBL L200 Ausmapn CDX-GT121 VSX-D509S Screen KV-21FQ10B Server Magellan 8500 Deskpro EC AR-121E 151E Model 10 SV0602H Vista-20psia Server Optoma RD50 JX-3P CN 40 PMA-500AE Destinator 8 SD257WTS KH 280 Edition 4D 9 MHC-EC68USB STE 70 140-2 Loola UP MAX-L42 Heybrook HB1 KLX110 KC-C70 P2370HD Hunter III Stepway BDM1200S Traveller VII Log-LOG Projector Driver TX6000 MPX 100 Station 7936 MA6100 S5PRO Silhouette 1997 DP-02CF TU 970 M310- Architecture 2010 VN-7600PC GA-BX2000 KD-36HD800 Scanner A-502R 42A456p2D UN55C7000 Canoscan-lide 600F DVP3980 12 CE137NEM-X VT440K SQ-1plus Scpt160 FX-9700GE R8 Plus 46PP9302 DVD 536 FP231W PS-42E92H 21PT5321 Cartrek 600 DMV-UH1977 HL-1070 ITD 58 SDM-X72 A1600 Easyshare C340 T650I NP-QM71D KDC-W657 Pinguino L10 HTC 4350 Cs 650 Sh-1060 UE-37C6000 DVD-SH830 FTX21 42FD9954 VS-2480 DSC-W35 JAX-N5 KDL-46W4000 XR-M33 KD-G441 Printer KDL-32W5710 59961 Review Thermostat-F Abit BD7 TAM-809 B-1445S

 

manuel d'instructions, Guide de l'utilisateur | Manual de instrucciones, Instrucciones de uso | Bedienungsanleitung, Bedienungsanleitung | Manual de Instruções, guia do usuário | инструкция | návod na použitie, Užívateľská príručka, návod k použití | bruksanvisningen | instrukcja, podręcznik użytkownika | kullanım kılavuzu, Kullanım | kézikönyv, használati útmutató | manuale di istruzioni, istruzioni d'uso | handleiding, gebruikershandleiding

 

Sitemap

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101