Spss Classification Trees 13 0
|
|
Bookmark Spss Classification Trees 13 0 |
Here you can find all about Spss Classification Trees 13 0 like manual and other informations. For example: review.
Spss Classification Trees 13 0 manual (user guide) is ready to download for free.
On the bottom of page users can write a review. If you own a Spss Classification Trees 13 0 please write about it to help other people. [ Report abuse or wrong photo | Share your Spss Classification Trees 13 0 photo ]
Manual
Preview of first few manual pages (at low quality). Check before download. Click to enlarge.
Download
(English)Spss Classification Trees 13.0, size: 986 KB |
Spss Classification Trees 13 0
User reviews and opinions
| jamila |
6:28am on Tuesday, October 19th, 2010 ![]() |
| BUYER BEWARE - This phone is not compatible with North American 3G bands. All in all I love this phone, only one I can think that is better is the Samsung I9000 Galaxy S. | |
| Nickleby |
8:13pm on Monday, October 11th, 2010 ![]() |
| Great phone, however, the user interface could be better; I get tried of the on off switch to answer a phone call. I would like to see more service in my town, granted I bought them for use while I was at work. | |
| Bev |
3:12pm on Monday, October 4th, 2010 ![]() |
| The Droid does it all. Open source apps allow this phone to go beyond its potential. Beautiful screen, phone, camera, video, mp3 player, internet. | |
| mpm |
9:42pm on Thursday, August 12th, 2010 ![]() |
| Overall, I would take this phone over the Droid. I think that the form factor, and Sense UI really help for the consumer. I think that the Droid. its a great phone if you know how to use it. needs an update to 2.0/2.1 and its just about perfect from my point of view. great design. | |
| tomlynch |
6:33pm on Tuesday, June 22nd, 2010 ![]() |
| First to all the people who say that this is different than the verizon model; do some reasearch before you post. Second. | |
| m.fal |
7:31am on Wednesday, April 7th, 2010 ![]() |
| i got this flew in from the US and i must say it looked good as i brought it out, i was even impressed by the apps it had on the phone. | |
| engmarg |
11:19pm on Saturday, March 27th, 2010 ![]() |
| love the pink sparkle cover. Thanks! Fits the phone perfectly really like it, have had it for over a month and it holds up well and is very pretty. great buy! | |
Comments posted on www.ps2netdrivers.net are solely the views and opinions of the people posting them and do not necessarily reflect the views or opinions of us.
Documents
Additional Publications
Additional copies of SPSS product manuals may be purchased directly from SPSS Inc. Visit the SPSS Web Store at http://www.spss.com/estore, or contact your local SPSS office, listed on the SPSS Web site at http://www.spss.com/worldwide. For telephone orders in the United States and Canada, call SPSS Inc. at 800-543-2185. For telephone orders outside of North America, contact your local office, listed on the SPSS Web site. The SPSS Statistical Procedures Companion, by Marija Noruis, has been published by Prentice Hall. A new version of this book, updated for SPSS 13.0, is planned. The SPSS Advanced Statistical Procedures Companion, also based on SPSS 13.0, is forthcoming. The SPSS Guide to Data Analysis for SPSS 13.0 is also in development. Announcements of publications available exclusively through Prentice Hall will be available on the SPSS Web site at http://www.spss.com/estore (select your home country, and then click Books).
Tell Us Your Thoughts
Your comments are important. Please let us know about your experiences with SPSS products. We especially like to hear about new and interesting applications using the SPSS system. Please send e-mail to suggest@spss.com or write to SPSS Inc.,
Attn.: Director of Product Planning, 233 South Wacker Drive, 11th Floor, Chicago, IL 60606-6412.
About This Manual
This manual documents the graphical user interface for the procedures included in the Classification Trees add-on module. Illustrations of dialog boxes are taken from SPSS for Windows. Dialog boxes in other operating systems are similar. Detailed information about the command syntax for features in this module is provided in the SPSS Command Syntax Reference, available from the Help menu.
Contacting SPSS
If you would like to be on our mailing list, contact one of our offices, listed on our Web site at http://www.spss.com/worldwide.
Contents
1 Creating Classification Trees 1
Selecting Categories. 7 Validation. 9 Tree-Growing Criteria. 10 Growth Limits. CHAID Criteria. CRT Criteria. QUEST Criteria. Pruning Trees. Surrogates. Options............ 45
CHAID* Chi-square-based** Surrogate independent (predictor) variables X X X CRT QUEST
7 Creating Classification Trees
CHAID* Tree pruning Multiway node splitting Binary node splitting Influence variables Prior probabilities Misclassification costs Fast calculation X X X X
CRT X X X X X
QUEST X X X X X
*Includes Exhaustive CHAID. **QUEST also uses a chi-square measure for nominal independent variables.
Selecting Categories
Figure 1-3 Categories dialog box
For categorical (nominal, ordinal) dependent variables, you can: Control which categories are included in the analysis. Identify the target categories of interest.
8 Chapter 1
Including/Excluding Categories
You can limit the analysis to specific categories of the dependent variable. Cases with values of the dependent variable in the Exclude list are not included in the analysis. For nominal dependent variables, you can also include user-missing categories in the analysis. (By default, user-missing categories are displayed in the Exclude list.)
Target Categories
Selected (checked) categories are treated as the categories of primary interest in the analysis. For example, if you are primarily interested in identifying those individuals most likely to default on a loan, you might select the bad credit-rating category as the target category. There is no default target category. If no category is selected, some classification rule options and gains-related output are not available. If multiple categories are selected, separate gains tables and charts are produced for each target category. Designating one or more categories as target categories has no effect on the tree model, risk estimate, or misclassification results.
Categories and Value Labels
This dialog box requires defined value labels for the dependent variable. It is not available unless at least two values of the categorical dependent variable have defined value labels.
To Include/Exclude Categories and Select Target Categories
E In the main Classification Tree dialog box, select a categorical (nominal, ordinal)
dependent variable with two or more defined value labels.
E Click Categories.
9 Creating Classification Trees
Validation
Figure 1-4 Validation dialog box
Validation allows you to assess how well your tree structure generalizes to a larger population. Two validation methods are available: crossvalidation and split-sample validation.
Crossvalidation
Crossvalidation divides the sample into a number of subsamples, or folds. Tree models are then generated, excluding the data from each subsample in turn. The first tree is based on all of the cases except those in the first sample fold, the second tree is based on all of the cases except those in the second sample fold, and so on. For each tree, misclassification risk is estimated by applying the tree to the subsample excluded in generating it.
10 Chapter 1
You can specify a maximum of 25 sample folds. The higher the value, the fewer the number of cases excluded for each tree model. Crossvalidation produces a single, final tree model. The crossvalidated risk estimate for the final tree is calculated as the average of the risks for all of the trees.
Split-Sample Validation
With split-sample validation, the model is generated using a training sample and tested on a hold-out sample. You can specify a training sample size, expressed as a percentage of the total sample size, or a variable that splits the sample into training and testing samples. If you use a variable to define training and testing samples, cases with a value of 1 for the variable are assigned to the training sample, and all other cases are assigned to the testing sample. The variable cannot be the dependent variable, weight variable, influence variable, or a forced independent variable. You can display results for both the training and testing samples or just the testing sample. Split-sample validation should be used with caution on small data files (data files with a small number of cases). Small training sample sizes may yield poor models, since there may not be enough cases in some categories to adequately grow the tree.
Tree-Growing Criteria
The available growing criteria may depend on the growing method, level of measurement of the dependent variable, or a combination of the two.
11 Creating Classification Trees
Growth Limits
Figure 1-5 Criteria dialog box, Growth Limits tab
The Growth Limits tab allows you to limit the number of levels in the tree and control the minimum number of cases for parent and child nodes.
Maximum Tree Depth. Controls the maximum number of levels of growth beneath
the root node. The Automatic setting limits the tree to three levels beneath the root node for the CHAID and Exhaustive CHAID methods and five levels for the CRT and QUEST methods.
Minimum Number of Cases. Controls the minimum numbers of cases for nodes. Nodes that do not satisfy these criteria will not be split.
Increasing the minimum values tends to produce trees with fewer nodes. Decreasing the minimum values produces trees with more nodes. For data files with a small number of cases, the default values of 100 cases for parent nodes and 50 cases for child nodes may sometimes result in trees with no nodes below the root node; in this case, lowering the minimum values may produce more useful results.
12 Chapter 1
CHAID Criteria
Figure 1-6 Criteria dialog box, CHAID tab
For the CHAID and Exhaustive CHAID methods, you can control:
Significance Level. You can control the significance value for splitting nodes and merging categories. For both criteria, the default significance level is 0.05.
For splitting nodes, the value must be greater than 0 and less than 1. Lower values tend to produce trees with fewer nodes. For merging categories, the value must be greater than 0 and less than or equal to 1. To prevent merging of categories, specify a value of 1. For a scale independent variable, this means that the number of categories for the variable in the final tree is the specified number of intervals (the default is 10). For more information, see Scale Intervals for CHAID Analysis on p. 14.
13 Creating Classification Trees
Chi-Square Statistic. For ordinal dependent variables, chi-square for determining node
splitting and category merging is calculated using the likelihood-ratio method. For nominal dependent variables, you can select the method:
Pearson. This method provides faster calculations but should be used with caution
on small samples. This is the default method.
Likelihood ratio. This method is more robust that Pearson but takes longer to
calculate. For small samples, this is the preferred method.
Model Estimation. For nominal and ordinal dependent variables, you can specify: Maximum number of iterations. The default is 100. If the tree stops growing
because the maximum number of iterations has been reached, you may want to increase the maximum or change one or more of the other criteria that control tree growth.
Minimum change in expected cell frequencies. The value must be greater than 0
and less than 1. The default is 0.05. Lower values tend to produce trees with fewer nodes.
Adjust significance values using Bonferroni method. For multiple comparisons,
significance values for merging and splitting criteria are adjusted using the Bonferroni method. This is the default.
Allow resplitting of merged categories within a node. Unless you explicitly prevent
category merging, the procedure will attempt to merge independent (predictor) variable categories together to produce the simplest tree that describes the model. This option allows the procedure to resplit merged categories if that provides a better solution.
14 Chapter 1
Scale Intervals for CHAID Analysis
Figure 1-7 Criteria dialog box, Intervals tab
In CHAID analysis, scale independent (predictor) variables are always banded into discrete groups (for example, 010, 1120, 2130, etc.) prior to analysis. You can control the initial/maximum number of groups (although the procedure may merge contiguous groups after the initial split):
Fixed number. All scale independent variables are initially banded into the same
number of groups. The default is 10.
Custom. Each scale independent variable is initially banded into the number
of groups specified for that variable.
To Specify Intervals for Scale Independent Variables
E In the main Classification Tree dialog box, select one or more scale independent
variables.
E For the growing method, select CHAID or Exhaustive CHAID. E Click Criteria.
15 Creating Classification Trees E Click the Intervals tab.
In CRT and QUEST analysis, all splits are binary and scale and ordinal independent variables are handled the same way; so, you cannot specify a number of intervals for scale independent variables.
CRT Criteria
Figure 1-8 Criteria dialog box, CRT tab
The CRT growing method attempts to maximize within-node homogeneity. The extent to which a node does not represent a homogenous subset of cases is an indication of impurity. For example, a terminal node in which all cases have the same value for the dependent variable is a homogenous node that requires no further splitting because it is pure. You can select the method used to measure impurity and the minimum decrease in impurity required to split nodes.
Impurity Measure. For scale dependent variables, the least-squared deviation (LSD)
measure of impurity is used. It is computed as the within-node variance, adjusted for any frequency weights or influence values.
16 Chapter 1
For categorical (nominal, ordinal) dependent variables, you can select the impurity measure:
Gini. Splits are found that maximize the homogeneity of child nodes with respect
to the value of the dependent variable. Gini is based on squared probabilities of membership for each category of the dependent variable. It reaches its minimum (zero) when all cases in a node fall into a single category. This is the default measure.
Twoing. Categories of the dependent variable are grouped into two subclasses.
Splits are found that best separate the two groups.
Ordered twoing. Similar to Twoing except that only adjacent categories can be
grouped. This measure is available only for ordinal dependent variables.
Misclassification Costs
Figure 1-12 Options dialog box, Misclassification Costs tab
20 Chapter 1
For categorical (nominal, ordinal) dependent variables, misclassification costs allow you to include information about the relative penalty associated with incorrect classification. For example: The cost of denying credit to a creditworthy customer is likely to be different from the cost of extending credit to a customer who then defaults on the loan. The cost of misclassifying an individual with a high risk of heart disease as low risk is probably much higher than the cost of misclassifying a low-risk individual as high-risk. The cost of sending a mass mailing to someone who isnt likely to respond is probably fairly low, while the cost of not sending the mailing to someone who is likely to respond is relatively higher (in terms of lost revenue).
Misclassification Costs and Value Labels
This dialog box is not available unless at least two values of the categorical dependent variable have defined value labels.
To Specify Misclassification Costs
E Click Options. E Click the Misclassification Costs tab. E Click Custom. E Enter one or more misclassification costs in the grid. Values must be non-negative.
(Correct classifications, represented on the diagonal, are always 0.)
Fill Matrix. In many instances, you may want costs to be symmetricthat is, the
cost of misclassifying A as B is the same as the cost of misclassifying B as A. The following controls can make it easier to specify a symmetric cost matrix:
Duplicate Lower Triangle. Copies values in the lower triangle of the matrix (below
the diagonal) into the corresponding upper-triangular cells.
21 Creating Classification Trees
Duplicate Upper Triangle. Copies values in the upper triangle of the matrix (above
the diagonal) into the corresponding lower-triangular cells.
Use Average Cell Values. For each cell in each half of the matrix, the two values
(upper- and lower-triangular) are averaged and the average replaces both values. For example, if the cost of misclassifying A as B is 1 and the cost of misclassifying B as A is 3, then this control replaces both of those values with the average (1+3)/2 = 2.
Profits
Figure 1-13 Options dialog box, Profits tab
For categorical dependent variables, you can assign revenue and expense values to levels of the dependent variable. Profit is computed as revenue minus expense.
22 Chapter 1
Profit values affect average profit and ROI (return on investment) values in gains tables. They do not affect the basic tree model structure. Revenue and expense values must be numeric and must be specified for all categories of the dependent variable displayed in the grid.
Scale. By default, large trees are automatically scaled down in an attempt to fit
the tree on the page. You can specify a custom scale percentage of up to 200%.
Independent variable statistics. For CHAID and Exhaustive CHAID, statistics
include F value (for scale dependent variables) or chi-square value (for categorical dependent variables) as well as significance value and degrees of freedom. For CRT, the improvement value is shown. For QUEST, F, significance value, and degrees of freedom are shown for scale and ordinal independent variables; for nominal independent variables, chi-square, significance value, and degrees of freedom are shown.
Node definitions. Node definitions display the value(s) of the independent variable
used at each node split.
Tree in table format. Summary information for each node in the tree, including parent node number, independent variable statistics, independent variable value(s) for the node, mean and standard deviation for scale dependent variables, or counts and percentages for categorical dependent variables.
33 Creating Classification Trees Figure 1-19 Tree in table format
34 Chapter 1
Statistics
Figure 1-20 Output dialog box, Statistics tab
Available statistics tables depend on the measurement level of the dependent variable, the growing method, and other settings.
Model Summary. The summary includes the method used, the variables included in the
model, and the variables specified but not included in the model.
35 Creating Classification Trees Figure 1-21 Model summary table
Risk. Risk estimate and its standard error. A measure of the trees predictive accuracy.
For categorical dependent variables, the risk estimate is the proportion of cases incorrectly classified after adjustment for prior probabilities and misclassification costs. For scale dependent variables, the risk estimate is within-node variance.
Classification table. For categorical (nominal, ordinal) dependent variables, this table
42 Chapter 1 Figure 1-28 Gains for percentiles table and index chart
Mean. Line chart of cumulative percentile mean values for the dependent variable.
Available only for scale dependent variables.
Average profit. Line chart of cumulative average profit. Available only for categorical
dependent variables with defined profits. For more information, see Profits on p. 21. The average profit chart plots the same values that you would see in the Profit column in the gain summary for percentiles table.
43 Creating Classification Trees Figure 1-29 Gain summary for percentiles table and average profit chart
Return on investment (ROI). Line chart of cumulative ROI (return on investment).
ROI is computed as the ratio of profits to expenses. Available only for categorical dependent variables with defined profits. The ROI chart plots the same values that you would see in the ROI column in the gain summary for percentiles table.
44 Chapter 1 Figure 1-30 Gain summary for percentiles table and ROI chart
Percentile increment. For all percentile charts, this setting controls the percentile increments displayed on the chart: 1, 2, 5, 10, 20, or 25.
45 Creating Classification Trees
Selection and Scoring Rules
Figure 1-31 Output dialog box, Rules tab
The Rules tab provides the ability to generate selection or classification/prediction rules in the form of SPSS command syntax, SQL, or simple (plain English) text. You can display these rules in the Viewer and/or save the rules to an external file.
Syntax. Controls the form of the selection rules in both output displayed in the Viewer
and selection rules saved to an external file.
SPSS. SPSS command language. Rules are expressed as a set of commands that define a filter condition that can be used to select subsets of cases or as COMPUTE
statements that can be used to score cases.
46 Chapter 1
SQL. Standard SQL rules are generated to select or extract records from a database
or assign values to those records. The generated SQL rules do not include any table names or other data source information.
Simple text. Plain English pseudo-code. Rules are expressed as a set of logical
if.then statements that describe the models classifications or predictions for each node. Rules in this form can use defined variable and value labels or variable names and data values.
61 Tree Editor
Note: If you apply rules in the form of SPSS command syntax to another data file, that data file must contain variables with the same names as the independent variables included in the final model, measured in the same metric, with the same user-defined missing values (if any).
The Classification Tree procedure assumes that: The appropriate measurement level has been assigned to all analysis variables. For categorical (nominal, ordinal) dependent variables, value labels have been defined for all categories that should be included in the analysis. Well use the file tree_textdata.sav to illustrate the importance of both of these requirements. This data file reflects the default state of data read or entered into SPSS before defining any attributes, such as measurement level or value labels. This file is located in the tutorial\sample_files directory of the SPSS installation directory.
Effects of Measurement Level on Tree Models
Both variables in this data file are numeric. By default, numeric variables are assumed to have a scale measurement level. But (as we will see later) both variables are really categorical variables that rely on numeric codes to stand for category values.
E To run a Classification Tree analysis, from the menus choose: Analyze Classify Tree.
64 Chapter 3
The icons next to the two variables in the source variable list indicate that they will be treated as scale variables.
Figure 3-1 Classification Tree main dialog box with two scale variables
E Select dependent as the dependent variable. E Select independent as the independent variable. E Click OK to run the procedure. E Open the Classification Tree dialog box again and click Reset. E Right-click dependent in the source list and select Nominal from the context menu. E Do the same for the variable independent in the source list.
65 Data Assumptions and Requirements
Now the icons next to each variable indicate that they will be treated as nominal variables.
Figure 3-2 Nominal icons in source list
E Select dependent as the dependent variable and independent as the independent variable, and click OK to run the procedure again.
66 Chapter 3
Now lets compare the two trees. First, well look at the tree in which both numeric variables are treated as scale variables.
Figure 3-3 Tree with both variables treated as scale
Each node of tree shows the predicted value, which is the mean value for the dependent variable at that node. For a variable that is actually categorical, the mean may not be a meaningful statistic. The tree has four child nodes, one for each value of the independent variable. Tree models will often merge similar nodes, but for a scale variable, only contiguous values can be merged. In this example, no contiguous values were considered similar enough to merge any nodes together.
Gains for Nodes
Figure 4-11 Gains for nodes
84 Chapter 4
The gains for nodes table provides a summary of information about the terminal nodes in the model. Only the terminal nodesnodes at which the tree stops growingare listed in this table. Frequently, you will be interested only in the terminal nodes, since they represent the best classification predictions for the model. Since gain values provide information about target categories, this table is available only if you specified one or more target categories. In this example, there is only one target category, so there is only one gains for nodes table. Node N is the number of cases in each terminal node, and Node Percent is the percentage of the total number of cases in each node. Gain N is the number of cases in each terminal node in the target category, and Gain Percent is the percentage of cases in the target category with respect to the overall number of cases in the target categoryin this example, the number and percentage of cases with a bad credit rating. For categorical dependent variables, Response is the percentage of cases in the node in the specified target category. In this example, these are the same percentages displayed for the Bad category in the tree diagram. For categorical dependent variables, Index is the ratio of the response percentage for the target category compared to the response percentage for the entire sample.
Index Values
The index value is basically an indication of how far the observed target category percentage for that node differs from the expected percentage for the target category. The target category percentage in the root node represents the expected percentage before the effects of any of the independent variables are considered. An index value of greater than 100% means that there are more cases in the target category than the overall percentage in the target category. Conversely, an index value of less than 100% means there are fewer cases in the target category than the overall percentage.
85 Using Classification Trees to Evaluate Credit Risk
Gains Chart
Figure 4-12 Gains chart for bad credit rating target category
This gains chart indicates that the model is a fairly good one. Cumulative gains charts always start at 0% and end at 100% as you go from one end to the other. For a good model, the gains chart will rise steeply toward 100% and then level off. A model that provides no information will follow the diagonal reference line.
86 Chapter 4
Index Chart
Figure 4-13 Index chart for bad credit rating target category
The index chart also indicates that the model is a good one. Cumulative index charts tend to start above 100% and gradually descend until they reach 100%. For a good model, the index value should start well above 100%, remain on a high plateau as you move along, and then trail off sharply toward 100%. For a model that provides no information, the line will hover around 100% for the entire chart.
105 Building a Scoring Model
In contrast, node 2, which represents cases with an income of 75 or more, has a mean vehicle price of 60.9. Further investigation of the tree would show that age and education also display a relationship with vehicle purchase price, but right now were primarily interested in the practical application of the model rather than a detailed examination of its components.
Risk Estimate
Figure 5-6 Risk table
None of the results weve examined so far tell us if this is a particularly good model. One indicator of the models performance is the risk estimate. For a scale dependent variable, the risk estimate is a measure of the within-node variance, which by itself may not tell you a great deal. A lower variance indicates a better model, but the variance is relative to the unit of measurement. If, for example, price was recorded in ones instead of thousands, the risk estimate would be a thousand times larger. To provide a meaningful interpretation for the risk estimate with a scale dependent variable requires a little work: Total variance equals the within-node (error) variance plus the between-node (explained) variance. The within-node variance is the risk estimate value: 68.485. The total variance is the variance for the dependent variables before consideration of any independent variables, which is the variance at the root node. The standard deviation displayed at the root node is 21.576; so the total variance is that value squared: 465.524.
106 Chapter 5
The proportion of variance due to error (unexplained variance) is 68.485/465.524 = 0.147. The proportion of variance explained by the model is 10.147 = 0.853, or 85.3%, which indicates that this is a fairly good model. (This has a similar interpretation to the overall correct classification rate for a categorical dependent variable.)
Applying the Model to Another Data File
Having determined that the model is reasonably good, we can now apply that model to other data files containing similar age, income, and education variables and generate a new variable that represents the predicted vehicle purchase price for each case in that file. This process is often referred to as scoring. When we generated the model, we specified that rules for assigning values to cases should be saved in a text filein the form of SPSS command syntax. We will now use the commands in that file to generate scores in another data file.
E Open the data file tree_score_car.sav, located in the tutorial\sample_files folder
of the SPSS installation folder.
E Next, from the SPSS menus choose: File New Syntax E In the command syntax window, type: INSERT FILE= 'c:\temp\car_scores.sps'.
107 Building a Scoring Model
If you used a different filename or location, make the appropriate changes.
Figure 5-7 Syntax window with INSERT command to run a command file
The INSERT command will run the commands in the specified file, which is the rules file that was generated when we created the model.
E From the command syntax window menus choose: Run All
108 Chapter 5 Figure 5-8 Predicted values added to data file
This adds two new variables to the data file: nod_001 contains the terminal node number predicted by the model for each case. pre_001 contains the predicted value for vehicle purchase price for each case. Since we requested rules for assigning values for terminal nodes, the number of possible predicted values is the same as the number of terminal nodes, which in this case is 15. For example, every case with a predicted node number of 10 will have the same predicted vehicle purchase price: 30.56. This is, not coincidentally, the mean value reported for terminal node 10 in the original model. Although you would typically apply the model to data for which the value of the dependent variable is not known, in this example the data file to which we applied the model actually contains that informationand you can compare the model predictions to the actual values.
E From the menus choose: Analyze Correlate Bivariate.
109 Building a Scoring Model E Select Price of primary vehicle and pre_001. Figure 5-9 Bivariate Correlations dialog box
E Click OK to run the procedure. Figure 5-10 Correlation of actual and predicted vehicle price
The correlation of 0.92 indicates a very high positive correlation between actual and predicted vehicle price, which indicates that the model works well.
110 Chapter 5
You can use the Classification Tree procedure to build models that can then be applied to other data files to predict outcomes. The target data file must contain variables with the same names as the independent variables included in the final model, measured in the same metric and with the same user-defined missing values (if any). However, neither the dependent variable nor independent variables excluded from the final model need to be present in the target data file.
The different growing methods deal with missing values for independent (predictor) variables in different ways: CHAID and Exhaustive CHAID treat all system- and user-missing values for each independent variable as a single category. For scale and ordinal independent variables, that category may or may not subsequently get merged with other categories of that independent variable, depending on the growing criteria. CRT and QUEST attempt to use surrogates for independent (predictor) variables. For cases in which the value for that variable is missing, other independent variables having high associations with the original variable are used for classification. These alternative predictors are called surrogates. This example shows the difference between CHAID and CRT when there are missing values for independent variables used in the model. For this example, well use the data file tree_missing_data.sav, located in the tutorial\sample_files directory of the SPSS installation directory. Note: For nominal independent variables and nominal dependent variables, you can choose to treat user-missing values as valid values, in which case those values are treated like any other nonmissing values. For more information, see Missing Values in Chapter 1 on p. 27.
text attributes, 56 tree contents in a table, 31 tree in table format, 82 tree map, 51 tree orientation, 31 working with large trees, 51 twoing, 15
validation trees, 9 value labels Classification Tree procedure, 68
weighting cases fractional weights in classification trees, 1
Tags
1200D-W 220 CPL LX3950W 05 1200 R LA 143 MX1063 KVT-M700 Dmc-lz5-multi TRF-7150 LE22S86BD KX-FT938PD DEH-2900MPB Supermatic 2 Omnia PRO H6365 DSC-W360 KX-T7731 Sbchc8441 DSA25000S EP-7E Virtual DJ XB12SS VGA 3G Yamaha M-85 FBX-10 KX-TSC11B VGN-A117S Server Review Excellence Kodak C813 GV53221 DSC-P72 H5455 CW-29M164N ZWD1471S Bissell 1697 XM601 Abit BH6 CW29A8VD T 1818 SL-D3 CDX-L550V HBH-602 Meccano 2 PSR-S550 8 9 RS261mdbp WPS54GU2 Wl-547 CQ-C1120GN C316BEE STR-SE581 MP-508 EL100 DI-604UP FR752-00 Loudspeaker D3251 N610C FM-8500 7 0 TX-8011 Client CMT-CPZ1 DZ5080 GL585 VGN-T350P AQ09NSA Digiverb Leica M8 Mark II Compact 330 Mitsubishi Colt Bladecenter SV-AV20 37GT-25H PMA-700V LG V271 Bernina 1008 740NW RSG5dumh QC5099 MAP 330M All-IN-ONE Sharp GX29 Minolta 7165 R1935 Blazer 12 CX-designer CDX-91 HAR-D1000 PW50-2002 Deskjet 5440 Behold II GX-F66RC AV14BM8EES Skype ML-1640 TEC Aspire-5000
manuel d'instructions, Guide de l'utilisateur | Manual de instrucciones, Instrucciones de uso | Bedienungsanleitung, Bedienungsanleitung | Manual de Instruções, guia do usuário | инструкция | návod na použitie, Užívateľská príručka, návod k použití | bruksanvisningen | instrukcja, podręcznik użytkownika | kullanım kılavuzu, Kullanım | kézikönyv, használati útmutató | manuale di istruzioni, istruzioni d'uso | handleiding, gebruikershandleiding
Sitemap
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101



