Spss Data Validation 14 0
About Spss Data Validation 14 0Here you can find all about Spss Data Validation 14 0 like manual and other informations. For example: review.
Spss Data Validation 14 0 manual (user guide) is ready to download for free.
On the bottom of page users can write a review. If you own a Spss Data Validation 14 0 please write about it to help other people. [ Report abuse or wrong photo | Share your Spss Data Validation 14 0 photo ]
Manual
Preview of first few manual pages (at low quality). Check before download. Click to enlarge.
Download
(English)Spss Data Validation 14.0, size: 823 KB |
Spss Data Validation 14 0
User reviews and opinions
| da7838 |
8:14am on Tuesday, October 26th, 2010 ![]() |
| battery door Palm Pre This product came in a timely manner and was just what was needed. Was just as advertised. Am very pleased with this purchase. | |
| rick98382 |
1:07pm on Saturday, October 16th, 2010 ![]() |
| Palm PRE is an excellent smartphone which has been receiving updates from PALM non-stop. There have been 9 updates in as many months. I gave the Pre a shot about a month ago and it lasted a week with me before I returned it to Sprint.I will make this very short but to the point. | |
| Epsonic |
3:36am on Friday, September 24th, 2010 ![]() |
| I was a staunch Blackberry supporter, however, their archaic web browser has turned me off from them completely. I looked at Android, however. | |
| morz |
3:33pm on Sunday, July 18th, 2010 ![]() |
| Would take the whole lot of them and recycle them into the apple iphone. It is a totally gargabe product! None It crashes, freezes. Palm Pre has yet to hit the market but the fact that it is said to surpass what iPhone has offered. | |
| hazlcha |
8:01am on Tuesday, June 15th, 2010 ![]() |
| I always wanted a Palm Pre when it was first shown off. It had a great vibe to it. It still does, actually. IntroductionI am nota cell phone junkie. In other words. | |
| lf_killer2 |
12:50pm on Wednesday, April 14th, 2010 ![]() |
| The Palm is great in most aspects. I have had some problems with accessing the internet, but with a quick turn off and reboot the problem is fixed. This is a great phone. The OS is fantastic and super easy to learn. The multi tasking is the best on any phone out there. | |
| Uncle_Sam |
1:25am on Sunday, April 11th, 2010 ![]() |
| Works Fine But Has Design Flaws I have two back covers for two phones and they fit fine and were a snap to "install. Palm Pre Touchstone Cover It works just great charging the phone and I can use the phone while it is charging. | |
| isaacvimal |
10:35am on Monday, April 5th, 2010 ![]() |
| Average phone. Unlocked None Anyone who thinks that the Palm Pre is even a remotely comparable product to the iPhone hates Apple and is simply trying to prop up a competitor to th... | |
| kitty |
5:50pm on Saturday, March 27th, 2010 ![]() |
| Palm pre is one of best smartphones on the market have 3mp camera with flash , full querty keyboard touch screen , very fast cpu, multi tasking. | |
Comments posted on www.ps2netdrivers.net are solely the views and opinions of the people posting them and do not necessarily reflect the views or opinions of us.
Documents

SPSS Data Validation 14.0
For more information about SPSS software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412 Tel: (312) 651-3000 Fax: (312) 651-3668 SPSS is a registered trademark and the other product names are the trademarks of SPSS Inc. for its proprietary computer software. No material describing such software may be produced or distributed without the written permission of the owners of the trademark and license rights in the software and the copyrights in the published materials. The SOFTWARE and documentation are provided with RESTRICTED RIGHTS. Use, duplication, or disclosure by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software clause at 52.227-7013. Contractor/manufacturer is SPSS Inc., 233 South Wacker Drive, 11th Floor, Chicago, IL 60606-6412. General notice: Other product names mentioned herein are used for identification purposes only and may be trademarks of their respective companies. TableLook is a trademark of SPSS Inc. Windows is a registered trademark of Microsoft Corporation. DataDirect, DataDirect Connect, INTERSOLV, and SequeLink are registered trademarks of DataDirect Technologies. Portions of this product were created using LEADTOOLS 19912000, LEAD Technologies, Inc. ALL RIGHTS RESERVED. LEAD, LEADTOOLS, and LEADVIEW are registered trademarks of LEAD Technologies, Inc. Sax Basic is a trademark of Sax Software Corporation. Copyright 19932004 by Polar Engineering and Consulting. All rights reserved. Portions of this product were based on the work of the FreeType Team (http://www.freetype.org). A portion of the SPSS software contains zlib technology. Copyright 19952002 by Jean-loup Gailly and Mark Adler. The zlib software is provided as is, without express or implied warranty. A portion of the SPSS software contains Sun Java Runtime libraries. Copyright 2003 by Sun Microsystems, Inc. All rights reserved. The Sun Java Runtime libraries include code licensed from RSA Security, Inc. Some portions of the libraries are licensed from IBM and are available at http://oss.software.ibm.com/icu4j/. SPSS Data Validation 14.0 Copyright 2005 by SPSS Inc. All rights reserved. Printed in the United States of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. 1234567890 ISBN 1-56827-372-X 06 05
Preface
SPSS 14.0 is a comprehensive system for analyzing data. The SPSS Data Validation optional add-on module provides the additional analytic techniques described in this manual. The Data Validation add-on module must be used with the SPSS 14.0 Base system and is completely integrated into that system.
Installation
To install the SPSS Data Validation add-on module, run the License Authorization Wizard using the authorization code that you received from SPSS Inc. For more information, see the installation instructions supplied with the SPSS Data Validation add-on module.
Compatibility
SPSS is designed to run on many computer systems. See the installation instructions that came with your system for specific information on minimum and recommended requirements.
Serial Numbers
Your serial number is your identification number with SPSS Inc. You will need this serial number when you contact SPSS Inc. for information regarding support, payment, or an upgraded system. The serial number was provided with your Base system.
Customer Service
If you have any questions concerning your shipment or account, contact your local office, listed on the SPSS Web site at http://www.spss.com/worldwide. Please have your serial number ready for identification.
Training Seminars
SPSS Inc. provides both public and onsite training seminars. All seminars feature hands-on workshops. Seminars will be offered in major cities on a regular basis. For more information on these seminars, contact your local office, listed on the SPSS Web site at http://www.spss.com/worldwide.
A rule is used to determine whether a case is valid. There are two types of validation rules:
Single-variable rules. Single-variable rules consist of a fixed set of checks
that apply to a single variable, such as checks for out-of-range values. For single-variable rules, valid values can be expressed as a range of values or a list of acceptable values.
Cross-variable rules. Cross-variable rules are user-defined rules that can be applied
to a single variable or a combination of variables. Cross-variable rules are defined by a logical expression that flags invalid values. Validation rules are saved to the data dictionary of your data file. This allows you to specify a rule once and then reuse it.
Load Predefined Validation Rules
You can quickly obtain a set of ready-to-use validation rules by loading predefined rules from an external data file that ships with SPSS.
To Load Predefined Validation Rules
E From the menus choose: Data Validation Load Predefined Rules.
4 Chapter 2 Figure 2-1 Load Predefined Validation Rules
Note that this process deletes any existing single-variable rules in the active dataset. Alternatively, you can use the Copy Data Properties Wizard to load rules from any data file.
Define Validation Rules
The Define Validation Rules dialog box allows you to create and view single-variable and cross-variable validation rules.
To Create and View Validation Rules
E From the menus choose: Data Validation Define Rules.
The dialog box is populated with single-variable and cross-variable validation rules read from the SPSS data dictionary. When there are no rules, a new placeholder rule that you can modify to suit your purposes is created automatically.
E Select individual rules on the Single-Variable Rules and Cross-Variable Rules tabs to
view and modify their properties.
5 Validation Rules
Define Single-Variable Rules
Figure 2-2 Define Validation Rules: Single-Variable Rules tab
The Single-Variable Rules tab allows you to create, view, and modify single-variable validation rules.
Rules. The list shows single-variable validation rules by name and the type of variable
to which the rule can be applied. When the dialog box is opened, it shows rules defined in the data dictionary, or, if no rules are currently defined, a placeholder rule called Single-Variable Rule 1. The following buttons appear below the Rules list:
New. Adds a new entry to the bottom of the Rules list. The rule is selected and
assigned the name SingleVarRule n, where n is an integer so that the new rules name is unique among single-variable and cross-variable rules.
6 Chapter 2
Duplicate. Adds a copy of the selected rule to the bottom of the Rules list. The rule
Validate Data Output
Figure 3-5 Validate Data: Output tab
Casewise Report. If you have applied any single-variable or cross-variable validation
rules, you can request a report that lists validation rule violations for individual cases.
Minimum Number of Violations. This option specifies the minimum number of rule
violations required for a case to be included in the report. Specify a positive integer.
Maximum Number of Cases. This option specifies the maximum number of cases
included in the case report. Specify a positive integer less than or equal to 1000.
18 Chapter 3
Single-Variable Validation Rules. If you have applied any single-variable validation rules, you can choose how to display the results or whether to display them at all. Summarize violations by analysis variable. For each analysis variable, this option
shows all single-variable validation rules that were violated and the number of values that violated each rule. It also reports the total number of single-variable rule violations for each variable.
Summarize violations by rule. For each single-variable validation rule, this option
reports variables that violated the rule and the number of invalid values per variable. It also reports the total number of values that violated each rule across variables.
Display descriptive statistics. This option allows you to request descriptive statistics
for analysis variables. A frequency table is generated for each categorical variable. A table of summary statistics including the mean, standard deviation, minimum, and maximum is generated for the scale variables.
Move cases with validation rule violations. This option moves cases with single-variable
or cross-variable rule violations to the top of the active dataset for easy perusal.
19 Validate Data
Validate Data Save
Figure 3-6 Validate Data: Save tab
The Save tab allows you to save variables that record rule violations to the active dataset.
Summary Variables. These are individual variables that can be saved. Check a box to
save the variable. Default names for the variables are provided; you can edit them.
Empty case indicator. Empty cases are assigned the value 1. All other cases are
Statistics. The procedure produces peer groups, peer group norms for continuous and
categorical variables, anomaly indices based on deviations from peer group norms, and variable impact values for variables that most contribute to a case being considered unusual.
Data Considerations Data. This procedure works with both continuous and categorical variables. Each row represents a distinct observation, and each column represents a distinct variable upon which the peer groups are based. A case identification variable can be available in the data file for marking output, but it will not be used in the analysis. Missing values are allowed. The SPSS weight variable, if specified, is ignored.
22 Chapter 4
The detection model can be applied to a new test data file. The elements of the test data must be the same as the elements of the training data. And, depending on the algorithm settings, the missing value handling that is used to create the model may be applied to the test data file prior to scoring.
Case order. Note that the solution may depend on the order of cases. To minimize order
effects, randomly order the cases. To verify the stability of a given solution, you may want to obtain several different solutions with cases sorted in different random orders. In situations with extremely large file sizes, multiple runs can be performed, with a sample of cases sorted in different random orders.
Assumptions. The algorithm assumes that all variables are nonconstant and independent and assumes that no case has missing values for all of the input variables. Each continuous variable is assumed to have a normal (Gaussian) distribution, and each categorical variable is assumed to have a multinomial distribution. Empirical internal testing indicates that the procedure is fairly robust to violations of both the assumption of independence and the distributional assumptions, but be aware of how well these assumptions are met. To Identify Unusual Cases
E From the menus choose: Data Identify Unusual Cases.
23 Identify Unusual Cases Figure 4-1 Identify Unusual Cases: Variables tab
E Select at least one analysis variable. E Optionally, choose a case ID variable to use in labeling output.
24 Chapter 4
Identify Unusual Cases Output
Figure 4-2 Identify Unusual Cases: Output tab
List of unusual cases and reasons why they are considered unusual. This option produces
three tables: The anomaly case index list displays cases that are identified as unusual and displays their corresponding anomaly index values. The anomaly case peer ID list displays unusual cases and information concerning their corresponding peer groups. The anomaly reason list displays the case number, the reason variable, the variable impact value, the value of the variable, and the norm of the variable for each reason. All tables are sorted by anomaly index in descending order. Moreover, the IDs of the cases are displayed if the case identifier variable is specified on the Variables tab.
25 Identify Unusual Cases
Summaries. The controls in this group produce distribution summaries. Peer group norms. This option displays the continuous variable norms table (if any
continuous variable is used in the analysis) and the categorical variable norms table (if any categorical variable is used in the analysis). The continuous variable norms table displays the mean and standard deviation of each continuous variable for each peer group. The categorical variable norms table displays the mode (most popular category), its frequency, and frequency percentage of each categorical variable for each peer group. The mean of a continuous variable and the mode of a categorical variable are used as the norm values in the analysis.
Anomaly indices. The anomaly index summary displays descriptive statistics for the
anomaly index of the cases that are identified as the most unusual.
Reason occurrence by analysis variable. For each reason, the table displays the
frequency and frequency percentage of each variables occurrence as a reason. The table also reports the descriptive statistics of the impact of each variable. If the maximum number of reasons is set to 0 on the Options tab, this option is not available.
Cases processed. The case processing summary displays the counts and count
percentages for all cases in the active dataset; the cases included and excluded in the analysis; and the cases in each peer group.
26 Chapter 4
Identify Unusual Cases Save
Figure 4-3 Identify Unusual Cases: Save tab
Save Variables. Controls in this group allow you to save model variables to the active
dataset. You can also choose to replace existing variables whose names conflict with the variables to be saved.
Anomaly index. Saves the value of the anomaly index for each case to a variable
with the specified name.
Peer groups. Saves the peer ID, peer group size, and peer group size as a percentage
for each case to variables with the specified rootname. For example, if the rootname Peer is specified, the variables Peerid, PeerSize, and PeerPctSize are
27 Identify Unusual Cases
generated. Peerid is the peer group ID of the case, PeerSize is the groups size, and PeerPctSize is the groups size as a percentage.
Reasons. Saves sets of reasoning variables with the specified rootname. A set of
reasoning variables consists of the name of the variable as the reason, its variable impact measure, its own value, and the norm value. The number of sets depends on the number of reasons requested on the Options tab. For example, if the rootname Reason is specified, the variables ReasonVar_k, ReasonMeasure_k, ReasonValue_k, and ReasonNorm_k are generated, where k is the kth reason. This option is not available if the number of reasons is set to 0.
Export Model File. Allows you to save the model in XML format.
Identify Unusual Cases Missing Values
Figure 4-4 Identify Unusual Cases: Missing Values tab
28 Chapter 4
The Missing Values tab is used to control handling of user-missing and system-missing values.
Exclude missing values from analysis. Cases with missing values are excluded from
the analysis.
Include missing values in analysis. Missing values of continuous variables are
substituted by their corresponding grand means, and missing categories of categorical variables are grouped and treated as a valid category. The processed variables are then used in the analysis. Optionally, you can request the creation of an additional variable that represents the proportion of missing variables in each case and use that variable in the analysis.
Identify Unusual Cases Options
Figure 4-5 Identify Unusual Cases: Options tab
29 Identify Unusual Cases
Criteria for Identifying Unusual Cases. These selections determine how many cases
are included in the anomaly list.
Percentage of cases with highest anomaly index values. Specify a positive number
that is less than or equal to 100.
Fixed number of cases with highest anomaly index values. Specify a positive integer
that is less than or equal to the total number of cases in the active dataset and used in the analysis.
Identify only cases whose anomaly index value meets or exceeds a minimum value.
Specify a non-negative number. A case is considered anomalous if its anomaly index value is larger than or equal to the specified cutoff point. This option is used together with the Percentage of cases and Fixed number of cases options. For example, if you specify a fixed number of 50 cases and a cutoff value of 2, the anomaly list will consist of, at most, 50 cases, each with an anomaly index value that is larger than or equal to 2.
analysis variables.
E Select Hospital ID, Patient ID, and Attending Physician ID as case identifier variables. E Click the Basic Checks tab.
35 Validate Data Figure 5-2 Validate Data: Basic Checks tab
The default settings are, in fact, the settings that you want to run.
E Click OK.
Warnings
Figure 5-3 Warnings
The analysis variables passed the basic checks, and there are no empty cases, so a warning is displayed that explains why there is no output corresponding to these checks.
36 Chapter 5
Incomplete Identifiers
Figure 5-4 Incomplete case identifiers
When there are missing values in case identification variables, the case cannot be properly identified. In this data file, case 288 is missing the Patient ID, while cases 573 and 774 are missing the Hospital ID.
Duplicate Identifiers
Figure 5-5 Duplicate case identifiers (first 11 shown)
A case should be uniquely identified by the combination of values of the identifier variables. The first 11 entries in the duplicate identifiers table are shown here. These duplicates are patients with multiple events who were entered as separate cases for each event. Because this information can be collected in a single row, these cases should be cleaned up.
37 Validate Data
Copying and Using Rules from Another File
The analyst notes that the variables in this data file are similar to the variables from another project. The validation rules that are defined for that project are saved as properties of the associated data file and can be applied to this data file by copying the data properties of the file.
E To copy rules from another file, from the menus choose: Data Copy Data Properties.
38 Chapter 5 Figure 5-6 Copy Data Properties - Welcome
E Choose to copy properties from an external SPSS data file, patient_los.sav, which can
be found in the \Tutorial\sample_files subdirectory of the SPSS installation directory.
E Click Next.
39 Validate Data Figure 5-7 Copy Data Properties - Choose variables
These are the variables whose properties you want to copy from patient_los.sav to the corresponding variables in stroke_invalid.sav.
40 Chapter 5 Figure 5-8 Copy Data Properties - Choose variable properties
E Deselect all properties except Custom Attributes. E Click Next.
41 Validate Data Figure 5-9 Copy Data Properties - Choose dataset properties
E Select Custom Attributes. E Click Finish.
You are now ready to reuse the validation rules.
42 Chapter 5 Figure 5-10 Validate Data: Single-Variable Rules tab
E To validate the stroke_invalid.sav data by using the copied rules, click the Dialog Recall toolbar button and choose Validate Data. E Click the Single-Variable Rules tab.
The Analysis Variables list shows the variables that are selected on the Variables tab, some summary information about their distributions, and the number of rules attached to each variable. Variables whose properties were copied from patient_los.sav have rules that are attached to them. The Rules list shows the single-variable validation rules that are available in the data file. These rules were all copied from patient_los.sav. Note that some of these rules are applicable to variables that did not have exact counterparts in the other data file.
43 Validate Data Figure 5-11 Validate Data: Single-Variable Rules tab
E Select Atrial fibrillation, History of transient ischemic attack, CAT scan result, and Died in hospital, and then apply the 0 to 1 Dichotomy rule. E Apply 0 to 3 Categorical to Post-event rehabilitation. E Apply 0 to 2 Categorical to Post-event preventative surgery. E Apply Nonnegative integer to Length of stay for rehabilitation. E Apply 1 to 4 Categorical to Recoded Barthel index at 1 month through Recoded Barthel
index at 6 months.
E Click the Save tab.
44 Chapter 5 Figure 5-12 Validate Data: Save tab
E Select Save indicator variables that record all validation rule violations. This process will
make it easier to connect the case and variable that cause single-variable rule violations.
45 Validate Data
Rule Descriptions
Figure 5-13 Rule descriptions
The rule descriptions table displays explanations of rules that were violated. This feature is very useful for keeping track of a lot of validation rules.
Variable Summary
Figure 5-14 Variable summary
46 Chapter 5
The variable summary table lists the variables that violated at least one validation rule, the rules that were violated, and the number of violations that occurred per rule and per variable.
Case Report
Figure 5-15 Case report
The case report table lists the cases (by both case number and case identifier) that violated at least one validation rule, the rules that were violated, and the number of times that the rule was violated by the case. The invalid values are shown in the Data Editor.
47 Validate Data Figure 5-16 Data Editor with saved indicators of rule violations
A separate indicator variable is produced for each application of a validation rule. Thus, @0to3Categorical_anticlot_ is the application of the 0 to 3 Categorical single-variable validation rule to the variable Taking anti-clotting drugs. For a given case, the easiest way to figure out which variables value is invalid is simply to scan the values of the indicators. A value of 1 means that the associated variables value is invalid.
48 Chapter 5 Figure 5-17 Data Editor with indicator of rule violation for case 175
Go to case 175, the first case with a rule violation. To speed your search, look at the indicators that are associated with variables in the variable summary table. It is easy to see that History of angina has the invalid value.
49 Validate Data Figure 5-18 Data Editor with invalid value for History of angina
History of angina has a value of 1. While this value is a valid missing value for treatment and result variables in the data file, it is invalid here because the patient history values do not currently have user-missing values defined.
Defining Your Own Rules
The validation rules that were copied from patient_los.sav have been very useful, but you need to define a few more rules to finish the job. Additionally, sometimes patients that are dead on arrival are accidentally marked as having died at the hospital. Single-variable validation rules cannot catch this situation, so you need to define a cross-variable rule to handle the situation.
E Click the Dialog Recall toolbar button and choose Validate Data. E Click the Single-Variable Rules tab. (You need to define rules for Hospital size, the
variables that measure Rankin scores, and the variables corresponding to the unrecoded Barthel indices.)
E Click Define Rules.
50 Chapter 5 Figure 5-19 Define Validation Rules: Single-Variable Rules tab
The currently defined rules are shown with 0 to 1 Dichotomy selected in the Rules list and the rules properties displayed in the Rule Definition group.
E To define a rule, click New.
51 Validate Data Figure 5-20 Define Validation Rules: Single-Variable Rules tab (1 to 3 Categorical defined)
E Type 1 to 3 Categorical as the rule name. E For Valid Values, choose In a list. E Type 1, 2, and 3 as the values. E Deselect Allow system-missing values. E To define the rule for Rankin scores, click New.
52 Chapter 5 Figure 5-21 Define Validation Rules: Single-Variable Rules tab (0 to 5 Categorical defined)
E Type 0 to 5 Categorical as the rule name. E For Valid Values, choose In a list. E Type 0, 1, 2, 3, 4, and 5 as the values. E Deselect Allow system-missing values. E To define the rule for Barthel indices, click New.
53 Validate Data Figure 5-22 Define Validation Rules: Single-Variable Rules tab (0 to 100 by 5 defined)
E Type 0 to 100 by 5 as the rule name. E For Valid Values, choose In a list. E Type 0, 5,., and 100 as the values. E Deselect Allow system-missing values. E Click Continue.
54 Chapter 5 Figure 5-23 Validate Data: Single-Variable Rules tab (0 to 100 by 5 defined)
Now you need to apply the defined rules to analysis variables.
E Apply 1 to 3 Categorical to Hospital size. E Apply 0 to 5 Categorical to Initial Rankin score and Rankin score at 1 month through
Rankin score at 6 months.
E Apply 0 to 100 by 5 to Barthel index at 1 month through Barthel index at 6 months. E Click the Cross-Variable Rules tab.
There are no currently defined rules.
55 Validate Data Figure 5-24 Define Validation Rules: Cross-Variable Rules tab
When there are no rules, a new placeholder rule is automatically created.
E Type DiedTwice as the name of the rule. E Type (doa=1) & (dhosp=1) as the logical expression. This will return a value of 1 if the
patient is recorded as both having been dead on arrival and having died in the hospital.
E Click Continue.
The newly defined rule is automatically selected in the Cross-Variable Rules tab.
56 Chapter 5
Cross-Variable Rules
Figure 5-25 Cross-variable rules
The cross-variable rules summary lists cross-variable rules that were violated at least once, the number of violations that occurred, and a description of each violated rule.
57 Validate Data
Figure 5-26 Case report
The case report now includes the cases that violated the cross-variable rule, as well as the previously discovered cases that violated single-variable rules. These cases all need to be reported to data entry for correction.
58 Chapter 5
Summary
The analyst has the necessary information for a preliminary report to the data entry manager.
Related Procedures
The Validate Data procedure is a useful tool for data quality control. The Identify Unusual Cases procedure analyzes patterns in your data and identifies cases with a few significant values that vary from type.
Identify Unusual Cases Algorithm
This algorithm is divided into three stages:
Modeling. The procedure creates a clustering model that explains natural groupings (or
clusters) within a dataset that would otherwise not be apparent. The clustering is based on a set of input variables. The resulting clustering model and sufficient statistics for calculating the cluster group norms are stored for later use.
Scoring. The model is applied to each case to identify its cluster group, and some
indices are created for each case to measure the unusualness of the case with respect to its cluster group. All cases are sorted by the values of the anomaly indices. The top portion of the case list is identified as the set of anomalies.
Reasoning. For each anomalous case, the variables are sorted by their corresponding
variable deviation indices. The top variables, their values, and the corresponding norm values are presented as the reasons why a case is identified as an anomaly.
60 Chapter 6
Identifying Unusual Cases in a Medical Database
A data analyst hired to build predictive models for stroke treatment outcomes is concerned about data quality because such models can be sensitive to unusual observations. Some of these outlying observations represent truly unique cases and are thus unsuitable for prediction, while other observations are caused by data entry errors in which the values are technically correct and thus cannot be caught by data validation procedures. This information is collected in stroke_valid.sav. Use Identify Unusual Cases to clean the data file. Syntax for reproducing these analyses can be found in detectanomaly_stroke.sps.
Running the Analysis
E To identify unusual cases, from the menus choose: Data Identify Unusual Cases.
61 Identify Unusual Cases Figure 6-1 Identify Unusual Cases: Variables tab
E Select Age category through Stroke between 3 and 6 months as analysis variables. E Select Patient ID as the case identifier variable. E Click the Output tab.
62 Chapter 6 Figure 6-2 Identify Unusual Cases: Output tab
E Select Peer group norms, Anomaly indices, Reason occurrence by analysis variable, and Cases processed. E Click the Save tab.
63 Identify Unusual Cases Figure 6-3 Identify Unusual Cases: Save tab
E Select Anomaly index, Peer groups, and Reasons.
Saving these results allows you to produce a useful scatterplot that summarizes the results.
E Click the Missing Values tab.
64 Chapter 6 Figure 6-4 Identify Unusual Cases: Missing Values tab
E Select Include missing values in analysis. This process is necessary because there are
a lot of user-missing values to handle patients who died before or during treatment. An extra variable that measures the proportion of missing values per case is added to the analysis as a scale variable.
E Click the Options tab.
65 Identify Unusual Cases Figure 6-5 Identify Unusual Cases: Options tab
E Type 2 as the percentage of cases to consider anomalous. E Deselect Identify only cases whose anomaly index value meets or exceeds a minimum value. E Type 3 as the maximum number of reasons. E Click OK.
66 Chapter 6
69 Identify Unusual Cases Figure 6-10 Anomaly case reason list (first 8 cases)
This configuration makes it easy to compare the relative contributions of the top three reasons for each case. Case 843 is, as suspected, considered anomalous because of its unusually large value of cost. In contrast, no single reason contributes more than 0.10 to the unusualness of case 501.
70 Chapter 6
Scale Variable Norms
Figure 6-11 Scale variable norms
The scale variable norms report the mean and standard deviation of each variable for each peer group and overall. Comparing the values gives some indication of which variables contribute to peer group formation. For example, the mean for Length of stay for rehabilitation is fairly constant across all three peer groups, meaning that this variable does not contribute to peer group formation. In contrast, Total treatment and rehabilitation costs in thousands and Missing Proportion each provide some insight into peer group membership. Peer group 1 has the highest average cost and the fewest missing values. Peer group 2 has very low costs and a lot of missing values. Peer group 3 has middling costs and missing values. This organization suggests that peer group 2 is composed of patients who were dead on arrival, thus incurring very little cost and causing all of the treatment and rehabilitation variables to be missing. Peer group 3 likely contains many patients who died during treatment, thus incurring the treatment costs but not the rehabilitation costs and causing the rehabilitation variables to be missing. Peer group 1 is likely composed almost entirely of patients who survived through treatment and rehabilitation, thus incurring the highest costs.
71 Identify Unusual Cases
Categorical Variable Norms
Figure 6-12 Categorical variable norms (first 10 variables)
The categorical variable norms serve much the same purpose as the scale norms, but categorical variable norms report the modal (most popular) category and the number and percentage of cases in the peer group that fall into that category. Comparing the values can be somewhat trickier; for example, at first glance, it may appear that Gender contributes more to cluster formation than Smoker because the modal category for Smoker is the same for all three peer groups, while the modal category for Gender differs on peer group 3. However, because Gender has only two values, you can infer that 49.2% of the cases in peer group 3 have a value of 0, which is very similar to the
72 Chapter 6
percentages in the other peer groups. By contrast, the percentages for Smoker range from 72.2% to 81.4%.
Figure 6-13 Categorical variable norms (selected variables)
The suspicions that were raised by the scale variable norms are confirmed further down in the categorical norms table. Peer group 2 is entirely composed of patients who were dead on arrival, so all treatment and rehabilitation variables are missing. Most of the patients in peer group 3 (69.0%) died during treatment, so the modal category for rehabilitation variables is (Missing Value).
73 Identify Unusual Cases
Anomaly Index Summary
Figure 6-14 Anomaly index summary
The table provides summary statistics for the anomaly index values of cases in the anomaly list.
74 Chapter 6
Reason Summary
Figure 6-15 Reason summary (treatment and rehabilitation variables)
For each variable in the analysis, the table summarizes the variables role as a primary reason. Most variables, such as variables from Dead on arrival to Post-event rehabilitation, are not the primary reason that any of the cases are on the anomaly list. Barthel index at 1 month is the most frequent reason, followed by Total treatment and rehabilitation costs in thousands. The variable impact statistics are summarized, with the minimum, maximum, and mean impact reported for each variable, along with the standard deviation for variables that were the reason for more than one case.
75 Identify Unusual Cases
Scatterplot of Anomaly Index by Variable Impact
The tables contain a lot of useful information, but it can be difficult to grasp the relationships. Using the saved variables, you can construct a graph that makes this process easier.
E To produce this scatterplot, from the menus choose: Graphs Scatter/Dot. Figure 6-16 Scatterplot dialog box
E Click Define.
76 Chapter 6 Figure 6-17 Simple Scatterplot dialog box
E Select Anomaly Index as the y variable and Reason Variable Impact Measure 1 as
the x variable.
E Select Peer Group ID as the variable to set markers by. E Click OK.
These selections produce the scatterplot.
77 Identify Unusual Cases Figure 6-18 Scatterplot of anomaly index by impact measure of first reason variable
Inspection of the graph leads to several observations: The case in the upper right corner belongs to peer group 3 and is both the most anomalous case and the case with the largest contribution made by a single variable. Moving down along the y axis, we see that there are three cases belonging to peer group 3, with anomaly index values just above 2.00. These cases should be investigated more closely as anomalous. Moving along the x axis, we see that there are four cases belonging to peer group 1, with variable impact measures approximately in the range of 0.23 to 0.33. These cases should be investigated more thoroughly because these values separate the cases from the main body of points in the plot. Peer group 2 seems fairly homogenous in the sense that its anomaly index and variable impact values do not vary widely from their central tendencies.
Tags
RDR-HXD995 CVP-103 CLP-311 GR-DVL320u-gr-dvl320 Firewire 1814 GR-D270 Octopre Pc 1000 EN 1 CJ2M-cpu3 Digital 2-substance SA-AK58 260 PRO 20PFL5522D 05 WF7602NAW Octave Zest Plus CT-S410 CQ-VD5505N Seiko V657 ZCF100 30004 MS-195US Treadmills Twin-AMP WTD1276F EUF23800 LA32A550p1F K8N-DL HI 9143 Nikkormat EL MEX-DV1500U Officejet 6300 CX11NF DI100 - 100 EMP-TW700 HD250HJ-SRA HTS3154-12 HI-407BT TI618BT1 37SL8000 Recorder CD824 Nokia DC-1 Automatic-2006 793MB HD300LJ DCR-SR68E HX4700 KDC-W534Y PET1031 Review WRT320N DCR-TRV130 247-2 Prelude Mover TE Dm620 3 0 CDR770 90046 NW-E003F St 1400 DA-3630AD WM3-96 DF030DWE RS21dasw Dimension 8100 Stylus C86 EWF860 Photography SC7830 D26410 Voip3211G Onetouch 9520 VGN-NR38s S S09AW Uk E W211 Spirit-10A L64840 Miele S558 DFX-5000 Avalanche 2003 Drive PRO III KDL-32S2020 V3 0 MAX-ZJ550 Fable Calculator KDL-40S2810 ZWF1210 WHP 462 DMC-LZ7 PSP-1006 K C-5050 Zoom PM-A970 Cube-60D
manuel d'instructions, Guide de l'utilisateur | Manual de instrucciones, Instrucciones de uso | Bedienungsanleitung, Bedienungsanleitung | Manual de Instruções, guia do usuário | инструкция | návod na použitie, Užívateľská príručka, návod k použití | bruksanvisningen | instrukcja, podręcznik użytkownika | kullanım kılavuzu, Kullanım | kézikönyv, használati útmutató | manuale di istruzioni, istruzioni d'uso | handleiding, gebruikershandleiding
Sitemap
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101



