Casio FX-100D
|
|
Bookmark Casio FX-100D |
About Casio FX-100DHere you can find all about Casio FX-100D like manual and other informations. For example: review.
Casio FX-100D manual (user guide) is ready to download for free.
On the bottom of page users can write a review. If you own a Casio FX-100D please write about it to help other people. [ Report abuse or wrong photo | Share your Casio FX-100D photo ]
Manual
Preview of first few manual pages (at low quality). Check before download. Click to enlarge.
Download
(English)Casio FX-100D, size: 3.9 MB |
Casio FX-100D
User reviews and opinions
| LaurieF |
6:31pm on Wednesday, September 1st, 2010 ![]() |
| great little stand. not for pro use at all but it serves it's purpose. it's [$] stand and worth every nickle. Folds Small","Good Stability". works like i expected it. pretty well known item. Folds Small","Lightweight","Unfolds Quickly Unstable","Weak Construction | |
| swodisimus |
3:16pm on Wednesday, August 11th, 2010 ![]() |
| I bought this for my SE Asia/Japan trip. Everywhere I went, there were people("tourist) to help me take picture... | |
| fyreMAN |
9:07pm on Wednesday, August 4th, 2010 ![]() |
| I bought this product for a recent cruise that I took in the Carribean. The little contraption is quite a useful one. I bought this for my older sister to use with her camera. | |
| chadpage |
6:20am on Friday, June 25th, 2010 ![]() |
| Brilliant Amazon once again This product is great, and great fun to use! Gorillapod Excellent product allowing you to set up your camera wherever you are. Slightly expensive for what it is. | |
| rrstiff |
8:50am on Monday, June 21st, 2010 ![]() |
| I have a Nikon D80 DSLR with a 18-200mm zoom lens. So I wanted a "flexpod" that can handle up to 7 lbs. I bought this product for a recent cruise that I took in the Carribean. The little contraption is quite a useful one. I bought this for my older sister to use with her camera. | |
| rshri |
11:26pm on Sunday, June 20th, 2010 ![]() |
| "I got this Tripod as a early christmas present and i started to play around with it and it was just too short and if there is something to wrap the l... | |
| lefritz |
3:13am on Wednesday, June 16th, 2010 ![]() |
| I bought this for my SE Asia/Japan trip. Everywhere I went, there were people("tourist) to help me take picture... | |
| markymo01 |
11:36am on Monday, May 24th, 2010 ![]() |
| This is a greta little companion to our standard digital camera. It hangs on to just about anything and is quick to set up. Lamp posts, fence posts, trees, chairs. A great idea to add stabilty on all surfaces. Coupled with the timer on my SLR we no longer have to ask strangers to "Take a photo of us"...brilliant. | |
| danofsteel |
3:56pm on Friday, March 19th, 2010 ![]() |
| Gorillapod Since this was sold with the Gorillapod clip as a single purchase I was expecting to be able to use them both immediately. However not so. Joby does the job You would find it hard to go wrong with this product. | |
| Bertrand Russel |
9:36am on Thursday, March 18th, 2010 ![]() |
| This is my second Gorilla Pod. Small in size but because of it's flexibility can mount virtually anywhere at any level! Ideal travel companion. It's great to be able to mount your camera anywhere and get a nice stable shot! Folds Small","Good Stability","Lightweight","Unfolds Quickly | |
Comments posted on www.ps2netdrivers.net are solely the views and opinions of the people posting them and do not necessarily reflect the views or opinions of us.
Documents

For real work: Microsoft Excel does basic analysis (especially if you switch on the Analysis ToolPak, available in Excel 97 from the Tools AddIns menu, and thereafter from Tools Data Analysis) and can generate quite good graphs, with a little playing. We will not cover proper statistics packages (such as SPSS) at Part IB. In any case, in the exams youll be required to do basic statistical tests with a calculator, so dont become reliant on a computer yet.
2 1.1 Basic mathematics If any of this (apart from the stuff in wavy lines) causes you problems, because for some reason you havent done NST IA Elementary Maths, you should speak to your Director of Studies about catching up to this level. Some of it isnt used in the stats course but is common in psychology (e.g. logarithms are used in psychophysics). Fractions, percentages
5 5% 0.05 100
Notation to be familiar with
A small change in x (pronounced delta-x). The sum of x (i.e. add up all the xs that you have). A more precise way of specifying summation: this means for every value of i from 1 to n take the sum of xi , or x1 + x2 + x3 + + xn. Much less than, less than, less than or equal to, equal to, greater than or equal to, greater than, much greater than. Does not equal, approximately equals, approximately equals, is equivalent/identical to Implies, is implied by, implies and is implied by Is proportional to Infinity
<<, <, , =, , >, >> , , ,
Powers (a summary) though nothing beyond x2 and x used in IB statistics x a+ b x a x b x1 xa x a b b x2 x 1 x1 x x 1 x 1 x x2 x x ab ab 3 x x x3 x 1 x x3 x x x 1 a x b x b xa xn n x x n x x L xn 1 n x n a 1 x xb b a x Logarithms (a summary) though not needed for IB statistics log a x + log a y log a xy log a b = c b = a c x log a x log a y log a log x ( x n ) n y
log10 ( x ) lg( x) log e ( x ) ln(x ) e = 2.718281828
log a x y y log a x log a c log b c log b a log b x log b a
log a x log a b log b x
Calculus If f(x) is some function of x, then the function giving the gradient of f(x) is the first & d f ( x ). If f(x) is derivative of f(x) with respect to x, written variously f ( x) = f = dx some function of x, then the area under the curve of f(x) is given by the integral of f(x) with respect to x: f ( x )dx. This is called the indefinite integral, because it doesnt specify which parts of the curve we want the area under. The area under the curve f(x) from x = a to x = b is given by the definite integral
a f ( x)dx.
3 1.2 Basic terminology Variables and measurement When we measure something that can vary, it is termed a variable. We can distinguish between discrete variables, which can only take certain values (e.g. in mammals, sex is a discrete variable which can take one of the two values male and female), and continuous variables, which can take any value (such as height). We can also distinguish between quantitative data and frequency data (also called categorical or qualitative data). Height is measured (quantified), and is therefore quantitative. If we count the number of males and females in the room, each person falls into one category or the other, and the data we end up with are frequencies (e.g. there are 26 males and 29 females). While were at it, we can also distinguish several types of measurement scale. Nominal scales arent really scales at all, theyre categories (e.g. male/female, Labour/Conservative/Lib Dem). The categories are different, but the nature of their difference isnt relevant. Ordinal scales rank things, but do not specify how far apart they are on a scale. For example, in the Army a lieutenant ranks lower than a captain, who ranks lower than a major; however, it doesnt make sense to ask whether a major is more or less above a captain than a captain is above a lieutenant. Interval scales have meaningful differences; 10C is as far above 10C as 40C is above 20C. However, interval scales do not have a meaningful zero point (0C is not the absence of temperature), so we cant say that 40C is twice as hot as 20C. Ratio scales have a true zero point. 40 K is twice as hot as 20 K (because 0 K is the absence of heat); 3 m is twice as far as 1.5 m. Frequently we come across a variable that can take many values. For example, suppose we have a group of 30 people and we want to know something about their heights. We might call X the variable that represents their height. Well be able to make 30 different measurements of X; we might call them X1, X2 X30. Each measurement is a single observation drawn from our variable. (Variables are often referred to by upper-case letters, such as X. Individual values of a variable are referred to by corresponding lower-case letters, such as x, or by the upper-case letter with a subscript, such as X1, X2, Xi, or by the lower-case letter with a subscript, such as x1, x2, xi.) Populations and samples Taking this a step further, we can distinguish populations from samples. If all we want to know is the height of our 30 people, we can measure it and thats the end of the matter. Our measured sample is the same as our total population. But very often, we want to estimate something about a population by measuring a sample of that population that is very far from being the whole population. For example, if we want to know the height of 20-year-old human males in general, then wed be unable in practice to measure the whole population, but we could measure 30 male 20-yearold Cambridge psychology undergraduates. This would be convenient, and we would get a number that would be a definitive measurement of our particular set of subjects, but would also be an estimator of the height of all 20-year-old male Cambridge undergraduates, and an estimator of the height of all 20-year-old male humans. However, it wouldnt necessarily be a very good estimator of the latter the sample may not be very representative of the whole population (average height in the UK is shorter than in Germany but taller than for Japan) and, more importantly, may be systematically different from the population mean (university students might be taller than similarly-aged UK males in general). The latter is called bias. If we want to obtain a sample that is likely to be a good estimator of the whole population, we should draw a random sample one where every member of the population has an equal chance of being picked to be in our sample. Studies based on nonrandom samples may lack generality (or external validity) so studying the effects of a potential memory-enhancing drug on Cambridge students might tell you a lot about what itll do to other university students, but not the adult population as a whole.
4 Descriptive and inferential statistics Statistics itself can mean a couple of things. Descriptive statistics is the business of describing things, youll be shocked to learn; newspapers are full of it (Henmans average serving speed was X). In research, it also includes the business of looking at the distribution of your data (is there an even spread of ability in my subjects or do I have a high-performing subgroup and a low-performing subgroup?). The job of having a look at the distribution of a data set before analysing it in detail is called exploratory data analysis (EDA), a set of techniques developed by a statistician called Tukey. Inferential statistics is the business of inferring conclusions about a population from studies conducted with a sample. When we measure an attribute (such as height) from a whole population, weve measured a parameter of the population. If we measure the same thing with a sample, weve measured a statistic of the sample. So inferential statistics is also the business of inferring parameters from statistics (in this specialized sense). We tend to use Greek letters for parameters, such as and , but Roman letters for statistics (such as x and s). Exerting control: independent and dependent variables, between- and within-subject designs If we manipulate or control a variable, it is termed an independent variable. We might test the reaction times of a group of people having given them one of three different doses of a drug; drug dose would then be a (discrete) independent variable. We might want to know how the drugs effect depends on their body weight; body weight would then be a (continuous) independent variable. The thing that we measure is the dependent variable, in this case reaction time. When we come to manipulate independent variables, we must consider randomness, just as we do when we choose samples from populations. If we are going to give our drug to some of our subjects and no drug to other subjects, we must consider several factors. First, we probably do not want the subjects to know whether they are receiving the drug or not, because this knowledge might in some way affect their performance; we would therefore give the non-drug group a placebo (Latin for I shall please a sugar pill given by doctors to placate patients they think dont need drug treatment). The groups should be unaware or blind to whether they receive drug or placebo; ideally, the person running the experiment should also be unaware, so he/she cant bias performance in any way. This would make the study a doubleblind, placebo-controlled study. However, we must also make sure that our drug group does not differ from the placebo group in some important way. If the drug group were male and the placebo group were all female, any potential effects of our drug would be confounded with the effects of the subjects sex; our study would be uninterpretable; it would not have internal validity. Similarly, if the subjects who are going to receive the drug have better reaction times to begin with than the subjects who are going to receive placebo, our results might not mean what we think they mean. Ideally, we would like our two groups to be matched for all characteristics other than the variable we want to manipulate (drug v. placebo). We can try to craft matched groups by measuring things that we think are relevant (e.g. reaction time on the task were going to use or a similar task, age, IQ, sex). But we probably cant explicitly match groups on every variable that might potentially be a confound; eventually we need a mechanism to decide which group a subject goes in, and that method should be random assignment. So in our example, if we have plenty of subjects, we could just randomly assign them to the drug group or the placebo group. Or we could match them a bit better by ranking them in order of reaction time performance and, working along from the best to the worst, take pairs of subjects (from the best pair to the worst pair), and from each pair assign one to the drug group and one to the placebo group at random. Random assignment takes care of all the factors you havent thought of for example, if your subjects are all going to do an IQ test in your suite of testing rooms, you should seat them randomly, in case one rooms hotter than the others, or nearer the builders radio outside, or whatever. Common confounding factors it is always worth thinking about are time and who collects the data.
5 If youre not in full control of the independent variable, your conclusions may be limited. For example, suppose you find your drug improves reaction-time performance in people whose (pre-drug or baseline) performance was bad, but not in people whose baseline performance was good. You might conclude that your drug improves performance up to some sort of ceiling. However, suppose that all your good performers were women and all the bad performers were men. In that case, you cant distinguish a performance-dependent effect from a sex-dependent effect. So far, weve been talking about between-subjects designs, in which you do one thing to some subjects (e.g. giving them drug) and another to others (e.g. giving them placebo). A very powerful method that you might consider is to use a withinsubjects design, in which every person gets tested on drug and on placebo, at separate times. The two types of design require different statistical analysis, which well discuss later basically, in a within-subjects design, two measurements from the same person are related/similar in a way that two measurements from two different people arent, and you have to take account of that. Within-subjects designs are very powerful, but they do have some problems to do with time: order and practice effects. If everybody does your task on placebo first and then on drug, and they get better, the effect might be due to practice rather than the drug. There are other kinds of effects that can arise if everyone experiences treatments in a particular order. You must design your experiment to avoid such potential confounds. 1.3 Plotting data The first thing we should do before analysing any set of data is to look at it. For this, its helpful to have some kind of graphical way of representing it. Here are a few. Histograms and grouped histograms Data set 1
Here we have a large list of measurements of something (it doesnt matter what), but we dont get much sense of the distribution. A histogram plots the frequency with
Left: Frequency histogram. The x axis (abscissa) shows values or categories; the y axis (ordinate) shows the frequency with which an observation fell into the appropriate category. This histogram looks rather noisy because there are too many categories. Right: Histogram with data grouped in more sensible categories. The same data as on the left. Each category (on the x axis) represents an interval. In this example, the value printed on the x axis is the midpoint of the interval; thus, 45 denotes those values falling into the range 42.547.5 (this is just done to save a bit of space). Choose your own interval size to make the histogram look sensible n categories is often a good choice when then are n observations. If you ever choose to make the intervals not all equal in width (you might call this asking for trouble), you should make the area of each bar proportional to the number of observations, rather than the height.
This was easy to find, because we had an odd number of observations. If we had an even number of observations then wed add up the two closest to the middle and divide by two: 18 the two middle values The median is (17+18) 2 = 17.Data set 23
Why use the median? Like the mode, it isnt affected by extreme scores (outliers). However, it is also less amenable to mathematical analysis than the mean. The mean is most peoples idea of the average. For a sample with n observations x1, x2, xn, the sample mean of X is written x and calculated as follows:
(The two notations are simply different ways of saying sum all of the observations and divide by the number of observations.) The mean of data set 2 above is 16.4. The population mean is written (but we dont normally measure this directly, as discussed earlier). The mean of a given sample may not match the population mean (measure ten tuna fish is the mean of your sample identical to the mean of all the tuna in the world, or have you caught tuna that are slightly bigger/smaller than average?) but on average, if you took a lot of samples, the average of all the sample means would be the same as the population mean. We say the sample mean is a good estimator of the population mean (in fact, its the best estimator). The mean has certain disadvantages. It is influenced strongly by extreme values (try changing just one datum to 10,000 in the data set above and recalculating the mean). There may well be no individual datum whose value is the same as the mean. Interpreting it requires some justification that the underlying data is being measured on an interval scale. However, it is eminently amenable to mathematical analysis and has certain other properties which make it the most widely-used measure of central tendency; for example, it includes information from every observation.
8 1.5 Measures of dispersion (variability) Knowing a measure of central tendency doesnt tell us all we need to know about a set of data. Two data sets can have the same mean but very different variability for example, {9,10,11} and {5,10,15} both have a mean of 10. Its often very important to have a measure of variability; there are several. Range This is simply the distance from the lowest to the highest point. The range of {9,10,11} is 2; the range of {5,10,15} is 10. The range is simple, but is easily distorted by extreme values. Interquartile range We talked about this when considering boxplots. It is the range of the middle 50% of observations; it is the distance between the first and third quartiles (the 25th and 75th percentiles). This is not distorted by extreme values; in fact, it may not pay enough attention to values at the edge of a distribution! The average deviation is approximately zero and therefore useless. We could measure how much each observation, xi, deviates from the mean, X , and take the average of each deviation. However, since some deviations will be positive and an equal number will be negative, the average deviation is about zero. The mean absolute deviation nobody uses. One stage further: we take the deviation from the mean for each observation, and take its absolute value (dropping any minus sign), i.e. |xi x |. We then take the mean of these values: | xi x | m.a.d. = n Though this one makes some sense, nobody uses it. Instead, they use the variance, the standard deviation, and the standard error of the mean. Well cover the last of these when we look at difference tests, but well consider the other two here. The variance IMPORTANT The population variance, 2 is worked out as follows. Take each deviation from the mean; square it (this eliminates negative values); sum all these together; divide by N, the number of observations (this gives the average squared deviation per observation). 2 ( xi ) 2 X = n However, since we rarely measure whole populations, we rarely use the population variance. Instead, we usually measure samples of the population (and therefore estimate the population variance from a sample variance). The sample variance, s2 is just the same except we divide by n1, not n. The formula on the far right is one thats mathematically identical but a bit easier to use in practice.
2 sX =
( xi x ) = n 1
( x i ) 2 n n 1
The standard deviation (SD) IMPORTANT The standard deviation (SD) is the square root of the variance (i.e. its the average deviation from the mean). So the population standard deviation, is
2 X = X = 2 ( xi x ) n
9 and the sample standard deviation, s is
sX = s2 = X ( xi x ) = n 1
( xi ) 2 n n 1
If the data are normally distributed (see below), 68% of observations fall within one SD of the mean, and 95% of cases fall within 2 SD. For example, if the age of a group of subjects is normally distributed, and the mean age is 45 with a standard deviation of 10, then 95% of the cases would be between 25 and 65. Some calculators refer to the population SD as n and the sample SD as n1. The coefficient of variation (CV) not often used The coefficient of variation is the standard deviation divided by the mean: s CV = X x The standard deviation often increases with the mean. For example, if you rate something on a scale with a range of 010 (perhaps with a mean of 5) then the (population) SD cant be bigger than 5. If your scale was 0100, with a mean of 50, your SD could be as high as 50. By dividing the SD by the mean, the CV becomes independent of this sort of thing. Discrete random variables, treated formally (A-Level Further Maths.) A random variable (RV) is a measurable or countable quantity that can take any of a range of values and which has a probability distribution associated with it, i.e. there is a means of giving the probability of the variable taking a particular value. If the values an RV can take are real numbers (i.e. an infinite number of possibilities) then the RV is said to be continuous; otherwise it is discrete. The probability that a discrete RV X has the value x is denoted P(x). We can then define the mean or expected value: E[ X ] = xP( x) and the variance:
Var[ X ] = E ( x E[ X ])2 = ( x E[ X ])2 P ( x ) = x P ( x) (E[ X ])2 = E X 2 (E[ X ])2 and the standard deviation: 2 = Var[ X ]
In our example, z = (5.5 4.25)/0.383 = 3.26. We have converted our potassium level of 5.5 mM to a Z score of 3.26. We can then use our tables of the standard normal distribution (youve got a copy) to find out how likely a Z score of 3.26 (or higher) is to have come from the standard normal distribution; this is answering the same question as how likely is a potassium level of 5.5 mM to have come from the distribution of plasma potassium in healthy people? Our tables tell us that we want the probability that Z 3.26, and thats 1 minus the probability that Z 3.26, which is 0.9994; so the answer to our question is 1 0.9994 = 0.0006. In other words, its highly unlikely that a plasma potassium of 5.5 mM would be found in a healthy population. Our patients probably not healthy better watch it, because if the potassium level goes too high, hell have a cardiac arrest. Z scores carry information on their own, because you automatically know what the mean and standard deviation are (0 and 1, respectively). Significant values of Z scores are extreme (big positive numbers or big negative numbers). Non-significant values of Z scores are close to zero. Sometimes, information is presented in a normalized form. For example, IQ scores are transformed to a distribution with a mean of 100 and an SD of 15; knowing this, you can work out what proportion of the population have an IQ over 120. (2) Assumptions of statistical tests Second, many statistical tests assume that the data being tested is normally distributed. We will return to this point later. (3) Confidence intervals Third, we can work out confidence intervals on any measurement we make. We saw an example above: we said that 95% of healthy people have a potassium concentration in the range 3.55.0 mM. That is the same as saying the 95% confidence interval (CI) for the healthy-person data is 3.55.0 mM. For any given set of data X, we can work out 95% confidence intervals as follows: 1. Calculate the mean, , and standard deviation,. 2. The Z scores that enclose 95% of the population are 1.96 and +1.96. Why? Well, our tables tell us that the area (probability) under the Z curve to the left of z = 1.96, written, (1.96), is 0.025. Similarly, they tell us that (+1.96) = 0.975. Therefore the area under the normal curve between z = 1.96 and z = +1.96 is (+1.96) (1.96) = 0.95. 3. Z = (X )/, therefore X = + Z. Therefore the X scores corresponding to Z scores of 1.96 are 1.96 , the 95% confidence intervals. For our potassium example, we had a mean of 4.25 and an SD of 0.383; therefore, our 95% confidence intervals are 4.25 (1.96 0. 383) and 4.25 + (1.96 0. 383), or 3.5 and 5.0. Try working out the 95% confidence intervals for IQ scores.
13 Deviations from normality Not everything you measure will be normally distributed. Heres a normal distribution and some non-normal distributions:
Figures illustrating bimodality and skew. Continuous random variables; probability density functions (A-Level Further Maths.) For a continuous random variable X, the probability of an exact value x occurring is zero, so we must work with the probability density function (p.d.f.), f(x). This is defined as
P (a x b) = f ( x )dx
f ( x)dx = 1
x : f ( x ) 0 ( x means for all values of x). The mean or expected value E[X] is defined as
E[ X ] = xf ( x )dx
The variance, Var[X] is given by
Var[ X ] = x 2 f ( x)dx ( E[ X ]) 2
The cumulative distribution function (c.d.f.), F(a), is given by
F (a) = f ( x)dx
i.e. F (a) = P ( x a) P (a x b) = F (b) F (a ) Definition of a normal distribution
2 This distribution is often abbreviated to N(, 2).
The standard normal distribution The standard normal distribution is N(0,1), i.e. a normal distribution in which = 0 and = 2 = 1. A standard normal random variable is frequently referred to as Z. The p.d.f. is frequently referred to as (z ) , and the c.d.f. as (z ). So
f ( x) =
( x ) e 2
( z) =
z2 e 2
( z ) = (t )dt
Transforming any normal distribution to the standard normal distribution As weve seen, If X is a normally-distributed random variable with mean and standard deviation , and Z is a standard normal random variable, then: x z=
14 1.7 Probability How much probability do you have to know? Not very much. You need to know what a probability is, what P(A) and P(A) mean, and preferably what P(B|A) means. If youre not keen on probability, you can skip the rest of this section and move on to the logic of null hypothesis testing. If youre a bit more capable mathematically, you may like to read this section probability is at the heart of statistical testing and youll be streaks ahead of many researchers if you have a solid grasp of probabilistic reasoning. Basic notation in probability
P(A) P (A) P( A B) P( A B) P ( B | A)
Basic laws of probability
probability of an event A probability of the event not-A, the opposite of A. This is variously written as A, ~A or A. probability of A or B (or both) happening (the notation is like set union: ) probability of A and B both happening (the notation is like set intersection: ) probability of B, given that A has already happened
If P(A) = 0, then A will never happen; if P(A) = 1, then A is certain to happen. Probabilities are always in this range: [1] 0 P(A) 1 Pick a card; there are 52 equally-likely outcomes; 13 are clubs, so P() = 13/52: number of ways in which A occurs [2] P(A) =
number of ways in which all equally likely events, including A, occur
Either A happens or A happens (I flip a coin, it either comes up heads or tails): P(A) + P(A) = 1 [3] P(A) = 1 P(A) Odds Odds are another way of expressing probability: theyre the ratio of P(A) to P(A). For example, Tiger Woods might be the favourite to win a tournament at odds of 9:5, often stated 9 to 5 on (= 9/5 = 1.8). This means that for every 14 times he plays the tournament, hed be expected to win 9 times and lose 5. If the event that Tiger Woods wins is A and his odds are x, we can write
or to write that in a shorter form:
P( A) = P( B j ) P( A | B j )
Therefore, from [9],
P( Bi | A) =
P ( Bi ) P ( A | Bi ) P( B j ) P( A | B j )
j =1 k
So suppose there are three assembly lines; lines X, Y and Z account for 50%, 30% and 20% of the total output. Quality control records show that line X produces 0.4% faulty cans, Y produces 0.6% faulty cans, and Z produces 1.2% faulty cans. Using Bayes theorem in the form of [10] will tell us that the chance our faulty can comes from line X is 0.32 (similarly, 0.29 for line Y and 0.39 for line Z).
16 Lets take a simple, fictional example in which only two things may happen. Q. The prevalence of a disease in the general population is 0.005 (0.5%). You have a blood test that detects the disease in 99% of cases: P(positive | disease) = 0.99. However, it also has a false-positive rate of 5%: P(positive | no disease) = 0.05. A patient of yours tests positive. What is the probability he has the disease? A. Wed like to find P(disease | positive). By [9],
P( dis | pos)
P( dis) P ( pos | dis ) P ( pos) P( dis) P( pos | dis) = P( dis) P( pos | dis) + P(dis) P( pos | dis) 0.005 0.99 = 0.005 0.99 + 0.995 0.05 =
= 0.09
So even though our test is pretty good and has a 99% true positive rate or sensitivity (a 1% false negative rate) and a 5% false positive rate (a 95% true negative rate or specificity), our positive-testing patient still only has a 9% chance of having the disease because its rare in the first place. Bayesian inference Suppose we have a hypothesis H. Initially, we believe it to be true with probability P(H); we therefore believe it to be false with probability P(H). We conduct an experiment that produces data D. We knew how likely D was to arise if H was true P(D|H) and we knew how likely D was to arise if H was false P(D|H). We can therefore use Bayes theorem [9] to update our view of the probability of H:
P( H ) P( D | H ) P( D) P ( H ) P( D | H ) P( H | D ) = P( H ) P( D | H ) + P (H ) P( D | H ) P( H | D ) =
This can be expressed another way (Abelson, 1995, p. 42): [11]
P( H | D) P( H ) P( D | H ) = P (H | D ) P (H ) P( D | H )
or posterior odds = prior odds relative likelihood
17 1.8 The logic of null hypothesis testing; interpreting p values We will come across a range of statistical tests. Most produce a test statistic and an associated p value; you will see these quoted in scientific journals time and time again (F2,47 = 10.7, p <.001 F3,18 = 4.52, p =.016 t60 = 1.96, p =.055). They all work on the same principle: that of null hypothesis testing. Null hypothesis testing approaches the questions we want to ask backwards. We typically obtain some data. Lets say we measure the weight of a hundred 18-yearold women who are either joggers (50) or non-joggers (50). We would like to know whether the mean weights of these two group differ. Obviously, its highly unlikely that the means will be exactly the same. Suppose the joggers are slightly lighter on average. How big a difference counts as significantly different? The conventional logic is as follows. Either the difference arises through chance, or there is some systematic difference (such as that jogging makes you thin, or that being thin encourages you to take up jogging). Our research hypothesis (sometimes written H1) is that the joggers are different from the non-joggers (that our two samples come from different underlying populations). Well invent a corresponding null hypothesis (sometimes written H0) that the observed differences arise purely through chance. Well then test the likelihood that our data could have been obtained if this null hypothesis were true. If this probability (the so-called p value) is very low, we will reject the null hypothesis chance processes dont appear to be a sufficient explanation for our data, so something systematic must be going on; well say that there is a significant difference between our two groups. If the p value isnt low enough, we will retain the null hypothesis (applying Occams razor because the null hypothesis is the simplest on offer) and say that the groups do not differ significantly. The exact meaning of a p value Lets say we run a statistical test to examine whether these two groups differ. It produces a test statistic (such as a t value; well consider how this works later) and a p value lets say 0.01. What does this mean? For shorthand, lets call D the event of obtaining a set of data, H be the research hypothesis, and H the null hypothesis. Correct: If the null hypothesis were true [if it were true that there were no systematic difference between the means in the populations from which the samples came], the probability that the observed means would have been as different as they were, or more different, is 0.01. This being strong grounds for doubting the viability of the null hypothesis, the null hypothesis is rejected. Correct: P(D | H) = 0.01. Wrong: The probability that the null hypothesis is true is 0.01. Wrong: The probability that the research hypothesis is false is 0.01. Wrong: P(H | D) = 0.01. Wrong: The probability that the null hypothesis is false is 0.99. Wrong: The probability that the research hypothesis is true is 0.99. Wrong: P(H | D) = 0.99.
Its easy to think that these are all saying the same thing, but theyre not. Compare (1) the probability of testing positive for a very rare disease if you have it, P(positive | diseased), with (2) the probability of having it if you test positive for it, P(diseased | positive). If you think the two should be the same, youre neglecting the base rates of the disease: typically, the second probability is less than the first, as its very unlikely for anybody to have a very rare disease, even those who test positive. Doctors intuitively get this wrong all the time. Substitute in P(rich | won the lottery) and P(won the lottery | rich) the first probability is much higher, because winning the lottery is so rare. Bayes theorem and Bayesian statistics The formal way to relate what we get from significance tests, P(data | hypothesis), to what we really want, P(hypothesis | data), is by using Bayes theorem (see section on probability). This is perhaps the simplest expression to use in this case:
posterior odds = prior odds relative likelihood. For example, suppose that a climatologist calculates that a 1C rise in temperature one summer had a probability of 0.01 of occurring by chance (p = 0.01). What does that tell us? It does not tell us that theres a 99% probability that it was due to the greenhouse effect. It does not even tell us that theres a 99% probability that it was not due to chance. The Bayesian approach would be this: suppose that reasonable people believed the odds were 2:1 in favour of the greenhouse hypothesis (H) before this new evidence was collected these are the prior odds. Now, weve been told that P(D|H) = 0.01. We need to know the probability that a 1C temperature rise would occur if the greenhouse hypothesis were true; that is, P(D|H). Suppose this is 0.03. Then the relative likelihood is 0.03/0.01 = 3. So the posterior odds are = 6 in favour of the greenhouse hypothesis; odds of 6:1 equate to P(H|D) = 6/7 = 0.86. Type I and Type II error; power Although p values speak for themselves in one sense, its very common for researchers to use them as a yes/no decision-making device. I wont debate the wisdom of this now, but this is how it works. A threshold probability, usually called (alpha), is chosen; typically, = 0.05. If a given p value is less than , the null hypothesis is rejected; if p , the null hypothesis is retained. You might see this logic described in papers like this: the two groups were significantly different (p < 0.05), or a significance level of = 0.05 was adopted throughout our study the two groups were significantly different. Obviously, if = 0.05, then there is a 0.05 (one in twenty) chance that an effect we label as significant could have arisen by chance if the null hypothesis was true. If this happens, and we accidentally decide that a effect was not attributable to chance when actually it did arise by chance, were said to have made a Type I error. The probability of making a Type I error is. Conversely, the probability of correctly not rejecting the null hypothesis when it is true is 1 . The opposite mistake is failing to reject the null hypothesis when it is false that is, ascribing your data to chance when it actually arose from a systematic effect. This is called a Type II error; its probability is labelled (beta). Conversely, the probability of correctly rejecting the null hypothesis when it is in fact false is 1 ; this is called the power of the test. If your power is 0.8, it means that you will detect genuine effects with p = 0.8. True state of the world H0 false Correct decision p = 1 = power Type II error p=
Decision Reject H0 Do not reject H0
H0 true Type I error p= Correct decision p=1
One-tailed and two-tailed tests Theres one other thing we should consider when we talk about and Type I error. Lets go back to the example of our joggers. Presumably our leading hypothesis is that joggers will be thinner than non-joggers, so we want to be able to detect if the mean weight of joggers is less than that of non-joggers, and we might choose = 0.05. But what will we do if the joggers actually weigh more? Well, this depends on what kind of test we decided on. If we were only interested in the difference between the groups if the joggers weighed less, we would use a one-tailed (directional) test, so that if there was less than a 5% probability that chance alone could have produced a difference in the direction we expect then we would reject the null hypothesis. But if we want to be able to detect a difference in either direction, we must use a two-tailed (nondirectional) test. In that case, we must allocate our 5%
19 to the two ways in which we could find a difference (joggers weigh more; joggers weigh less) so wed allocate 2.5% to each tail of the distribution. This is shown in the figure below (plotted on a normal distribution; you might like to think of it in terms of the joggers and the potassium examples). In general, unless you would genuinely not be interested in both possible outcomes (quite a rare situation), you should use a two-tailed test. What you must not do is to run a one-tailed test ( = 0.05), find a non-significant result, then look at the data, realize the difference is in the other direction to the one you predicted, and decide then to do a two-tailed test ( = 0.05) because what you have actually done is to allocate 5% to one tail, then allocate another 2.5% to the other tail, meaning that you have actually run a sort of asymmetric two-tailed test with a total of 0.075 (7.5%). Decide what test you want in advance of analysing the data.
One-tailed and two-tailed tests. The danger of running multiple significance tests Every time you run a test, if the null hypothesis is true, you run the risk of making a Type I error with probability. So if you run n tests, you have n chances to make a Type I error. Whats the probability that you dont make any Type I errors when you run n tests? Well, the probability that you dont make a Type I error on each test is 1 , so the probability you make no Type I errors when you run n tests is (1 )n. So the probability that you make at least one Type I error when you run n tests when the null hypothesis is true is 1 (1 )n. If you set = 0.05, you must expect on average one in every 20 tests to come up significant when it isnt (Type I error) if the null hypothesis is in fact true. If you run 20 tests and the null hypothesis is true, the probability of making at least one Type I error is 1 (1 0.05)20 = 0.64. This is why running lots of tests willy-nilly is a Bad Idea eventually, something will turn up significant, but that doesnt mean it really is. This doesnt mean that 5% of all your significant results are wrong. You can only make Type I errors when the null hypothesis is true! In practice, on some occasions the null hypothesis will be false, so we cant make a Type I error. Therefore, something less than 5% of our significant results will be Type I errors; is the maximum Type I error rate. Is there a difference between p = 0.04 and p = 0.0001? Yes. Whether you look on p values as expressing the degree of confidence with which you reject the null hypothesis, or as information you can use to update your opinions of the world in Bayesian fashion, p values have real meaning. Some people will argue that as long as p < you neednt report the actual p value, but this approach takes information away from the reader. p = 0.06
20 What happens if you run a well-designed experiment in which you give a treatment to one group of people and not another, measure some aspect of their performance, test for a difference between your groups and get p = 0.06? You could do one of several things. (1) Re-run your experiment with more subjects; perhaps you did not have enough statistical power to detect the size of effect that your treatment produced. You might have been spared this embarrassment if you had tried to calculate your statistical power in advance; you might then have realised your experiment was under-powered in the first place. (2) Report your experiment as showing a trend towards an effect; its not like p = 0.04 is somehow magically better than p = 0.06, after all. (3) Use = 0.1 rather than = 0.05. However, not only will journal editors definitely be upset with this (for no real reason theres nothing magical about = 0.05), but it is highly dubious to change your only after youve run your experiment after all, youre only doing it to shore up a not-quite-significant result, and youre therefore distorting the results. You should have chosen in advance. Similarly, it is very dubious to add subjects to your original experiment until it reaches significance youre only doing this because your original data was near significance and you want it to be significant. If you had a compelling reason to want your treatment to have no effect, you wouldnt be doing this so youre biasing the experiment by this kind of post-hoc fiddling. What does not significant mean? What happens when you want to prove that a hypothesis is not true? Suppose your contention is that jogging doesnt affect body weight; you take two identical groups of people, set half of them jogging for a couple of months while the rest eat pies, and measure their weights. You find no difference between the groups (p = 0.12). What does this mean? It means that you have failed to reject the null hypothesis there is a fair chance (0.12) that your observed difference could have arisen by chance alone. It does not mean that you have proven the null hypothesis. Take an extreme example: your null hypothesis is that all people have two arms. Just because the next 5,000 people you meet all have two arms (failure to reject the null hypothesis) does not mean that you have proved the null hypothesis. You can do two things when you fail to reject the null hypothesis: (1) view it as an inconclusive result, or (2) act as if the null hypothesis were true until further evidence comes along. Really, you should consider your level of and to meet the needs of your study. If you want to avoid Type I errors (e.g. telling someone they have an ulcer when they dont), set low. If you want to avoid Type II errors (e.g. telling them to go home and rest when theyre about to die from a gastric haemorrhage), set higher. The other thing you can do when youre designing an experiment is to make sure the power is high enough to detect effects with a reasonable probability such as by using enough subjects. If you take two people and make one jog, youll never find a significant difference between the jogging and non-jogging groups, but that doesnt mean people should believe you when you say that jogging doesnt reduce weight. If you used half a million people and still found no effect, your study might command more attention. A statistical fallacy to avoid: A differs from C, B doesnt differ from C If you test three groups and find that A is significantly different from C, but B is not significantly different from C, do not conclude that A is significantly different from B. To see why, imagine that A is smaller than B, and B is smaller than C. Then we might find a difference between A and C (p = 0.04) and no difference between B and C (p = 0.06) but the p values are just on either side of our threshold of 0.05 and A and B might be nearly the same! Making this conceptual mistake is quite common. Similarly, just because A isnt significantly different from B, and B isnt significantly different from C, doesnt mean that A isnt significantly different from C.

Left: Frequency histogram. The x axis (abscissa) shows values or categories; the y axis (ordinate) shows the frequency with which an observation fell into the appropriate category. This histogram looks rather noisy because there are too many categories. Right: Histogram with data grouped in more sensible categories. The same data as on the left. Each category (on the x axis) represents an interval. In this example, the value printed on the x axis is the midpoint of the interval; thus, 45 denotes those values falling into the range 42.547.5 (this is just done to save a bit of space). Choose your own interval size to make the histogram look sensible n categories is often a good choice when then are n observations. If you ever choose to make the intervals not all equal in width (you might call this asking for trouble), you should make the area of each bar proportional to the number of observations, rather than the height.
1.4. Measures of central tendency taking the average
Data set 10
Lets take a set of 15 numbers (above). Wheres the middle or the average? There are several ways we might answer this question. The mode is the value that occurs most commonly in this case, 18. If we wanted to be formal, we could say that these data are from a variable we measured called X. We could therefore say that Mo(X) = 18. If there are two modes and theyre in some sense adjacent, we might Mo1 + Mo 2 use the mean of the two,. If theyre far apart, then the distribution is 2 bimodal and wed report both modes. Why use the mode? It can be applied to nominal (categorical) data. It isnt affected by extreme scores. It may be the most meaningful; if you want to buy a job-lot of shoes that are all the same size, you should buy shoes that are the modal size of the population youre going to sell them to. By definition, for an observation xi taken at random from a variable X, P(xi = mode) > P(xi = any other score). Why might you not use it? If your categories are not particularly meaningful, nor will be your mode. It is also less amenable to mathematical analysis than the mean. The median is the value thats in the middle if we lined all the values up in order. (More precisely, its the value at or below which 50% of the scores fall when the data are arranged in numerical order, as below.) Here, its 17. This is written Med(X) = 17, or sometimes ~ = 17. x 17
Basic notation in probability
P ( A) P (A) P( A B) P( A B) P ( B | A)
probability of an event A probability of the event not-A, the opposite of A. This is variously written as A, ~A or A. probability of A or B (or both) happening (the notation is like set union: ). Sometimes written P(A or B). probability of A and B both happening (the notation is like set intersection: ). Sometimes written P(A, B) or P(A and B). probability of B, given that A has already happened, known as the conditional probability of B given that A has already happened
Basic laws of probability
If P(A) = 0, then A will never happen (is impossible); if P(A) = 1, then A is certain to happen. Probabilities are always in this range: 0 P(A) 1 [1]
Pick a card; there are 52 equally-likely outcomes; 13 are clubs, so P() = 13/52: P(A) =
number of ways in which A occurs number of ways in which all equally likely events, including A, occur
Either A happens or A happens (I flip a coin, it either comes up heads or tails): P(A) + P(A) = 1 P(A) = 1 P(A)
Odds are another way of expressing probability: theyre the ratio of P(A) to P(A). For example, Tiger Woods might be the favourite to win a tournament at odds of 9:5, often stated 9 to 5 on (= 9/5 = 1.8). This means that for every 14 times he plays the tournament, hed be expected to win 9 times and lose 5. If the event that Tiger Woods wins is A and his odds are x, we can write
P( A) =x P (A)
Therefore
1 P ( A) 1 P( A) = x xP( A) = P ( A) x = (1 + x) P ( A) = x P( A) x 1 P ( A) x P ( A) = 1+ x
So in the case of Tiger Woods, since x = 1.8, P(A) = 0.64. In general odds probability = 1 + odds
If the odds on a player were quoted as 3 to 1 against, the odds on them losing are 3:1 so the odds on them winning are 1:3 (i.e. probability of them winning is = 0.25).
The rest of the basic laws of probability
If A and B are mutually exclusive events ( P ( A B ) = 0 ) then
P( A B ) = P( A) + P ( B)
In the more general case,
P( A B) = P( A) + P( B) P( A B)
If A and B are independent events that is, the fact that A has happened doesnt affect the likelihood that B will happen, and vice versa: P ( B ) = P ( B | A) and
P ( A) = P ( A | B ) then
P( A B) = P( A) P( B)
If I toss a fair coin and roll a fair die, the probability of getting a six and a head is 1 /6 1/2 = 1/12. The probability of getting a six or a head or both is 1/6 + 1/2 1/12 = 7/12. In the more general case:
P( A B) = P( A) P( B | A)
If I have a bag that initially contains 4 red marbles and 6 blue marbles, and I withdraw marbles one by one, the probability of picking a red marble first (event A) and a blue marble second (event B) is 4/10 6/9 = 4/15.
2 sY = 2
and its square root, s residual (sometimes written sY X to show that Y has been predicted from X), is called the standard error of the estimate (its like a standard deviation the square root of the variance of the errors is the standard deviation of the errors, abbreviated to the standard error). We can also express the residual variance and its square root like this:
2 sresidual = MSresidual = 2 SSresidual SSY (1 r 2 ) sY (n 1)(1 r 2 ) = = = sY (1 radj ) df residual n2 n2
sresidual = sY (1 r 2 )
n = sY 1 radj n2
Actually, its generally easiest to do the calculations in terms of the sums of squares, not variances, because then we dont have to worry about all these degree-of-
2: Correlation and regression freedom corrections (r and radj and this n 1, n 2 business) you cant add two variances together unless theyre based on the same number of degrees of freedom, but you can add sums of squares together any way you like and we find that SS residual = SSY (1 r 2 ) r2 = SSY SS residual SSY
SS Y = SS Y (r 2 ) + SS residual In other words, the total variability in Y is made up of a component thats related to X ( SSY r 2 = SSY SS residual , which we can also write as SSY , the variability in the predicted value of Y) and a component thats residual error (SSresidual). Translated to our cholesterol example, people vary in their cholesterol levels (SSX), they vary in their heart attack risk (SSY), a certain amount of the variability in their heart attack risk is predictable from their cholesterol ( SSY ), and a certain amount of variability is left over after youve made that prediction (SSresidual). Or, SSY = SY + SS residual where r2 =
2.6. Advanced real-world topics
SSY SSY
As with all the wavy-line sections, this section certainly isnt intended to be learned! Its for use with real-world problems that you may encounter. You will not be tested on any of this in the exam.
Whats regression to the mean?
Something related to regression, but quite interesting. It was discovered by Galton in 1886. He measured the heights of lots of families, and calculated the mid-parent height (the average of the mothers and the fathers height) call it X and the heights of their adult children call it Y. He found that the average mid-parent height was x = 68.2 inches; so was the average height of the children ( y = 68.2 inches). Now, consider those parents with a mid-parent height of 7071 inches: the mean height of their children was 69.5 inches. That is, the height of these children (69.5) was closer to the mean of all the children ( y = 68.2) than the height of the parents (7071) was to the mean of all the parents ( x = 68.2). But this wasnt a genetic phenomenon, it was a statistical phenomenon, and it worked backwards: if you took children with a height of 7071 inches, the mean mid-parent height of their parents was 69.0 inches. This is called regression to the mean. Why does it happen? Suppose we have the variables X and Y, with standard deviations sX and sY, and the correlation between them is r. Weve previously seen that cov XY rXY = s X sY and the regression slope b is cov XY b= s2 X Therefore, s slope = b = r Y sX So a change of one standard deviation in X is associated with a change of r standard deviations in Y. And we know the regression line always goes through the point at the means of both X and Y that is, the point { x , y }. Therefore, unless there is perfect correlation (unless r = 1), the predicted value of Y is always fewer standard deviations from its mean than X is from its mean. Remember that predicting Y from
X is different from predicting X from Y, unless the two are perfectly correlated? This is another way of saying the same thing. Examples of regression to the mean (from Bland & Altman, 1994) If we are trying to treat high blood pressure, we might measure blood pressure at time 1, then treated, and then measured again at time 2. We might see that blood pressure goes down most in those who had the highest blood pressure at time 1, and we might interpret this as an effect of the treatment. Wed be wrong; this is regression to the mean. It would happen even if the treatment had no effect. The two sets of observations (time 1, time 2) will never be perfectly correlated (because of measurement error and biological variation); r < 1. So if the difference between our high blood pressure subgroup and the whole population was q at time 1, it will be rq at time 2 i.e. the difference from the population mean will have shrunk. We should have compared our treated group to a randomized control group. In one study, people reported their own weight and had their weight measured objectively. A regression was used to predict reported weight from measured weight; the regression slope was less than 1. This might lead to you interpret that very fat people underestimate their weight when they report it, and very thin people overestimate it. But wed never have expected perfect correlation. All this might be is regression to the mean and if wed predicted measured weight from reported weight, wed also have a slope less than one, from which we might have concluded the opposite: that very fat people overestimated their weights and very thin people underestimated them. When scientific papers are submitted to journals, referees criticize them and editors select the best ones to publish on the basis of the referees reports. Because referees judgements always contain some error, they cannot be perfectly correlated with any measure of the true quality of the paper. Therefore, because of regression to the mean, the average quality of the papers that the editor accepts will be less than he thinks, and the average quality of those rejected will be higher than he thinks.
Partial correlation dealing with the effects of a third variable
Sometimes we are interested in the relationship between two variables and know that a third variable is also influencing the situation. Imagine we examine the correlation between IQ (X) and income (Y), and find it to be positive, but we suspect that one reason that higher IQ predicts higher income is because people with higher IQs are more likely to get into university, stay for higher degrees, and so on and its the degree that gets you the higher income, not your IQ itself. So is there any further relationship between IQ and income once youve taken into account this effect of studying for longer? One way of investigating this is to look at the correlation between IQ (X) and (say) number of years of study (Z), and the correlation between income (Y) and number of years of study (Z). We can then calculate the partial correlation between IQ (X) and income (Y) having taken account of the relationship of each of these to number of years of study. We call this partialling out the effects of number of years of study. We term the partial correlation coefficient between X and Y with the effects of Z partialled out rxy.z, and calculate it like this: rxy rxz ryz rxy. z = (1 rxz )(1 ryz ) Lets use some fictional numbers to illustrate this: suppose that the correlation between IQ and income is rxy = 0.6, the correlation between IQ and years of study is rxz = 0.8, and the correlation between income and years of study is ryz = 0.7. Then the correlation between IQ and income having partialled out the effect of years of study 2 would be only rxy.z = 0.09. This would mean that rxy.z = 0.0081, so only 0.8% of the variability in income is predictable from IQ once youve taken account of the num2 ber of years of study, even though rxy = 0.36 = 36% of the variability in income is
Weve already discussed the differences between one- and two-tailed tests (p. 23). Weve already talked about making multiple comparisons between groups (p. 24).
Paired and unpaired tests (related and unrelated data)
When we come to look at the difference between two samples of data, the samples can be related or unrelated. Suppose we want to compare the speed with which people can rotate figures mentally in two conditions: on land and underwater. (1) We could take a group of landlubbers and a group of divers, and compare them. There would be no particular relationship between individual data points from the land sample and the underwater sample. We would use statistical methods that are described as unrelated, unpaired, or between-subjects. (2) Alternatively, we could measure the same group of people in two conditions, on land and underwater. In this situation, there is a relationship between one subjects score on land and the same subjects score underwater they are likely to be more similar than they would be by chance alone, because they come from the same person. Our statistical methods must reflect this fact; the techniques we would use are described as related, paired, or within-subjects. It is absolutely not acceptable to fail to take account of relationships like this between data. A classic example of this sort of error is something called pseudoreplication. Suppose you test Alice, Bob, and Celia on land, and Eric, Frankie, and Greg underwater. You obtain 6 observations, n = 3 for each group. Your groups are not related. So far, so good. But suppose you want more than 6 observations; you might measure each subject three times. This would give you observations A1, A2, A3, B1 on land, and E1, E2, E3, D1 underwater. The error is to analyse this as if you had 18 observations (n = 9 for each group). This is wrong, because A1, A2 and A3 are all related more so than A1 and B1, or A1 and E1. We will not cover the analytical techniques required for this sort of situation, where we have multiple variables (in this case, land/underwater as a between-subjects variable, observation 1/observation 2/observation 3 as a within-subjects variable) thats covered in the Part II course. If you have data like these, the simplest thing is to obtain some sort of overall score for each subject (e.g. take Alices overall score to be the mean of A1, A2, and A3) and analyse those. If you have data from only one subject, then you can consider the data to be unrelated for the purposes of analysis, but your conclusions only apply to that subject. For example, if you measured my ability to remember sequences of digits (my digit span) ten times when Im on dry land and ten times when Im underwater, you could treat the data as unrelated they have no relationship to each other beyond the fact that they come from the same subject, and thats part of your analytical context anyway. You would have a sample (n = 10) of my dry-land digit span, and a sample (n = 10) of my underwater digit span. If the dry-land scores were significantly higher than the underwater scores, you could conclude that my digit span was better
The essence of a two-sample t test. We have two samples with means x1 and x 2. If the distance (difference) between means ( x2 x1 ) is big enough, we say that the two samples are significantly different (which is to say, the two samples come from underlying populations whose means are different). We measure the distance between the means somehow in terms of the standard deviations of the samples, s1 and s2.
Overview If we have two independent (unrelated) groups, X1 and X2, with equal variances ( s1 = s 2 ), we can ask if they are significantly different from each other. The null hypothesis is that the two underlying populations have the same mean (1 = 2). We can calculate a t statistic, which has the same general form as before: its the differ-
ence between means divided by the standard error of that difference, and this time it has (n1 + n2 2) degrees of freedom. x x t n1 + n= s2 s2 p p + n1 n2 where s 2 (called the pooled variance) is given by p
(n1 1) s1 + (n2 1) s 2 n1 + nIf the groups are of the same size (n1 = n2 = n), then the formula becomes simpler: x x2 t 2n 2 = 2 s1 + s 2 n This test assumes that the two samples come from populations with equal variances 2 ( 12 = 2 ), whether or not n1 = n2. If this assumption is violated, we must use the unequal variances version of this test (see below).
s2 = p
Example
Suppose we collect young horses and assign them to one of two groups at random. We feed one group (n = 10) FastDope, a drug that we suspect of having performance-enhancing properties. The other group (n = 10) are given a placebo. They are then timed running along a 1 km racetrack and their speed is calculated in ms1. The null hypothesis is that the speeds of the drugged and undrugged groups do not differ. We find that the speeds of the drugged group (group 1) are {14.6, 12.6, 12.2, 15.0, 12.5, 12.1, 13.1, 12.2, 14.1, 14.2} and the speeds of the placebo group (group 2) are {10.8, 11.9, 9.7, 9.3, 12.0, 9.6, 10.7, 8.9, 12.5, 12.0}. Since n1 = n2, we can use the simpler of the two formulae for t, and can therefore calculate 2.52 x x2 13.26 10.74 = = = 4.64 t n1 +n= t18 = 2 0.543 s +s (1.11) + (1.31)
n 10 For 18 df, the critical value of t for a two-tailed = 0.05 is 2.101. Since our t statistic exceeds this critical value, we reject the null hypothesis; the drugged group ran faster.
How did we derive this t test? If youre interested, see page 59.
If there is a significant difference, which way round is it?
The F test
The F statistic is a ratio of two variances. If the two variances are equal, F = 1. If theyre not, F 1. How much more/less than 1 does it need to be before we declare the difference significant? We find that from tables of critical values of F (see p. 126). The F distribution is based on two numbers for the degrees of freedom: one for the numerator, and one for the denominator. We might write this as Fa,b where a is the number of df for the numerator and b is the number of df for the denominator:
Fn1 1,n=
In practice, tables of F dont give critical values for F < 1; they only give critical values for F > 1 (if you had F < 1, you could always take the reciprocal, 1/F, and test that). So to make sure that our F > 1, we always put the biggest variance on the top (numerator) of the ratio, and the smallest variance on the bottom (denominator). So if the variances are different, the F statistic will be bigger than 1. In other words:
Fn1 1,n= Fn2 1, n=
2 sss2
if s1 > s 2
if s 2 > ss1 So you can run an F test on your data before choosing a t test; if its significant (especially if n1 n2), use the unequal variances t test; if it isnt, use the equal variances t test.
One more thing, though if you want to test whether the variances are different with = 0.05 (two-tailed), you must run the F test itself with = 0.025. If you run the test with tabled values for = 0.05 (one-tailed), your actual two-tailed will be 0.1. Why? Well, asking whether the variances are different without specifying the direction of the difference is a two-tailed test. The critical values of F, however, are for a one-tailed test (because we only test significance when F > 1, rather than F < 1). Youve forced it to become a two-tailed test by calculating F in such a way that that F > 1; you must therefore double the stated one-tailed to get the two-tailed. The Tables & Formulae (p. 126) give both the one-tailed and two-tailed equivalent values of ; you should use the two-tailed equivalent for testing whether two variances are different in this context.
One-tailed, two-tailed notes only for Part II students using this for revision
When using F tests as part of analysis of variance (ANOVA), covered in Part II, use the one-tailed critical values of F. Why? Because ANOVA compares a measure of effect size (MSeffect) with a measure of variability (MSerror): F = MSeffect/MSerror. MSeffect gets bigger no matter what the direction of the effect. We are only interested in whether an effect is bigger than wed expect by chance; given the assumptions of ANOVA, it would not be sensibly possible to get an effect smaller than wed expect by chance alone. So we use the one-tailed critical values of F. It is in this (onetailed) sense that F = t2 as discussed below.
Relationship between the F test and the t test
The t test is actually a special case of the F test:
2 F1,k = t k and t k = F1,k
where k is the number of degrees of freedom. In other words, a t test on k df is directly equivalent to an F test on 1 and k df. The difference that the t distribution is symmetrical about zero, since it deals with the differences between things, so values of t can be positive or negative. The F test deals with squared values, which are always positive, so F ratios are always positive (see Keppel, 1991, p. 121).
3.7. Assumptions of the t test
For any t test:
Youre testing hypotheses about the mean, which only makes sense if the mean is meaningful (it may not be if the measurement scale you used wasnt an interval or ratio scale see p. 7). The maths behind the t test assumes that the underlying populations of the scores or difference scores, for the paired t test are normally distributed. (Large samples help to make up for lack of normality, but see below for more explanation. Casual rule of thumb: if n > 15 and the data dont look too weird, its probably OK to use a t test; if n > 30, its usually fine.) If this assumption is violated, you cant use any form of t test.
For two independent samples, to use the equal-variance t test, we assume
2 The two samples come from populations with equal variances ( 12 = 2 ), whether or not n1 = n2.
The t test is fairly robust to violations of this assumption if n1 = n2, but not if n1 n2. The case where the variances are unequal and so are the ns is the only situation in which the t test becomes liberal (see p. 46) particularly if the sample with the smaller n has the larger variance (Boneau, 1960).
More detailed explanation of the normality assumption
The assumption about normally-distributed data was stated above casually. This is the full explanation; Ill use scores to refer to the numbers being analysed. 1. The logic behind the t test doesnt make assumptions about the distribution of the scores per se, but it does assume that the means taken from samples of size n are themselves normally distributed (see p. 57). This is always true if the scores themselves are normally distributed, but is also true if the scores are not themselves normally distributed but the sample size is large (e.g. >15 if the scores are not too far from a normal distribution; >30 if the scores are very nonnormal) this is a consequence of something called the Central Limit Theorem (see p. 57). (See also Frank & Althoen, 1994, pp. 388-390, 401-406.) For the two-sample t test, simply read difference scores instead of scores. 2. The t test also makes assumptions about the distribution of the variance of the samples it assumes that they have a 2 distribution (see pp. 58 and 81), which is true if the underlying scores are normally distributed, but may not be true if theyre not (e.g. highly skewed; see Howell, 1997, p. 177, and core.ecu.edu/psyc/wuenschk/StatHelp/t-CLM.txt). 3. There is an additional reason for wanting the scores themselves to be normally distributed. If they arent, the sample mean and the sample variance (or standard deviation, if you prefer) are not independent. For example, consider a positively skewed set of scores (see p. 17). Because low scores are generally closer together than high ones, samples with low means will tend to have lower variances than samples with high means. This skew can make the t test less powerful. The two-sample t test can also give distorted results if the two samples have different skew (see also Howell, 1997, p. 201).
For the null hypothesis, p(positive sign) = p(negative sign) = 0.5. So if the number of non-zero difference scores n > 10, and x is the number of difference scores of one sign (e.g. positive), we can use the normal approximation to the binomial distribution to get a quick answer. The mean of this distribution is np = n/2, and the variance is npq = n/4. So we can calculate a Z score: n 2 x n x = z= n n 4
and test that Z score in the usual way (see p. 15).
Comparing the sign test to the Wilcoxon matched-pairs signed-rank test
From our discussion of the Wilcoxon matched-pairs signed-rank test (p. 67), youll see that the sign test is pretty similar in overall logic except that the sign test throws away even more information about the distribution (it doesnt care about the magnitudes of the difference scores at all, just their signs). You pay a price in power,
but gain generality; the sign test is a nonparametric test that can be used with ordinal or even categorical data.
5.5. Supplementary material: the multinomial distribution
If we want to consider more than two alternatives for each trial, we need to use the multinomial distribution. Let there be n trials and k alternatives for each trial, numbered from 1 to k, each with the probabilities p1, p2, pk. Then the probability of obtaining exactly X1 outcomes of event1, X2 outcomes of event2, and Xk outcomes of eventk is given by n! X p X 1 p X 2 pk k p( X 1 , X 2 , , X k ) = X 1! X 2 ! X k ! An example: if we had a die with two black sides, three red sides, and one white side, then for each trial p(black) = 2/6, p(red) = 3/6, and p(white) = 1/6. So if we roll the die 10 times, then the probability of obtaining exactly 4 blacks, 5 reds, and 1 white is
p(4,5,1) =
10! 2 3 1 = 0.081 4!5!1! 6 6 6
5.6. Supplementary material: the distribution; an outline of deriving the test; other points The 2 distribution The 2 probability density functions are shown in the figure below; you can see that the shape of the distribution depends on the number of degrees of freedom, k. It is a positively skewed distribution, especially when k is small. The distribution is often 2 written as df , or sometimes 2(df). To obtain critical values of 2, we need to know the value of 2 above which (say) 5% of the area falls. In practice, well get this from tables (p. 130) or a computer. Relationship between 2 and the normal distribution If we have a normal random variable N(, 2), we can sample one value x from it, convert it to a standard normal variable z, and square it:
b) Design an experiment to determine whether the left or the right ear is more accurate in recognising words presented, one to each ear, simultaneously. What particular problems does such an experiment encounter? c) Suppose you are interested in determining the mechanisms that are responsible for retroactive and proactive interference. Design an experiment that might help elucidate these mechanisms.
Q7 (2003, Paper 1). In an initial experiment to measure the reaction times for discriminating positive affect faces (expressing happiness) from negative affect faces (expressing sadness) the following ten reaction times from ten subjects were recorded in milliseconds (msec): 578 618
Within what interval is there a 95% probability that the true population mean lies (assuming that the 10 observations have been sampled randomly from a normally distributed population)? In a subsequent experiment, 12 subjects were randomly assigned to two groups. One group was given a caffeine tablet (condition A) while the other group was given a placebo a sugar pill with no physiological effect (condition B). Reaction times were then taken for subjects in both groups on the positive affect versus negative affect face discrimination test. These are given below. Reaction time score (msec) 630 654
Condition A: Condition B:
643 586
507 593
Is there a significant difference between the two groups?
Q8 (2003, Paper 2). Answer one of the following three questions on Experimental Design. State what data you would collect and what statistical procedure(s) you would use, and give the reasons for your choices. (1) Design an experiment to demonstrate blocking in humans. Indicate how you would investigate the processes responsible for the effect in further experiments. (2) It has been claimed that aspirin adversely affects the operation of the active mechanism in the cochlea. Design an experiment to test this claim. (3) Design an experiment to investigate whether people who are blind perform a mental imagery task. Be sure to choose a task that requires visual mental imagery and cannot be performed using semantic knowledge.
Q9 (2004, Paper 1). An experimenter is interested in schizotypy, a personality type thought to predispose to the development of schizophrenia. One aspect of schizotypy is magical ideation, the tendency of a person to believe in magical phenomena. Schizophrenia has been suggested to involve abnormal development of the left/right asymmetry of the cerebral hemispheres, and abnormalities of the dopaminergic neurotransmitter system. High dopamine levels in one side of the brain are known to make animals turn to the opposite side, and consequently, the experimenter wonders if people with magical ideation might have higher dopamine levels in the right-hand side of their brain, which might result in a tendency to turn left more often than would be expected by chance. She selects a group of 16 subjects at random and measures their Magical Ideation scores using standard techniques. She then equips her subjects with devices to measure their turning behaviour. She gives each subject one of either dopamine agonist (increases levels of dopamine [actually, mimics dopamine!]) or placebo (in a
Q1 (2000 Paper 1)
Hidden assumptions or, waffle you can skip in order to get to the answer.
Before we answer the question, we must decide what kind of techniques to use. Should we use parametric or nonparametric techniques? Theres no necessarily right answer. We could summarize the theoretical arguments like this:
The purist might say that the Beck Depression Inventory (BDI) is an ordinal rating scale; higher scores indicate more depression somehow, but the differences between two ratings are not quantitatively meaningful (i.e. the difference between 0 and 10 is not necessarily the same as between 20 and 30) like our example on p. 7 of Army ranks (lieutenant captain major, etc.). This approach would mean that parametric techniques would not be applicable and we should use a nonparametric analysis. Alternatively, we might treat the BDI as an interval scale for the purposes of analysis, which would allow us to use statistics such as the mean and standard deviation, and appropriate parametric tests (assuming their other assumptions are met). There are several rationales for this, discussed for example by Velleman & Wilkinson (1993), who argue that the meaning of a scale is largely what you make of it so if youre happy to speak about a 3-point change in a BDI score then you should be happy to use parametric techniques here. As Francis Bacon (1620) said, truth emerges more readily from error than from confusion. And while common doesnt imply correct, it is certainly common to analyse rating scales with parametric techniques the first paper on depression I looked at for this purpose analysed the Hamilton Depression Rating Scale using analysis of variance, which is a parametric technique (Mayberg et al., 2000); the second I found did the same with BDI scores (Allen et al., 1998).
Being pragmatic, are there other barriers to using one or other technique? Part (b), comparing men and women before treatment, could be approached parametrically with an unpaired t test or non-parametrically with a MannWhitney U test, so no problem there. Part (c) could be approached parametrically with Pearsons r or non-parametrically with Spearmans rs correlation, so no problem there. But part (a) asks which treatment is more effective so we have to look at some difference between pre-treatment and post-treatment scores for each subject. If we take the differences (e.g. Post minus Pre scores) and analyse these, in whatever way, we have already made the assumption of an interval scale of measurement simply by calculating those differences, so we might as well use parametric techniques unless their other assumptions are violated. If we dont do this, the only information available is whether a subject improved or not. Since 15/15 Cogth patients improved, and 14/15 Couns patients improved, were never going to find a significant difference with some form of categorical test (and a 2 test wont be valid since there will be expected values <5 and highly uneven across rows/columns, which violates the assumption of normality).
x = 11 (5) + 29 (4) + + = x
( x ) 2 n = n 1
(174) = 1.515 999
(b) The second phase is to determine the expected values for a normally-distributed variable with a mean of 0.174 and an SD of 1.515 if we measured them in a categorical way. It would be reasonable to suppose that the 0D category includes people measuring 0.5 to +0.5D; the 1D category includes people from +0.5 to +1.5D, and so on. So we can calculate Z scores for each boundary between dioptre categories, work out the proportion we expect to find in each category, and multiple by 1000 to get the number of observations we expect. That gives us expected values like this:
5D 1.93 4D 11.90 3D 48.27 2D 1D 0D +1D +2D +3D 31.10 +4D 6.62 +5D 0.92 128.35 224.13 257.05 193.63 95.79
(c) Finally, we run a 2 test with these as the expected values. Since we have made the observed and expected values agree as to n, x , and sX, we have lost three degrees of freedom (rather than a conventional 2 test, in which we make them agree only as to n, and lose only one df). So df = k 3 (where k is the number of 2 categories) = 8. We find that 8 = 534.4, p = 2., which we could call very significant indeed! The lens powers do deviate significantly from a normal distribution. Out of interest, here are the histograms of the observed values and corresponding expected values (for the same n) based on a normal distribution having the same mean and standard deviation. (Trivia: the posh name for this type of deviation from normality too peaked in the middle is leptokurtic.)
number of observations
observed expected (if normal)
0 -5 -4 -3 -2 -5
power (D)
9: Experimental design tips and glossary
9. Experimental design tips and glossary
9.1. About the experimental design questions
Theres no right answer to an experimental design question. Good experimental design requires that you understand the question well if you dont know what retroactive and proactive interference are, for example, youll have difficulty with Q3(c). But in general, what things should you think about when designing experiments? One vital thing is to establish what youll measure. What numbers will you actually write down? For example, for exam question Q3(a), will you measure baby looking times? Baby approach distances? Proportion of occasions on which the baby approaches? Another example: exam Q8 (part 2) asks about investigating whether aspirin actively affects the operation of the active mechanism in the cochlea. Will you measure detection thresholds at different frequencies? Will you measure frequency selectivity (and how)? Will you measure otoacoustic emissions? To choose, you need to know what aspects of hearing actually depend on the active mechanism of the cochlea obviously, not all aspects of hearing do. You also need to determine your subjects. Will you use people? Owls? Rats? Dissected cochleas? If you use humans, who? Psychology undergraduates? People recruited from newspaper adverts? Will you impose restrictions on the age or sex of your subjects? Will you exclude them if they have a history of mental illness, head injury, or ototoxic antibiotic use? If you think the specifics are not important, you might simply say that youd recruit subjects with normal hearing. Sometimes your experimental technique influences who you recruit: you cant give PET scans to young women (potential egg damage from radiation), you cant put people with implanted magnetic metal into an MRI scanner (it accelerates hard and tends to kill), and you might be careful before giving a drug that makes people unhappy to those with a history of severe depression. Will you use a correlative or a causal technique? In our aspirin example, will you seek out people who use lots of aspirin and compare them to people who dont? If so, do you need to match these groups somehow? What might confound your interpretation of any differences? Or will you use a causal technique, in which you give aspirin to subjects in some fashion and compare them to people who didnt receive aspirin? Many psychological experiments are based on simple intervention studies, where treatments are controlled by the experimenter. If done properly, these allow inferences to be made about whether the treatment caused an effect. They may use between-subjects (between-groups) or within-subjects designs A simple between-groups design: assign subjects (at random) to groups. Treat each group differently, before giving them all the same test. If the test results for the groups are different, this is evidence that the treatment influences performance. A simple within-subjects design: test the same subjects repeatedly after (or during) different treatments, counterbalancing for the order in which you give the treatments and so on. If the results differ across treatment conditions, this is evidence that the treatment influences performance. Sometimes we have to use more complex experimental designs. For example, when more than one treatment is given, they can all be given in a between-subjects fashion, or all in a within-subjects fashion, or some between- and some within-subjects. Sometimes we have to consider differences between groups that are not assigned by the experimenter such as gender, age, IQ, prior illness, prior drug use. The exact interpretation of differences between groups in such an experiment may be more complicated, as the effect of one variable may be confounded by another. For example, if you give all the male subjects treatment A, and all the female subjects treatment B, you wont be able to tell whether a difference between the groups is due to the treatment difference or the sex difference these two variables are confounded. In general, random assignment of subjects to groups is a good way to get around this problem. So will you use a between-subjects or a within-subjects design? In our aspirin example, will you test subjects on aspirin and the same subjects off aspirin? Or will you give some subjects aspirin and some not?
If you use a within-subjects design, will you test off aspirin and then on? Or on then off? Will there be effects of practice on the task that affect the interpretation in this case? Will the drug have permanent or long-lasting effects? Should you randomize or counterbalance the order (so some subjects get aspirin first and some get placebo first)? When youre not giving the drug, will you give nothing, or a placebo? If you use a placebo, will the experimenter be blind as to the condition the subject is in? The importance of the experimenters awareness, or lack of it, applies more generally whenever there is the possibility that the experimenters expectations may influence the subject, or influence the recording of the data, consciously or unconsciously. If you use a between-subjects design, how will you assign subjects to groups?
Sometimes the question asks about how you will analyse the data you collect. The design of your experiment partly determines the analytical technique. Do you have quantitative or categorical data? The use of a between-subjects or a within-subjects design influences the relatedness of your data, and may determine whether you will use related (e.g. paired) or unrelated (unpaired) statistical tests. What will your null hypothesis be? There may be some things you cant specify in advance (e.g. you may prefer to use parametric tests like the t test, but you dont always know whether their assumptions will be met until you collect the data; if their assumptions are violated, you may need to use nonparametric tests instead and you can simply say that). Good designs are simple, and answer the question clearly. If you find an effect in your experiment, will the interpretation be simple? Sometimes you may need to design a series of experiments rather than a single experiment. Think about what each experiment should aim to establish; keep each experiment as simple as possible. Sometimes the most sensible choice of the next experiment depends on the results of previous experiments; you can do no more than anticipate likely outcomes or lines of investigation if you are outlining a proposed series of experiments. Good designs are also practical. If your design calls for the use of a zero-gravity cell culture environment, it may be impractical or expensive but if the question is important (will it save lives? improve the lot of millions?), maybe its worth it. On the other hand, if you could do the same experiment with a set of headphones and a signal generator, thats probably preferable. If your design involves inducing permanent hearing damage in volunteer humans, its highly questionable ethically. When using animals, experimenters always seek to refine experiments to minimize suffering and distress, reduce the number of animals used, and replace live animals with alternatives when possible. Finally, your design may be excellent, or it may just be the best thing you can think of during the exam. If you spot problems or flaws in your design, discuss them. Youre not expected to design something perfect, but if you know there are problems, talk about them. Many of these issues were also discussed in Section 1.
Tags
R-316FS Nokia 3108 AEG-electrolux T2 XNV-660BT 18I-MB Yamaha P250 NP-R480-js01 5 3 Hydromotrix 23KW GSA-E40N RS 6 Canon B840 TX-SV525R F-110 Phone-MA361 UF-315 Edition 6 03 H5474 M-16DX Apache DNS-343 Cappuccino WV-CST604 Kxtg6513 FS-85USB CD630 HB851 TSS-1 Navpilot 511 TX-28LK1C Mhcc33 L1718S Lexmark Z845 GSR600 DTB-P850V VGN-BX563B HF-310 Seat Leon RDC-7 Realis SX60 DTI 652 EL-6390 LQ-2070 RX-V595 TH-42PA50E SL-BD22D MC-7844AS SV-3700 21FU1RK Mb986 TA1050 HQ8241-18 Vista H Samsung NC10 EL-501W NX-E400 RS100 KX-TG9120E Sa660 The Past TX-DS575X GZ-MG77EK EW880F VGN-AW11m H FM40AH Uh0 10270 BC 301 GDT-11 L74665 DC120 Digital Samsung ES75 CR 30T Easyshare P880 LP260 128 L Gzmc200EK-GZ-mc200 CQ-D1703N Quest IX MDR-IF5000 Seccoultra Enterprise DVD-SH893M HD-P2 HTR-5140 Prix 4 SVT124 Kirk ND1 42PFP5532D-12 N1050V Digilux2 DE6543X KEH-P8200RDS Fr PDP-434HDE P5S800-VM 3750I Digitech RP70 Phaser 3117 Review
manuel d'instructions, Guide de l'utilisateur | Manual de instrucciones, Instrucciones de uso | Bedienungsanleitung, Bedienungsanleitung | Manual de Instruções, guia do usuário | инструкция | návod na použitie, Užívateľská príručka, návod k použití | bruksanvisningen | instrukcja, podręcznik użytkownika | kullanım kılavuzu, Kullanım | kézikönyv, használati útmutató | manuale di istruzioni, istruzioni d'uso | handleiding, gebruikershandleiding
Sitemap
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101










