Home » Analysis of variance

Analysis of variance


 

 

Section 1
1.1 Review Concepts
   Reports
   1.3 Correlation and Regression
   1.4 Limitations of Hypothesis testing
   1.5 The concept of interactions  The concept of interactions

Hypothesis testing:

Each test such as

  • t tests
  • chi square and
  • pearsons correlation co efficient

 

has associated p value, most is performed at 5% if p <.050 then its significant

 

Process:

  1. Start with a question or research hypothesis
  2. Collect data
  3. Calculate the appropriate t statistic and associated p value
  4. Compare to significance level your testing for p <.001 or p <.050
  5. Reach a conclusion about the population

 

  • Exercises 1-1.1.5

 

95% Confidence Interval

How strong is the correlation in the tested population? By testing the correlation in the sample we can draw an inference.

How reliable is this estimate and how close is it to the actual correlation in the population.

The 95% confidence interval gives a range which is the r (rho) score. It’s between this and that. At 95% we can be 95% confident of our results.
When the interval is very broad we are not able to accurately estimate the strength of the correlation. The solution is to use a larger sample ie the larger the sample to narrower the interval.

Observational v experimental studies-confounding and causation.
Experimental: some variables are kept constant (dependent variable, dv) and others are changed (independent variable)
Observational: existing conditions or groups are used. In obs studies we cannot determine causation only correlation.
Reasons we can’t draw causal conclusions
*nuisance variable acting as (confounding factor)

experimental design: properly designed can avoid confounding factors.
3 designs that control for confounding factors are
1) independent samples design
2) repeated measures design
3) matched pairs design

Independent sample properly constructed may include random assignment and each group receives one condition only.

Repeated measures, in this design we ask all participants to try all conditions. This design is very effective at avoiding many nuisance variables.

Matched pairs design, in this design we may also assign to different groups but each group would be matched in terms of gender, age, etc.

Properly designed experimental designs allow us to draw causal conclusions. Whilst this is the best scenario sometimes ethics or circumstance do not allow us to do this and we must do an observational study.

 

Exercises 1.1.6-10

Choosing the appropriate analysis

Analysis will depend on how the variables are measured and what hypothesis or question we are trying to address.

 

Levels of measurement

  1. Metric or categorical

Metric

-Interval: the difference between the scores has meaning ie height in cm’s, also applies to scales which as routinely used in psych research.

Catagorical

-Ordinal: Has natural order such as 1st born 2nd etc

-Nominal: non ordinal data that cant naturally be ordered.

 

Types of research

-Ours is bigger then yours (Independent samples t-test)

-Our this is bigger then our that (Paired samples t test)

-We are more likely than you are (Crosstabs/Chi-square statistic)

-More of this means more of that (Correlation/regression)
Scenarios:

Relationship between two categorical variables then FREQUENCY

  1. Crosstab to describe the nature of the relationship
  2. Chi square to see if that relationship is significant

Relationship between two metric variables then EXPLORE

  1. Scatterplot, pearsons correlation, and linear regression to describe the relationship
  2. Pearsons r to test significance

Relationship between one metric and one categorical

  1. Boxplots of metric variable for each category to describe relationship, or table/graph of group means.
  2. Our method of testing significance would depend on the design of the study,

repeated measures= paired samples t test,

independent samples= independent samples t test

When we are testing 3 or more conditions we use ANOVA

 

Steps to choosing analysis

  • Identify how variables are measured
  • Decide if the research hypo is asking for within variable or between variables this will change the type of analysis.

 

Analysis Chart

 

 

SPSS Instructions

Frequency Table (proportion of respondents who…)

Analyse/ Descriptive statistics/frequencies

Click on relevant variables and transfer and ok

 

Pie Chart (as above click chart pie chart)

 

Summary Statistics/Box plot/Histogram (distribution of variables in sample)

Analyse/Descriptive statistics/explore

Transfer applicable variable into dependent list

-histogram: plots/histogram/turn off stem and leaf/ok

-percentiles as above then statistics/click percentiles/ok

 

Binomial Test (the difference between two nominal data sets)

Analyse/nonpara/legacy dialogue/binomial

Transfer variable into test variable list

(Define dichotomy) TBA

 

One Sample T-Test (have mean variables changed? From one sample measured at different times)

Analyse/compare means/one sample t test

Transfer variable into test variables and add orginal mean into “test value” to compare current and former means.

 

Descriptive stats for sub groups (such as men and woman, or Australian v non Australian)

Analyse/descriptive/explore

Add variable to dependent list and subgroup to factors list

 

Independent samples T test (like one sample but for cat data)

Analyse/comparemeans/independ samples/

Add variable to the Test variable and grouping to the grouping variable.

Define groupings by codes in data/ ok

 

Paired samples T test

variable:

Transform/compute variable

Add what the difference will be called eg (diff species)/type label and add label/ok

Then use explore to produce distributions.

Produce output:

Analyse/compare means/paired sample

Transfer two original variables into paired variables box/ok

 

Scatterplots/correlations/regression stats (relationship between 2 metric variables)

Perform scatterplot to check no outliers or curve

Graphs/Chart builder/choose plot options

Ensure element properties is set to variable value.

Make sure the IV goes on the Horizontal and DV goes on the vertical/ok

Click line pic if u want a regression line.

 

Scatterplot separated

Graphs/chartbuilder/simpledot/separated scatter

Transfer IV to Hoz and DV to vert and factor to Set colour

Open graph and choose separated fit line.

 

Split file

 

 

Pearsons correlation co efficient:

Analyse/correlate/bivariate

Transfer variables to variable box/ok

Co efficient:

Analyse/regression/linear

Transfer IV and DV then ok

 

Crosstabs/chi square (relationship between two categorical values)

Analyse/descriptive stats/crosstabs

Transfer IV into column and DV into row

For percentages click cells/tick column

For chisquare: click statistics/chi square X2/ ok

 

 

 

Correlation and Regression Revisited

Correlation and regression explores the relationship between two variables, where both of these are metric we would perform a correlation and regression. Where the hypothesis is more of this means less/more of that this is correlation regression.

 

Scatterplot

Explores the relationship between the variables.

DV goes on the VERTICAL and IV goes on the HORIZONTAL.

To identify the DV:

What is more likely, does this depend on that or that on this.

Where this is difficult look at the hypothesis.

Make sure you are clear about the perspective you are taking before you analyse the data.

  • Check for outliers (data well separated from the main group) if there are any exclude.
  • Curve of relationship- if there is a curve then pearsons r and regression cant be used.

Description of scatterplot

  1. Direction (pos/neg)
  2. Form: linear
  3. Strength: V weak, Weak moderate , strong V strong

 

Pearsons r and regression line

Where scatterplots are subjective and somewhat open to interpretation pearsons r and regression stats are a more objective measure of strength of the relationship. Where there are no outliers or curve this can be used.

 

Strength of relationships

.75 strong linear

.45-.74 moderate

.25-.44 weak linear

.24-less extremely weak

 

Look at the r2 coefficent of determination stat to guage the importance of the relationship. How much of the variation is attributable to the relationship between the variables is shown in the value of the  eg r = .00 so  = .00 x .00

This information can tell us in conjunction with the strength of the relationship how much of that came from the actual relationship between the variables.

 

The regression line or line of best fit quantifies how much less or more one variable is against another

(slope horizontal intercept, regression line)  which is bottom line of unstandardized co efficients table to two dec places.

This tells us based on the IV used how much the DV increases or decreases. Where it makes more sense we can multiply it out to larger increments such as 500grams or 1 meter or 10 years.

(Vertical intercept) which is top line of table

This tells us based on the DV how much the IV increases or decreases. Depending on the range in the sample data and the research question this may or may not be useful information.

 

Significance testing and the report

When writing a report we use the scatterplot, correlation, regression and 95% confidence interval to make our finding incl if the relationship was significant and can be applied to the whole population.

 

The report should include

  • Hypothesis or question
  • Information about the sample
  • Strength of sample
  • Significance and supporting stats (r, n, p)
  • If significant an interpretation of pearsons r
  • If scales familiar an interpretation f slope and or intercept.
  • Conclusion relating back to question.

 

Regression

 

Checking the direction of the relationship

Where a hypothesis is directional such as predicting people who do this will have more of that…our results may support our directional hypothesis or be contrary.

Where its contrary still report the sig stats but explicitly mention that results are contrary to expectations, be careful not to draw unfounded causal conclusions.

Where a result is contrary always reflect on why.

 

Limitations of hypothesis testing

Significance v Importance

Significance shows if the relationship that exists in the sample would be likely to exist in the population but it does not say much else. It does not tell us how strong the relationship is. This is where we use the 95% confidence interval. Much more informative then saying the relationship is significant is the ability to say within what range there is a relationship and the strength of that relationship, how important the relation is in relation to the overall results. Eg while we may find a p=0.20 the range may be .15-.30 which is only very weak to weak.

Where you find a sig result but a weak weak 95% conf interval this should be detailed in the report.

 

Regression

 

Not significant v no relationship

If we don’t find a result of significance its because either there is no correlation or the sample wasn’t large enough.

 

The 95% conf interval can indicate to us when we may need a larger sample ie if we find not sig and p(rho) is from 0.29 and 0.80 we would know that the correlation could either be weak or very strong and a larger sample may be able to narrow this correlation down.

For large samples we can be confident in our results.

 

Power Analysis

Tells us how large a sample we need to use in order to be reasonably confident of detecting a correlation if one exists in the population. If there is a really strong correlation in the population then you only need a relatively small  sample in order to detect it and vis a vis.

In the case where you are interested in detecting correlations that are very weak (think allergic reaction) then you will need a much larger sample.

 

Size of sample will depend on:

  1. How weak the correlation you want to be able to detect
  2. How confident you want to be
  3. Level of significance your going to test at.

 

Common decisions are to:

  • test at 5% significance, or a = 0.5 this means we have a 5% chance of being wrong.
  • Use 80% confidence of detecting a relationship.
  • Strength of correlation: r = .30 or stronger

 

As an example if we wanted 80% chance of detecting r = .30 then we would read the top line (Power) to 0.30 and the vertical of .80 finding we need a sample of n = 85 in this instance.

 

While best case scenario is to do this before you collect data, this is not always possible, in this instance we would do this analysis as part of our overall analysis to see if we needed a larger sample or if it was sufficient. We would add this to our report.

 

The concept of interaction

What happens when a variable such as occupation or location has a relationship within it, and not just between variables such as age against intelligence.

 

We can get spss to show the data for each category within a variable in the scatterplot to see the pattern and a consequent regression analysis, this allows us to see the interaction between two IV for example.

The results can be much clearer ie men may have a wildly different relationship with the DV then woman. Lumping both genders together may skew the results.

 

Interaction refers to the idea that the relationship for one category of IV is different to another category of IV with the DV and these relationships must be described seperatly.

 

 

 

Revision

The – sign on t test is totally arbitrary and really is just dependent on the way the way the data was coded.

 

Reporting analysis

-Hypothesis

-Information about the sample

-comparison of two means

-whether difference was sig together with supporting stats (t, df and p value)

-if sig an interpretation of the 95% conf interval for difference

-conclusion relating back to the question

 

Checking direction of the difference

Where a hypo is directional we must check the group means to see if consistent with our hypothesis, if not consistent this needs to be pointed out in the report. (contrary to expectations)

 

Assumptions underlying the independent sample t-test

  • All scores are independent of each other (no confounding)
  • DV is metric
  • Normal distribution
  • All pops have same variance

 

Checking normality

Where the sample is normally distributed we assume the population is too.

To test normality we can use a normality or Q-Q plot this is where the observed values in the distribution are  are plotted against values you would expect for a normal distribution with same mean and sd. If the distribution is normal then the plots should all line up closely to the diagonal line. This can be seen in a histogram and a QQ plot.

 

When pos skewed the points begin below the line rise above then drop below again. Neg skewed is the opposite. With bimodal it has 2 peaks and follows an s curve. This is only useful for large samples.

 

Normality plots rather then histograms are preferred as they generally are more accurate.

 

Where normal T-test is appropriate where not normal use a nonparametric test .

 

Levenes test

Where p <0.50 then significant, where not significant we can assume equal variances assumption is met.

 

Non sig T tests- Power

 

-alternate hypothesis (research Question)

-Null hypothesis

 

Type I and II errors

When testing at 0.50 we are accepting a 5% chance we could be wrong no matter what.

Type I concluding a relationship exists when infact there is no relationship in the population.

Type II concluding no relationship exists where an actual relationship does exist in the population.

 

Methods of reducing error

-increase sample size

-increase size of treatment effect (choosing treatments likely to have larger effects)

-decrease background variation (reduce nuisance variables)

-use a more sensitive experimental design.

 

 

Reports

It was hypothesised that older people would tend to watch less television.

In a random sample of 200 Australians, there was a weak, negative, linear relationship between age and time spent watching television, and Pearson’s r shows that this relationship is significant, r = .29, n = 200, p < .001. The 95% confidence interval for Pearson’s correlation indicates that the strength of the relationship is between ρ = .16 and ρ = .41. In the sample, for each additional ten years in age, on average, respondents watched 1.3 hours less television per week.

As expected, older people do tend to watch less television.

 

————

It was hypothesised that students who spent more time working on statistics exercises in the week leading up to the exam would achieve higher exam results.

In a random sample of 100 students, there was a weak, negative, linear relationship between time spent on exercises and exam result, and Pearson’s r shows that this relationship is significant, r = .26, n = 100, p = .010. The 95% confidence interval for Pearson’s correlation indicates that the strength of the relationship is between ρ = .06 and ρ = .43. In the sample, for each additional hour of study, on average, exam score was 1.2 marks lower.

Contrary to expectations, students who spend more time working on statistics exercises in the week leading up to the exam tend to get lower marks.

_____________

It was hypothesised that students who complete more revision exercises in the week leading up to the exam tend to get a higher exam mark.

In a random sample of 200 students, there was an extremely weak, positive, linear relationship between number of revision exercises completed and exam mark, and Pearson’s r shows that this relationship is not significant, r = .10, n = 200, p = .159.

There is insufficient evidence to suggest that there is a relationship between number of revision exercises completed in the week leading up to the exam and exam mark. A power analysis indicates that there was a 99% chance of detecting a correlation as weak as  = .30 if one exists in the population, so we can be confident that if there is any correlation in the population it is only weak.

____________

It was hypothesised that students who complete more revision exercises in the week leading up to the exam tend to get a higher exam mark.

In a random sample of 20 students, there was a weak, positive, linear relationship between number of revision exercises completed and exam mark, and Pearson’s r shows that this relationship is not significant, r = .27, n = 20, p = .243.

There is insufficient evidence to suggest that there is a relationship between number of revision exercises completed in the week leading up to the exam and exam mark. A power analysis indicates that there was only a 25% chance of detecting a correlation as weak as  = .30 if one exists in the population. The study should be repeated with more participants.

_____________

It was hypothesised that girls in year 8 would have higher English achievement scores than boys.

In a random sample of 240 children in year 8, the average English achievement score was higher for the girls (M = 17.71, SD = 4.52, n = 120) than for the boys (M = 15.52, S = 4.14, n=120) and an independent sampels t-test shows that this difference in mean achievement is significant, t (238) = 3.91, p < .001. The 95% confidence interval indicates that the average English score is between 1.09 and 3.29 marks higher for girls in year 8 than for boys in year 8.

As expected, girls have higher English achievement scores than boys.

______________

Analysis of variance

 

Single factor analysis

Single iv with two or more conditions like testing hyperactivity against jelly beans of three different colours.
Analysis of variance is based on the concept of variance ie
Identify the sources of variance and assess their relative importance.
A) when there is no variation in test scores it’s easy to identify effects of the iv.
B) it’s important to limit nuisance variables, but we can never remove them completely.
C) nuisance variables can occur both within and between groups. Called background variation.
The more variation within groups the harder it is to tell the effect of the iv.

-iv effect is greatest when background variation is small and there are large differences in treatments.
The analysis of variance provides up with an objective measurement of whether the difference between the group means is due to chance or due to the iv

with anova we begin by looking at possible sources of variation. variation within groups cannot be explained by the IV so must be a nuisance variable.

variation may be caused by IV, background variable or confounding factor. even in random samples it is only unlikely not impossible that nuisance variables are reduced. to test this we use the F ratio

ratio =variation within group over variation between groups(BV) however we must add to this the treatment effect, (TE) so it would be:
F = BV+TE over BV

if the IV does not effect the DV then it would just be BV + TE (=0) over BV and should equal approximatly 1 as they would both be very similar values.

if IV does effect DV then we would expect BV + TE to be greater then BV and would = more then 1. F>1

calculating the F ratio
in order to calculate the F ratio we need to calculate the variation within groups and between groups and these need to be squared means.
F= MS(mean squared) a (between groups) over
MS(mean squared) w (within groups)

to find squared deviation (ss) squared score (w) within groups we take each score subtract the mean and square the rest and add them up.
to know the (ms) mean squared (w) we need to know the degrees of freedom (df) (dfw)

calculate df (w) – n (number of scores total) minus a (number of treatment groups)

Mean Squares within groups (MS(w)) = Sum squared (ss(w)) = (result like 42) OVER
Degrees of Freedom (df(w)) = (result like 9) = 2.678

We do the same for Mean Squares between groups (a) then;

put it all together

F= Mean Squared (w) = result (see above) OVER
Mean Squared (a) = result = Result like 0.890

with a result like that it is close to 1 so it could easily have just occured by chance where its more then 1 we can be pretty sure it was because of the Treatment effect.
basically we are ideally looking for greater then 1 here to show the IV had an effect.

When reporting F it usually comes in a table that reports within and between group stats for Sum squared (ss) degrees of freedom (df) and Mean squared (ms) along with the final F score.

Sampling distribution of F
to decide if the F stat is significant we need to use sampling distribution
1) start with the assumption that all treatments are equal for effectiveness, ie no dif in means.

to test this in the real world we would do many trials and chart all the F stats some may be a lot less then F=1 and some more.

we generally stick to our 5% for F ratios or F= ration p>.050 meaning we are willing to be wrong 5% of the time and that F will be as high as this number if there was no
difference in the population means.

Two treatment conditions
where only 2 conditions exist then the results will show up the same as an independent samples t test.

effect size
when we find a difference between the sample means and its significant we need to determine if its important. to do this we look at both the difference in means and the variability observed.

eg if we were counting how many blue eyed children in a group of 5 and the result was 3 that would be quite important but if it was from a group of 20 then not so important.

to measure how important we use greek symbol n squared and pronounced eta

eta squared = sum squared (a) between groups OVER sum squared (t) statistic
this is a lot like r squared and tells us the proportion of variation that can be explained by the IV

levels of n squared (Cohen)
.01 small .06 medium .14 large
eg .16 = 16% of the variation in scores can be explained by the IV.

Power analysis for one way anova
when conducting an experiment it is important to check the power of the sample, with a very small sample we are unlikely to be able to make many conclusions even if the effect is large.

how large a sample depends on:
a) size of the effect we want to detect for us n squared = .06 (moderate) at 80% chance.
b) level of statistical significance .05 level of making a type 1 error seeing a relationship where one does not exist.
c) the risk we are willing to take of making a type II error. .20 of making a type II error ie detecting a relationship where one actually exists.

cohens power analysis is in appendix

Presenting results:
F ratio, degrees of freedom, and significance level.
where the results are not significant the meands and df can paint a picture of why relating to anything from sample size treatments, groups and BV

an alternative to showing a table is to show a graph.

4 assumptions of F ratio

1) normal distribution
2) same variance (homogeneity of var)
3) independent of each other (not confounded)
4) measurement is metric.

1 and 2 can be addressed through a QQ plot.

Reports should include
-research question
-type of analysis
-table of rel stats
-conclusion relating to the significance and relationships
-relevant stats of sig
-where sig iscuss group means relating back to question
-overall conclusion

analytical comparisons in the single factor independent groups design.
F ratio is a omnibus statistic meaning if significant then we can be confident there is a difference in the means but we cannot say which means.

comparisons allow us to test which means differ. in this way we can see whether there is a difference between any two means or sets of means.

when we look at comparisons we take into account the nuisance or BV but compare only the different means between groups. we can do this by specifying co efficents.

experiment wise v

Sent from my iPad

 


Leave a comment