Explain the difference between an independent and dependent sample.
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\) Show
Testing a Hypothesis for Dependent and Independent SamplesHypothesis Testing for Dependent and Independent SamplesWe have learned about hypothesis testing for proportion and means with both large and small samples. However, in the examples in those lessons only one sample was involved. In this lesson we will apply the principals of hypothesis testing to situations involving two samples. There are many situations in everyday life where we would perform statistical analysis involving two samples. For example, suppose that we wanted to test a hypothesis about the effect of two medications on curing an illness. Or we may want to test the difference between the means of males and females on the SAT. In both of these cases, we would analyze both samples and the hypothesis would address the difference between two sample means. In this Concept, we will identify situations with different types of samples, learn to calculate the test statistic, calculate the estimate for population variance for both samples and calculate the test statistic to test hypotheses about the difference of proportions or means between samples. Dependent and Independent SamplesWhen we are working with one sample, we know that we need to select a random sample from the population, measure that sample statistic and then make hypothesis about the population based on that sample. When we work with two independent samples we assume that if the samples are selected at random (or, in the case of medical research, the subjects are randomly assigned to a group), the two samples will vary only by chance and the difference will not be statistically significant. In short, when we have independent samples we assume that the scores of one sample do not affect the other. Independent samples can occur in two scenarios. Testing the difference of the means between two fixed populations we test the differences between samples from each population. When both samples are randomly selected, we can make inferences about the populations. When working with subjects (people, pets, etc.), if we select a random sample and then randomly assign half of the subjects to one group and half to another we can make inferences about the populations. Dependent samples are a bit different. Two samples of data are dependent when each score in one sample is paired with a specific score in the other sample. In short, these types of samples are related to each other. Dependent samples can occur in two scenarios. In one, a group may be measured twice such as in a pretest-posttest situation (scores on a test before and after the lesson). The other scenario is one in which an observation in one sample is matched with an observation in the second sample. To distinguish between tests of hypotheses for independent and dependent samples, we use a different symbol for hypotheses with dependent samples. For dependent sample hypotheses, we use the delta symbol δ to symbolize the difference between the two samples. Therefore, in our null hypothesis we state that the difference of scores across the two measurements is equal to 0;δ=0 or: H0:δ=μ1−μ2 Calculating the Pooled Estimate of Population VarianceWhen testing a hypothesis about two independent samples, we follow a similar process as when testing one random sample. However, when computing the test statistic, we need to calculate the estimated standard error of the difference between sample means, Where n1 and n2 are the sizes of the two samples s2 is the pooled sample variance, which is computed as . Often, the top part of this formula is simplified by substituting the symbol SS for the sum of the squared deviations. Therefore, the formula often is expressed byCalculating s2 Suppose we have two independent samples of student reading scores. The data are as follows: Sample 1Sample 2712814101841361110From this sample, we can calculate a number of descriptive statistics that will help us solve for the pooled estimate of variance: Descriptive StatisticSample 1Sample 2Number n56Sum of Observations ∑x3578Mean of Observations x̄713Sum of Squared Deviations ∑(xi−x̄)2 {i:1,n}2040Using the formula for the pooled estimate of variance, we find that s2=6.67 We will use this information to calculate the test statistic needed to evaluate the hypotheses. Testing Hypotheses with Independent SamplesWhen testing hypotheses with two independent samples, we follow similar steps as when testing one random sample:
When stating the null hypothesis, we assume there is no difference between the means of the two independent samples. Therefore, our null hypothesis in this case would be: H0:μ1=μ2 or H0:μ1−μ2=0 Similar to the one-sample test, the critical values that we set to evaluate these hypotheses depend on our alpha level and our decision regarding the null hypothesis is carried out in the same manner. However, since we have two samples, we calculate the test statistic a bit differently and use the formula: where: x̄1−x̄2 is the difference between the sample means μ1−μ2 is the difference between the hypothesized population means s.e.(x̄1−x̄2) is the standard error of the difference between sample means Evaluating the Difference Between Two SamplesThe head of the English department is interested in the difference in writing scores between remedial freshman English students who are taught by different teachers. The incoming freshmen needing remedial services are randomly assigned to one of two English teachers and are given a standardized writing test after the first semester. We take a sample of eight students from one class and nine from the other. Is there a difference in achievement on the writing test between the two classes? Use a 0.05 significance level. First, we would generate our hypotheses based on the two samples. H0:μ1=μ2 Ha:μ1≠μ2 This is a two tailed test. For this example, we have two independent samples from the population and have a total of 17 students that we are examining. Since our sample size is so low, we use the t−distribution. In this example, we have 15 degrees of freedom (number in the samples minus 2) and with a .05 significance level and the t distribution, we find that our critical values are 2.131 standard scores above and below the mean. To calculate the test statistic, we first need to find the pooled estimate of variance from our sample. The data from the two groups are as follows: Sample 1Sample 23552518766764262378146716055556753From this sample, we can calculate several descriptive statistics that will help us solve for the pooled estimate of variance: Descriptive StatisticSample 1Sample 2Number n98Sum of Observations ∑x445551Mean of Observations x̄49.4468.875Sum of Squared Deviations ∑(xi−x̄)2 from {i:1,n}862.221058.88Therefore: and the standard error of the difference of the sample means is: Using this information, we can finally solve for the test statistic: Since -3.53 is less than the critical value of 2.13, we decide to reject the null hypothesis and conclude there is a significant difference in the achievement of the students assigned to different teachers. Testing Hypotheses about the Difference in Proportions between Two Independent SamplesSuppose we want to test if there is a difference between proportions of two independent samples. As discussed in the previous lesson, proportions are used extensively in polling and surveys, especially by people trying to predict election results. It is possible to test a hypothesis about the proportions of two independent samples by using a similar method as described above. We might perform these hypotheses tests in the following scenarios:
In testing hypotheses about the difference in proportions of two independent samples, we state the hypotheses and set the criterion for rejecting the null hypothesis in similar ways as the other hypotheses tests. In these types of tests we set the proportions of the samples equal to each other in the null hypothesis H0:p1=p2 and use the appropriate standard table to determine the critical values (remember, for small samples we generally use the t distribution and for samples over 30 we generally use the z−distribution). When solving for the test statistic in large samples, we use the formula: where: p̂ 1,p̂ 2 are the observed sample proportions p1,p2 are the population proportions under the null hypothesis se(p1−p2) is the standard error of the difference between independent proportions Similar to the standard error of the difference between independent samples, we need to do a bit of work to calculate the standard error of the difference between independent proportions. To find the standard error under the null hypothesis we assume that p1=p2=p and we use all the data to estimate p. Now the standard error of the difference is The test statistic is now Determining Statistical DifferenceSuppose that we are interested in finding out which particular city is more is more satisfied with the services provided by the city government. We take a survey and find the following results: Number SatisfiedCity 1City 2Yes12284No7866Sample Sizen1=200n2=150Proportion who said Yes0.610.56Is there a statistical difference in the proportions of citizens that are satisfied with the services provided by the city government? Use a 0.05 level of significance. First, we establish the null and alternative hypotheses: H0:p1=p2 Ha:p1≠p2 Since we have a large sample size we will use the z−distribution. At a .05 level of significance, our critical values are ±1.96. To solve for the test statistic, we must first solve for the standard error of the difference between proportions. Therefore, the test statistic is: Since 0.94 does not exceed the critical value 1.96, the null hypothesis is not rejected. Therefore, we can conclude that the difference in the probabilities could have occurred by chance and that there is no difference in the level of satisfaction between citizens of the two cities. Testing Hypotheses with Dependent SamplesWhen testing a hypothesis about two dependent samples, we follow the same process as when testing one random sample or two independent samples:
As mentioned in the section above, our hypothesis for two dependent samples states that there is no difference between the scores across the two samples H0:δ=μ1−μ2=0. We set the criterion for evaluating the hypothesis in the same way that we do with our other examples – by first establishing an alpha level and then finding the critical values by using the t−distribution table. Calculating the test statistic for dependent samples is a bit different since we are dealing with two sets of data. The test statistic that we first need calculate is , which is the difference in the means of the two samples. Therefore, =x̄1−x̄2. We also need to know the standard error of the difference between the two samples. Since our population variance is unknown, we estimate it by first using the formula for the standard deviations of the samples:where: s2d is the sample variance d is the difference between corresponding pairs within the sample is the difference between the means of the two samplesn is the number in the sample sd is the standard deviation With the standard deviation, we can calculate the standard error using the following formula: After we calculate the standard error, we can use the general formula for the test statistic: Evaluating the Relationship Between Two SamplesThe math teacher wants to determine the effectiveness of her statistics lesson and gives a pre-test and a post-test to 9 students in her class. Our hypothesis is that there is no difference between the means of the two samples and our alternative hypothesis is that the two means of the samples are not equal. In other words, we are testing whether or not these two samples are related or: H0:δ=μ1−μ2=0 Ha:δ=μ1−μ2≠0 The results for the pre-and post-tests are below: SubjectPre-test ScorePost-test Scored differenced217880242676924356701419647879115969600682842478488416890922498792525Sum71875032254Mean79.783.33.6Using the information from the table above, first solve for the standard deviation of the two samples, then the standard error of the two samples and finally the test statistic. Standard Deviation: Standard Error of the Difference: Test Statistic (t−Test) With 8 degrees of freedom (number of observations - 1) and a significance level of .05, we find our critical values to be ±2.306. Since our test statistic exceeds this critical value, we can reject the null hypothesis that the two samples are equal and conclude that the lesson had an effect on student achievement. ExampleExample 1You have obtained the number of years of education from one random sample of 38 police officers from City A and the number of years of education from a second random sample of 30 police officers from City B. The average years of education for the sample from City A is 15 years with a standard deviation of 2 years. The average years of education for the sample from City B is 14 years with a standard deviation of 2.5 years. Is there a statistically significant difference between the education levels of police officers in City A and City B? First, find the test statistic: This is a t – statistic with 66 degrees of freedom. This is a two-sided test, with the p-value = 0.07. Since this is greater than .05 we fail to reject the null hypothesis. This means that we believe there is no statistically significant difference between the education levels of police officers in the two different cities. Review
(a) Does this scenario involve dependent or independent samples? Explain. (b) What would the hypotheses be for this scenario? (c) Compute the pooled estimate for population variance. (d) Calculate the estimated standard error for this scenario. (e) What is the test statistic and at an alpha level of .05 what conclusions would you make about the null hypothesis?
(a) What would be the hypotheses for this scenario? (b) Calculate the estimated standard deviation for this scenario. (c) Compute the standard error of the difference for these samples. (d) What is the test statistic and at an alpha level of .05 what conclusions would you make about the null hypothesis?
Complete the hypothesis test to determine if the two types of music have different effects upon the ability of college students to perform a series of mental tasks requiring concentration. (source: Vassar College)
Review (Answers)To view the Review answers, open this PDF file and look for section 8.6. VocabularyTermDefinitiondependent samplesDependent samples occur when you have two samples that do affect one another.independent samplesIndependent samples occur when you have two samples that do not affect one another.likelihoodThe likelihood is the test statistic (t) associated with two dependent samples.proportions associated with two independent samplesThe z-score is the test statistic associated with two independent samples used when testing the proportion associated with two independent samples.Additional ResourcesVideo: t-Test Two Sample for Means Hypothesis Test Practice: Dependent and Independent Samples This page titled 9.7: Dependent and Independent Samples is shared under a CK-12 license and was authored, remixed, and/or curated by CK-12 Foundation via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request. What is the difference between two dependent samples and two independent samples?Dependent samples occur when you have two samples that do affect one another. Independent samples occur when you have two samples that do not affect one another.
What is a dependent sample?Two samples are dependent(or consist of matched pairs) if the members of one sample can be used to determine the members of the other sample. Tricks: The words like dependent, repeated, before and after, matched pairs, paired and so on are hints for dependent samples.
What is the difference between independent and paired samples?Paired-samples t tests compare scores on two different variables but for the same group of cases; independent-samples t tests compare scores on the same variable but for two different groups of cases.
What does it mean when samples are independent?Independent samples are samples that are selected randomly so that its observations do not depend on the values other observations. Many statistical analyses are based on the assumption that samples are independent.
|