########################################################
# Name: John Palomino
# CSC-315
# Lab #6: Hypothesis Tests -- Proportions
########################################################
##########################################################################
# Add R code to the script below and create a Notebook to complete
# the steps and explicitly answer the following questions
##########################################################################
# A CBS news poll that surveyed 2,226 registered voters towards the
# end of August 2020 found that 1,158 (52%) answered 'Yes' to the question
# "Are you better off today than you were 4 years ago". Let p.hat be the
# proportion of all registered voters that would answer 'Yes' to that question.
# Link: # https://www.cbsnews.com/news/republicans-economy-coronavirus-opinion-poll-cbs-news-battleground-tracker/
#
# Use the above information to answer questions 1 - 2
Question 1 -- [9 / 12 points] ❌Your calculations are correct, but you need to use dnorm to generate the correct curve for the normal distributions
# 1) Complete the following questions to carry out a hypothesis test 'manually' using
# the information from the CBS poll.
# a) What is the mean and standard deviation of the distribution of p.hat
# under the null hypothesis that p = 0.50.
# GD: this gives error
#mean=p.hat
mean=1158/2226
p_0=0.5
n=2226
std.dev=sqrt(p_0*(1-p_0)/n)
std.dev
## [1] 0.0105976
# b) Graph the distribution of p.hat and draw a vertical line at
# p.hat = 1158/2226, the proportion answering 'Yes' to the question.
set.seed(0.5)
p.hat=rnorm(50,1158/2226,0.0105976)
hist(p.hat,col="sky blue")
abline(v=1158/2226,col="blue",lwd=2)

# c) Calculate the z test statistic and graph its distribution under the
# null hypothesis as was done in class, drawing a vertical line at the
# z statistic. Find the p-value based on this test statistic. The
# z statistic should be around 1.907568 and the p-value should be about
# 0.056557.
p.hat=1158/2226
p_0=0.5
z_statistic=(p.hat-p_0)/std.dev
z_statistic
## [1] 1.907568
p_value=2*(1-pnorm(z_statistic))
p_value
## [1] 0.05644713
z_statistic=rnorm(50,0,1)
hist(z_statistic,col="sky blue")
abline(v=1.907568,col="blue",lwd=2)

Question 2 -- [8 / 8 points] ✅
# 2) You will now use prop.test to complete the hypothesis test, by
# following the steps below.
# a) Use the prop.test function to conduct the hypothesis test without
# the continuity correction. Calculate the z test statistic from the
# prop.test object and extract the p-value (Note: these should match the
# test statistic and p-value from parts 1(b) and 1(c).
prop.test(1158,2226,p=0.5,alternative="two.sided",correct=FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: 1158 out of 2226, null probability 0.5
## X-squared = 3.6388, df = 1, p-value = 0.05645
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.4994447 0.5409169
## sample estimates:
## p
## 0.5202156
# b) Your p-value should be about 0.0564. Based on this p-value, state the
# conclusion regarding whether registered voters believe that
# they are better off today than they were 4 years ago, using a level
# of significance value of 0.05. Is there evidence that a majority agree
# or disagree with the statement?
# Since the p-value is greater than 5%, we fail to reject the null hypothesis.
# Therefore, there is no evidence that a majority agree or disagree with the statement.
Question 3 -- [10 / 16 points] ❌For (a), what is p the probability of? The null probability should be 1/6, which is the probability of guessing a die roll correctly by chance.
For (b), what is the z test statistic?
# 3) A person who claims to be a psychic says that she can correctly
# predict the outcome of a roll of a standard die (with numbers 1-6)
# more times than what would be expected by chance. When you roll a
# die 50 times, she correctly predicts the outcome 12 times.
# Use the prop.test function to complete (b) and (c).
# a) State the null and alternative hypothesis corresponding to this claim.
# Make sure to define the parameter (p) which should be used in your hypotheses.
# The alternative hypothesis should also be two-tailed.
# p = the proability to compare against
# H0: p is equal to 0.5
# HA: is not equal to 0.5
# b) Find the z test statistic
prop.test(12,50,p=0.5,alternative="two.sided",correct=FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: 12 out of 50, null probability 0.5
## X-squared = 13.52, df = 1, p-value = 0.000236
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.1429739 0.3741268
## sample estimates:
## p
## 0.24
# c) Find the p-value
prop.test(12,50,p=0.5,alternative="two.sided",correct=FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: 12 out of 50, null probability 0.5
## X-squared = 13.52, df = 1, p-value = 0.000236
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.1429739 0.3741268
## sample estimates:
## p
## 0.24
# d) The p-value should be around 0.2295. Using this p-value, state the conclusion
# regarding whether or not the person's ability to predict the outcome of the
# die is different than what would be expected by chance.
# Since the p-value is greater than 5 %, we fail to reject the null hypothesis.
# Thus, the person's ability to predict the outcome of the of the die is not
# different than would be expected by chance.
Question 4 -- [8 / 12 points] ❌You need to make sure you are finding the area in both tails. This is best accomplished by using: 2*pnorm(-abs(Z)).
# 4) Find the p-values associated with the following z test statistics, and
# state whether you would reject or fail to reject the null hypothesis
# at alpha = 0.05.
# a) z = 2.15
p_value=1-pnorm(2.15)
# Since the p-value is less than 5%, we reject the null hypothesis.
# b) z = -1.94
p_value=1-pnorm(-1.94)
p_value
## [1] 0.9738102
# Since the p-value is greater than 5 %, we fail to reject the null hypothesis.
# c) z = 1.05
p_value=pnorm(1.05)
p_value
## [1] 0.8531409
# Since the p-value is greater than 5 %, we fail to reject the null hypothesis.
# In May of 2020, a survey of 10,957 American adults found that 72% of them
# (7,889) would "definitely" or "probably" get a coronavirus vaccine if one
# were available. In September, a similar survey of 10,093 American
# adults found that only 51% (5,147) would get a vaccine.
# Ref: https://www.pewresearch.org/science/2020/09/17/u-s-public-now-divided-over-whether-to-get-covid-19-vaccine/
# For questions (5) and (6), you will carry out a hypothesis test to evaluate whether
# the proportion of American adults willing to get a vaccine has changed.
# Note that the null and alternative hypotheses are as follows:
# H0: pMay - pSept = 0
# H1: pMay - pSept != 0
# where pMay and pSept are the proportion of American adults, in May and September,
# who would "definitely" or "probably" get a coronavirus vaccine.
Question 5 -- [16 / 20 points] ❌For (c) you need to use common.prop instead of p1 and p2. Otherwise what you have is correct, though you didn't do (e).
# 5) First, carry out the hypothesis test manually by completing the steps below.
# a) Show that the difference between population proportions is about 0.210039
pMay=7889/10957
pSep=5147/10093
prop.diff=pMay-pSep
prop.diff
## [1] 0.210039
# b) Show that the estimate of the common population proportion
# (i.e., the value of p on slide 17) is about 0.6193.
y1=7889
y2=5147
n1=10957
n2=10093
common.prop=(y1+y2)/(n1+n2)
common.prop
## [1] 0.6192874
# c) Show that the standard deviation of the difference between the two
# proportions (i.e., the value of the denominator of Z on slide 17)
# is approximately 0.006699.
p1=pMay
p2=pSep
std.dev=sqrt((p1*(1-p1)/n1)+(p2*(1-p2)/n2))
std.dev
## [1] 0.006569563
# d) Calculate the z test statistic by dividing your answer to (a) by your answer
# to (c)
z_statistic=prop.diff/std.dev
z_statistic
## [1] 31.97152
# e) Find the z test statistic by carrying out the hypothesis test WITHOUT the
# continuity correction, and calculate the z test statistic from the prop.test
# object (this should match your answer to (d)
Question 6 -- [4 / 8 points] ❌Your conclusion is correct, but for
prop.test, you need to specify a vector of "successes" and a vector of sample sizes:
res <- prop.test(c(7889, 5147), c(10957, 10093))
# 6) Complete the hypothesis test by carrying out the steps below:
prop.test(7889,10957,p=pSep,alternative="two.sided",correct=FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: 7889 out of 10957, null probability pSep
## X-squared = 1934.3, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5099574
## 95 percent confidence interval:
## 0.7115132 0.7283253
## sample estimates:
## p
## 0.7199963
# a) To be a little more accurate, use prop.test to find the p-value WITH the
# continuity correction. What is the p-value?
prop.test(7889,10957,p=pSep,alternative="two.sided",correct=TRUE)
##
## 1-sample proportions test with continuity correction
##
## data: 7889 out of 10957, null probability pSep
## X-squared = 1933.5, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5099574
## 95 percent confidence interval:
## 0.7114672 0.7283705
## sample estimates:
## p
## 0.7199963
# b) The p-value should be very close to 0 (well below 0.05). Assuming that this
# is the case, state the conclusion regarding whether the proportion of
# Americans willing to get a coronavirus vaccine has changed between May and
# September of 2020.
# Since the p-value is less than 5 %, we reject the null hypothesis. Hence, the
# the proportion of Americans willing to get a coronavirus vaccine has changed between May and
# September of 2020.
Question 7 -- [8 / 8 points] ✅Great job!
# 7) For question (3), the null hypothesis is that the "psychic" does not do
# better or worse than random chance; the alternative hypothesis is that
# the psychic's predictive abilities are different than what is expected
# from random chance. Complete the following to specify what would it mean
# if a Type I or Type II error occured, in the context of the problem.
# A Type I error means that we conclude that the psychic's predictive
# abilities are different than what is expected
# from random chance. but in reality,the "psychic" does not do better or
# worse than random chance.
# A Type II error means that we conclude that the "psychic" does not do better or worse than random chance
# but in reality, the psychic's predictive abilities are different than what is expected
# from random chance.
Question 8 -- [8 / 8 points] ✅
# 8) For question (5), the null hypothesis is that there is no change in
# the proportion of American adults willing to get a coronavirus vaccine
# between May and September; the alternative hypothesis is that the
# proportion has changed. Complete the following to specify what would it mean
# if a Type I or Type II error occured, in the context of the problem.
# A Type I error means that we conclude that the proportion has changed
# but in reality, there is no change in the proportion of American adults willing to get a coronavirus vaccine
# between May and September.
# A Type II error means that we conclude that there is no change in
# the proportion of American adults willing to get a coronavirus vaccine
# between May and September
# but in reality, the proportion has changed.