Total grade: 71/92 (77%)

########################################################
# Name: John Palomino
# CSC-315
# Lab #6: Hypothesis Tests -- Proportions 
########################################################
  
##########################################################################
# Add R code to the script below and create a Notebook to complete
# the steps and explicitly answer the following questions
##########################################################################

#    A CBS news poll that surveyed 2,226 registered voters towards the 
#    end of August 2020 found that 1,158 (52%) answered 'Yes' to the question 
#    "Are you better off today than you were 4 years ago". Let p.hat be the
#    proportion of all registered voters that would answer 'Yes' to that question.
#    Link: # https://www.cbsnews.com/news/republicans-economy-coronavirus-opinion-poll-cbs-news-battleground-tracker/
# 
#    Use the above information to answer questions 1 - 2

Question 1 -- [9 / 12 points] ❌

Your calculations are correct, but you need to use dnorm to generate the correct curve for the normal distributions

# 1) Complete the following questions to carry out a hypothesis test 'manually' using # the information from the CBS poll. # a) What is the mean and standard deviation of the distribution of p.hat # under the null hypothesis that p = 0.50. # GD: this gives error #mean=p.hat mean=1158/2226 p_0=0.5 n=2226 std.dev=sqrt(p_0*(1-p_0)/n) std.dev
## [1] 0.0105976
# b) Graph the distribution of p.hat and draw a vertical line at 
#    p.hat = 1158/2226, the proportion answering 'Yes' to the question.
set.seed(0.5)
p.hat=rnorm(50,1158/2226,0.0105976)
hist(p.hat,col="sky blue")
abline(v=1158/2226,col="blue",lwd=2)

# c) Calculate the z test statistic and graph its distribution under the 
#    null hypothesis as was done in class, drawing a vertical line at the 
#    z statistic. Find the p-value based on this test statistic. The 
#    z statistic should be around 1.907568 and the p-value should be about 
#    0.056557.
p.hat=1158/2226
p_0=0.5
z_statistic=(p.hat-p_0)/std.dev
z_statistic
## [1] 1.907568
p_value=2*(1-pnorm(z_statistic))
p_value
## [1] 0.05644713
z_statistic=rnorm(50,0,1)
hist(z_statistic,col="sky blue")
abline(v=1.907568,col="blue",lwd=2)

Question 2 -- [8 / 8 points] ✅

# 2) You will now use prop.test to complete the hypothesis test, by # following the steps below. # a) Use the prop.test function to conduct the hypothesis test without # the continuity correction. Calculate the z test statistic from the # prop.test object and extract the p-value (Note: these should match the # test statistic and p-value from parts 1(b) and 1(c). prop.test(1158,2226,p=0.5,alternative="two.sided",correct=FALSE)
## 
##  1-sample proportions test without continuity correction
## 
## data:  1158 out of 2226, null probability 0.5
## X-squared = 3.6388, df = 1, p-value = 0.05645
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.4994447 0.5409169
## sample estimates:
##         p 
## 0.5202156
# b) Your p-value should be about 0.0564. Based on this p-value, state the 
#    conclusion regarding whether registered voters believe that
#    they are better off today than they were 4 years ago, using a level
#    of significance value of 0.05. Is there evidence that a majority agree 
#    or disagree with the statement?
#    Since the p-value is greater than 5%, we fail to reject the null hypothesis. 
#    Therefore, there is no evidence that a majority agree or disagree with the statement.

Question 3 -- [10 / 16 points] ❌

For (a), what is p the probability of? The null probability should be 1/6, which is the probability of guessing a die roll correctly by chance. For (b), what is the z test statistic?

# 3) A person who claims to be a psychic says that she can correctly # predict the outcome of a roll of a standard die (with numbers 1-6) # more times than what would be expected by chance. When you roll a # die 50 times, she correctly predicts the outcome 12 times. # Use the prop.test function to complete (b) and (c). # a) State the null and alternative hypothesis corresponding to this claim. # Make sure to define the parameter (p) which should be used in your hypotheses. # The alternative hypothesis should also be two-tailed. # p = the proability to compare against # H0: p is equal to 0.5 # HA: is not equal to 0.5 # b) Find the z test statistic prop.test(12,50,p=0.5,alternative="two.sided",correct=FALSE)
## 
##  1-sample proportions test without continuity correction
## 
## data:  12 out of 50, null probability 0.5
## X-squared = 13.52, df = 1, p-value = 0.000236
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.1429739 0.3741268
## sample estimates:
##    p 
## 0.24
# c) Find the p-value
prop.test(12,50,p=0.5,alternative="two.sided",correct=FALSE)
## 
##  1-sample proportions test without continuity correction
## 
## data:  12 out of 50, null probability 0.5
## X-squared = 13.52, df = 1, p-value = 0.000236
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.1429739 0.3741268
## sample estimates:
##    p 
## 0.24
# d) The p-value should be around 0.2295. Using this p-value, state the conclusion
#    regarding whether or not the person's ability to predict the outcome of the 
#    die is different than what would be expected by chance.
#    Since the p-value is greater than 5 %, we fail to reject the null hypothesis.
#    Thus, the person's ability to predict the outcome of the of the die is not
#    different than would be expected by chance.
Question 4 -- [8 / 12 points] ❌

You need to make sure you are finding the area in both tails. This is best accomplished by using: 2*pnorm(-abs(Z)).

# 4) Find the p-values associated with the following z test statistics, and # state whether you would reject or fail to reject the null hypothesis # at alpha = 0.05. # a) z = 2.15 p_value=1-pnorm(2.15) # Since the p-value is less than 5%, we reject the null hypothesis. # b) z = -1.94 p_value=1-pnorm(-1.94) p_value
## [1] 0.9738102
#    Since the p-value is greater than 5 %, we fail to reject the null hypothesis.
# c)   z = 1.05
p_value=pnorm(1.05)
p_value
## [1] 0.8531409
#    Since the p-value is greater than 5 %, we fail to reject the null hypothesis.
#    In May of 2020, a survey of 10,957 American adults found that 72% of them
#    (7,889) would "definitely" or "probably" get a coronavirus vaccine if one 
#    were available. In September, a similar survey of 10,093 American
#    adults found that only 51% (5,147) would get a vaccine.

#    Ref: https://www.pewresearch.org/science/2020/09/17/u-s-public-now-divided-over-whether-to-get-covid-19-vaccine/

#    For questions (5) and (6), you will carry out a hypothesis test to evaluate whether
#    the proportion of American adults willing to get a vaccine has changed.
#    Note that the null and alternative hypotheses are as follows:

#    H0: pMay - pSept = 0
#    H1: pMay - pSept != 0

#   where pMay and pSept are the proportion of American adults, in May and September,
#   who would "definitely" or "probably" get a coronavirus vaccine.

Question 5 -- [16 / 20 points] ❌

For (c) you need to use common.prop instead of p1 and p2. Otherwise what you have is correct, though you didn't do (e).

# 5) First, carry out the hypothesis test manually by completing the steps below. # a) Show that the difference between population proportions is about 0.210039 pMay=7889/10957 pSep=5147/10093 prop.diff=pMay-pSep prop.diff
## [1] 0.210039
# b) Show that the estimate of the common population proportion 
#     (i.e., the value of p on slide 17) is about 0.6193.
y1=7889
y2=5147
n1=10957
n2=10093
common.prop=(y1+y2)/(n1+n2)
common.prop
## [1] 0.6192874
# c) Show that the standard deviation of the difference between the two 
#    proportions (i.e., the value of the denominator of Z on slide 17)
#    is approximately 0.006699.
p1=pMay
p2=pSep
std.dev=sqrt((p1*(1-p1)/n1)+(p2*(1-p2)/n2))
std.dev
## [1] 0.006569563
# d) Calculate the z test statistic by dividing your answer to (a) by your answer
#    to (c)
z_statistic=prop.diff/std.dev
z_statistic
## [1] 31.97152
# e) Find the z test statistic by carrying out the hypothesis test WITHOUT the
#    continuity correction, and calculate the z test statistic from the prop.test
#    object (this should match your answer to (d)

Question 6 -- [4 / 8 points] ❌

Your conclusion is correct, but for prop.test, you need to specify a vector of "successes" and a vector of sample sizes:
res <- prop.test(c(7889, 5147), c(10957, 10093))


# 6) Complete the hypothesis test by carrying out the steps below: prop.test(7889,10957,p=pSep,alternative="two.sided",correct=FALSE)
## 
##  1-sample proportions test without continuity correction
## 
## data:  7889 out of 10957, null probability pSep
## X-squared = 1934.3, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5099574
## 95 percent confidence interval:
##  0.7115132 0.7283253
## sample estimates:
##         p 
## 0.7199963
# a) To be a little more accurate, use prop.test to find the p-value WITH the
#    continuity correction. What is the p-value?
prop.test(7889,10957,p=pSep,alternative="two.sided",correct=TRUE)
## 
##  1-sample proportions test with continuity correction
## 
## data:  7889 out of 10957, null probability pSep
## X-squared = 1933.5, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5099574
## 95 percent confidence interval:
##  0.7114672 0.7283705
## sample estimates:
##         p 
## 0.7199963
# b) The p-value should be very close to 0 (well below 0.05). Assuming that this 
#    is the case, state the conclusion regarding whether the proportion of 
#    Americans willing to get a coronavirus vaccine has changed between May and 
#    September of 2020.
#    Since the p-value is less than 5 %, we reject the null hypothesis. Hence, the 
#    the proportion of Americans willing to get a coronavirus vaccine has changed between May and 
#    September of 2020.
Question 7 -- [8 / 8 points] ✅

Great job!

# 7) For question (3), the null hypothesis is that the "psychic" does not do # better or worse than random chance; the alternative hypothesis is that # the psychic's predictive abilities are different than what is expected # from random chance. Complete the following to specify what would it mean # if a Type I or Type II error occured, in the context of the problem. # A Type I error means that we conclude that the psychic's predictive # abilities are different than what is expected # from random chance. but in reality,the "psychic" does not do better or # worse than random chance. # A Type II error means that we conclude that the "psychic" does not do better or worse than random chance # but in reality, the psychic's predictive abilities are different than what is expected # from random chance.
Question 8 -- [8 / 8 points] ✅

# 8) For question (5), the null hypothesis is that there is no change in # the proportion of American adults willing to get a coronavirus vaccine # between May and September; the alternative hypothesis is that the # proportion has changed. Complete the following to specify what would it mean # if a Type I or Type II error occured, in the context of the problem. # A Type I error means that we conclude that the proportion has changed # but in reality, there is no change in the proportion of American adults willing to get a coronavirus vaccine # between May and September. # A Type II error means that we conclude that there is no change in # the proportion of American adults willing to get a coronavirus vaccine # between May and September # but in reality, the proportion has changed.