Total grade: 54/66 (82%)

####################################################################
# Name: John Palomino
# Lab 7: Hypothesis testing for population means.

# For these questions, we will assume that the Central 
# Limit Theorem applies (i.e., that the populations are normally 
# distributed or that 'n' is sufficiently large), and that the 
# samples are representative of the population of interest. 

# Turn in a Notebook that answers the questions below
####################################################################

library(readr)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Read in our survey data
survey <- read_csv('https://gdancik.github.io/CSC-315/data/datasets/csc315_survey_fall_2021.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   `HS GPA` = col_double(),
##   `College GPA` = col_double(),
##   CatOrDogPerson = col_character(),
##   Alcohol = col_double(),
##   Sleep = col_double(),
##   FavoriteMeal = col_character(),
##   Streaming = col_character(),
##   Mobile = col_character(),
##   FavoriteSeason = col_character(),
##   Writing = col_character(),
##   FruitsOrVeggies = col_character(),
##   LaurelOrYany = col_character()
## )
##########################################################################
Question 1 -- [16 / 16 points] ✅

Great job!

# 1) Assume that the mean amount of sleep an adult gets is 8 hours # per night. Is there evidence that students in CSC 315 # get a different amount of sleep? (Here we assume that our # survey results are representative of all CSC 315 students). # (a) State the null and alternative hypotheses (done for you): # H0: mu_sleep = 8 # HA: mu_sleep != 8 # where mu_sleep is the mean amount of sleep a CSC 315 student gets # per night. # (b) Calculate / find the test statistic (and specify the degrees of freedom) x=survey$Sleep n=length(x) mu_sleep=mean(x) sigma_sleep=sd(x) t_stat <-(mu_sleep-8)/(sigma_sleep/sqrt(n)) t_stat
## [1] -6.02186
# (c) Find the p-value using the t.test function
test<-t.test(x, y= NULL,alternative = "two.sided",mu=8,conf.level = 0.95)
test
## 
##  One Sample t-test
## 
## data:  x
## t = -6.0219, df = 17, p-value = 1.374e-05
## alternative hypothesis: true mean is not equal to 8
## 95 percent confidence interval:
##  6.124501 7.097721
## sample estimates:
## mean of x 
##  6.611111
test$p.value
## [1] 1.373551e-05
# (d) Find the p-value 'manually' based on the test statistic and
#     appropriate degrees of freedom 
pvalue=2*pt(t_stat,df=(n-1))
pvalue
## [1] 1.373551e-05
# (e) State the conclusion regarding the null and alternative hypotheses in 
#     the context of this problem.
# Since the p-value is less than 5 %, we reject the null hypothesis that 
# the mean amount of sleep an adult gets is 8 hours per night. Hence, there is 
# evidence that students in CSC 315 get a different amount of sleep.
##########################################################################

##########################################################################
Question 2 -- [17 / 24 points] ❌

Make sure to use ggplot for your plots. For (e) you should get the test statistic from the test1 object. For a Type I error, you need to state both the conclusion and the reality. Your conclusion (that the amounts of sleep differ) is correct, but to be a Type I error the reality would be that there is no difference in hours of sleep.

# 2) Is there evidence that the amount of sleep that a 'cat' person gets # differs from that of a 'dog' person in CSC 315? # (a) State the null and alternative hypotheses (done for you): # H0: mu_cat - mu_dog = 0 # HA: mu_cat - mu_dog != 0, # where mu_cat and mu_dog are the mean hours of sleep for students preferring # cats and dogs, respectively # (b) Create side-by-side boxplots (using ggplot) showing hours of sleep for 'cat' # and 'dog' people. Make sure to label the y-axis and give the chart a title. boxplot(Sleep~CatOrDogPerson, data=survey,col=c("sky blue","red"))

# (c) We will now formally test the hypotheses that mean amount of sleep
#     is different between 'cat' and 'dog' people. The command 
#     t.test(x,y) will perform a two-sample t-test for the null 
#     hypothesis that the 'x' and 'y' populations have the same mean, where 'x' 
#     is a vector of observations from the first population and 'y' is a vector of
#     observations from the second population.
#     Use the t.test function to find the test statistic and the 
#     corresponding degrees of freedom. Note that in your call to t.test,
#     'x' is a vector of hours of sleep for 'cat' people and 'y' is a 
#     vector of hours of sleep for 'dog' people, which you can get using the 'split'
#     function.
test1=t.test(Sleep~CatOrDogPerson, data=survey)
test1
## 
##  Welch Two Sample t-test
## 
## data:  Sleep by CatOrDogPerson
## t = -1.7925, df = 15.792, p-value = 0.09223
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.6985960  0.1430404
## sample estimates:
## mean in group Cat mean in group Dog 
##          6.222222          7.000000
# (d) Find the p-value from the result of the t.test function
test1$p.value
## [1] 0.09222587
# (e) Find the p-value 'manually' based on the test statistic and
#     appropriate degrees of freedom from the t.test result (which 
#     is stored in the $parameter object)
pvalue1=2*pt(-1.7925,df=15.792)
pvalue1
## [1] 0.09222862
# (f) State the conclusion regarding the null and alternative hypotheses in 
#     the context of this problem.
# We fail to reject the null hypothesis because the p-value is greater than 5 %.
# Thus, there is no evidence that the amount of sleep that a 'cat' person gets
# differs from that of a 'dog' person in CSC 315

# (g) What would it mean in the context of this problem if a Type I 
#     error occurred?
# In the context of this problem, a Type I error means concluding that there is evidence 
# that the amount of sleep that a 'cat' person gets differs from that of 
# a 'dog' person in CSC 315.

##########################################################################

##########################################################################
Question 3 -- [7 / 10 points] ❌

For (a) and (c) you need to multiply by 2 to get the area in both tails. The degrees of freedom should also be n - 1, and not n.

# 3) Find the p-values associated with the following t test # statistics, for a one-sample t-test, and state whether you would reject # or fail to reject the null hypothesis at alpha = 0.05: # (a) t = 2.78, n = 45 pvalue_a=1-pt(2.78,df=45) pvalue_a
## [1] 0.0039536
# Reject the null hypothesis

# (b) t = -3.3, n = 51
pvalue_b=2*pt(-3.3,df=51)
pvalue_b
## [1] 0.00176857
# Reject the null hypothesis

# (c) t = 1.11, n = 100
pvalue_c=pt(1.11,df=100)
pvalue_c
## [1] 0.8651695
# Fail to reject the null hypothesis

##########################################################################

# use the cereal data to complete the last question
cereal <- read.delim("http://pastebin.com/raw/0G6DrHyC")


#################################################################################
Question 4 -- [14 / 16 points] ❗

Your subset function call is not correctly filtering. Actually, the filtering is not necessary since it is already done at the start of the problem. You could use cereal instead of cereal1 in your call to t.test. Everything else is correct.

# 4) The 'sugars' column contains the sugar content (in grams), while # the 'shelf' column contains the shelf in which the cereal is # shelved on, with 1 = lower shelf, 2 = middle shelf (which is at # eye level for children), and 3 = top shelf. The code below constructs # a boxplot comparing sugar content across only the lower and middle # shelves ################################################################################# # remove data from the top shelf (Note: make sure dplyr is loaded # before running the next statement) cereal <- filter(cereal, shelf != 3) # change shelf column to factor cereal$shelf <- factor(cereal$shelf) levels(cereal$shelf) <- c("lower", "middle") # generate boxplot ggplot(cereal) + geom_boxplot(aes(shelf, sugars, fill = shelf)) + theme_classic() + theme(legend.position = "none") + ggtitle("Sugar content in cereals by shelf level") + labs(x = "shelf level", y = "sugar content (grams)")
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).

# Now let's formally test whether mean sugar content differs
# between the lower shelf and the middle shelf.

# (a) State the null and alternative hypotheses, making sure
#     to define the 'mu' parameters
#   H0: mu_lower - mu_middle = 0
#   HA: mu_lower - mu_middle != 0,

#   where mu_lower and mu_middle are the mean sugar contents for lower and 
#   middle shelves, respectively

# (b) Use the t.test function to find the test statistic 
#     and corresponding degrees of freedom
cereal1=subset(cereal,cereal$shelf==c("lower","middle"))
## Warning in `==.default`(cereal$shelf, c("lower", "middle")): longer object
## length is not a multiple of shorter object length
## Warning in is.na(e1) | is.na(e2): longer object length is not a multiple of
## shorter object length
test2=t.test(sugars~shelf, data=cereal1)
test2
## 
##  Welch Two Sample t-test
## 
## data:  sugars by shelf
## t = -1.6962, df = 17.856, p-value = 0.1072
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.0615071  0.8615071
## sample estimates:
##  mean in group lower mean in group middle 
##                  5.9                  9.5
# (c) Find the p-value
test2$p.value
## [1] 0.1072084
# (d) State the conclusion regarding the null and alternative hypotheses in 
#     the context of this problem.
# We fail to reject the null hypothesis since the p-value is greater than 5 %. 
# Therefore, the mean sugar content does not differ between the lower shelf 
# and the middle shelf
#################################################################################