Overview: The goal of this mini project is for you to complete an exploratory data analysis by applying the concepts and tools you learned during the first month of class. We will soon be progressing to statistics and statistical machine learning, but the first step in working with data should always be EDA, as it can be a window offering powerful insights into your data.
1: Hate Crimes
The hate crimes dataset is compiled per state, and describes the prevalence of hate crimes. The data’s features include: - state: state name - me- dian_household_income: median household income - share_unemployed_seasonal: share of the population that is unemployed, seasonally adjusted - share_population_in_metro_areas: share of the population that lives in metropolitan areas - share_population_with_high_school_degree: share of adults 25 and older with a high-school degree - share_non_citizen: share of the population that are not U.S. citizens - share_white_poverty: share of white residents who are living in poverty - gini_index: Gini index - share_non_white: share of the population that is not white - share_voters_voted_trump: Share of 2016 U.S. presidential voters who voted for Donald Trump - hate_crimes_per_100k_splc: Hate crimes per 100,000 population, Southern Poverty Law Center, Nov. 9-18, 2016 - avg_hatecrimes_per_100k_fbi: Average annual hate crimes per 100,000 population, FBI, 2010-2015
2. Bike sharing
The bike sharing dataset list individual bike rides in the New York City Citibike system. The features include: - tripduration: length of trip (seconds) - starttime: start time and date - stoptime: end time and date - start station id: id of the start station - start station name: name of the start station - start station latitude: latitude of the start station - start station longitude: longitude of the start station - end station id: if of the end station - end station name: name of the end station - end station latitude: latitude of the end station - end station longitude: longitude of the end station - bikeid: the id of the bike used - usertype: type of rider, (Customer = 24-hour pass or 3-day pass user; Subscriber = Annual Member) - birth year: year of birth of rider - gender: gender of rider,
(Zero=unknown; 1=male; 2=female)
This data was already processed to remove trips taken by staff as they service and inspect the system, trips that are taken to/from “test” stations, and any trips that were below 60 seconds in length (potentially false starts or users trying to re-dock a bike to ensure it’s secure).
Exploratory Data Analysis
During your recent studio and homework assignments, you performed cursory explorations of several reasonably large data sets. Your goal in this mini-project is to spend some time doing a more in depth exploration of a dataset of your choice.
You should feel free to use all the R tools at your disposal so far to try to uncover an interesting story in the data, and to create informative visualizations. You should then write a short summary of your discoveries in R Markdown, suppressing code for, but displaying relevant visualizations. Your aim in this write up is to convince us that your conclusions are valid.
This mini-project is a pair programming project because collaboration will give you a chance to vet your ideas. Your partner can find holes in your thinking, and likewise you in theirs. We also encourage you to visit the course staff during their office hours, so you can run your ideas by them as well for further (constructive) criticism.
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of