logo Use CA10RAM to get 10%* Discount.
Order Nowlogo
(5/5)

The purpose of this checkpoint is to test your skills with reading CSV files into Data Frames and writing back to CSV.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Python homework

The purpose of this checkpoint is to test your skills with reading CSV files into Data Frames and writing back to CSV. This is a common task during the Data Preparation phase of the data mining life cycle which you will learn about later.

Complete the follow tasks in order by writing Python code:

  1. Download the dataset below: top_100_books.csv
    • This is a real dataset representing the top 100 best-selling books available on Amazon.com
    • It includes the following fields:
      • rank: number in the top 100 list
      • title: the name of the book
        • keep in mind that this is real data; please don't be offended by some of the book titles
      • author: the author who wrote the book
      • rating: the average rating of all verified customers who purchased the book
      • reviews: the number of customers who wrote a review (not the total number who purchased the book)
      • type: the book format; hardcover, paperback, mass market paperback, or board book
      • price: the listed price of the book on amazon.com
  1. Read this dataset into a Pandas DataFrame
  2. There was a clerical error that needs to be fixed: update the price of the book titled "Man's Search for Meaning" to $9.99
  3. Because there is only one case of type = "Mass Market Paperback", let's combine that book with "Paperback" by changing its type to "Paperback". This is a common data cleaning task when there are categorical values with very few instances.
  4. We suspect that, in general, customers rate books relative to the price they had to pay for it. In order to test this hypothesis (which we will learn how to do later), we must first create a new calculated field representing rating/price. Create this field, call it "ratingByPrice", and add it to the DataFrame as the last column
  5. Write your updated DataFrame to a new CSV file called "top_100_books_cleaned.csv"

For this checkpoint, you are going to create a function that will import any DataFrame and print out a report of the following information:

  • For each column, print:
    1. The column header name
    2. The number of unique values
    3. The total number of values

NOTE: These tasks can be performed using basic functions for DataFrames. See the table in Chapter 7.3 or find examples online to make this task easier.

Your output for the included dataset may look as simple as this:

Column, Nunique, Count
age: 47, 1338
sex: 2, 1338
bmi: 548, 1338
children: 6, 1338
smoker: 2, 1338
region: 4, 1338
charges: 1337, 1338

 

For this checkpoint, use the IMDB dataset to visualize and analyze data about movies from 2006 to 2016. You will use the pandas and matplotlib libraries to create four different charts.

  1. Create a bar chart that shows the top 10 highest grossing films.
    • The title for the graph should be “Top 10 Highest Grossing Films”
    • Use the colors ("red","green","blue","orange","pink")
    • The x label should be 'Film'
    • The y label should be 'Revenue (Millions)'
    • The x ticks should be the title of the movies
    • Make the rotation of the xticks vertical
    • The bar chart should look something like this:
(5/5)
Attachments:

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Um e HaniScience

858 Answers

Hire Me
expert
Muhammad Ali HaiderFinance

580 Answers

Hire Me
expert
Husnain SaeedComputer science

862 Answers

Hire Me
expert
Atharva PatilComputer science

825 Answers

Hire Me

Get Free Quote!

400 Experts Online