Task 1. Conceptual modeling of a data warehouse (3 marks)
Consider a sample data warehouse domain presented below.
A large network of hotels would like to create a data warehouse to store information about their hotels located in the different cities of different countries, hotel guests visiting the rooms in hotels, staff working at the hotels. The management of the network would like to store the following information in the data warehouse.
Each hotel is described by its location (country, city, building number), email address and link to a Web page. A hotel offers the rooms to its customers. A room has a unique number within a hotel. A room number consists of a floor number and a unique number at a floor. For example, room 25 at 5th floor has a number 0525.
A hotel employs staff members. A staff member has a unique employee number, first name, last name, and date of birth. Staff members belong to either administration group or maintenance group. Among the other duties, administration staff members are allowed to perform check-in and check-out of hotel guests. Maintenance staff members perform the maintenance works in the rooms occupied by hotel guests.
Hotel guests stay in hotel rooms of different types. On check-in day a start date of a visit is recorded and on check-out day an end date of a visit is recorded. Besides check-in and check- out dates, the data warehouse must contain data about the room number and the amount of money of each visit for each guest. The data warehouse must also contain data about the total number of facilities used by hotel guests and the total number of maintenances performed in a room during a visit.
A hotel guest is described by a number of identification documents, first name, last name, date of birth and nationality. A hotel guest uses a credit card to pay for his/her stay in a hotel. A credit card number and a name of bank that issued a card is recorded.
A management of hotel network would like to get from a data warehouse information about:
The total number of visits per hotel and per
The average number of maintenance per room type, per hotel and per
The total amount of money paid by guests per hotel and per
The total number of facilities used per hotel and per
The average length (in days) of each visit per
Create a conceptual schema of a data warehouse use a graphical notation with UMLetlet
The software tool UMLetlet 14.2 can be downloaded from Moodle. You should use the "Conceptual modelling" notation in the software tool.
Present OLAP operations for queries (a) to (e) above. The OLAP operations should use algebraic operators that are introduced in lecture notes in Week 4 and Week 5. (Hint: You can use a “Rollup*” operator in the above queries and for (e) you can also define a new measure for “length of stay” by using an “Addmeasure” )
A file solution1.pdf with solutions to questions (1) and (2) above.
Task 2. Logical modelling and implementation of external tables (3 marks)
Consider the conceptual schema of a data warehouse in an attached file named “conceptual_model_for_task2.bmp”, which is in the assignment resources folder on Moodle.
Perform a logical design to transform the given conceptual schema into a star schema. Use UMLetlet 14.2 diagram software tool and apply a "Logical modelling" notation to draw the logical (star)
Based on the previous step, for each table that you defined in the logical schema, implement an external table in Hive. Each table must contain at least three (Note: you need to create the table source files and upload them to HDFS.)
After the external tables are created, use DESCRIBE and SELECT statements in HQL to describe the tables and list the first 3 rows from each table,
Note. In a star schema, all level tables must be flat (i.e., denormalized)
A file solution2.pdf which contains:
A drawing of a logical schema for question (1) and
All of your inputs and outputs for questions (2) and (3) in Zeppelin or Terminal in the BigDataVM virtual
Task 3. Data manipulation and queries in Hive (2 marks)
The objective of this task is to re-create two relational tables taken from TPC-W benchmark database as external Hive tables and to implement SELECT statements that retrieve information from the tables.
The SQL script file dbcreate.sql contains SQL CREATE TABLE statements to create the relational tables in Oracle-based implementation of TPC-W benchmark. Data files item.tbl and author.tbl contain sample synthetic data.
Use HQL to implement the following operations:
Create two external Hive tables ITEM and AUTHOR and load the data stored in the files tbl and author.tbl into two tables. (You can use statements in the SQL script dbcreate.sql.)
Retrieves the following information from the Hive
Find the total number of rows in the table ITEM and the total number of rows in the table AUTHOR.
Find the titles of items (I_TITLE) that have largest costs (I_COST).
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of