Help me with the remaining parts of my assignment. I need to make functions to analyze the file but am at a loss of how to do this.
HW 8 (Capstone) – File Inspector
Background
• While all the original applications for computers back in the 1950’s and many, if not most, applications today are
mathematical, there is a large field of computing that deals with text processing. These range from editing and
word processing systems to typesetting to literary analysis. Having some exposure to this type of problems, and
some experience in solving them, is invaluable in preparation for a successful career in software development.
Assignment
• Draw a flowchart, using LucidChart, to document the logic required to complete this program. Save the
flowchart as a PDF file.
• Develop a C++ program, following the logic in your flowchart, to catalog certain facts about the contents of a
text file, including the longest and shortest words, total number of words, and the number of times each word in
a selected group occurs within the file.
• Create the following data storage variables:
o A set of parallel collections (vectors or arrays), each with 50 rows, as follows:
▪ A collection of strings to hold the selected words.
▪ A collection of whole number values for word counts
▪ A collection of floating point values for word frequencies.
o Two strings to hold the shortest and longest words found in the data file.
o A whole number variable to hold the Total Word Count.
• Initialize all counters to zero.
• Read the words from a file named “50Words.txt” into the collection of strings. Clear the collection of word
counts and word frequencies to all zeroes.
• Ask the user to select one of the instructor-provided data files using a menu. Give the user an unlimited number
of attempts to enter a valid selection. Do not proceed until a valid selection has been entered. Do not have the
user enter the filename – have them select a file by number or letter instead.
• Reset all the word counts and word frequencies to zero.
• Open and read the user-selected data file. Examine each word as it is read as follows:
o For the purposes of this assignment, a “word” is any sequence of printable (non-whitespace) characters,
including letters, decimal digits, and punctuation. Words are separated from each other by whitespaces
(spaces, tabs, and newline characters).
o Remove leading and trailing punctuation. All internal punctuation, including hyphens, apostrophes,
periods, etc. must remain in the word. Use the ispunct function in the cctype library to identify
punctuation characters. Continue with this word only if its length is still greater than zero.
o Convert all remaining characters in the word to lower-case.
o Increment the Total Word Count by one.
o If necessary, update the Shortest Word string.
o If necessary, update the Longest Word string.
o If the word matches any word in the collection of strings, increment the corresponding occurrence
count in the collection of word counts.
• After all words have been read from the data file, calculate the frequency of occurrence of each of the
50PopularWord’s by dividing each word’s occurrence count by the Total Word Count, and save the calculated
value in the corresponding element of the collection of word frequencies.
• Display the name of the data file, the shortest word, longest word, and total word count.
• Display the contents of the collection of 50PopularWord’s, their counts, and their frequencies of occurrence in
neat columns. Format the word frequencies with 4 decimal places, right-justified in an 8-characters field.
• Repeat this entire process from asking the user to select a data file to displaying results until the user selects the
“Exit” option from the file menu.
Style
• Document your program according to the guidelines presented in class.
Bonus
• Declare a collection of at least 50 numbers. Use each to accumulate a count of all the words of that length in the
selected data file. For instance, array[5] should hold a count of all words that had a length of exactly 5 characters
(after leading and trailing punctuation were removed). After all other output is complete, display the contents of
this collection, formatted as “N words of length M”, substituting the count for N and the collection index for M.
Do not display word lengths for which the count is zero.
Objectives
• Develop skill in writing and editing C++ code with Visual Studio
• Develop skill in building and running C++ programs with Visual Studio
• Develop skill in reading data values from the console
• Develop skill in formatting data to the console
• Develop skill in designing, implementing, and testing combinations of decision-making and looping structures.
• Gain experience in dealing with parallel arrays and vectors.
• Gain experience in processing strings as words.
• Gain experience in processing strings as sequences of characters.
• Gain experience in reading and writing text files.
• Understand the consequences of errors in program logic.
Reference
• This Wikipedia article shows the comparative ranking of English words:
http://en.wikipedia.org/wiki/Most_common_words_in_English
Data Files:
• 50Words.txt
• HW8DataFiles.zip – contains the following files:
o A Short History of the world.txt (742KB)
o Apology.txt (Plato, 105KB)
o CallOfTheWild.txt (195KB)
o LegendOfSleepyHollow.txt (89KB)
o Leviathan.txt (Thomas Hobbes, 1.2MB)
o ModestProposal.txt (Jonathan Swift, 39KB)
o OccurrenceAtOwlCreek.txt – (Ambrose Bierce, 41KB)
o SherlockHolmes.txt (Sir Arthur Conan Doyle, 581KB)
o TaleOfTwoCities.txt (775KB)
o TomSawyer.txt (Mark Twain, 407KB)
o WarOfTheWorlds.txt (356KB)
o WizardOfOz.txt (L. Frank Baum, 228KB)
Turn in a single zip file containing your flowchart and a single C++ source code file. Name the zip file “First_Last_HW8”,
where “First Last” is replaced with your First and Last names.
Rubric
Issue Poss. Earned
FLOWCHART 0 0
Flowchart uses only standard flowchart symbols 3 3
Flowchart includes explicit Start and Stop blocks 3 3
All blocks in the flowchart are connected with arrowed lines 3 3
Only Decision blocks have multiple outputs 3 3
The only block with no exit arrows is the Stop block 3 3
SOURCE CODE 0
Source code is readable and neatly organized 3 3
Source code uses standard C++ input and output statements 3 3
Source code uses standard C++ assignment statements 3 3
Source code matches logic in the flowchart 3 3
FUNCTIONALITY – TESTED IN MICROSOFT VISUAL STUDIO 0 0
Program reads the “50Words.txt” file only one time to initialize the string
collection
8 8
Program asks the user to pick a data file from a menu and does not proceed until a
valid selection is made
8 8
Program reads the selected data file, analyzing it word-by-word 10 10
Program removes leading and trailing punctuation from each word 8 8
Program converts each word to lowercase 8 8
Program displays the total word count 8 8
Program displays the longest and shortest words in the data file 8 8
Program displays all words from the 50Words.txt file, their counts, and their
frequencies of occurrence
15 15
Program allows the user to select another file after each one is analyzed 5 5
BONUS FUNCTIONALITY 0 0
Program counts the number of words of each length, from 1 character up to at
least 40 characters
20 0
Program displays all counts, including those with zero occurrences -5 0
DEBITS – CODING COMPONENTS & TECHNIQUES 0 0
Flowchart not submitted -50 0
Flowchart not submitted as a pdf document (.pdf) -15 0
Source code not submitted -50 0
Each dead-end block in the flowchart -5 0
Source code not submitted as a C++ source code file (.cpp) -15 0
Assignment submitted to Canvas in separate files instead of a single zip file -15 0
Assignment submitted late -15 0
Program compiles with warnings (will run) -10 0
Program compiles with errors (will not run) -35 0
Program runs with errors or crashes -35 0
Program uses C-style input and output (scanf and printf) -10 0
Program does not offer the user all of the available data files in a menu -10 0
Program requires the user to select a file by entering its filename -10 0
Program does not read entire data file -5 0
Program removes all punctuation from each word (not just leading and trailing) -5 0
Incorrect total word count (+/- 10%) -5 0
Incorrect longest or shortest word -5 0
Program does not use collections to store selected words, counts, and frequencies -10 0
Incorrect counts for Selected Words (+/- 10%) -5 0
Incorrect frequency of occurrence for Selected Words (+/- 10%) -5 0
Program does not reset word counts and word frequencies for each data file -10 0
DEBITS – CODING STYLE AND DOCUMENTATION 0 0
No descriptive names for variables -5 0
Inconsistent indentation -5 0
Excessive blank lines -5 0
Using magic numbers -5 0
Using C-strings instead of string data types -20 0
Using goto statements (each instance) -20 0
Using global variables -20 0
No documentary comments -5 0
No else clause for each if statement -5 0
No default case for each switch statement -5 0
Total 105 105
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of