- Exercise 3
Write a program IdentyfyWordsMain that reads a text file(like HistoryOfProgramming)
and divide the text into a sequence of words (word = sequence of letters). All non-letters (except whitespace)
should be removed. Save the result
in a new file (words.txt). Example:
Text
====
Computer programming, History of programming
From Wikipedia, the free encyclopedia (081110)
The earliest known programmable machine (that is a machine whose
behavior can be controlled by changes to a
"program") was Al-Jazari's programmable humanoid robot in 1206.
Sequence of words
=================
Computer programming History of programming
From Wikipedia the free encyclopedia
The earliest known programmable machine that is a machine whose
behavior can be controlled by changes to a
program was Al Jazaris programmable humanoid robot in
All exceptions related to file handling shall be handled within the program.
- Exercise 4
Create a class Word, representing a word. Two words should be considered equal if they
consist of the same sequence of letters and we consider upper case and lower case as
equal. For example hello, Hello and HELLO are considered to be equal. The methods
equals and hashCode define the meaning of "equality". Thus, the class Word should
look like the following.
public class Word implements Comparable<Word> {
private String word;
public Word(String str) { ... }
public String toString() { return word; }
/* Override Object methods */
public int hashCode() { ... compute a hash value for word }
public boolean equals(Object other) { ... true if two words are equal }
/* Implement Comparable */
public int compareTo(Word w) { ... compares two words lexicographically }
}
Note:
- If you want, you can add more methods. The methods mentioned above are the
minimum requirement.
- Exercise 5 and onward is based on Exercise 4. Thus, carefully test all methods before
proceeding.
- Exercise 5
Create a program WordCount1Main doing the following:
For each word in the file word.txt
- Create an object of the class Word
- Add the object to a set of the type java.util.HashSet
- Add the object to a set of the type java.util.TreeSet
Note:
- The size of the sets should correspond to the number of different words in the files. (Our
tests gave 350 words for the file HistoryOfProgramming)
- An iteration over the words in the TreeSet should give the words in alphabetical
order.
- Since our defintion of a word is not very precise (similar to the WarAndPeace exercise
in Assignment 2), we do not expect all of you to end up with exactly 350 words. But it should be rather close.
- Exercise 6
Given the following interface
public interface WordSet extends Iterable {
public void add(Word word); // Add word if not already added
public boolean contains(Word word); // Return true if word contained
public int size(); // Return current set size
public String toString(); // Print contained words
}
Implement the interface using a) Hashing, b) Binary Search Tree.
In the case of hashing, a rehash shall be performed when the number of
inserted elements equals the number of buckets. For the binary search tree, the
elements shall be sorted using the method compareTo. The names of the two
implementations shall be HashWordSet and TreeWordSet.
Note: You are not allowed to use any predefined collection classes from the Java
library. However, you are allowed to use arrays.
- Exercise 7
Repeat Exercise 5 with the new implementations HashWordSet and TreeWordSet.
The program shall be called WordCount2Main. The two notes of Exercise 5
should still be valid.