KAIST
EE 209: Programming Structures for EE

Assignment 2: A Regular Expression Module

Purpose

The purpose of this assignment is to help you understand regular expression and how we implement it in C. C. It also will give you more opportunity to use the GNU/Unix programming tools, especially bash, emacs, gcc209, gdb.

Rules

Study the implementation of the regular expression by Rob Pike and Brighan Kernighan. Implementing ?, +, \x, is "on your own". That is, you must not consult with the course instructor, lab TAs, Moodle, etc., except perhaps to clarify requirements. That part is worth 20% of this assignment.

Background

Regular expressions describe patterns of text. They consist of literal chracters such as 'a', 'I', '9', and "wild cards" which have special meanings such as matching "multiple occurence of some character", "any one character", etc. For example, KA*IST matches any text that starts with K, followed by zero or more occurrence of 'A', and ends with IST. KIST and KAAAIST match it while KA%IST doesn't. * is a wild card that matches zero or more occurrences of the previous character and K,A,I,S,T are literals.

The list of wild cards for this assigment is as follows:

c  matches any literal character 'c' unless 'c' is a wild card character
^  matches the beginning of the input string
$  matches the end of the input string
.  matches any one character
?  matches zero or one occurrence of the previous character
*  matches zero or more occurrenece of the previous character
+  matches one or more occurrence of the previous character
\x matches the character, 'x' if 'x' is a wild card ('^', '$', '.', '?', '*', '+', '\') or 'x' can one of the characters below:
   \d matches any decimal digit (0,1,2,3,4,5,6,7,8,9)
   \D matches any character that is not a decimal digit
   \s matches any whitespace character. Whitespace characters are: space, form-feed ('\f'), newline ('\n'),
      carriage return ('\r'), horizontal tab ('\t'), and vertical tab ('\v'). 
   \S matches any character that is not a whiltespace character.



 Regular Expression Examples 




Regular Expression


Match

Not match




hello

hello, Chello, hello EE209


hallo




^hello$

hello

hello EE209, Chello




KA..T

KAIST, KA33T

KIST




(\s+cool EE2\d\d)
( cool EE219), (    cool EE209)
(cool EE222)






Your Task

Your task is to write a C program named mygrep that performs the regular expression matching as below.



 Functionality 

Your program should take the regular expression as the first parameter
and match it against the text either from standout input stream or a
list of files.

  mygrep 'regexp' somefile or
  mygrep 'regexp' file1 ...


mygrep should read one line at a time and prints out the line if the regular expression matches it. The output should look like

filename:line1
filename:line2
...
Total # of matches: 7


If standard input is used, use "stdin" as the file name. For example,

$ ./mygrep 'argv' mygrep.c
mygrep.c:main(int argc, char *argv[])
mygrep.c:    nmatch = grep(argv[1], stdin, NULL);
mygrep.c:      f = fopen(argv[i], "r");
mygrep.c:	fprintf(stderr, "can't open %s:", argv[i]);
mygrep.c:      nmatch += grep(argv[1], f, argv[i]);
Total # of matching lines: 5


$ ./mygrep 'argv' < mygrep.c
stdin:main(int argc, char *argv[])
stdin:    nmatch = grep(argv[1], stdin, NULL);
stdin:      f = fopen(argv[i], "r");
stdin:	fprintf(stderr, "can't open %s:", argv[i]);
stdin:      nmatch += grep(argv[1], f, argv[i]);
Total # of matching lines: 5




Logistics

Develop on lab machines, using emacs to create source
code and gcc209 to compile.

A skeleton C file is available
here: mygrep.c It implements the basic file
I/O, so all you need is to fill out the rest of the functionalities.
Feel free to use this file as your starting point (of course, you
don't have to use it), and change any part in the code if you'd like.



You can assume each input line is less than or equal to 1024
bytes. That is, it is OK if your program behaves incorrectly if some
input line is larger than 1024 bytes, but your program should not
crash. (grep() in the skeleton file already guards against this condition)



Your program should produce an error if the regular expression in the
wrong format. For example, '\abc' is wrong regular expression since
'\a' is not allowed. In that case, the execution should stop with an
error message, "wrong regular expression format:\abc". 




We provide reference program (mygrep) and some test cases at here.
Download all files and enable the execution bit by



chmod u+x samplemygrep 

chmod u+x test_regex.sh 




Assuming your binary is 'mygrep', you can run the test cases with your program by



./test_regex.sh ./mygrep



Feel free to modify the test scripts and files. Compare the difference from samplemygrep and your mygrep.




Create a readme text file that contains:


Your name and your student ID

A description of whatever help (if any) you received from others while doing the assignment, and the names of any individuals with whom you collaborated, as prescribed by the course "Policy" web page.

(Optionally) An indication of how much time you spent doing the assignment.

(Optionally) Your assessment of the assignment.

(Optionally) Any information that will help us to grade your work in the most favorable light. In particular you should describe all known bugs.


Submit your work electronically on Moodle via following commands:

mkdir 20091234_assign2
mv readme mygrep.c 20091234_assign2
tar zcf 20091234_assign2.tar.gz 20091234_assign2




Grading

We will grade your work on quality from the user's point of view
and from the programmer's point of view. From the user's point of
view, your module has quality if it behaves as it should. The correct
behavior of the mygrep is defined by the previous
sections of this assignment specification and by the
given samplemygrep program.

From the programmer's point of view, your program has quality if it is
well styled and thereby simple to maintain. See the specifications of
the previous assignment for guidelines concerning style. Specifically,
function modularity will a prominent part of your grade. To encourage
good coding practices, we will deduct points if gcc209
generates warning messages.

Regular Expression	Match	Not match
`hello`	`hello`, `Chello`, `hello EE209`	`hallo`
`^hello$`	`hello`	`hello EE209`, `Chello`
`KA..T`	`KAIST`, `KA33T`	`KIST`
`(\s+cool EE2\d\d)`	`( cool EE219)`, `( cool EE209)`	`(cool EE222)`

KAIST EE 209: Programming Structures for EE

Assignment 2: A Regular Expression Module

Purpose

Rules

Background

Regular Expression Examples

Your Task

Functionality

Logistics

Grading

KAIST
EE 209: Programming Structures for EE