Training Exercises: advanced level

So far, we have considered the problems with one part only. In this section of the manual, we will learn how to rate problems with multiple parts. Each part of the problem usually asks a different question related to the same expression. For example, the first part may ask students to calculate the first derivative of a certain expression, and the second part may ask for a second derivative. Since we are interested in formulaic questions and ignore numerical ones, a rater would normally ignore those parts of the problem that represent numerical questions. However, ignoring numerical part(s) has some subtleties. For example, consider the problem below:

In this problem, the first part is formulaic while the second part asks for an answer in numerical format. The rater's responsibility is to catch when a student switches between problem parts. If this is not done in time, the rater may erroneously assign many Resubmission labels to formulaic answers since they are automatically reproduced even if a student works on a second numerical part. A good practice is whenever there is a Resubmission in one part, to check the other parts for changes. As soon as changes are detected, there are two possibilities:

1. The rater may find that Resubmissions in one part are accompanied by input or changes in the other part of a question that has numerical format as was shown in the previous picture. As soon as the numerical part gets modified, the rater should simply ignore all the entries (in both parts). The rater should continue grading the first formulaic part if a student resumes modifying it or stops modifying the second numerical part.

2. The rater may find that Resubmissions in the first part are accompanied by input or changes in the other part of a question that has formulaic format as shown in the picture below (part 3). In this case, the rater should start rating the latter part while making sure that there are no changes in the first part. Our experience shows that several parts are never modified at the same time (that is, in the same entry) though this is possible. In such case, the rater should record all labels that apply to the changes in each part.

Now we will look at how multiple parts are represented in entry files. Different problem parts correspond to the different columns in an entry file. Below we show an example of such file with two formulaic and two numerical parts that correspond to the picture above. In this example, the first two entries are related to changes in the first part (formulaic), next two entries reflect changes in the second part (numerical) and the last two entries represent changes in the third part (formulaic). Note that repetitions of one part are not rated as Resubmissions because there are changes in the other part.

answer, part 1	answer, part 2	answer, part 3	Reworking
x/(x^2+10)
x/(x^2+10)^0.5			X
x/(x^2+10)^0.5	3*(3^2+10)^0.5
x/(x^2+10)^0.5	3/(3^2+10)^0.5
x/(x^2+10)^0.5	3/(3^2+10)^0.5	10/(x^2+10)^-1.5
x/(x^2+10)^0.5	3/(3^2+10)^0.5	10/(x^2+10)^1.5	X

The following exercises concludes our training course. These files are taken from real WeBWorK assignments. Only about 10-30 percent of entries are formulaic, and the others are in numeric and multiple choice formats and thus should be ignored. Skipping through the entries should be done carefully since it is easy to miss some formulaic strings imbedded in dozens of irrelevant ones. When grading a large number of files generated by WeBWorK, one may use the fact that formulaic expressions are always located at the same place (problem set and problem number) for all students in a particular course.

The training process can be regarded complete when the rater reaches the performance level above 90% correct. Note that the chance level for ratings is about 20 % (since we have 5 kinds of labels, the probability of randomly picking a right one is 1/5 = 20%). If there are multiple raters, it is a good idea to check the consistency of their results. A good performance would mean that whenever one rater is wrong, the one is right and vice versa. In other words, the errors in rating are due to some random factors rather than common difficulties in evaluating student responses.

Exercises	Answer Key	Problem description
Exercise 1	Answer 1
Exercise 2	Answer 2
Exercise 3	Answer 3
Exercise 4	Answer 4
Exercise 5	Answer 5

Last update: June 3, 2006