So zestyping and I have been working on a grading scheme for an assignment in our computer graphics class. The assignment is to rasterize a polygon (with integer vertexes), and given our conventions stated in class, there is only one correct answer. This lends it self to automatic grading by test cases.

The initial suggestion is something like the following. Students submit solutions and test suites (a set of test cases) for the assignment. Students start with 100 points. They lose up to 80 for failing the TA’s test-suites. They lose 2 points for each student test suite they fail up to a maximum of 20 points. They gain 1 point for each student who fails their test suite up to a maximum of 10 points.

This is a good start, and it has several nice properties. Firstly it is important that students submit test suites, and it is all or nothing to pass the suite. This prevents students from submitting many very similar test cases in their suite. Secondly it is important that every time a student gains some points, that some other student (or students) loses as many or more points. If this were not the case and Alice loses 1 point, and Bob gains 2 points on a test case, then they could cooperate to set up a symmetric situation and they both gain 1 point for adding errors to their programs. This is why this is an inverse prisoner’s dilemma problem. We want set up the weights so the cooperation is not encouraged, and preferably discouraged. Students shouldn’t want to write the code that says ‘if the test case is exactly my friends test case, then fail’.

Now because of the lower cutoff (the fact they can only lose up to 20 points), it is possible to have one student gain 1 point without the other student losing 2 points. However to cooperate they have to both lose their 20 points, and the most they could each gain is 10. So it is only advantageous if they have resigned themselves to losing those 20 points already.

So I picture this problem as a bipartite graph. Let’s say their are 150 student submissions on the left side, and there are 150 student test suites + 5 TA test suites on the right side. Draw an edge every time an submission fails a test suite. Points flow out of submission (for failing) and into the test suites (for writing a good suite). Edges are given two weights. The number of points lost for failing that test, and the number of points gained for writing a good suite. The number of points lost by a student is the sum of the loss weights for the edges on their submission, and the number of points gained by a student is the sum of their gain weights on their edge. So long as on each edge the loss weight >= gain weight, I believe students will not cooperate to beat the grading system. Students start out with 100 points.

Suppose the loss weights on each edge is 2 and the gain weight is 1 on each edge. Except the TA’s loss weights will be 16 and the gain weights will be 0. So if there are no edges connected to a submission, the student loss will be 0, and they will get at least 100/100, which is good. However as it stands if a student fails all test suites, they lose 80 + 2×150 = 380 points. If a student as a test suite that everyone fails, they gain 150 points. Both these numbers are too big. So we can cap the number of points gain at 10. This is fine. We can cap the number of points they lose by 100 (so they never get less than 0), but this is a bit worrisome. By putting a lower cap on, there is the potential to create points in this exchange, and possible give students a chance to cooperate to bet the grading system.

There is another practical problem. Suppose there are 10 students that couldn’t program to save their life. They will fail all test cases, and then drop the course. However this means that everyone can get 10 points no matter what crappy test suite they submit. To solve this we can put the gain weight on submission `S` at say 10/`n`, where `n` is the number if incident edges to `S`. So if someone fails every test, the number of points gained by each student is only 10/150=0.06. So now with our 10 flaky students, every student only gains 10×0.06=0.6 bonus points. But if only one test case breaks the submission, there is a gain weight of 10/1=10 points, which is greater than the loss weight of 2. We can’t have that.

This could be fixed by having the gain weight equal to 1/`n`. I’m a little concerned that this isn’t enough incentive to get students to submit test cases.

My friend Alex suggested this solution which solves this problem and the lower cap problem. Set the gain weight = loss weight = `k`×ln(`n`+1)/`n`. So the number of points lost by a student is `k`×ln(`n`+1). `k` is choosen so that even if the student fail all test cases they only lose only 20 points. With 150 students, `k` is about 4. No lower cap is needed anymore. Now if Alice makes a test suite is the only suite that Bob fails, Alice gets 4×ln(1+1)/1≈2.77259 points, which is pretty good. And this number stays reasonably high even if Bob fails up to 10 suites:

Prelude> (map (\n->4/n*log(n+1)) [1..10]) [2.77259,2.19722,1.84839,1.60944,1.43341,1.29727,1.18825,1.09861,1.02337,0.959158]

Now if everyone only fails Alice’s suite she would get 150×2.77259≈415.888 points. So an upper cap is still needed, but upper caps are fine.

Now some might say this scheme is too confusing for students. I don’t think this is the case. All they need to know is:

- If their program is bug-free they will get at least 100 points.
- The more student test suites they fail, the more points they will lose up to 20.
- The more TA test suites they fail, the more points they will lose up to 80
- They will not get less than 0.
- The more students that fail their test suites, the more bonus points they will get up to a maximum of 10.

If they want to think about how the scoring system achieves this, and prevents abuse, they are welcome to think about it, but understanding the scoring system won’t help them get more points.

So if anyone has comments on this grading algorithm, let me know. Comments are preferred by Monday, but even after that is fine.

Oh, there are some side issues. Automatic grading is sometimes ‘unfair’ because one ‘little bug’ can cause you to fail all test cases. To deal with this, it is suggested to run submissions similar to the ACM style. When a program is submitted, the student is informed right away how many test suites they passed and how many test suites were applied. They can keep submitting until they pass all the test suites or until the deadline.