Sunday, 23 October 2016

How do assessment and grading practices at Waikato compare with U.S. universities

Bill Walstad and Laurie Miller (both University of Nebraska-Lincoln) have a new paper in the latest issue of the Journal of Economics Education, summarising the grading policies and practices across the U.S. (sorry I don't see an ungated version anywhere). The data comes from a survey of 175 instructors of principles courses (whether microeconomics, macroeconomics, or combined).

The most interesting thing about this paper was the comparison with our two first-year economics papers at the University of Waikato. Now, we don't actually teach principles of microeconomics in a single paper. Instead, we have ECON100 which is essentially business economics, and ECON110, which is more of a survey course with a welfare economics flavour. However, both fit a similar role to principles courses as taught at U.S. institutions. So, in this post I look to see how they compare.

First, Walstad and Miller look at average grades, and find the average grade is a B-. Now, a direct comparison of grades between U.S. and New Zealand institutions is somewhat unhelpful, but it is interesting that the average grade for ECON100 is usually in the high B- (in the Waikato grading scale), and ECON110 is usually in the mid-high B range. So at least on the surface, things appear similar there.

In terms of grading practices, Walstad and Miller note:
The grading policies that instructors adopt to determine a grade are quite different across instructors. The majority of instructors (74 percent) calculate grades based on an absolute standard such as the percentage of points earned in a course.
Count us in the majority for both ECON100 and ECON110. Next, Walstad and Miller look at grade adjustments. They note:
Regardless of whether an absolute or relative standard is used as the grading policy, student grades can be adjusted at the margin... An instructor can decide at the end of the course to give students bonus points to increase the class average or for meeting some requirement, such as having excellent attendance. The bonus adjustment, however, seems to be more the exception than standard practice because it is used by only 15 percent of instructors.
Again, we are in the majority here for both ECON100 and ECON110, but then:
Another type of positive adjustment is to increase a grade near a grade cutoff. This cutoff adjustment is more widely used than the bonus adjustment because while 13 percent say that they will increase a grade if it is very close to a cutoff, another 56 percent replied that maybe they would increase a grade.
This is something we often do. Given that marks are measured with some error, it makes sense to give students who are on the boundary of a grade the benefit of the doubt (in most cases - if a student wasn't attending class or missed handing in assessments, we are less inclined to move them over the grade boundary).

Next, Walstad and Miller look at extra credit:
What is more popular for increasing grades than bonus points or a cutoff bump among almost half of instructors (46 percent) is to give students extra credit for some type of activity or project. The ones most often given extra credit are for participating in an instructor-sanctioned event or activity outside of class (46 percent), such as attending a guest lecture on an economic topic. Also considered highly valuable is doing something extra for class such as writing something (35 percent); taking an extra quiz, homework, or class assignment (14 percent); or bringing new material to class (10 percent). Students also can be rewarded with extra credit points for good attendance (10 percent); contributing to a class discussion (6 percent); or participating in a class experiment, game, or project (5 percent)...
Among the 46 percent of instructors who use extra credit, the average allocation to the course grade is 4 percent, and the median is 2.5 percent, indicating that the percentage allocation for extra credit is positively skewed, but with large clumps of responses at 3 percent (29 percent) and 5 percent (20 percent). 
I have given extra credit in ECON110 for the last few years, both for attendance (at randomly-selected lectures) and for completing in-class experiments, exercises and short surveys (where the data will be used later in the same topic or a later topic). This semester in ECON100, we gave extra credit for the first time, for being in class for spot quizzes in randomly-selected lectures. In ECON110, extra credit could be worth up to 3 percent of a student's overall grade, and in ECON100 up to 2.5 percent. So, even though we are in the (large) minority here, both classes are around the median in terms of the weighting of extra credit in the overall grade.

Lastly, Walstad and Miller summarise the types of assessment used:
Exams constitute the largest component of a course grade (65 percent). The number of exams that are administered can range from as few as one to as many as six, but the majority of instructors give three exams, in which case the exam grade weights are 30, 30, and 40. When a final exam is given, it is most likely comprehensive (69 percent) rather than limited to the content covered since the previous exam (31 percent). The predominant type of questions on these exams is multiple-choice (56 percent) followed by numerical or graphical problems (21 percent) and short-answer questions (15 percent). Very few instructors (8 percent) allow students to retake an exam to improve their score, and if they do, the exam is a different one from what they previously took.
The other contributions to a course grade come from homework or problem sets (15 percent) and quizzes (10 percent). The majority of the type of items for homework or problem sets are numerical or graphical problems (49 percent) followed by multiple-choice items (26 percent) and short-answer questions (16 percent). By contrast, multiple-choice items are more likely to be used for quizzes (50 percent) than problems (18 percent) or short-answer questions (14 percent).
Exams here include tests, so ECON100 (80 percent tests and exam made up of 15+15+50) and ECON110 (60 percent tests and exam made up of 30+30) are similar to the U.S. institutions. Both the ECON100 exam and the ECON110 final test are comprehensive. ECON100 is predominantly multiple choice (60%), and the rest is short answers or graphical problems, while ECON110 has no multiple choice but is all short answers, numerical or graphical problems. We don't allow students to retake an exam or test.

Where we differ most from the U.S. institutions is in the use of homework or problem sets. ECON100 doesn't use these at all, but does have quizzes (using the online testing system Aplia). ECON110 doesn't have quizzes but does have weekly assignments (similar to homework or problem sets, but more applied).

Overall though, to me it seems that in both ECON100 and ECON110 we are following current practice in U.S. institutions in our grading and assessment practices. So our students can feel pretty confident we are following best practice. Phew!

Friday, 21 October 2016

A cautionary tale on analysing classroom experiments

Back in June I wrote a post about this paper by Tisha Emerson and Linda English (both Baylor University) on classroom experiments. The takeaway message (at least for me) from the Emerson and English paper was the there is such a thing as too much of a good thing - there are diminishing marginal returns to classroom experiments, and the optimal number of experiments in a semester class is between five and eight.

Emerson and English have a companion paper published in the latest issue of the Journal of Economic Education, where they look at additional data from their students over the period 2002-2013 (sorry I don't see an ungated version anywhere). In this new paper, they slice and dice the data in a number of different ways from the AER paper (more on that in a moment). They find:
After controlling for student aptitude, educational background, and other student characteristics, we find a positive, statistically significant relationship between participation in experiments and positive learning. In other words, exposure to the experimental treatment is associated with students answering more questions correctly on the posttest (despite missing the questions initially on the pretest). We find no statistically significant difference between participation in experiments and negative learning (i.e., missing questions on the posttest that were answered correctly on the pretest). These results are consistent with many previous studies that found a positive connection between participation in experiments and overall student achievement.
Except, those results aren't actually consistent with other studies, many of which find that classroom experiments have significant positive impacts on learning. The problem is the measure "positive learning". This counts the number of TUCE (Test of Understanding of College Economics) questions the students got wrong on the pre-test, but right on the post-test. The authors make a case for this positive learning measure as a preferred measure rather than the net effect on TUCE, but I don't buy it. Most teachers would be interested in the net, overall, effect of classroom experiments on learning. If classroom experiments increase students' learning in one area, but reduce it in another, so that the overall effect is zero, then that is the important thing. Which means that the "negative learning" (the number of TUCE questions the students got right on the pre-test, but wrong on the post-test) must also be counted. And while Emerson and English find no effect on negative learning, if they run the analysis on the net overall change in TUCE scores (which you can get by subtracting their negative learning measure from their positive learning measure), they find that classroom experiments are statistically insignificant. That is, there is no net effect of classroom experiments on students' performance in TUCE.

Next, Emerson and English start to look at the relationship between various individual experiments and TUCE scores (both overall TUCE scores and scores for particular subsets of questions). They essentially run a bunch of regressions, where in each regression the dependent variable (positive or negative learning) is regressed against a dummy variable for participating in a given experiment, as well as a bunch of control variables. This part of the analysis is problematic because of the multiple comparisons problem - when you run dozens of regressions, you can expect one in ten of them to show your variable of interest is statistically significant (at the 10% level) simply by chance. The more regressions you run, the more of these 'pure chance' statistically significant findings you will observe.

Now, there are statistical adjustments you can make to the critical t-values for statistical significance. In the past, I'm as guilty as anyone for not making those adjustments when they may be necessary. At least I'm aware of it though. My rule of thumb in these cases where multiple comparisons might be an issue is that if there isn't some pattern to the results, then what you are observing is possibly not real at all and the results need to be treated with due caution. In this case, there isn't much of a pattern at all, and the experiments that show statistically significant results (especially those that are significant only at the 10% level) are showing effects that might not be 'real' (in the sense that they are not pure chance results).

So, my conclusion on this new Emerson and English paper is that not all classroom experiments are necessarily good for learning, and the overall impact might be neutral. Some experiments are better than others, so if you are limiting yourself to five (as per my previous post), this new article might help you select those that may work best (although it would be more helpful if they had been more specific about exactly which experiments they were using!).

Read more:

Thursday, 20 October 2016

New research coming on surf rage in New Zealand

Last month I wrote a post about the escalation of surf gang violence:
Surf breaks are a classic common resource. They are rival (one surfer on the break reduces the amount of space available for other surfers), and non-excludable (it isn't easy to prevent surfers from paddling out to the break). The problem with common resources is that, because they are non-excludable (open access), they are over-consumed. In this case, there will be too many surfers competing to surf at the best spots.
The solution to the problem of common resources is to somehow convert them from open access to closed access. That is, to make them excludable somehow. And that's what the surf gangs do, by enforcing rules such as 'only locals can surf here'.
Now a Massey PhD student is starting a new study on 'surf rage' in New Zealand. The Bay of Plenty Times reports:
The surf at Mount Maunganui will be used as a location to explore surf rage - with locals saying it is real.
Massey University PhD student Jahn Gavala said surf rage, with surfers protecting their local surf and leading to intimidation and physical assault, was prevalent across New Zealand.
"People have ownership of, or mark certain spaces in the surf zones. They form packs of surfers. They use verbal intimidation, physical intimidation and the raging is being physically beaten up - boards broken, cars broken."
Mr Gavala planned to observe surfers at six top surf breaks including Mount Maunganui over summer.
Seems like a good excuse to hang out at the beach and call it research. On a more serious note though, I hope Gavala reads the extensive work of Elinor Ostrom on private solutions to common resource problems, of which surf rage is one example.

Wednesday, 19 October 2016

Brexit and the chocolate war

I've avoided adding to the sheer volume of stuff that's been written about Brexit. However, in this case I'm willing to make an exception. The New Zealand Herald recently ran a story about the reopening of the 'chocolate war':
A 30-year battle between Britain and the European Union over chocolate, which was settled by a court ruling only in 2003, could reopen when the UK quits the bloc, former Deputy Prime Minister Nick Clegg warned Monday.
British chocolate manufacturers fought for the right to sell chocolate containing vegetable fat, which their continental competitors said was not as pure as the products they were marketing and should be branded "vegelate" or "chocolate substitute."
In 2000 a compromise was reached to call it "family chocolate" and the European Court ordered Italy and Spain, the most vociferous opponents, to allow its sale three years later.
"The chocolate purists, I guarantee, will quite quickly start fiddling with the definition of chocolate to make it much more difficult for British exporters to export elsewhere in Europe," Clegg said after a speech in central London...
Arguments over "common definition" will sit alongside tariff barriers and customs controls as obstacles to British food and drink manufacturers if Britain leaves the EU single market, Clegg said as he introduced a report on the UK's 27 billion pound (NZ$46 billion) food and drink sector.
It seems somewhat obvious that Brexit will lead to an increase in trade barriers between Britain and the European Union. However, most people are concentrating on the implications in terms of tariffs (essentially, taxes on imports or exports that make traded goods more expensive).

Fewer people are considering the rise of non-tariff trade barriers. Non-tariff trade barriers exist where the government privileges local firms (or consumers) over the international market, but does so without direct intervention (such as tariffs or quotas). Because they don't involve an explicit tariff or quota, these trade barriers are somewhat hidden from view. However, having rules that prevent UK chocolate from being sold or marketed as chocolate in the European Union would certainly fit the definition, given that it would make it difficult (if not impossible) for British chocolate manufacturers to export to Europe (at least, not without renaming their products 'vegelate' - yuck!).

Other than the rekindling of the 'chocolate war', I wonder how many other non-tariff trade barriers will arise after Brexit is triggered?