Thursday, 22 June 2017

Why researchers need to name their teaspoons

The annual Christmas issue of the British Medical Journal always has at least one 'interesting' paper included. For example, I recently blogged about the study on Pokemon Go and obesity. I recently read this 2005 paper (open access) by Megan Lim, Margaret Hellard, and Campbell Aitken (all from the Burnet Institute in Melbourne) about the Institute's mysteriously disappearing teaspoons. The paper explains:
In January 2004 the authors found their tearoom bereft of teaspoons. Although a flunky (MSCL) was rapidly dispatched to purchase a new batch, these replacements in turn disappeared within a few months. Exasperated by our consequent inability to stir in our sugar and to accurately dispense instant coffee, we decided to respond in time honoured epidemiologists' fashion and measure the phenomenon.
Here's what they did:
At the completion of the pilot study we carried out a longitudinal cohort study. We purchased and numbered a further 54 stainless steel teaspoons. In addition we purchased and discreetly numbered 16 teaspoons of higher quality. The teaspoons were distributed (stratified by spoon type) throughout the eight tearooms, with a higher proportion allocated to those tearooms with the highest teaspoon losses in the pilot study.
We carried out counts of the teaspoons weekly for two months then fortnightly for a further three months.
They then essentially conducted a very simple survival analysis of the teaspoons. They found that:
After five months, 56 (80%) of 70 teaspoons had disappeared. The half life of the teaspoons was 81 days (that is, half had disappeared permanently after that time) compared with 63 days in the pilot study...
If you think this study is inconsequential, think again:
If we assume that the annual rate of teaspoon loss per employee can be applied to the entire workforce of the city of Melbourne (about 2.5 million), an estimated 18 million teaspoons are going missing in Melbourne each year. Laid end to end, these lost teaspoons would cover over 2700 km—the length of the entire coastline of Mozambique—and weigh over 360 metric tons—the approximate weight of four adult blue whales. 
There is an economics aspect to the study. Teaspoons in a common area are subject to the 'tragedy of the commons', as the authors explain:
The tragedy of the commons applies equally well to teaspoons. In the Burnet Institute the commons consists of a communally owned set of teaspoons; teaspoon users (consciously or otherwise) make decisions that their own utility is improved by removing a teaspoon for personal use, whereas everyone else's utility is reduced by only a fraction per head (“after all, there are plenty more spoons…”). As more and more teaspoon users make the same decision, the teaspoon commons is eventually destroyed. The fact that teaspoons were lost significantly more rapidly from the Burnet Institute's communal tearooms (the “commons”) compared with programme linked rooms, correlates neatly with Hardin's principle.
The tragedy of the commons arises because the resource (teaspoons) is rival (one person taking a teaspoon reduces the amount of teaspoons left available for everyone else) and non-excludable (it isn't easy to prevent someone taking a teaspoon). One solution to the tragedy of the commons is to create property rights, which would make the teaspoons excludable. In this case, everyone in the Institute would have their own named teaspoons, with a rule that no one can use others' teaspoons without some suitably dire punishment befalling them.

As with most of these BMJ papers, this one was an interesting diversion. Also, if you haven't ever heard of counterphenomenological resistentialism, I recommend you read the paper and be enlightened.

Wednesday, 21 June 2017

Are men more likely to cheat if working with more women, or do cheating men prefer to work with more women?

I recently read this 2013 paper by Masanori Kuroki (Occidental College), published in the journal Economics Letters (sorry I don't see an ungated version anywhere). In the paper, Kuroki looks at whether the sex ratio in the workplace affects the likelihood of marital infidelity for men and for women. She used data from the 1998 General Social Survey (as an aside, you can get the data for yourself here - not just for 1998 but for every wave of the survey, which is a pretty cool resource).

Her measure of the sex ratio of the workplace is based on a self-reported measure in response to this question:
"Are the people who work at this location mostly men or women?" Individuals respond on a 7-point scale: (1) all women, (2) almost all women (e.g. 95%), (3) mostly women (e.g. 70%), (4) about half men and half women, (5) mostly men (e.g. 70%), (6) almost all men (e.g. 95%), and (7) all men.
She then converted the categorical measure to a numerical measure (which just screams out "measurement error", but we'll put that to one side as there is a more important issue with the paper). Her measure of marital infidelity is based on this question:
"Have you ever had sex with someone other than your husband or wife while you were married?"
I'm sure you can immediately see a problem here. The interpretation of the results is not straightforward, since the cross-sectional correlation that results from her analysis will be between current workplace sex ratio and whether the person has ever been unfaithful. Here's what Kuroki finds:
An increase in one standard deviation in a fraction of coworkers of the opposite sex is predicted to increase the likelihood of an extramarital affair by 2.9 percentage points. Considering that 22% of people have committed infidelity in the sample, this magnitude is not trivial...
Next I run separate regressions for men and women... The coefficient on the fraction of coworkers of the opposite sex continues to be positive and statistically significant for men but not for women. An increase in one standard deviation in a fraction of female coworkers is predicted to increase the likelihood of an extramarital affair by 4.6 percentage points for men. 
The coefficient may imply a result that is not trivial, but it is correlation not causation, and as noted above the interpretation is not straightforward. Does it mean that men are more likely to be unfaithful if their current workplace has a high proportion of women, or that men who have a history of infidelity are more likely to choose to work in a workplace with a high proportion of women? The answer is not clear.

Sunday, 18 June 2017

Book Review: You are Not So Smart

I recently read a 2011 book by David McRaney, "You are Not So Smart". The book has 48 short chapters, each of which is devoted to a different cognitive bias, heuristic, or logical fallacy, all of which demonstrates how all of us are not so smart. As McRaney puts it in the introduction:
These are components of your mind, like organs in your body, which under the best conditions serve you well. Life, unfortunately, isn't always lived under the best conditions. Their predictability and dependability have kept confidence men, magicians, advertisers, psychics, and peddlers of all manner of pseudoscientific remedies in business for centuries. It wasn't until psychology applied rigorous scientific method to human behavior that these self-deceptions became categorized and quantified...
You will soon realize you are not so smart, and thanks to a plethora of cognitive biases, faulty heuristics, and common fallacies of thought, you are probably deluding yourself minute by minute just to cope with reality.
Many of the heuristics and biases are ones that I cover in the behavioural economics topic in my ECON110 class, such as framing, the Dunning-Kruger effect, procrastination and present bias. All are supported by appropriate citations to research and interesting anecdotes. And some of the bits are priceless, such as this (on introspection):
Is there a certain song you love, or a photograph? Perhaps there is a movie you keep returning to over the years or a book. Go ahead and imagine one of those favorite things. Now, in one sentence, try to explain why you like it. Chances are, you will find it difficult to put into words, but if pressed you will probably be able to come up with something.
The problem is, according to research, your explanation is probably going to be total bullshit.
The only problem with this book is that, while it presents a lot of problems with our cognitive processes, it is very light on solutions. And when solutions are presented, they can sometimes be inconsistent. For instance, the solution to normalcy bias (where you pretend everything is normal, even in the wake of a major crisis) is repetition of warnings. However, in the next paragraph McRaney points out the cases of Y2K, swine flu, and SARS, where media over-hyping has led people to become complacent!

Overall, I found this book to be an easy and interesting read, and recommended for anyone who wants to understand why we are not so smart. If you're looking for more, McRaney has a website/blog, and a follow-up book, "You are Now Less Dumb", which I look forward to reviewing in the future.

Wednesday, 14 June 2017

How segregated is Auckland, and New Zealand?

A story published on the Newsroom site last month discussed diversity and segregation in Auckland:
How often do we hear that Auckland is this wonderfully diverse city where immigration has produced an exciting multicultural mix and made it a truly dynamic city to live in?
The portrayal of Auckland as a place where different ethnicities live side by side and share the fruits of its booming economy suits many narratives, but is it a myth?
Associate Professor of Pacific Studies at the University of Auckland Damon Salesa says the residential segregation in Auckland is remarkably high in Auckland, and not far behind what you would find in South Africa or parts of the American South.
“The most segregated population is actually European New Zealanders in Auckland. These people have no window or vision on the rest of Auckland…. the city many European New Zealanders live in is not diverse at all."
Now, I've been working for the last couple of years on developing new projections of future ethnic diversity as part of the MBIE-funded CADDANZ (Capturing the Diversity Dividend of Aotearoa New Zealand) project. Those projections make use of some fairly sophisticated population projections and microsimulation methods, but I won't be talking about those here just yet. And given continuing record rates of net migration, understanding these changes is pretty important.

Before thinking about future diversity, it pays to have a good understanding of current and past diversity. So, my very able research assistant Tobias has been working on estimating measures of segregation and isolation for New Zealand (and Auckland) recently. And it seems that Salesa has things quite wrong, based on the data (or he's reading the data quite differently to us).

There are several ways of measuring the extent of segregation of a population group (by which I mean how separated a given population group, such as Europeans, is from other population groups). Two widely used measures of residential segregation (for groups) are:

  1. The Index of Segregation (IS), which measures the proportion of people in a population subgroup that would have to relocate in order to make their distribution identical to that of all other groups (on average); and
  2. The Modified Isolation Index (MII), which measures the extent to which members of a population subgroup are disproportionately located in the same area as other members of their group. 
Both measures range from 0 (low segregation or isolation) to 1 (high segregation or isolation). I'll focus on the Index of Segregation in this post (but the MII results are similar [*]). To calculate the measures we use data from the New Zealand Census 2001-2013 based on the number of people belonging to each of five ethnic groups (European, Asian, Maori, Pacific, and Other). [**]

First, here's the picture for the country as a whole:

Clearly, the most segregated ethnic group is the Pacific group, and segregation of that group has been declining since 2001. The Asian group is the next most segregated, followed by European and Maori (which are about even). The "Other" ethnic group is the least segregated of all. So, that doesn't seem to support Salesa's assertion that Europeans are the most segregated ethnic group. Also, they haven't been becoming more segregated over time. But his statement was about Auckland, so here's the same picture for Auckland:

Again, the Pacific group is the most segregated. Asians are clearly less segregated in Auckland than in the rest of the country, and in Auckland there is little to choose between that group and Europeans. Maori are also less segregated in Auckland than in the country as a whole, but notice the trend towards less segregation for Maori is the same for both Auckland and the whole country. Again, there is little support for Salesa in these data.

Those first two charts are based on data at the area unit level (an area unit is a geographical unit that in urban areas equates roughly to the size of a suburb). You might worry that using these arbitrary boundaries matters, but it doesn't much. Here's the picture for the whole country based on 2014 electoral boundaries:

Notice that it doesn't look much different from the country as a whole using area unit boundaries (the first chart in this post), in terms of the ranking of the different ethnicities' levels of segregation (and if we look at electorates in Auckland only, it looks similar to the second chart in this post). Overall, we can probably conclude that the European ethnic group is not the most segregated (in Auckland, or in the country as a whole). If we are concerned about segregation, we should be looking at the Pacific and Asian groups (and particularly the Pacific group in Auckland).

As a bonus, it's interesting to note that New Zealand is much more segregated by ethnicity than by political affiliation. Here's segregation by vote share from the last five national elections (2002-2014) plotted on the same scale (and with five groups, to make it most comparable to the ethnic segregation data above):

New Zealand is much less segregated politically than ethnically, and none of the political parties' supporters seem vastly more segregated than any other (though I wonder if Labour and NZ First will continue their previous trajectories through this year's election)?


[*] The MII results accentuate some of the differences, and some of the rankings are slightly different, but the overall picture is the same in that the Pacific group is the most isolated.

[**] These calculations are based on the 'total response' ethnicity for each area. In the Census, people can report that they belong to more than one ethnic group, and the 'total response' counts these people once for each group they belong to. So, the total number of reported people by ethnicity in an area will always be greater than the total number of people in an area. I can't see that this would bias the statistics in any serious way, however.

Monday, 12 June 2017

Is Roger Federer more loss averse than Serena Williams?

I've really enjoyed watching the French Open tennis the last couple of weeks, and congratulations to Jelena Ostapenko and Rafael Nadal for their excellent wins. I especially like watching tennis because I can read a book in between points (and watching Nadal helps with this too, since he takes so long between points I can read a lot more!), and the payoff is likely to be a couple of quick book reviews on my blog (in spite of also reading a Terry Pratchett novel as well).

Anyway, the point of this post is in the title, which is also the title of a new paper published in the journal Applied Economics (ungated earlier version here), authored by Nejat Anbarci (Deakin University), Peren Arin (Zayed University in Abu Dhabi), Cagla Okten (Bilkent University in Turkey), and Christina Zenker (Zayed University). In the paper, the authors use data on the service speeds from 32 matches (19 men's; 13 women's) from the 2013 Dubai Tennis Championships, to investigate whether player efforts obeys Prospect Theory.

Prospect Theory was introduced by Daniel Kahneman and Amos Tversky in this 1979 paper. The theory suggests that not only are people loss averse (they value losses as much worse than they value equivalent gains), people are risk averse when they are in the domain of gains (relative to their reference point), but risk seeking when they are in the domain of losses. What that all means is that:

  1. People will exert more effort to avoid losses than they will to capture equivalent gains; and
  2. People will engage in more risky behaviour if they are 'behind' than if they are 'ahead'.
Anbarci et al. test this with their data. They have a measure of whether players are ahead or behind (based on the score in the current game, the current set, and the match overall), and a measure of player effort (their service speed). They find that:
...(i) a server will put more effort into his/her serve speed when behind in score than when ahead in score, (ii) players’ effort levels and thus serve speeds get less sensitive to losses or gains when score difference gets too large and (iii) overall servers will be more risk averse in the domain of gains than in the domain of losses.
This seems to support Prospect Theory. And when they look at male and female players separately, they find that:
...male players are more risk seeking in the loss domain as they increase serve speed when behind in set score while female players are more risk averse in the gain domain since they decrease serve speed when ahead in set score. Hence, although we find evidence for behaviour consistent with loss aversion for both males and females, its manifestation differs significantly between the two sexes.
So the results are not unequivocally in favour of Prospect Theory, but at least they are consistent with it. The only issue I see with the paper is that the sample size is relatively small, based on data from a single tournament. This means that, when the authors try to exclude other explanations for their results, the statistical insignificance of those findings is not altogether convincing. They also don't answer the actual question that is the title of their paper (and this post), but with sufficient additional data it might be possible to assess the loss aversion of individual players. Hopefully, some intrepid souls are following up this work with some more in-depth analysis involving many tournaments.

Sunday, 11 June 2017

Book Review: The Price of Fish

Regular readers of my blog may have noticed that in my book reviews, I rarely review a book that I don't like. That is probably because I tend to have a pretty good filter most of the time for the books I read, so that I don't pick up a book that I won't like. However, that isn't always the case and I just finished reading "The Price of Fish" by Michael Mainelli and Ian Harris and was pretty disappointed.

The subtitle of the book is "A new approach to wicked economics and better decisions", and so I guess I was expecting something that would challenge my thinking a little bit. Unfortunately, while the authors claim to be developing a new framework for decision-making, my impression of the book is that it was a particularly shallow collection of references to work that others have done, where none of the references do a particularly good job of explaining the key issues or why that particular reference is important. Essentially, the book could adequately be summarised as: "here are a bunch of ideas we have read about. We're not going to explain them to you, but believe us when we say that by collecting them together we have created a new way of thinking about decision making". Blech.

To give you some idea, try this passage on Elinor Ostrom's work:
Ostrom derived eight design principles for systems that successfully manage common-pool resources:

  • Clearly defined boundaries
  • Equivalence between costs and benefits (appropriation and provision rules) in local conditions.
  • Collective-choice arrangements.
  • Monitoring.
  • Graduated sanctions.
  • Conflict-resolution mechanisms.
  • Recognition of rights to organize.
  • The use of nested enterprises.
That's pretty much all you get. The authors never explain this list of bullet points (and this is not the only time they do this). It just seems to be something they have picked up and thought important (and it is), but it isn't well integrated into their overall narrative (unless their narrative really is that they are collecting a bunch of ideas they have read about?). It is all very well to assume some prior knowledge of your readers, or give them teasers so that they go out and read more about certain topics, but Mainelli and Harris do this far too often for my liking. 

Finally, their "new approach to wicked economics and better decisions" isn't pulled together until page 287 (out of 310), so you have to wait a while to see what they are suggesting. But to me the new approach isn't all that new, and doesn't really rely on the previous 280+ pages of material.

So if you're looking for some good reading over the semester break, this isn't it. I suggest avoiding this book, as there are much better alternatives available. If you want a new approach to economics, you would be better off with Alvin Roth's "Who Gets What - and Why" (see my review here).

Saturday, 10 June 2017

The gender gap in economics

I was interested to read recently the new paper in the Papers and Proceedings issue of the American Economic Review by Wendy Stock (Montana State University), entitled "Trends in Economics and Other Undergraduate Majors" (sorry, I don't see an ungated version anywhere). Stock used data on some 1.5 million degrees awarded in the U.S. over the period 2001-2014. To me, the key results start with this:
An average of 26,500 degrees were conferred to economics majors per year. Females earned just under one-third and minority students earned just over one-tenth of these majors...
Economics’ share of first majors remained flat at about 1.7 percent during 2001–2014, while its share of second majors grew from 3.5 to 4.6 percent... only a tiny and unchanging fraction of female and minority students (0.9 and 1.1 percent, respectively) first majored in economics during 2001–2014. Thus, although we have seen substantial growth over time in female and minority students graduating from college, the demographic mix of majors in economics has not mirrored these trends. However, from 2001–2014, economics’ share of second majors grew from 2.3 to 2.9 percent among females and from 1.8 to 2.8 percent among minority students.
Despite some increase in economics majors among female students, there remains a significant gender gap (and this is a point I have raised before). Stock's paper is relatively silent on the reasons for the gap. However, some earlier work may shed some light on this.

A 2012 paper by Tisha Emerson (Baylor), KimMarie McGoldrick (University of Richmond), and Kevin Mumford (Purdue) published in the Journal of Economics Education, suggested that:
...apprehension over the gender gap also includes suggestions that the absence of peers and role models may have a dampening effect on female interest in the discipline... Thus, the gender gap could be (partially) self-perpetuating.
Emerson et al. used data from nearly 600,000 students from the Multiple-Institution Database for Investigating Engineering Longitudinal Development (MIDFIELD), and investigated which students took introductory economics, which students progressed to an intermediate theory course, and which students progressed to an economics major. They found that:
...females and minorities are less likely to take an introductory economics course. Further, students with higher standardized SAT/ACT scores are significantly less likely to enroll in introductory economics courses. However, women with higher aptitude are relatively more likely to enroll in an introductory course as are female minorities...
At the intermediate theory level, we found significant differences in the probability of majoring by gender, which remain even after accounting for gender differences in the influence of aptitude and course performance...
...the probability that females major in economics is positively correlated with the proportion of women in their theory course. A 1-percent increase in the percentage of male students in their theory course actually decreases a female student’s likelihood of majoring in economics by 59 percent. 
They concluded that:
...course performance matters, and our grading practices and policies relative to that of other disciplines may either draw or repel students.
Which relates to a point I made in an earlier post this week. There may be good reasons for us to include more writing in introductory economics, in order to attract more female students into the discipline. Also:
...because once students commit to an economics major they stick by that decision, one possible channel for increasing the number of women who major in economics is to generate significant interest in the subject prior to their first course, perhaps through greater exposure to economics at the high school level.
This is something that we have little control over at university level. And the rise in business studies courses at high school, and consequent decline of economics courses, is likely to be a challenge for the future of the economics major at all universities. This may especially become a problem at Waikato from next year, where a greater proportion of students will have to elect their majors from the start of their first year of their degree.

The gender gap is not just apparent in U.S. data. In a 2015 paper published in the journal CESifo Economic Studies (ungated earlier version here) and authored by Mirco Tonin and Jackline Wahba (both University of Southampton), we see somewhat similar results for the U.K. They report:
...using administrative data for enrolment in 2008, we find that females represent 57% of those enrolling for undergraduate courses in all fields, but only 27% in Economics.
Tonin and Wahba have access to data from a 50% sample of all applicants to UK universities in 2008, being nearly 960,000 applications, from over 230,000 applicants. Once they restrict the sample to those who actually enrolled at a university, that leaves a little over 185,000 applicants. This dataset allows them to look at whether there is a gender bias in applications, offers, or acceptance of offers by students. They find: evidence that females are less likely to accept an offer for a bachelor's degree in Economics, thus indicating that, conditional on applying, Economics is not lower in the ranking for females than for males. We also do not find evidence of universities discriminating against female applicants. What we find is that girls are less likely to apply for a bachelor's degree in Economics to start with, while they are over-represented in certain degrees that are associated with more female-concentrated occupations such as Nursing and Education... even among those who have studied Maths, females are less likely to apply for an Economics degree than males, suggesting that differences in the choice of A level subjects cannot explain the whole gap.
Again, it appears that part of the problem is occurring before female students even set foot on the university campus. Are female students simply less interested in economics by the time they apply for university?

Coming back to the Stock paper I mentioned first in this post though, she suggests something that might be worth consideration:
Encouraging combinations of second majors in economics with majors that are growing among female and minority groups may also increase gender and racial diversity among economics undergraduates.
Perhaps the most effective course of action at the margin is to make economics an attractive second major for students in other disciplines, or an attractive option as a minor or as part of an inter-disciplinary major or minor. Given the significant upheaval of the degree structures and majors here at Waikato, I look forward to reporting in the future on some of our developments in that space. In the meantime though, will shortly submit a proposal for a Summer Research Scholarship student to use our accumulated data on first-year economics students at Waikato, to further investigate the gender gap in economics. I look forward to reporting on that in a future post as well.

Read more:

Wednesday, 7 June 2017

Trump, Paris, and the repeated prisoners' dilemma

I just finished reading this 2008 paper by Garrett Jones (George Mason University), entitled "Are smarter groups more cooperative? Evidence from prisoner’s dilemma experiments, 1959–2003", and published in the Journal of Economic Behavior & Organization (ungated earlier version here). In the paper, Jones collates data from 36 studies of the repeated prisoners' dilemma that were undertaken among U.S. college students between 1959 and 2003.

The classic prisoners' dilemma game goes like this (with lots of variants; this is the version I use in ECON100):
Bonnie and Clyde are two criminals who have been captured by police. The police have enough evidence to convict both Bonnie and Clyde of the minor offence of carrying an unregistered gun. This would result in a sentence of one year in jail for each of them.
However, the police suspect Bonnie and Clyde of committing a bank robbery (but they have no evidence). The police question Bonnie and Clyde separately and offer them a deal: if they confess to the bank robbery they will get immunity (and be set free) but the other criminal would get a sentence of 20 years. However, if both criminals confess they would both receive a sentence of 8 years (since their testimonies would not be needed).
The outcome of the game is that both criminals have a dominant strategy to confess. Confessing results in a payoff that is always better than the alternative, no matter what the other criminal decides to do. The Nash equilibrium is that both criminals will confess. However, that assumes that the game is no repeated.

In a repeated prisoners' dilemma game, where the game is played not once but many times with the same players, we may be able to move away from the unsatisfactory Nash equilibrium, towards the preferable outcome, through cooperation. Both criminals might come to an agreement that they will both remain silent. However, this cooperative outcome requires a level of trust between the two criminals.

Jones's data records the proportion of the time that the repeated prisoners' dilemma (RPD) resulted in cooperation in each of the 36 studies. He then shows that there is a large positive correlation between the average SAT scores at the college that the study was undertaken at (a measure of average intelligence), and the proportion of students choosing to cooperate in the RPD game. This is the key figure (2006 average SAT score is on the x-axis, and the proportion of students cooperating is on the y-axis): [*]

The regression results support this, and the effects are relatively large:
Using our weakest results, those from the 2006 SAT regression, one sees that moving from “typical” American universities in the database such as Kent State and San Diego State (with SAT scores around 1000) to elite schools like Pomona College and MIT (with scores around 1450) implies a rise in cooperation from around 30% to around 51%, a 21% increase. Thus, substantially more cooperation is likely to occur in RPD games played at the best schools. It indeed appears that smarter groups are better at cooperating in the RPD environment.
So, the results are clear. Smarter groups are more likely to cooperate in a repeated prisoners' dilemma game. This brings me to the Paris climate change agreement, which as I noted in this earlier post, is an example of a repeated prisoners' dilemma. If more intelligent groups are more likely to cooperate in this situation, what does exiting the Paris agreement (i.e. not cooperating) imply about the Trump administration's collective level of intelligence?


[*] The results are not sensitive to the choice of which year's average SAT scores to use, and similar figures are shown in the paper for 1966 and 1970 SAT scores, and 2003 ACT scores.

Read more:

Tuesday, 6 June 2017

Congratulations Dame Peggy Koopman-Boyden

I was delighted to learn on Monday that my former (now retired) colleague, Peggy Koopman-Boyden was made a Dame Companion of the New Zealand Order of Merit. Dame Peggy and I have a long history of collaboration in research on ageing, dating back to 2006 when we were brought into the beleaguered Enhancing Wellbeing in and Ageing Society (EWAS) project (see here for details on that project). That was followed by two projects funded by the Foundation for Research, Science and Technology (see here for details), and the Ministry of Science and Innovation (later the Ministry of Business, Innovation and Employment) (see here for the final output of that project). However, despite all those projects we were both on, we actually only co-wrote one research paper together - this 2015 working paper entitled "Labour Force Participation, Human Capital and Wellbeing among Older New Zealanders" (co-authored with Matthew Roskruge at Massey).

I suspect there are probably some people who are pretty unhappy about this honour. Dame Peggy was responsible for handling the merger of the School of Social Science and the School of Arts at the University of Waikato, and that process ruffled a lot of feathers. However, her honour has less to do with her work here, and more as a recognition of the immense contributions she has made in service of older people in New Zealand - service which it has been a privilege for me to observe first-hand on many occasions. The Waikato Times article gives only a small taste of her contributions:
Dame Peggy led major research projects for the Foundation for Research, Science and Technology during the 1990s and 2000s and recently completed a multiyear programme of research on active ageing funded by the Ministry of Business, Innovation and Employment.
In 1997, she was appointed a Companion of the New Zealand Order of Merit for her services to the elderly. 
And in 2005, she became the founding president of the Waikato branch of the New Zealand Association of Gerontology, a position she held until 2012. She has been a member of the Age Concern Advisory Research Committee since 2010.
She has also been chair of Hamilton City Council's Older Persons Advisory Panel and now chairs the steering group of Hamilton's Age Friendly accreditation of the Institute of Healthy Ageing.
Dame Peggy was made Emeritus Professor of the University of Waikato last year. She was previously Director of NIDEA for a short time before retirement, but retirement has clearly not slowed her down and she is actively involved in a number of projects, including her goal of Hamilton becoming New Zealand's first age-friendly city.

Dame Peggy was a great mentor to me as I was starting out my independent research career (and finishing off my PhD), and has always been a great source of advice and ideas for continuing research. I am blessed to have had her share part of my career journey, and I look forward to being able to use her formal title in person soon (I'm sure that will draw a smile and a gentle rebuke).

Congratulations Dame Peggy Koopman-Boyden!

Monday, 5 June 2017

On multiple choice vs. constructed response in tests and exams

Periodically I run into arguments about whether it is appropriate to use multiple choice questions as a form of assessment, or as a large component of assessment that combines multiple choice and constructed response (such as fill-in-the-blanks, calculations, short answer questions, or short or long essays) questions. Before I get into those debates though, it is worth considering what the purpose of assessment is.

From the perspective of the institution, assessment can be seen to have one or more of two main purposes. The first purpose is to certify that students have some minimum standard of knowledge (what is termed criterion-referencing), and a pass-fail type assessment is all that is required. The second purpose is to rank students in order to assign grades (what is termed norms-referencing), and an assessment that facilitates ranking students (thus providing a range of grades) is required.

However, from the perspective of the student (and the lecturer or teacher), the purpose of assessment is to signal the students' ability in the subject matter that is being assessed. Signalling is necessary to overcome the information asymmetry between student and lecturer or teacher - students know how competent they are, but the lecturer doesn't (or at least, the students should have a better idea than the lecturer does prior to the assessment). Students signal their ability by how well they answer the questions they are faced with. A signal is effective if it is costly, and if it is costly in a way that makes it unattractive to those with lower quality attributes to attempt (or makes it impossible for them to do so). In this case, it is difficult to do well in an assessment, and more difficult (or impossible) for less able students, which makes assessment an effective signal of student ability.

Now that we recognise that, for students, assessment is all about signalling, we can see why most lecturers would probably prefer to construct assessments based purely on constructed response questions. It is difficult for less able students to hide from a short answer or essay question, whereas it may be possible for them to do somewhat better on multiple choice questions, given that the answer is provided for them (they just have to select it). The quality of the signal (how well the assessment reveals student ability) is much greater for constructed response questions than it is for multiple choice questions. The trade-off for lecturers is that constructed response questions are more time intensive (costly) to mark, and the marking is more subjective.

The arguments about the use of multiple choice questions I encounter usually come from both directions - there should be more multiple choice (and less constructed response), and there should be less multiple choice (and more constructed response). The former argument says that, since assessment is about assigning grades, all you need from assessment is some way to rank students. Since both multiple choice and constructed response rank students, provided the rankings are fairly similar we should prefer multiple choice questions (and potentially eliminate constructed response questions entirely). The latter argument says that multiple choice questions are biased in favour of certain groups of students (usually male students), or that they are poor predictors of student ability since students can simply guess the answer.

It is difficult to assess the merits of these arguments. To date I have resisted the first argument (in fact, ECON110 has no multiple choice questions at all), and the second argument is best answered empirically. Fortunately, Stephen Hickson (University of Canterbury) has done some of the leg work in terms of answering questions about the impact of question format in introductory economics classes, published in two papers in 2010 and 2011. The analyses are based on data from over 8000 students in introductory economics classes at the University of Canterbury over the period from 2002 to 2007.

In the first paper, published in the journal New Zealand Economic Papers (ungated earlier version here), Hickson looks at which students are advantaged (or disadvantaged) by the question format (multiple choice, MC, or constructed response, CR) used in tests and exams. He finds that, when students' characteristics are investigated individually:
...females perform worse in both MC and CR but the disadvantage is significantly greater in MC compared to CR... All ethnicities perform worse compared with the European group in both MC and CR with a greater relative disadvantage in CR. The same is true for students whose first language is not English and for international students compared to domestic.
However, when controlling for all characteristics simultaneously (and controlling for student ability), only gender and language remain statistically significant. Specifically, it appears that female students have a relative advantage in constructed response (and relative disadvantage in multiple choice), while for students for whom English is not their first language, the reverse is true. Hickson goes on to show that the female advantage in constructed response applies for macroeconomics, but not microeconomics, but I find those results less believable, since they also show that students with Asian, Pacific, and 'other' ethnicities have an advantage in constructed response in microeconomics.

The takeaway from that first paper is that a mix of multiple choice and constructed response is probably preferable to avoid gender bias in the assessment, and that non-native English speakers will be disadvantaged the greater the proportion of the assessment is constructed response.

The second paper, which Hickson co-authored with Bob Reed (also University of Canterbury), was published in the International Review of Economics Education (ungated earlier version here). This paper addresses the question of whether multiple choice and constructed response questions test the same underlying knowledge (which, if they did, would ensure the ranking of students would be much the same regardless of question type). In the paper, they essentially run regressions that use multiple choice results to predict constructed response results, and vice versa. They find that:
...the regression of CR scores on MC scores leaves a substantial residual. The first innovation of our study is that we are able to demonstrate that this residual is empirically linked to student achievement. Since the residual represents the component of CR scores that cannot be explained by MC scores, and since it is significantly correlated with learning outcomes, we infer that CR questions contain “new information about student achievement” and therefore do not measure the same thing as MC questions...
...we exploit the panel nature of our data to construct a quasi-counterfactual experiment. We show that combining one CR and one MC component always predicts student achievement better than combining two MC components.
In the latter experiment, they found that a combination of constructed response and multiple choice always does a better job of predicting students' GPA than does a combination of multiple choice and more multiple choice. They didn't test whether a combination of constructed response and more constructed response was better still, which would have been interesting. However, the takeaway from this paper is that multiple choice and constructed response questions are measuring different things, and a mix is certainly preferable to asking only multiple choice questions. On a related point, Hickson and Reed note:
Bloom’s (1956) taxonomy predicts that MC questions are more likely to test the lower levels of educational objectives (i.e., Knowledge, Comprehension, Application, and, perhaps, Analysis). While CR questions test these as well, they are uniquely suited for assessing the more advanced learning goals (Synthesis and Evaluation).
The more of the higher-level skills you want to assess, the more constructed response should probably be used. This provides another argument for using a combination of multiple choice (to test lower levels on Bloom's taxonomy) and constructed response (to test the higher levels), which is what we do in ECON100.

Sunday, 4 June 2017

The puzzle of newspaper pricing

In the simple economic model of supply and demand, when demand decreases the price decreases (except in some special cases such as when supply is perfectly elastic). Typically, the same holds true for other market structures such as monopoly - when demand decreases, the monopolist's profit-maximising price usually decreases.

That makes the recent experience of newspaper subscriptions increasing in price, in the wake of decreasing demand, somewhat of a puzzle. Fortunately, a 2016 paper by Adithya Pattabhiramaiah (Georgia Tech), S. Sriram (University of Michigan), and Shrihari Sridhar (Texas A&M) unpacks the puzzle for us, and ultimately it rests on the fact that this is a platform market (or a two-sided market) - a market where a firm brings together two sides (e.g. in this case the readers and the advertisers), both of whom benefit by the existence of the platform (the newspaper), and both of whom may (or may not) be charged (in this case, the readers are charged a subscription, and the advertisers are charged for advertising). As Nobel Prize winner Jean Tirole has noted, in platform markets it is common for one of the two sides to subsidise the other. To be more precise, the side of the market with relatively more inelastic demand (or greater willingness-to-pay for access to the platform) will subsidise the side of the market with relatively more elastic demand (or lower willingness-to-pay for access to the platform).

So, one potential explanation for the puzzle of increasing subscription prices for readers in the face of decreasing demand is that the decreasing number of readers reduces the value of advertising in the newspaper, which reduces advertisers' willingness-to-pay for advertising, which in turn reduces the optimal subsidy the newspaper will apply to subscriptions. And it is this explanation that Pattabhiramaiah et al. set out to test.

They use data for 2006-2011 from a top-50 regional newspaper in the U.S., supplemented by microdata from 5565 subscribers, to construct models of: (1) the subscribers' decisions about whether to subscribe (and which of three subscription choices to select); (2) the advertisers' decisions about whether to advertise (and whether to do so in classifieds, display, or inserts); and (3) the newspaper's decision about subscription prices. A bit of background first though:
During the period of our analysis (i.e., 2006-2011), 14% of households in the newspaper’s market subscribed to the focal newspaper...
Conditional on subscribing to the focal newspaper, 72.4% (71.6%) of readers within (outside) the core market opt for the Daily option. The corresponding numbers for the Weekend and Sunday only options are 5.4% (4.4%) and 22.2% (23.9%), respectively...
The Daily option witnessed the steepest price increase of nearly 77%, both within and outside the core market, while prices of the Weekend and Sunday only options also increased by 52% and 38%, respectively...
On average, across the three options, the newspaper’s circulation witnessed steep year-on-year declines within (outside) the core market of between 7-10% (2-6%)...
While display and inserts lost 57.7% and 43.4%, respectively during our analysis period, Classifieds ad revenues experienced the steepest decline of 88.3%...
Between 2006 and 2011, classifieds ad rates at the focal newspaper declined by 66%, possibly as a result of the growing popularity of Craigslist. The rates for display ads and inserts experienced smaller declines of 16.7% and 10.8%, respectively.
So, subscription prices for readers increased (and readership decreased), and advertising rates (the price advertisers pay) decreased and so did advertising revenue. However, that decrease in readership was likely to be both the result of decreased price (and so was the result of a movement along the demand curve), and the result of changing reader preferences to (a decrease in demand, or a shift of the demand curve to the left). This creates an identification problem (which I've written on before, here) - which of the movement along the demand curve or the shift in the demand curve has contributed the most to the change in price? [*]

Pattabhiramaiah et al. then used their model to: how optimal markups evolved between 2006 and 2011 in each of the three cases: actual markups (computed based on our model parameters), the case where we switch off the decline in readers’ preferences, and the case where we switch off the decline in the incentive to subsidize readers at the expense of advertisers.
They find that:
...within the core market, the decline in readers’ preferences accounted for between 8-21% of the increase in subscription prices. On the other hand, nearly 79-92% of the increase in subscription prices between 2006 and 2011 can be traced back to the decreasing incentive on the part of the newspaper to subsidize readers at the expense of advertisers.
So, their results support the argument that the puzzle is explained by decreasing subsidies from the advertiser side of the platform market to the reader side.

There are two final (statistical) points I want to make about this paper, which may leave us with some concern about the results. The first is illustrated by this quote from the paper:
We find that the correlation between the subscription of the local newspaper and the local subscription of national newspapers is 0.8, suggesting that it is not a weak instrument... Overall, these results suggest that the instruments, along with other exogenous variables, explain 83% of the variation in readership. Compared to the first stage regression with only the exogenous variables, but excluding the instruments, the proposed instrumental variables improve the R-squared by 12-14% for ad rates and 11-15% for readership. Therefore, we contend that we do not have a weak instruments problem.
What the hell? There are actual statistical tests that you can run for testing whether you have weak instruments (Stock and Yogo have a whole chapter on it here), but rather than report the results of those tests they obfuscate instead? On the basis of what they have written, we are left with little idea about whether their instruments do a good job or not (for more on instrumental variables, see my post here). The second issue is somewhat hidden in a footnote:
Our in-sample MAPD ranges between 17.4%-17.8%. The out of sample MAPD range between 12.1%-16.8%.
The MAPD (Mean Absolute Percentage Deviation) is a measure of the error in their model, and it is extremely unusual for a model to show a lower error on data that was held over for validating the model (out-of-sample data) than on data that was used to construct the model (in-sample data). After all, most models are constructed to minimise the in-sample error. So this should leave us a little concerned about their model (or their calculation of MAPD). Or perhaps it's just luck that their data does a better job of predicting the 2010-2011 period than the 2006-2010 period?

[HT: Marginal Revolution]


[*] Pattabhiramaiah et al. use their data to first eliminate the alternative possibilities of increasing quality of the newspaper leading to an increase in price, or increasing marginal costs leading to an increase in price (in fact, they find that marginal costs actually declined over the period).

Saturday, 3 June 2017

Pakistan should subsidise LPG for heating and cooking

Indoor air pollution is a serious problem in many (or most) developing and middle-income countries. The main source of indoor air pollution is the burning of solid fuels (such as firewood, animal dung, or crop residues) for heating and cooking purposes. In Pakistan alone, the World Health Organization estimates that over 70,000 deaths annually can be attributed to indoor air pollution (see here), with the global total being around 1.6 million deaths annually.

When we think about demerit goods (goods that society would prefer there was less consumption of), there are typically two solutions. The first option is a command-and-control policy that prohibits or limits the consumption of the good. In this case, governments could ban the use of firewood. However, it is unlikely that such a ban is feasible. The second option is to tax the good, but again in this case taxing is not feasible as firewood, animal dung, and crop residues can be obtained at low (or no) cost by rural households direct from the source.

In a new working paper, Muhammad Irfan, Gazi Hassan and I use household data from Pakistan to estimate the price and fuel expenditure elasticities of demand for various fuels used for heating and cooking. Specifically, we pooled data from three waves of the Pakistan Social and Living Standard Measurement Survey (2007-08, 2010-11 and 2013-14), and used Deaton and Muellbauer's LA-AIDS (Linear Approximate Almost Ideal Demand System) model. That sounds complicated and fancy (and it is), but the output is pretty simple - it estimates all of the price and fuel expenditure elasticities for the different fuels (natural gas, LPG, firewood, agricultural waste/crop residues, animal dung, and kerosene). We were also able to estimate different elasticities for rural and urban households.

We found that all fuel types except natural gas were price inelastic at the national level and for urban households. In rural areas, natural gas and LPG were found to be more price elastic compared with urban areas. Fuel expenditures elasticities for all fuels were found to be positive and between zero and one.

Finally, we ran a fairly simple policy simulation to test how much solid fuel use could be reduced by subsidising the cleaner-burning fuels (LPG and natural gas). We found that subsidizing LPG dominates a subsidy of natural gas, producing a greater reduction in solid fuel use at a lower total cost to the government. If the government wants to subsidise only one clean fuel, they should subsidise LPG instead of natural gas. So, while it may be unusual for an economist to advocate in favour of a subsidy, in this case it probably makes a lot of sense, if you want to reduce the burden of disease from indoor air pollution.

Finally, this paper is also the first research paper from Muhammad's PhD thesis, so congratulations to him on that achievement, and I look forward to reporting on his future work in later posts.

Thursday, 1 June 2017

Paul Krugman on interstellar trade

Economists (and academics) should never be too busy to have fun. I sometimes regret that I'm too busy to read some of the more fun contributions of other economists. For instance, this 2010 article by Nobel Prize winner Paul Krugman, titled "The Theory of Interstellar Trade" and published in Economic Inquiry, was sitting in my to-be-read pile for way too long before I picked it up this week. I can't be faulted too much for this relative to others - Krugman originally wrote the paper in 1978 (see this version), and took 32 years to get around to having it published.

The paper extends the ideas of trade between nations (or between planets within a single star system) to the case of trade between planets of different star systems. In that case, travel between the stars would require travel at close to the speed of light, where time dilation becomes an issue. The paper is full of memorable quotes, like:
It should be noted that, while the subject of this article is silly, the analysis actually does make sense. This article, then, is a serious analysis of a ridiculous subject, which is of course the opposite of what is usual in economics.
It isn't the only humorous bit (which shouldn't be a surprise - regular readers of my blog will recall that Economic Inquiry is the journal responsible for classics like "Riccardo Trezzi is immortal" (see my post on that one here), and "On the Efficiency of AC/DC: Bon Scott versus Brian Johnson". The Krugman paper has other points of interest, such as the imaginary Figure 2, on which he explains:
Readers who find Figure 2 puzzling should recall that a diagram of an imaginary axis must, of course, itself be imaginary.
And this:
Readers may, however, wish to use general relativity to extend the analysis to trade between planets with large relative motion. This extension is left as an exercise for interested readers because the author does not understand the theory of general relativity, and therefore cannot do it himself.
I guess that leaves an opening for further extensions of this work, and there are certainly enough former physicists working in economics (especially finance) that someone must take up the challenge sooner or later (or sooner and later, time being relative and all that).

Paul Krugman blogs for the New York Times. Despite having read his blog for a while, to me this paper on interstellar trade is his most entertaining writing.