Three years ago, the “world’s largest pre-college science fair” was passing through town. I was asked to volunteer as a judge, and my university system was pleased to have a faculty member at the event. I was a student in the fair in 1988, too, so I was curious to see it from the other side.
What I saw was appalling.
After the experience was over, I got a pro forma email asking me to provide any remarks and recommendations. Here is what I sent back to the organizer, edited for length and to accommodate the uninitiated.
May 16, 2011
I actually do have some remarks, as I’ve been really troubled since the judging took place last week. I’d like to know what response my remarks might elicit, if any. It is my hope that my remarks are taken seriously.
To make the very long story short, I’m troubled by what seems to me to be a systemic problem in the judging process. I’m not taking this lightly and I find this compromises the integrity of the fair in a serious way. I didn’t say much about it at the time — I was quickly dismissed — but after some detailed consideration and deliberation with colleagues, I am very concerned about the process. I am reluctant to participate as a judge again because my involvement would compromise my professional standing, if my colleagues were to know how this process took place.
There was a shortage of judges in my category, some projects were only officially seen by 2 [which is far below the standard set by the fair; there were 30+ judges on my panel as there were many projects]. Because the finalists did not receive enough interviews to have a representative sample, the scores from the official judging sessions were thrown out, and the decisions in caucus were actually not informed whatsoever by the votes. In fact, the group voted by majority to ignore the scores, even though not all of the judges were present in the caucus.
During the caucus process, each of the judges advocated for what they considered to be top projects for the grand, 1st place, and 2nd place awards. Each of the judges introduced what they thought to be the strength of the project and the finalist who conducted the project, and then those with experience with the finalist and project then contributed their opinions. Of course, given the breadth of projects, much of the science was outside the expertise of the judges who were assigned to a particular project. We all communicated our areas of expertise and made a point to interview students whose projects fell within out expertise, even if we were not officially assigned as judges to a particular poster.
The basis of evaluation for most of the projects was based on the students’ abilities to articulate the purpose of the project, why they conducted the project, and what they learned from the experience. When the judges were evaluating the projects, there was a strong emphasis on shared interests and how much the finalists convinced the judges that they were personally inspired by conducting their projects. When the judges were asked why they thought certain projects should be selected, they judges emphasized that they could really relate to this student, found them likable, and found that their fascination, and decision to conduct the project, to be superb.
These criteria are faulty in several ways. The central problem with this approach is the inherent cultural bias. Depending on the cultural background of the student, the students may be trained to not communicate great passion in their science, as they may believe that communicating inspiration would distract from the significance of the project. The cultural bias in this criterion is even more insidious because the way that a judge evaluates excitement and likability is biased towards individuals from their own cultural group. One judge who heavily advocated for a project – which won the grand award in the section — said (this is a rough quote from my notes): “My main reason for this choice is that I made a connection with her, she’s very personable and you could feel her enthusiasm for science more than the other finalists.”
Not all projects were perfect — though some approached perfection. There were some amazing students. Clearly, the judges were looking to find areas in which some students were more superlative than others, or to find flaws. This was the area that cultural bias was most overt. The flaws of students whose ethnic backgrounds resembled the judges were overlooked, and the flaws (in some cases, incorrectly evaluated by those who were not experts in the field) in the underrepresented students were scrutinized and amplified by the judges. As one example, a few judges specifically stated that they would give a student (of similar cultural background) a “free pass” on the statistics of the project, while the statistics of other projects (by students of different cultural backgrounds) were heavily scrutinized by the same judges. This happened even though the student with the free pass had simpler statistics which were central to the finding. In this case, I cannot come to any conclusion other than the fact that it was cultural bias – and I’m working hard to keep an open mind. This was not just the case with one or two projects, but a consistent pattern that I detected near the end of the caucus session after considering many of the deliberations.
Let me put it this way: The way the judges picked their top projects, it was a virtual impossibility for a project by a student to rise to the top of the list of finalists, if these students had an odd accent or who did not share a similar cultural background to the judges. I do not take this charge lightly, and do not make it without understanding the gravity of this charge.
I think the number of judges is a little problematic, and the qualifications of the judges is problematic, but the prevailing problem that taints the integrity of the process is the overt cultural bias that occurred. I do not think the basis of this cultural bias — which some might simply call systemic racism — is rooted in any kind of overt racism among the judges. The judges were evaluating based on criteria that they could use, and in the absence of scientific expertise, they were drawing merely on intuition which does not work well for finalists with different cultural backgrounds.
None — absolutely zero — of the judges came from groups that are traditionally underrepresented in the sciences. Given the ways in which top students were evaluated, I can’t imagine a scenario in which a student from an underrepresented group could have come out of the caucus with the top prize. I was surprised that I was the only person in the room who was conversant in Spanish – when the group was struggling to fairly evaluate the projects by several students who came from Puerto Rico. The translators were clearly not professional translators and had difficulty with the science and technical words (I interceded in one case, when I was walking past, to clear things up. I wish I had more time for this task but I had my own judging assignments to attend to.)
I think this can be fixed using three approaches. First, the judges need to be experts in their scientific discipline, with advanced degrees and an active research agenda. I greatly respect and admire K-12 educators, but some of the judges with this background were not prepared to evaluate most projects, which were conducted in a university laboratory under the mentorship a Ph.D. scientist. [more on this topic here, by the way.] Second, the pool of judges should only be certified after there is adequate participation of individuals from underrepresented groups. Third, the recruitment of judges should be more robust to draw from local talent in the area – nearly none of the faculty or professional researchers in the LA area were judging in my panel, which mostly consisted of volunteer educators flying in from around the country. As you realize that Los Angeles is a highly diverse area with scientists from many backgrounds, recruiting judges representing the ethnic diversity of the city is not an unreasonable task.
Please let me know what response there might be to my concerns.
So, what was the response? After asking three times for a reply, this is what I eventually received:
October 03, 2011
As for your concerns on the number and quality of judges, I assure you that the there was no one more concerned than I was about the small number of judges who actually showed up and interviewed students. The no-show rate was considerably larger than I had anticipated. Those that did show up, however, were well-qualified, with only a small number of exceptions, and I don’t know that any of those exceptions were in your panel. I’d have to look at the detailed records to be sure. I suspect, rather, that most people were being modest in their introductions if you got the impressions that they weren’t good enough to be there.
Here was my reply:
October 03, 2011
Thanks for getting back to me.
Just to be clear, my concerns were not so much about the qualifications of the judges and the degrees they hold, but rather the fact that the judging criteria – as stated by several judges in caucus – were not connected to any of the criteria specified by [the organization] and as a consequence, the judging was culturally biased.
There were a number of students who lacked the upper-middle class white upbringing that had no chance at winning a prize because of their background. Moreover, some of the top prizes in my category were selected on the basis of their likability and enthusiasm – and their ability to connect with judges on a personal level – even though the experts in the panel found major flaws and errors in their projects that should have knocked them out of contention. I found this appalling and I couldn’t leave from the process without having expressed my concerns.
Nobody in these panels can understand all of the projects fully – there is far too much breadth and depth. I’m not expecting this. What I did find are people who preferred to rely on their own impression of the likability of a finalists, and were inclined to disregard the opinion of the judges who were experts in particular subfields. Ultimately, the majority of the panel voted in caucus for finalists who they personally got along with, even if other judges clearly showed that these finalists had a poor understanding of their own projects or discovered that the finalists misrepresented their work.
This reminds me the sausage factory analogy. I think it’s just best for everyone if I stop eating the sausage.
After that? Crickets. Or frogs. The sound of silence.
The fair passes through town again in three months. I’m torn between showing up to fight against cultural bias, but I’m tempted to steer clear. I hope they invest more heavily into diversifying the pool of judges this time around. Maybe my best contribution would be to work hard to improve the pool of judges. There is a ton of money poured into organizing this fair, and with just a little bit of it, they could invest more in efforts to get a qualified and diverse set of judges. In the meantime, I suppose my efforts judging are best spent on science fairs in local public schools.
At the time this happened, I talked about it with one colleague of mine who was in my section (who I got to know in the day-long process of judging). She was not as concerned as myself because she didn’t perceive structural bias. She thought the judges were underprepared and over their heads, but not discriminating on the basis of ethnicity. She’s a very smart and very reasonable person, and has great judgment. So, I could be merely hypersensitive. On the other hand, the most parsimonious conclusion is probably that a room full of non-underrepresented minority judges is coming up with an unfairly biased outcome. In our current environment, anything else would be quite remarkable.
I’ve had concerns about posting this because of non target effects on the participants in the science fair. However, as I am equally qualified to judge in several categories in the fair, the only readers who could know which particular section I was involved in would be the officials with records of the fair. Of course, it should go without saying that none of the students did anything wrong or unfair. I am only taking this to the public venue after being blown off by the people in charge. The people in the organization that puts on the fair are well-intentioned, and I would like to see opportunity extended to all.
My aim is respectful conversation. I I hope I might be able to get a more constructive and detailed response from the organization in charge of the event. I think the the best case scenario is that the organization adopts guidelines that require effort to diversity the judging body, and invests the time and money required into making sure that these guidelines are both followed and enforced.
Keep in mind that my own participation in the fair in 1988 was the product of capricious judging, at least at my own high school, and my own gender and ethnicity helped me get there. One could argue that that experience was one of the things that led to me becoming a scientist. So sitting on the sidelines is tantamount to silently profiting from an ongoing injustice.
How would you size up this situation, and what would you do?