How can track record matter in double-blind grant reviews?


We should have double blind grant reviews. I made this argument a couple weeks ago, which was met with general agreement. Except for one thing, which I now address.

trouble coverSome readers said that double-blind reviews can’t work, or are inadvisable, because of the need to evaluate the PI’s track record. I disagree with my whole heart. I think we can make it work. If our community is going to make progress on diversity and equity like we keep trying to do, then we have to make it work.

We can’t just put up our hands and say, “We need to keep it the same because the alternative won’t work” because the status quo is clearly biased in a way that continues to damage our community.

It’s instructive to note that nearly everybody who pointed to “track record” as a qualm fell into the same demographic group as myself: tenured white guy. I think it’s no accident that members of the group that stand to lose the most from having their identity blinded are the first to raise concerns that the process can’t work. The enfranchised want to keep their franchise. I get this is how things work.

I don’t want to dismiss these concerns of tenured white guys like myself. Because like it or not, these folks are still the decision makers. If white men weren’t running the show, I’d bet a well-made sandwich and a glass of unsweetened iced tea that we’d probably have a lot more double-blind review than we do at the moment. So these people — my people — are the folks that I need to convince. I’m taking this demographic — which is my demographic — seriously. I just don’t want to scream “bigot bias unfair privilege yadda yadda” because we’ve got to build a common vision for change. I hear you, I respect your opinion — heck in many ways I’m one of you — and I’m taking the time out to write this post so that we might find some common ground.

In our status quo, reviewers evaluate the track record of grant seekers. In the absence of other variables, this should result in a more fair and higher quality decision. However, the mechanism that allows this track record — knowing the identity of the PI — also facilitates conscious and unconscious bias. This has has a demonstrable, known, and problematic negative effect on members of our community: women and ethnic minorities. If you argue that the lack of double-blind review doesn’t result in unfairly biased decisions, then your argument has an impossible uphill climb.

Which is worse?

  1. Keeping a grant review process with a bias against women and minority scientists that perpetuates a long history of harmful exclusion and discrimination
  2. Working to find a grant review process that uses a double-blind method, so that a set of reviews are created without use of the name, institution, or track record of the PI

If you say the second thing is worse than the first thing, then, well, I just can’t even. That’s an attitude that I don’t want in the scientific community. That’s an attitude that doesn’t belong in this 21st century in these United States of America. I want to find common ground, but if you are so unconcerned about bias against women and underrepresented minorities that you aren’t willing to rethink about how we can review grants more fairly, then I can’t imagine where that common ground might even be, maybe as far as Titan, as close as Ceres, but probably as far as the Kuiper belt. Our common ground cannot be one that we currently occupy on this Earth. Equity is a foundational value for our community.

Once we can get our head around the fact that biases in the review process are causing unfair and adverse decisions for historically excluded subsets of our community, then I see two ways to process this challenge:

  1. “Since we really can’t give a grant to someone without seeing their track record when we do reviews, then we’ll just have to live with the insidious effects of these biases that will continue to harm my colleagues.”
  2. “I do think it’s useful to consider track record in grant reviews, but I also see that blinded reviews are important to remove bias. Rather than dismissing the idea of double-blind reviews, I’d like to seek alternative ways to conduct reviews that remove this bias.”

If you fall into the first category, and think that the biases caused by a lack of double-blind peer reviews are not a big problem, could you do me a favor before you share your opinion? Could you ask a few women and a few scientists from underrepresented groups what they think about this? To see if they share your opinion?

If you’re a member of a group that is the beneficiary of unconscious bias against one’s competition in the applicant pool, then you don’t have a disinterested stake in the maintenance of this bias against people competing against you for grants. When it comes to the importance of double-blind reviews, my opinion shouldn’t count as much because I’m in the demographic category that stands to benefit from having my identity known.

I see two ways to conduct double-blind reviews of grants that also let the funding agency take the track record of the PI into account:

First, track record can be factored in by the program officer (which I mentioned in my original post).  I do think it’s important that a person who is awarded a grant not have a history of squandering prior support. If a project is worthy of funding, then in my own opinion, then a program director is quite capable of making the call that a PI isn’t qualified to do a project because of their track record. It’s been argued that this is not a binary issue, that ‘a track record worthy of receiving funds for a proposed project’ is a complex thing that panels can assess better than program officers. That’s a valid opinion, and think I might even agree with it, but that also opens a door for bias. I can understand that people with a strong track record, especially those from non-margainalized groups, would hate to lose an advantage over people who have not yet established a track record, and also would think that it’s unfair that people with a poor track record receive funds. Just as track record isn’t binary, neither is bias, as was pointed out yesterday: “Insider status isn’t binary, of course.” You don’t get the benefits of review as an insider if your name isn’t on the proposal.

Second, in the comments to my original post, Emilio Bruna pointed out that there could be a two-stage review process, in which track record is assessed after the panel reviews a proposal. When a project is considered, federal agencies make a point of emphasizing that it’s projects that are funded, not people. Of course, when research takes place, we need to make sure that the people conducting the project are well qualified to do it in an excellent manner. But first, we must recognize the value of the project itself. There is no reason that we can’t do this double-blind assessment of the project, and then leave it to a different panel to ensure that the most worthy projects are being conducted by highly qualified parties with an appropriate track record.

These two ideas about implementing double-blind review and including track record as a variable aren’t just mine. After I wrote those paragraphs, I found this document on the NSF site that discusses ideas for enhancements to the review process. Near the bottom of the second page are a clear “Version 1” and “Version 2” for double-blind review that match mine. That document is from 2011.

How much time, trouble, or expense is it worth to conduct procedures that protect the marginalized members of our community from bias? I’m sick of people saying it’s too hard, or too much work, to make things more fair. Once the people who are getting screwed over by bias start saying we can’t make the system more fair, then maybe we should stop trying. But when a person who benefits from the bias, albeit inadvertently, says that it can’t be done or it’s not worth the effort? Pffft.

Even if it were possible for panels to take track record into account by reviewers without gender or ethnicity bias (which is, of course, not possible), then there are three reasons why I am not so hot on using track record in panels anyway.

First, the discussion of the PI’s track record is an open door to introducing spurious issues into the review process. A lot of us work in small academic communities where we know one another moderately well, either from personal interactions or by reputation. I often get grants to review by colleagues of mine. It’s my job to separate out my personal thoughts about the person from my assessment of the project. I try, and I hope I’m doing well at it of course. But a system that is designed to rely on forthright behavior of all community members is bound to have insidious outcomes. We can’t list people with whom we have a formal conflict (collaborators, mentees, mentors, and so on) because once you’re in the game for a decade or two, you have a history with most of the players. I can’t guarantee I’m being unbiased, as hard as I try. And there must be plenty of scientists who are not even trying. Are you going to be the person to claim that you are capable of writing an unbiased review of a proposal from someone you have a history with (of any kind)? If so, I imagine there will be a long line of sociologists and psychologists ready to point out how you’re wrong (which is presumably why, in sociology and psychology journals, they usually have double-blind reviews).

Second, consider that the track record itself is reflective of the bias against the marginalized members of our community. Even if there were not unconscious bias in the process, then using track record as a measure of merit is still flawed, because of all the crap that most people have to deal with gets in the way of producing an equivalent track record as a white man. Women are less likely to publish first-authored articles because there is a bias against them in the peer-review process. This is just a fact. Track record itself is a manifestation of bias. If we compare people on the basis of their track records, then the scientists from marginalized groups will, on average, come out a little behind because of the systemic resistance against their efforts to do science.

Third, in my experience, panels don’t seem to be that good at sizing up whether or not a PI is qualified to do the work, and this part of the evaluation process appears to be particularly prone to bias based on institutional and personal affiliations. Have you had a reviewer tell you that you weren’t capable of doing something, even though you had clearly demonstrated in the proposal that you have already done that thing plenty of times? I’ve not only experienced this, but have heard this from many people. Bias against small institutions is egregious when single-blind review is implemented. How often have you seen in a grant review that “The applicant was trained in a good lab,” or “The academic pedigree of the PI is an indicator that the project will be successful,” — which clearly implies the converse assessment. The only way to get rid of this bias is to double-blind the process. It’s hard to not wonder about how these biases are actually operating on a day to day basis. (drugmonkey seems to have been wondering about them this weekend as well.)

I suspect that opponents to double-blind review may not be adequately attuned to the pervasive biases against people in other demographic categories.

As a white guy, if I want to do science, then I can just go ahead and do science. But other members of the community have roadblocks put up in front of them all of the time. I haven’t had a senior scientist hit on me at a meeting, or have a supervisor target me with inappropriate advances, or have anybody doubt my scientific ability because of my identity that I was born with. If we take the time to listen to scientists who say that our established structures are barriers to success, we should be prepared to create evaluations that take into account this uneven playing field. Because some members of our community need to be twice as good just to keep pace with other members of the community, then we need to be consistently intentional about identifying and implementing mechanisms to reduce bias.

Reviewers are routinely incapable of abstracting how different conditions result in different track records. I think most men (myself included) are not capable of understanding how the experience of being woman in science affects one’s professional trajectory. I regularly hear stories from women that still raise the (few remaining) hairs on my head, that make me wonder how it’s even possible to operate in such a hostile environment. But I won’t say “I don’t know how you do it” to women, because that just normalizes the unacceptable condition of our community. Instead, I’m saying, “You shouldn’t have to experience this bias, and here is what I’m doing to change it.”

I know a little bit about how reviewers fail to understand how the academic environment shapes one’s track record. I just went through a bunch of my reviews from the past 6 years. Of the reviews that remark on track record (and a bunch do not), I’d say about half say that I have a particularly strong track record. The other half say I have a mediocre-to-marginally-acceptable track record. And there rarely is any middle ground. What’s the difference between one set of reviews and the other? I don’t have any idea. I can take some guesses though. It might be the people who say I have a great record are people who personally know me and/or my work and think highly of it. It might be the people who say that I have a strong record have looked at my institutional background, taken into account my historic teaching load, and that I have an undergraduate-powered laboratory. (Maybe the difference is panelists vs. ad-hoc reviewers?) As for the people who think my record is weak? I guess they’re comparing me to themselves, or to other people who have PhD students and work in universities that don’t take teaching that seriously. Or maybe they know my role in the academic community and still think that I’ve underperformed. I have no idea, really.

So my grant reviews have a bimodal distribution with respect to the assessment of my track record, which may or may not be caused by biases or blindness to the experience of others. I don’t want to generalize from my own experiences, and I don’t want to put biases against researchers in primarily undergraduate institutions in the same category as biases against against women and underrepresented minorities. It’s just my only experience of being in a marginalized demographic, aside from being veg. And both are my choice.

A recent paper on biases in peer review concluded with:

Peer review is a flawed process, full of easily identified defects with little evidence that it works. Nevertheless, it is likely to remain central to science and journals because there is no obvious alternative, and scientists and editors have a continuing belief in peer review. How odd that science should be rooted in belief.

There is some chance that you’ve read this far and are thinking, “Why is it that for you and others, so many of these issues in doing science have to deal with gender, ethnicity, privilege and other socioeconomic sociology?” The answer to that question is really simple: It’s because scientists are people. If we don’t work to fix our individual and structural biases, then things are not going to get better. If the moral arc of the universe bends towards justice, that’s only because people like us need to keep pulling on it.

12 thoughts on “How can track record matter in double-blind grant reviews?

  1. Yes! Thanks for a great follow-up post, Terry.

    I have actually seen an NSF rejection based on identity, but I think it was (sadly) reasonable. The proposal was a single PI proposing to do some sophisticated modeling with some data that the PI was collecting. The PI had a great track record — as a field ecologist. The proposal made it through preproposals, but was declined at the full proposal stage, based on “it’s not clear the PI has the skills to do the described work.” Accurate. It was going to be the postdoc (me) that would do the modeling. But I was just a student at the time and I think I wasn’t even listed on the proposal. So I think this was a quite reasonable rejection based on the information the panel had. Sad for me because it would have been a fun project.

    So, like you, I think it has to be a two-tier process, where the first round is double-blind. We need to make sure PIs can do the stated work. And we know that NSF also makes decisions based on “portfolio.” So while we want to discourage funds going to those with poor track records, program officers need identity to make sure that “no/little record” isn’t too disadvantaged against “good record.” This helps counteract bias against early career researchers.

    The thing I’m still a bit uncomfortable with is that someone still has to take identity into account. And program officers are people too. They also have biases. Maybe it’s better for them to do it than panelists, because that’s fewer people to target for unconscious bias training and the like. And because they’re already mindful of needing a balance portfolio. There may still be bias. But a two-tiered system with program officers assessing track record after a double-blind review of project content is the best idea I’ve seen put forth. And it would be a big step in the right direction, even if it might not be perfect.

  2. The small world problem of science is not a trivial one. Double blinding might work, but it’s also a bit predicated on PIs not opening discussing directions they are taking or new grants they are submitting, no? Because the type of work, specific system, and even just style of writing can give someone’s identity away; maybe imperfect double blinding is better than no double blinding at all, but I wonder if biases would still creep in just because reviewers are familiar with the work they’re reviewing because that PI gave a talk about it at a conference somewhere talking about their future directions.

    I agree with a lot of what you say here and it is worth trying something, but it’s not as easy to double blind science as it was say classical music auditions where all they needed was a screen to shield players from the eyes of evaluators.

  3. I had thoughts similar to Margaret’s last paragraph while reading this post. It may be easier to reduce bias among program officers, especially since they’re trying to balance portfolios, as Margaret pointed out. But it’s also risky to rely on just one person to do the only evaluation that considers identity. If that person has biases (conscious or unconscious) against women, or racial minorities, or people from small schools, or whatever, there’s no one to correct them. At least in a panel, those thoughts would get voiced out loud, and someone else could say, “Well, actually, I think we should be basing our evaluation on factors X and Y.”

    I haven’t thought very much about this, but it almost seems like two panels would be necessary, one to judge the science and one to judge the scientist (as discussed above). I wonder how this would work logistically. Everyone has a lot on their plate, and people might not be keen to volunteer for more panels. But people have also been blogging about early career investigators wanting to review and not getting a chance, so getting enough reviewers might not be a problem (at least at places like NSF, which seems to welcome early career reviewers).

  4. Good points. This will be even harder – but perhaps even more valuable – for funding systems that fund investigators/research programs rather than projects (NSERC Discovery in Canada being what I have in mind). Here increased emphasis on the track record, and decreased on the project, intensifies all the problems you mention; but as a direct consequence, would also intensify the damage done by any implicit bias.

  5. Separating the review of project from that of investigator (including investigator’s record) is indeed the best proposal I’ve seen to provide real momentum in countering implicit bias. (As has been noted, institutional identity as well as individual scientist identity does seem to have an unfair influence on proposal outcomes…) As someone who has served on NSF panels and a Committee of Visitors (reviewing the reviews, as it were), it seems as if this could be effective in some of the larger NSF programs. However, I’ve also had experience with the DFG (the German equivalent of the NSF) and with NIH. In both cases, it seems that most individuals working in a particular field or on a particular study system are so well acquainted with one another that the description of the science itself “gives away” the identity of the investigator. And in the German case, it seemed at times that literally every researcher had some sort of formal “conflict” with every other — advisor, mentor, grad student, postdoctoral trainee, collaborator. (And Germany is the largest of the European scientific communities!) This is not to say that the double blind review of the project shouldn’t be a starting place – it just shouldn’t be relied on as a sole solution for the issues of implicit bias.

  6. A couple people have made well-informed and insightful comments about the “but they can figure out the identity anyway” problem. Thanks for contributing to the discussion! I don’t like to chime in too much in the comments on my own post, but I have a couple thoughts to add on this, which I didn’t include in the original post for brevity (in a 3000 word post, I realize), but I’ll add them now.

    First, at least when it comes to publishing peer-reviewed papers, we know that the authors aren’t successful at guessing the identity of the person all of the time. This comes from an editor of a journal that’s gone double-blind in conversation, and also — the fact that going double-blind changes outcomes. If everybody knew regardless of blinding, then you wouldn’t expect to see a change! Since we see that the number of women getting papers accepted goes up after double-blinding in journals, then we know that it does have an effect — even if people can often guess!

    Second, in small fields you can guess who wrote a proposal, that kind of thing can happen in the US too. However, I think blinding would still have a huge effect. Knowing the lab group that originated the ideas is hugely different than knowing who the PI is, and also being 100% certain rather than taking a guess. For example, a lot of former grad students and postdocs will write a proposal to build on one direction of the work of the work from their former PI, but in their own lab and not under the umbrella of a PI. Knowing the lab that started a line of work is different than knowing the lab that is writing this proposal. There are people who have their academic enemies, or friends, in certain places — but they also rarely have grudges against that person’s proteges, or they also might wonder if an enemy-of-an-enemy is writing the proposal instead of the enemy. For people who have intentions of helping or hindering colleagues, there’s a big difference between a guess and knowing for sure. (For example, there is a person in one of my subfields who has overtly done things to obstruct me, for reasons I don’t understand. (This is stuff I have direct knowledge of, not just me guessing.) Once in a long while, I do get stuff of his to review, and actually since it’s really good I’ve been glad to provide positive reviews. But if I weren’t so generous, it’d be easy in my heart to find reasons to sabotage his stuff. In a double-blind situation, I wouldn’t have that option, because I would have trouble distinguishing a proposal or paper from him versus one of his students, and he has wonderful students who have never done anything to wrong me and have always been very collegial. So in this case, for that guy, double-blinding would protect his interests if I were into retribution.)

    In short, because double-blinding does change things in the right direction, then we know that it changes things. It’s not the solution to all of our problems, of course, but I’m heartened to see comments with folks recognizing that it’s a necessary step.

  7. It seems like reviewing the track record and research proposal independently would overcome some of the issues.

  8. NASA has moved in this direction at least for Hubble observing proposals – for the last couple of years, the proposers are listed (but not on the cover page) but the PI is not identified, and the previous section on the PI’s earlier work with Hubble has been eliminated. Some of these points could of course be gleaned from the proposal text, but without as much certainty and much later in the reading of proposals.

  9. Great post. There’s an aspect of track record that I find important when evaluating applications: can the applicants do what they propose? I can see how this has terrible sides on bias and on giving a pass to people you might know or like. However, it is critical whether people can do things or they just talk up the wazoo.
    It’s easy for people to see a paper on New Amazing Bunny Hopping Analysis and add it to their proposal. Can they do it?
    Without info on the applicants I will need to 1- give them the benefit of the doubt, giving a pass to bluffers, or 2- demand preliminary data, which we know is biased too.
    Anyone can write an amazing proposal claiming the most amazing techniques from the leading labs in the field. They just need to read the papers and make the claims. Can they actually do them? Have they convinced the people who actually do these things to collaborate? Being able to propose without backing it up is open season for bluffers. There’s a lot of them.
    Some people bluff knowingly, others have no idea that things are way more difficult to do than they think they will be. With funding levels so low, I rather give the support to people who know what they are doing. The question is how to reduce bias while keeping science strong. A two-level review will be difficult because of the added work, and because track record is connected to what will actually happen.

Comments are closed.