Over my year of sabbatical, I planned to become comfortably proficient with data manipulation and analysis with R. I’m getting there. (I was doing a lot more over sabbatical of course, but this was one of my main objectives.) I figure it’ll take at least a few more manuscripts to get comfortable. As I really should be cranking out a dissertation’s worth of stuff in the next year, I have plenty of opportunity to get better, and the rate limiting step for me is sorting out the code.
I’m going to briefly describe the resources I’ve used, and the challenges and constraints I’ve faced — but the main reason I’m writing this post is that I’m hoping the comments will be a clearinghouse of suggestions and perspectives. A lot of y’all did things differently than me, and some of you are literally pros at teaching this stuff to others. Please, leave a comment with your thoughts about how it’s best for all kinds of folks to learn R, because I imagine that’s where this post will be most valuable to others.
I think my circumstance is rather common, but not one I hear folks discussing much. I went to grad school when R was not much of a thing, finishing up in 1999. In the early 2000’s, I spent a bit of time to get ramped up, but didn’t get far, once I realized it involved delving into the documentation for S. As a PI in a primarily undergraduate institution, I’ve been running my lab steadily, running stats through the range of ways that worked just fine for me. Over time, more and more ecologists have started using R. In recent years, for many approaches, it’s switched from a tool of choice to a tool of no-other-choice.
For a lot of folks in my boat, we don’t have a community of savvy folks around us who can give us tips and troubleshoot off the hip. My department now has a couple professors who are good to great with R, who were recently hired, but there really isn’t anybody else I can readily consult in person. So this is pretty much a solitary endeavor. I was really interested in showing up for a week-long short course, I’ve heard how useful it is, but I just didn’t find that week in my schedule, even on sabbatical.
If am going to keep working on what I’ve been working on, now I’ve got to use R. It’s not a bad thing, it’s just, well, cumbersome to pick up this tool when I’ve already mastered a toolset that did what I needed. There are a lot of other ecologists from my generation that aren’t exactly in this situation — they may or may not have learned R — but most of their analyses and figures are run by members of their lab. I’ve talked to a lot of mid-career and senior folks who concede they should learn R, but for them it’s less of a pressing need because it doesn’t grind the work in their lab to a halt. After relying on collaborators for this kind of stuff interstitially, it long ago became clear this is no longer sustainable. The upshot is I spent my sabbatical learning stuff that folks now are typically learning in grad school, who may have gotten a taste in their undergrad work. And until my department gets on the same page and gives these skills to all of our students as a part of the curriculum (which won’t happen overnight by any means), then I’m to be the guy who teaches my lab members to do this, that is, if it’s something that we make happen. Because I won’t not be available enough to my students to help them troubleshoot that often, even if my help would be helpful at this point.
I’m coming at this from the perspective of being very comfortable with statistics. A lot of the resources out there for teaching R are teaching statistics as much as they are teaching how to code usefully in R. This totally makes sense for junior scientists who are learning the theory and the math while they are learning to code in R, but this also means a lot of the materials out there aren’t built with me in mind.
How did I go about starting out?
I started with a book. The second edition of Beckerman et al’s Getting Started With R. It did precisely what it said it would do — it got me started with R. I was stationed at a desk for a few straight days, mostly walking through the lessons bit by bit — occasionally dipping into my own datasets and looking other stuff up along the way. In my experience, if you know stats but don’t know R, then this book will get you to a point where you’re comfortable with the basics. The book points out a lot of obvious things in a very obvious way, and might seem to be slow in this regard, which I found to be a feature and not a bug. I totally breezed through it, and by the end, I was good to go with the basics. The book also pointed me toward some standard resources. When googling up how to do something in R, you often come upon useful information in stack exchange. Also, the R Studio cheatsheets are super helpful.
Having gotten a little familiar with base R, I’m a lot happier to deal with dplyr and the ‘hadleyverse’ as much as possible.
(By the way, what’s the deal with the name dplyr? Here’s the deal:
I’d also like to point out that I’m under no illusions that using R is no more than an incremental step to prevent oneself from being outdated, and this is a good thing to keep in mind regardless of your career stage. I imagine the standard might soon be to use Python or something else that isn’t even on the horizon. It might not be long before R is uniformly seen as the antiquated way of doing things. In a few decades, you might look back at your R code just like senior folks now chuckle about using punchcards and Fortran.
This site had a discussion about good ways to learn R four years ago. Since then, the tools and support for using R have evolved substantially. But maybe it still might be useful to see the comments from back then. So I’m hoping new comments on this post can help steer folks in useful directions.
By the way, this post is coming out just as the annual meeting of the Ecological Society of America is ramping up. Which means a lot of folks probably won’t see this on a prompt timescale. (And probably why I won’t be engaged in commenting/responding much either.) If you happen to be at ESA, please do say hi if you see me around! (And if you wake up early Thursday morning, do feel free to catch my ignite talk).