Update 15 May 2013: If you’re a newbie to R and want to know where to start, the comments on this post are now replete with (what I surmise to be) wonderful suggestions. Of course learning in the presence of those who know R is best, but this is a great set of suggested resources regardless of your environment.
I’m not that old, but I already feel myself getting a little stale.
How did this happen? Well, I guess it’s because I’m a professor and this is just the default rate of entropy.
When I was an undergrad, one of our introductory bio professors was a kindly man who was the archetype of deadwood. He had a separate slide carousel for every lecture in his course. When it was his time to teach, all he did to prepare was to pull the carousel off of the shelf. He didn’t have any idea what he was going to say until he saw the slide appear on the screen. Then, he would say the same thing he’d been saying for that slide over the past 20 years. It was just so obvious. One day, the slide projector broke. What did he do? He cancelled class.
This kind of thing is even more common now than it was back then, because few people had so many carousels at their disposal. It’s just done with powerpoint.
I’ve worked hard to keep my teaching from becoming stale. And since I’m doing a lot of research, then I can’t get stale at doing research either, right? If only that were true.
I imagine that molecular biologists all had to learn the ropes at PCR as machines and reagents became commercially available, and then relatively cheap and efficient. Nobody’s out there doing allozymes for population genetics after all, I would hope. And the same is true for RNAi, and now with nextgen sequencing approaches to genomics. In my flavor of work, there isn’t as much required to stay current, but nonetheless I’m still getting behind. If only I could have the time to run to just to stay in place.
At least I’ve diagnosed this condition and can fight the entropy. Just I keep the dishes mostly clean in my house and I have the oil changed in my car on time, I’ve got to stay fresh as a practicing scientist too. It isn’t easy.
This occurred to me, in part, when reading something that Joan Strassmann wrote (in the context of picking a good PhD advisor) that grad students are probably better at using R than their own advisor. I guess that’s the case in most labs, even if their advisors might have better statistical acumen.
If you’re a serious ecologist, nowadays, then R is an essential or near-essential tool. Here’s a confession: I’m useless with R. This is a problem. And it’s not a little problem, it’s a big problem.
I suspect that I’m not the only one in this boat, though I haven’t really heard anybody else admit to it. Every day that passes in which I still can’t use R, I’m not able to collaborate as effectively, the more reliant I am on others, and the less able I am to apply the most current tools to the experiments which I’m running. There is a single analysis that I should be able to do in R in an hour, that’s keeping me from submitting a manuscript that otherwise is pretty much done. That’s a problem.
Now, I’m not a statistical dunderhead. (I teach our graduate biostatistics class, but obviously teaching a class in something doesn’t mean you’re an expert). I design my experiments with specific tests in mind, and I choose ones that work, and I use model selection understanding the power and limitations of the approach. I understand frequentist vs. bayesian perspectives even though I don’t choose to say anything that would start a disagreement. (If you read my stuff, you can decide for yourself if I know what I’m talking about.) I guess you’ll probably just take me at my word that I’m not stupid when it comes to stats.
But there are a few analyses that I just can’t run easily, like NMDS or a GLMM. This is because I mostly use a powerful menu-driven version of SAS called JMP. It does nearly everything I want, and quite well. But there are a few analyses that I can’t run in JMP, which are becoming more and more relevant to the questions which I’m asking in my lab.
How did I get into this situation? Well, when do people learn R? In grad school. When I was in grad school, R was not the standard tool. Before then, I used SPSS on a mainframe (NO, not with punchcards) and a variety of easy-to-use programs on a Mac. (Statview was unparalleled for simple exploratory data analyses on Macs, and it was bought up by SAS and orphaned so that people would use JMP instead. The world has moved on without it.). By the time I was finishing up grad school in the late ’90s, R wasn’t in widespread use but it was ramping up. None of my fellow grad students were using it at the time, and I wasn’t behind the curve.
A few years later, while I was starting on the tenure track in the early 2000s, I put aside a little time to figure out R. That was a disaster and I couldn’t even get it to read my files. I had a few halfhearted attempts, but I couldn’t find the time. I looked into taking a short course, getting a book, but I didn’t have the time to make it happen. At this point, it wasn’t a critical failing, but I saw that more and more people were using R, and that I wasn’t one of them.
My lack of R mojo isn’t a teaching problem. Even if I was an R pro, I don’t think I’d use this in my course because the class is about understanding how statistics works and how to apply them, not how to use the software. I use JMP in the course because it is so easy to use, and I’m not going to waste instructional time on software tutorials. (We should have a separate class or seminar or experience that teaches students to use R, but it can’t fit in this class.) I’ve talked to people who teach with R in their courses, and they’ve reported that you either have to make it a course about learning stats, or learning R, but you can’t do both well with 45 hours of class time. Clearly, by using R you actually learn what you’re doing statistically, because that’s part of understanding coding. So I hear. But I’m not going to spend half of my time in class dealing with coding errors and stress when my students still don’t fundamentally understand probability, randomness and the actual nature of a null hypothesis.
While not a teaching problem, my lack of R mojo is a research problem. I am on it. I’ve been aware of this for a while, and I’ve found a way to deal with it.
For the last month, I’ve had sitting in my backpack wherever I go, what appears to be the exact resource I need: Beckerman and Petchey’s Getting Started with R: An Introduction for Biologists. From my quick browse, I feel mighty confident that using R like a pro is now only a matter of finding the time, and it doesn’t seem as insurmountable.
My hope is that, this summer, I find the time to actually remove the book from my bag and use it. This is the point in the narrative where I could explain everything I’ve done in the last month that would explain why I haven’t found the time to get to it, but you know the story. I won’t try to out-busy you.
This summer is already booked. Learning to use R to some degree of proficiency is going to take the amount of time that it would take to write a whole manuscript, or nearly write a whole grant. I have to decide which one of those things I’m not going to do to keep my skills sharp. Of course, I’ll be using R in the context of a manuscript. It’s just that this manuscript will take 2-3 times longer to write because of my R learning curve.
Maintenance isn’t optional. Learning R feels more like an engine replacement instead of an oil change, but I’ve got enough miles that I guess I’ve got to make the investment to avoid being sold for scrap.
Kodak stopped making the carousel projector less than ten years ago. I still have a carousel sitting around my lab, containing slides from the last talk I gave in this format. It wasn’t that long ago, really. (In the early 2000s, the Entomological Society of America hadn’t yet switched to accepting digital projection. That’s what still in the carousel.)
The world changes really quickly. As I’m doing my day-to-day faculty job, the world will be passing me by unless I actively work to keep pace. I always wondered how some people became deadwood. Now, I see how easy it is. It’s not about giving up, and it’s not about not caring. It’s about not strategically and systematically planning to keep up, which takes you away from immediate responsibilities. I’ve avoided this particular maintenance task for ten years, and just like when I go to get oil changed in the car, I’m not thrilled to spend my time that way. Of course, I’m glad that I can continue to drive a working car that will last a long while, and I’m glad that my soon-to-be-developed R mojo will keep me fresh for a good long while as well.
38 thoughts on “Drifting towards deadwood, or not: learning to use R (updated)”
Gosh, that reads like my life story and has stimulated me to stop putting off learning R; I have been procrastinating for years – will get Andrew and Owen’s book too ;-)
SAS user here. It is like R with complete documentation and references for all of the things that you want to do. Yeah, it costs money, but so does the time you spend trying to figure out an analysis that shouldn’t take more than a day or two to execute. R is just an extension of the statistical “machismo” so many ecologists exhibit. Despite the fact that this makes statisticians giggle at them.
If it works for you and does everything you need, awesome. I figure getting up to speed with real SAS or R would take the same time, and it does look like R is more the future.
Sorry Josh, but in my view R with one of the gui’s folks have mentioned is no harder to learn than SAS or JMP. And honestly, I found regular ol’ command line R much easier to pick up than SAS. So for some users, R is a free lunch (as in, literally free, and no harder to learn or use).
As for statisticians giggling at ecologists who use R, are you so sure that you’ve talked to a large and representative sample of statisticians? Because you haven’t talked to the ones I’ve talked to, apparently.
As for R being a cause of statistical machismo in ecology, I’m unclear why you’d say that, given the other views you expressed in your comment. If you think R is hard to use, and no more capable than SAS (as you seem to), then shouldn’t you think that widespread uptake of R is actually helping to *prevent* ecologists from doing over-complicated analyses that they otherwise could do, and do more easily, in SAS? My own view (which I can’t back up with any proper research, so it’s probably not worth much) is that the trend towards increasingly-complicated stats in ecology has more causes than just easy availability of fancy R packages (e.g., easily availability of other software such as WinBUGS, and causes that are independent of scientists’ choice of statistical software).
Obviously, I’m just speaking from personal experience here and your mileage may vary. I’d hope you’d say the same. I’m sure that SAS was and continues to be the right choice for you, and if you personally don’t like R, that’s totally fair enough. But frankly and with respect, I think you’re overgeneralizing from your own experience when you imply that people who use R must somehow be uninformed or bad at decision-making, so that they’re deservedly getting laughed at behind their backs by people who actually know what they’re doing. Frankly, I and the many colleagues of mine who use R know what we’re doing, and we chose R in light of our familiarity with various alternatives, including SAS.
Jeremy, this seems to be an important topic for you and you have read a great deal into those 5 short sentences. You’ve also made some interesting assumptions that are clearly a reflection of your own biases. It really was just a bit of sarcasm directed at the die-hard users of R who often seem to suffer so.
In any case, I should clarify the statisticians giggling part. It has nothing to do with which tool you choose to use (such as R or SAS) it has to do with the weird attitudes and steadfast dedication that some ecologists have about what somehow makes some analytical tools “better” than others. That is what the statisticians I’ve spoken with, find humorous. We’ve all gotten a glimpse of this with lots of such unsolicited recommendations for using analytical tools by other scientists in reviews, at meetings, etc who say things like: “the BEST way to analyze a mixed-model design is, of course, with … But this advice is typically given by scientists who are not qualified to make any such assessments, at least so far as their relative (to statisticians) understanding of how the computations are carried out by the tools, or even when to apply certain analyses.
I wholeheartedly agree with Terry’s sentiment – use what works for you. I do not struggle with R nor do I think that R is somehow causal for statistical machismo, it is simply one of the more recent reasons for expressing it. R is a tool, and according to your reply, apparently, so am I.
Thank you for clarifying Josh. Apologies for reading more into your comment than was intended. I misunderstood your sarcasm as being directed at R users generally, and your remark about giggling statisticians as statisticians giggling at R users.
By the way, “Giggling Statisticians” could be a good indie band name.
Hey, somebody else who remembers Statview! I used to use that a lot. And SuperANOVA (remember that one?).
For linear mixed models, I’m actually surprised to hear that SAS won’t do what you want and R will. For a linear mixed model in a paper I’m working on, I actually had to rope in a collaborator who uses SAS, because the lme4 package in R wouldn’t do what I wanted.
As I was reading your post, I was getting ready to suggest Petchey and Beckermans’s book to you, but then I saw you already had it. For any readers who are interested, here’s my review of it:
Full disclosure: Owen Petchey and Andy Beckerman are friends of mine.
Another thing that might help a bit with the R learning curve: there’s an add-on package called R Commander (rcmdr in the packages list, I think), that adds a menu-driven interface to R for various simple statistical methods (up to and including things like generalized linear models and PCA), and various common graphs. I find it very helpful, especially for graphing. And it’s a good learning tool because there’s a window that shows you what commands you would’ve typed into the R command line to get R to do whatever your menu choices in R Commander told R to do.
I definitely know where you’re coming from on this post. I’ve been starting to have the same feelings. Not so much in terms of statistical software, but in other ways. Still figuring out what to do about it, but you’re right that something needs to be done.
On the teaching side, I just had this problem solved for me. I’ve been assigned to teach (and heavily revise) our intro biostats course. Never taught it, or any other intro-level course, before. I’m really looking forward to it. It’s going to go get some pedagogical training and learn to use some new technologies I’ve never used before (like clickers). And it’s going to force me to get back to doing some of the teaching techniques (like pair-&-share and minute papers) that I used to use as a new prof but got away from over the years because of falling into a rut and getting complacent.
Jeremy: you can do any kind of General/Generalized Linear/Nonlinear Mixed Model in regular SAS, but JMP (the GUI-driven SAS spin-off) is far more limited in it’s capabilities. You can do a General Linear Mixed Model, or a Generalized Linear Model (not mixed), but you can’t do a GzLMM. I think the options for setting, e.g., covariance structures are also pretty limited.
Terry: even though I just finished up my Ph.D., I’m in practically the same boat. I took an intro stats class that used R, then an advanced stats seminar that used SAS. Because I got lots of useful code out of the advanced class, and the cost was minimal ($50/year), I stuck with SAS. I just started learning R again in the last year of my Ph.D., and now I’m using it (and learning more as I go) in my post-doc. Part of me wishes I’d learned R earlier. On the other hand, I ran a lot of GzLMMs and I still go back to SAS for those – I much prefer PROC GLIMMIX to the various R GzLMM packages.
Another great thing about R is it now interfaces with many other frequently-used ecology programs, e.g., ArcGIS and MARK. It really is the analysis program of the present and (at least the immediate) future.
As for learning R, if you can find the time (ha!), I highly recommend checking out Jeff Leek’s course videos and lecture notes from the Data Analysis in R MOOC he taught for Coursera last fall. I also keep a copy of this reference card on my desk at all times. Michael Crawley’s “The R Book” is another great resource.
What Nicole said about what JMP can and can’t do. Thanks for the additional tips. Starting R now is a LOT easier that it was 10 years ago, it seems. Thanks for sharing the review. (I had read it, and if I recall correctly it helped me decide to buy the book, so I should have included it in the post, so thanks for sharing the link after my failure to do so.)
I’ll admit to being in EXACTLY the same boat. Know I need to learn R. Even got some tutorials from my grad program where they switched the labs I took using SPSS into R modules. Haven’t opened them. We are an SPSS campus so I deal with that. Reading this post prompted me to buy the book you suggest. We need a support group — or a grant to just dedicate a week, maybe two, to learning the darn thing. It is on the list but pretty far down!
I’d do a 3 day workshop or something, maybe, if the timing was right. Hmm. I’m not up to organizing one, though. Just like the parable of the mouse who had the perfect plan to protect everyone from the cat. But no volunteers to tie the bell on the cat’s neck.
Just remembered that I once did a post arguing that sometimes people should stick with old fashioned ways of doing things, because it works for them:
That post was prompted in part by my frustration with a few commenters who insisted that, if I wasn’t filtering the literature and deciding what to read using keyword searches and Google Scholar recommendations, I must be doing it wrong.
In that post, I said that I’d know when my own ways of doing things were starting to break down, and would change if they were. And similarly, your post was prompted because you realized that your usual choice of statistical software isn’t cutting it anymore. But do you ever worry that in some areas you might not realize that you’re becoming deadwood until it’s too late? Are there areas where you worry that the world could pass you by without you realizing that you’re being passed? And if so, are there strategies to keep that from happening?
Really good point. If it works, it works. But can you diagnose when it stops working as well or becomes less relevant? I bet that I’m getting less relevant in ways that I don’t know about, surely. Some deadwood faculty have landed upon that status by choice, while others think they’re doing a great job and they aren’t, because they’re in such a rut they can’t see out. Strategies to see out of the rut? Other than genuine humility, I imagine it’s maintaining solid relationships with the next generation.
Of course, as I noted in the comments on that old post of mine, if you become famous/important enough the rest of the world will adjust to *your* way of doing things, and even see your old-fashioned approach as interesting. Like how the President of Harvard back in the late ’80s or early ’90s was thought interesting (rather than a Luddite) for still using a manual typewriter. I’m sure everyone at Harvard just made whatever adjustments were necessary to accommodate the President in this. Of course, whether this “strategy” works for things like choice of statistical software, I’m not sure… ;-)
In my opinion, the worst way to learn R is to try to figure it out on your own. That’s especially true if you’re new to coding. I started out learning on my own, but it was awful, and I probably would have quit (under the guise of being too busy to learn it right then) if my mentor hadn’t been expecting me to figure it out. Luckily, he had time to help me when I got stuck.
I’ve heard of several schools where professors and grad students have R clubs; they meet once a week for an hour or something, and work on the same exercises/problems/goals. For instance, one week, you might explore graphing – lines, points, colors, legends, axes, etc. When you get stuck on something, you can work it out with your neighbor, and then go on to conquer the world with R.
As for using R to teach – I wish I had used R in my undergrad classes. I don’t remember anything I ever did on point and click software, but that may or may not be the software’s fault. :P Also, R is a really good place to explore probability and randomness! Don’t rule it out yet? Maybe after you’re an R Master, you’ll be able to more easily incorporate it in your classes?
And finally, Jeremy recommended R Commander. I use Rstudio. I haven’t tried R Commander, so I can’t give you a comparison of the two. Maybe someone else can weigh in on that.
Hey, speaking of stale, do people still say “mojo?” Hehehe…
I grok your point.
I’ve never tried RStudio but I hear that it’s basically a newer, better version of R Commander. So yeah, might be worth checking RStudio first. I still use R Commander because it works for me and you can’t teach an old dog new tricks. ;-)
Second (third?) the recommendation of R Studio. I first learned R in the old command-line environment. R Studio is a HUGE improvement on it. I haven’t tried R Commander, so I’m not sure how different they are – but definitely recommend using one or the other of the GUIs.
One more recommendation for easing the transition to R: Check out RStudio (http://www.rstudio.com) – it is an absolutely fantastic desktop environment for R. You have a console, workspace browser detailing all variables and data objects, figure window, and editing window (so you can figure out a command – then immediately put it into a script to save all of your analyses to go into the supplemental docs of your paper; and save all of those byzantine plotting commands for when your reviewers want you to re-run all of your analyses and regenerate your figures because their sure if you would just…). It has changed my R-Life, and it will change yours too.
I second the recommendation for the R book by Crawley and modelers may also want to check out Hank Stevens’s book on modelling in R: A Primer for Ecology in R (disclosure: Hank is a friend). I look forward to checking out Beckerman and Petchey…
Finally, I am making the transition to teaching with R – I use it in my intermediate-level Mathematical Biology course and will be introducing it into my Intro Stats course. Especially in the latter context, I have heard that the MOSAIC package is terrific. Basically, the authors have put together and simplified the core commands that you need in R to teach basic stats including probability, sampling/resampling, t/X^2/anova/regression, and plotting (for which they supply tons of data). They have also added functionality to teach introductory calculus (differentiation/integration, etc.) to try to integrate (math pun!) education in both mathematical modeling and statistical analysis. I’m just getting to know it but it comes highly recommended by a great colleague.
And it is great for undergrads that R is free and platform independent. They can do hw on their laptops.
One of the challenges of being at a teaching institution is that we are expected to outfit our students with research-grade tools – which means we have to keep up on them ourselves. Or, like you said, we can CHOOSE to be deadwood – but we cannot ignore that it is a choice….
This is inspiring, if your students can emerge both learning stats AND learning R. Once I get sped up, perhaps I could hit you up.( Today, though, was fun with EstimateS.)
I second the recommendation to check out the Stevens book on R as a platform for teaching introductory ecological modeling. I’ve written several labs based on his code in my own courses. Full disclosure: Hank’s a friend of mine too.
Thanks for the tip on Project MOSAIC Drew, I’d never heard of it. Just checked it out very briefly and it looks very useful. Just put up a post on Dynamic Ecology pointing folks to it and asking if anyone’s used the mosaic package or other Project MOSAIC resources in their teaching. I’m teaching intro biostats for the first time this fall (well, the first time on a non-emergency basis), so I’m quite keen to learn about what resources are out there and how other people teach the subject.
I picked up R late in my PhD and early in my postdoc, coming from SPSS / JMP, and learned most of it on my own. It was a slog, but it made me more cognizant of how statistics work (vs. click-click-click-results). I had to think about what the error terms were, how my models were structured, etc.
Perhaps it’s not the case any longer, but when I was starting ~4 years ago, Rcmdr was a Windows-only GUI, which was useless for me using R on a Mac. My biggest challenge at first was just importing data, and learning the difference between data frames, arrays, and which works when. In high school, I took BASIC programming (oh to start a line of code with “10” again!), and R is staggeringly similar in structure.
That said, R is still clunky for a couple of things I do (like certain ANOVA post-hoc tests, or plots with x error bars), but on the whole, I consider myself about 85% migrated over to R.
I don’t know about R Commander, but I know you can use R Studio on a Mac. We (+the grad students, and some profs) started an informal R Club at my Ph.D. school (which I highly recommend!), and the participants always had a mix of Macs and PCs. R Studio was virtually identical across platform, other than (naturally) file access and management.
Good to know! A local useR group started at my PhD school after I left, and has been a huge success from what I hear from previous colleagues.
I’ve also used it in teaching a grad class where we wanted to use just a particular package. As long as there’s decent documentation (provided by the instructor, or the package author), the students took to it well. Though beware poorly-maintained packages!
Personally I find the SAS language to be much more intuitive, but I am slogging through R and getting better at it and I am glad because it is becoming an indispensable tool – the last two packages I’ve used simply do not exist in SAS. A few hopefully helpful hints:
1. As they said above, R-Studio is helpful.
2. StackOverflow is a great website for finding answers to coding questions on R.
3. I am not fluent in R-help or in R-error messages. I find the language used to be really frustrating (R can shove its “atomic vector” error messages), but luckily I am not alone. StackOverflow and Google are my friends.
4. Really, if your data are ready to roll then R is pretty straightforward. It’s data preparation and formatting that kills me. Last week I had a particularly vexing day:
Warning: apparently the post made me sound suicidal. It was intended to be a bit more tongue in cheek than that, but in my defense R was doing the dumbest things.
Good luck, Terry!
My take: don’t let a book stand in the way of learning R. If you want to learn spanish, are you going learn more efficiently by picking up a book with spanish vocabulary and grammar or by spending a year in Spain immersing yourself in the language and culture? So, the next time you open a data table in JMP, hit command-Q and open it in R instead. Not a gui-R either (those are very limited). Immerse yourself. If you wait until you’ve finished some book, you’ll be waiting a long time. Read the book after you’ve been scripting for a year. It will actually be valuable then. Frankly, I cannot see how people do any work without programming or scripting. Canned statistics packages are just very limited and limiting.
I’ll be sure to do that tonight after I grade all my exams, feed my kid dinner and do bedtime reading, submit a grant, edit my student’s manuscript, and pack for the field.
I understand the frustration. But it sounds like you are waiting to start and finish a book before jumping in and and I’m recommending that you start with a real problem, such as the one that is keeping you from publishing. Don’t wait on reading a book. Ultimately however, learning R may not be worth the time it will take to reach that comfortable level when you choose to solve a problem in R rather than JMP or excel. This can be months to, more likely, years, especially with a family, four classes, etc. etc. Actually, scripting is something that can be done in small time chunks. And completing each of these little chunks is immensely satisfying. A project can be broken up into smaller projects. For example, simply reading in your data is a really small chunk. But start by reading in the data you need for your paper!
This I get, I see your point. I just need to do it. That is, in fact, the plan, and this book just provides structure for it. Thanks for the encouragement!
1. Yes, R Commander is quite good, especially for teaching purposes. It also has about 10 add on packages now that do all sorts of useful things. I have used R-Cmdr in several undergrad stats classes and other advanced courses for about four years now and haven’t had any more trouble then with SPSS (yes, that’s still our standard purchased package). For one thing R-Cmdr is just like SPSS; it’s a code writer GUI. There really isn’t much difference, except in the tables routines, but there the “models” package rules; it has options for producing both SAS and SPSS format tables and you can call it right from R-Cmdr. I have also used RStudio, but I seldom have projects that can take advantage of it effectively. And R-Cmdr is better for teaching and a lot easier for usual research problems (that I do, that is).
2. Goto Robert Muenchen R for SAS and SPSS Users (NY: Springer 2009). I’ve found this book invaluable; you want to know how to do something you’ve done in SAS or SPSS in R, there it is. With comparable code, I might add. This site – http://www.statmethods.net – Quick R, it’s called, is also useful.
3. Download the ZELIG package. This is Gary King’s shop’s universal can opener for all known linear models. Believe me, if you’ve thought of it, there’s a routine in here that’ll do it. The main thing is that it vastly simplifies R code for complex regression techniques. It also allows you to put together matching routines quite easily as well. Problem = the manual is huge and doesn’t have an index. You pretty much need to read the routine descriptions to see what they are doing.
4. I’m a political scientist. In our field, almost everyone has migrated to either R or STATA. You can still find SAS and SPSS installations and all the older (ahem!) faculty used them at one time or another. But getting a continuously updated, massively complete, FREE stats package is chasing the rest of the competition off the field.
Thanks so much! This is really appreciated.
The automatic spellchecker played me false: it’s package gmodels, not models. I outhg to just turn the Christ-bitten thing off. Glad you found the post useful.