What are best ways to learn R?

Standard

Screen Shot 2017-08-06 at 3.35.38 PMOver my year of sabbatical, I planned to become comfortably proficient with data manipulation and analysis with R. I’m getting there. (I was doing a lot more over sabbatical of course, but this was one of my main objectives.) I figure it’ll take at least a few more manuscripts to get comfortable. As I really should be cranking out a dissertation’s worth of stuff in the next year, I have plenty of opportunity to get better, and the rate limiting step for me is sorting out the code. Continue reading

The statistics of busy, or the management of approachability

Standard

In one Seinfeld episode, George puts on an annoyed busy-all-the-time act at work. Consequently, nobody bothered him with work.

Academia is a cult of busy. We all are very busy, and often complain about it when we shouldn’t. However, being busy is part of becoming more efficient. Continue reading

I must be the worst statistics professor

Standard

Several times a year, students contact me to tell me that I was the worst professor ever.

To be precise, former biostatistics students contact me with the simplest, and often ignorant, statistics questions. These questions are so basic that it is clear that I have failed in my job as a stats professor.

With a basic dataset, a student might ask, “what test should I use?”  Last month I had a student drop by my office with a result of p = 0.0071 to ask me to tell him whether or not his result is significant. Without a hint of irony.

If you taught comparative vertebrate anatomy, how would you feel if a recent student of yours came into your office and pointed at his biceps, and then asked, “What muscle is this?” This is how I feel when students come to me for statistical consulting.

My one-semester graduate biostatistics class doesn’t go into much depth and covers the standard, mostly univariate, frequentist statistical approaches that you find in similar courses. We spend a decent chunk of the course on probability, experimental design and why people do statistical tests and what the results mean. We spend a lot of time – and by a lot I mean a lot – on what p-values are, and the relationships among probability, null hypotheses, variance, distributions, and errors. (And when I say we spend time on it, I don’t mean that I lecture a lot about it. I mean students actively figure this stuff out. And their exams show that, at least at the time, they really understand it.) I am convinced that, at the end of the semester, that these students really understand the main concepts.

But then they don’t understand it after the course is over.

If a student came to me and said, “I realize that we didn’t learn how to do a GLM in class but from my reading I think that might be the best choice here, and I was wondering if you had the time to talk about it,” I’d ask her to pull up a chair. But when a student says, “I know I was in your class a couple years ago, but I’m looking at this dataset that I already collected and I don’t know where to start,” I’m not going to lift the paperwork that is probably occupying all three chairs in my office.

The hypothetical students-who-forgot-the-name-of-the-biceps-muscle are professional frustrations, and evidence of educational failure, for three reasons:

  1. They forgot something really simple. Though the name of muscle is a mere fact, is one thing you’d expect a student in a vertebrate anatomy course to remember, if not know before even starting the course.
  2. The students were intellectually lazy and didn’t decide to look up the answer, but instead just asked a former professor.
  3. The students demonstrated a personal lack of regard for the professor’s effort in teaching the course, by showing unawareness of the fact that the professor expects students to remember basic facts after the course ends and also empowered them to look up basic information. Or to put it in fake-biblical terms: the students had the temerity to think that we are there to feed them fish sandwiches instead of showing them how to fish and how to bake bread. In my view, this extends beyond personal laziness, by showing that the students don’t bother to show that they respect our role as teachers.

On the bright side, there is one positive aspect of the fact that students come in to ask me dumb stats questions. They think positively enough about my course that they think that I’m useful for statistical advice. That’s not much of an upside, but hey, it’s all I’ve got as far as I can see.

I’m long past taking this personally. I don’t get insulted when students come to me asking me to reteach a very basic concept of probability that we studied for a whole semester. I see that this is their problem, and not mine. I don’t take it home with me. There’s a 0% probability that this issue will keep me from sleeping. However, if I’m trying to be more effective at my job, then I need to confront the issue raised by these interactions. It’s a form of teaching assessment, in which I’m doing poorly. (Some students do thank me quite generously for what they learned in the course, and I’m not forgetting that input either.)

How do I handle these outrageous questions? It varies, because I haven’t (yet) developed an a priori approach to the situation. In some cases, I might simply chimp grin a bit and say something like “you really don’t recall how to test a null hypothesis?” or “So you’re telling me that you haven’t been able to find anything about what to do when your independent variables are categorical and your response is continuous?”

The bulk of the class is biomedically-oriented Master’s students who have some need for statistics with their thesis but don’t think that they’ll need to practice stats for everyday use. So, each semester, I make a point of saying at the outset, “When you design your thesis experiments, you’re welcome to consult with me about the process. But if you don’t discuss your stats with me before collecting your data, then there won’t be much I can do to help you.” I’ve had to remind a couple students of this fact, one of whom had a horribly pseudoreplicated design. And another who failed to run a necessary control.

While I’m not the most amazing professor, I don’t genuinely think I’m the worst, either. I just think that some students think it’s acceptable to offload course material from their brain as soon as the course is over, and do not feel obligated to go back to the hardware store if they lost the tools that they picked up during the course. And among students who are stats-phobic and math-phobic, which is a sizable fraction of the population in the course, they’re just glad they survived. It’s particularly frustrating because my guiding principle in this course is to teach a small number of fundamental concepts in a way that they are supposed to stick with the students for a long time. At least in this class, I know it isn’t happening, with at least some of them. I honestly don’t know what I can do, if anything, to make sure that students really remember what a null hypothesis looks like and what a p-value means. But it is clear that some of them genuinely forget it, presumably because they think it’s not important.

So, when I teach this class again in the fall, I have to find a way to make biostats personally important, with students who don’t see the usefulness of stats in their professional future. Wish me luck.

 

How all ecology grad students can benefit from an OTS course

Standard

If you’ve only just started grad school, or if you’re getting ready to finish, there are a ton of great reasons to take the OTS course this summer. The Organization for Tropical Studies courses aren’t just for tropical biologists, and the experience is useful for all ecology grad students.

  • Breadth of research methods — Gain experience in running experiments in a great variety of biomes, fields, and taxa. No matter your speciality, it can be useful and important to know how to mark insects, do biogeochemistry and microbial ecology, dissect flowers and do pollination experiments, mist net birds and bats, make and analyze sound recordings, and much, much more.
  • Making connections — You will work very closely with a large number of faculty from universities all over the United States and elsewhere. More important, you’re in the course with a bunch of other grad students who are typically fun-loving and academically talented. The course is work hard-play hard environment and you’ll go back home with new friends and colleagues, some of whom you’ll stay in touch with for the remainder of your career. You want to emerge from grad school with a network that goes well beyond your own institution. This is a great way to make that happen.
  • Experimental design — This course will have you designing and conducting experiments at many different sites in small groups. This really helps you learn how to develop the right questions, design the most appropriate experiments and that you’ve had the best analysis in mind the whole time.
  • Data analysis — Because you are involved in so many experiments, you gain experience with may kinds of analysis. The course has expert faculty including well-recognized statistical gurus who communicate in common English. You’ll get training in R to give you the tools that you need.
  • Science communication skills — Learn how to produce media that communicate your science with the public, by working with PhD scientists/filmmakers. Here are the tremendous results from a brief science communication project on the OTS course, from a post on the National Geographic Explorers Journal. The course runs its own blog and you have an opportunity to create podcasts and posts.
  • Experience with conservation in action — You’ll have the chance to interact with land managers and conservation professionals on the sites of ongoing projects. If you’re thinking about getting into the this aspect of the ecology business, you’ll have experiences and opportunities with making connections.
  • Tropical nature — If you haven’t ever spent time in the tropics, the biological diversity is stunning compared to the meager biota of the temperate zone. You get to see these biomes in the company of researchers who are experts in this environment and conduct a number of experiments. If you want to learn natural history and biodiversity, this is a chance to be in the field with the experts who can show you what you what to learn.
  • Units — You get six credit hours from the University of Costa Rica that (typically) count towards the coursework requirements of your program. So, there’s that, too.

Speaking just from my own experience, the course gave me so many skills — and ideas — that have been useful in many unpredictable ways. I’ve yet to meet anybody who has taken the course who has said it is anything short of incredibly useful, and I think everybody has rated it as a spectacular experience. In the course of your graduate career, it definitely is worth your time.

Here’s a pdf flyer with more info.

Here is the link to the course for summer 2014, with its list of great faculty and remarkable sites the course visits, and instructions on how to apply. The deadline for applications is just over a week away, but then there are rolling admissions afterwards.

Drifting towards deadwood, or not: learning to use R (updated)

Standard

Update 15 May 2013: If you’re a newbie to R and want to know where to start, the comments on this post are now replete with (what I surmise to be) wonderful suggestions. Of course learning in the presence of those who know R is best, but this is a great set of suggested resources regardless of your environment.

I’m not that old, but I already feel myself getting a little stale.

How did this happen? Well, I guess it’s because I’m a professor and this is just the default rate of entropy.

When I was an undergrad, one of our introductory bio professors was a kindly man who was the archetype of deadwood. He had a separate slide carousel for every lecture in his course. When it was his time to teach, all he did to prepare was to pull the carousel off of the shelf. He didn’t have any idea what he was going to say until he saw the slide appear on the screen. Then, he would say the same thing he’d been saying for that slide over the past 20 years. It was just so obvious. One day, the slide projector broke. What did he do? He cancelled class.

This kind of thing is even more common now than it was back then, because few people had so many carousels at their disposal. It’s just done with powerpoint.

I’ve worked hard to keep my teaching from becoming stale. And since I’m doing a lot of research, then I can’t get stale at doing research either, right? If only that were true.

I imagine that molecular biologists all had to learn the ropes at PCR as machines and reagents became commercially available, and then relatively cheap and efficient. Nobody’s out there doing allozymes for population genetics after all, I would hope. And the same is true for RNAi, and now with nextgen sequencing approaches to genomics. In my flavor of work, there isn’t as much required to stay current, but nonetheless I’m still getting behind. If only I could have the time to run to just to stay in place.

At least I’ve diagnosed this condition and can fight the entropy. Just I keep the dishes mostly clean in my house and I have the oil changed in my car on time, I’ve got to stay fresh as a practicing scientist too.  It isn’t easy.

This occurred to me, in part, when reading something that Joan Strassmann wrote (in the context of picking a good PhD advisor) that grad students are probably better at using R than their own advisor. I guess that’s the case in most labs, even if their advisors might have better statistical acumen.

If you’re a serious ecologist, nowadays, then R is an essential or near-essential tool. Here’s a confession: I’m useless with R. This is a problem. And it’s not a little problem, it’s a big problem.

I suspect that I’m not the only one in this boat, though I haven’t really heard anybody else admit to it. Every day that passes in which I still can’t use R, I’m not able to collaborate as effectively, the more reliant I am on others, and the less able I am to apply the most current tools to the experiments which I’m running. There is a single analysis that I should be able to do in R in an hour, that’s keeping me from submitting a manuscript that otherwise is pretty much done. That’s a problem.

Now, I’m not a statistical dunderhead. (I teach our graduate biostatistics class, but obviously teaching a class in something doesn’t mean you’re an expert). I design my experiments with specific tests in mind, and I choose ones that work, and I use model selection understanding the power and limitations of the approach. I understand frequentist vs. bayesian perspectives even though I don’t choose to say anything that would start a disagreement. (If you read my stuff, you can decide for yourself if I know what I’m talking about.) I guess you’ll probably just take me at my word that I’m not stupid when it comes to stats.

But there are a few analyses that I just can’t run easily, like NMDS or a GLMM. This is because I mostly use a powerful menu-driven version of SAS called JMP. It does nearly everything I want, and quite well. But there are a few analyses that I can’t run in JMP, which are becoming more and more relevant to the questions which I’m asking in my lab.

How did I get into this situation? Well, when do people learn R? In grad school. When I was in grad school, R was not the standard tool. Before then, I used SPSS on a mainframe (NO, not with punchcards) and a variety of easy-to-use programs on a Mac. (Statview was unparalleled for simple exploratory data analyses on Macs, and it was bought up by SAS and orphaned so that people would use JMP instead. The world has moved on without it.). By the time I was finishing up grad school in the late ’90s, R wasn’t in widespread use but it was ramping up. None of my fellow grad students were using it at the time, and I wasn’t behind the curve.

A few years later, while I was starting on the tenure track in the early 2000s, I put aside a little time to figure out R. That was a disaster and I couldn’t even get it to read my files. I had a few halfhearted attempts, but I couldn’t find the time. I looked into taking a short course, getting a book, but I didn’t have the time to make it happen. At this point, it wasn’t a critical failing, but I saw that more and more people were using R, and that I wasn’t one of them.

My lack of R mojo isn’t a teaching problem. Even if I was an R pro, I don’t think I’d use this in my course because the class is about understanding how statistics works and how to apply them, not how to use the software. I use JMP in the course because it is so easy to use, and I’m not going to waste instructional time on software tutorials. (We should have a separate class or seminar or experience that teaches students to use R, but it can’t fit in this class.) I’ve talked to people who teach with R in their courses, and they’ve reported that you either have to make it a course about learning stats, or learning R, but you can’t do both well with 45 hours of class time. Clearly, by using R you actually learn what you’re doing statistically, because that’s part of understanding coding. So I hear. But I’m not going to spend half of my time in class dealing with coding errors and stress when my students still don’t fundamentally understand probability, randomness and the actual nature of a null hypothesis.

While not a teaching problem, my lack of R mojo is a research problem. I am on it. I’ve been aware of this for a while, and I’ve found a way to deal with it.

For the last month, I’ve had sitting in my backpack wherever I go, what appears to be the exact resource I need: Beckerman and Petchey’s Getting Started with R: An Introduction for Biologists. From my quick browse, I feel mighty confident that using R like a pro is now only a matter of finding the time, and it doesn’t seem as insurmountable.

My hope is that, this summer, I find the time to actually remove the book from my bag and use it. This is the point in the narrative where I could explain everything I’ve done in the last month that would explain why I haven’t found the time to get to it, but you know the story. I won’t try to out-busy you.

This summer is already booked. Learning to use R to some degree of proficiency is going to take the amount of time that it would take to write a whole manuscript, or nearly write a whole grant. I have to decide which one of those things I’m not going to do to keep my skills sharp. Of course, I’ll be using R in the context of a manuscript. It’s just that this manuscript will take 2-3 times longer to write because of my R learning curve.

Maintenance isn’t optional. Learning R feels more like an engine replacement instead of an oil change, but I’ve got enough miles that I guess I’ve got to make the investment to avoid being sold for scrap.

Kodak stopped making the carousel projector less than ten years ago. I still have a carousel sitting around my lab, containing slides from the last talk I gave in this format. It wasn’t that long ago, really. (In the early 2000s, the Entomological Society of America hadn’t yet switched to accepting digital projection. That’s what still in the carousel.)

The world changes really quickly. As I’m doing my day-to-day faculty job, the world will be passing me by unless I actively work to keep pace. I always wondered how some people became deadwood. Now, I see how easy it is. It’s not about giving up, and it’s not about not caring. It’s about not strategically and systematically planning to keep up, which takes you away from immediate responsibilities. I’ve avoided this particular maintenance task for ten years, and just like when I go to get oil changed in the car, I’m not thrilled to spend my time that way. Of course, I’m glad that I can continue to drive a working car that will last a long while, and I’m glad that my soon-to-be-developed R mojo will keep me fresh for a good long while as well.