I have some more thoughts on the topic: “the part-time
I am thinking a bit more about the diversity
R users. It occurs to me simply dividing
R users into two groups, beginning and advanced, neglects a very important group: the part-time
R user. This leaves us teachers and package developers with an unfortunate bias.
The concept of “beginning
R user” implies a user who has near infinite time to adapt to our advanced
R user work style and other nonsense. “Beginning” is a transient state, one feels we can temporarily accommodate the beginners on our path to assuming them away.
However for a language such as
R which deliberately targets non-programmer populations (such as statisticians, scientists, medical professionals, and more) we must assume there is a permanent population of users that have other things going on in their lives. These are users that come to
R to make statistical inferences, do science, study social policy or some other non-programming task.
This means us
R package developers have at least the following responsibilities:
- Our packages should be simple and intuitive (how low “cognitive load”).
- Our packages should obey common design principles such as the principle of least surprise.
- Our packages should have sensible and meaningful documentation and examples.
- Our functions should have sensible and safe defaults (you don’t have to set obscure options to get sensible behavior).
There are a lot more consequences that one can derive from the “part-time user” principle. However, I think the principle itself is probably the most important point.
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
An outstanding observation about part-time users John. There are many like that, and in general they just want to use R to do what they need, they don’t intend on becoming R experts. So we must plan for them. They may even be the largest of the three groups. I’ll also mention I have really enjoyed your recent posts, excellent commentary and food for thought.
Thank you so much for this, I wish I had lots more time to get used to R, it is such a fantastic resource.
A few ideas I would like to mention is this. When I first heard about R I thought it was a statistics package. And it got worse the academics who mention R talked about R-Commander, I actually wrongly thought that R was R Commander.
I look back with total embarrassment and since the now understand R really is a basic programming language that can be used to design statistical procedures or even more challenging complex packages..
The students I try to teach hate anything mathematical, I would say their maths level is around that of an eight year old, a few years ago I taught a bio-stats unit using R Commander as their training wheels. Yes I know there are other packages like R-Studio but even this is too advanced for my students.The students gave me an award as “Best Salesperson award” for R Commander”
I really think R Commander is a good starting point for beginners of R but I have learnt you need to spend a lot of time playing with the scripting language of R . R is a great resource the more I use it the more I love it. But really I fall into the part time user using it a bit like Math-cad mainly .
I always look forward to my R Blogger emails and see what the R Community is up to.
Once again thanks again for mentioning us part time R uisers, I would love to get to an advanced stage, but its juts finding the time.
John, you have hit on an important theme to help R grow in popularity and usage. As an “intermediate” R user who will never become an “expert” — one who writes sophisticated functions or extends extant packages — may I offer a few more ideas? One is that it would be good if a package or RStudio help intermediates create their own cheat sheets of tricks and tools. It is true that “Find in files” works quite well most of the time, but other times we all could use a repository. Second, it would be great if someone could click on an error message and bring up help for the error message. As is, they are often obscure. Third, I wish there were a package that could look at my last X scripts or .Rnw files and tell me the most commonly used functions. Those are the ones intermediates should strive to study and become more competent with. Perhaps these ideas will help you extend your thinking.
Very correct and timely thoughts. Fully support.
part-time R user
As a part time R user myself (a few hours per week), I totally agree. More consistency in syntax would be nice.
Engineer here. While I tend to use R significantly more than others, I’d never consider myself as an expert. I actually do find quite of the discussions beyond what I’m willing to investigate because I have a job to do. Anything to make the tool easier for us intermediate users would be helpful.
This is pretty much me. I’m quite statistically literate, and love the idea of R, but generally find the help files, extremely unhelpful. Like anything, R is easier the more you use it, but I simply don’t have the time to master it. I _know_ I should use it more, but I can do most things in 10% of the time in SYSTAT.
I discovered R with R Commander and FactoMineR. I began programming with RStudio when I needed to do deeper analysis. A colleague of mine, R expert, helped me in the beginning.
As a part time R user, once a quarter on average, I am frustrated by the lack of proper documentation to explain key packages, and by the change of behaviour of functions when there is a package upgrade .
A good example is dplyr. It is a key package for me. However, documentation does not explain all the features (need to search forums) and during the upgrades to 0.6 and 0.7, some functions changed behaviour (not a best practice in IT…).
Thus every quarter, I have to expect that existing code will crash or start having a strange behaviour…
Please R expert making packages, don’t forget part time R users :-)
Where would you suggest packages fall along the line of data.frame vs vector input for functions? I’ve noticed some recent “self-styled tidyverse” packages primarily use data.frames for input, with extra arguments allowing the user to designate which columns in the table should be used for things timestamps, measures, etc.
On the one hand, most actual workflows primarily manipulate data.frames; in fact, this is the best way to mentally organize related data. However, a data.frame is more complicated than a vector, extra arguments are extra things to know, and a standard of data.frame-wanting functions could hinder new users from learning about vectors.
The last point may be a bit too purist.
My group is experimenting with
data.framecentered interfaces (at least one argument in is a data frame, and what comes out is a
data.frameor list of such). A simple example of that is our
WVPlotspackage. Each plot takes a
data.frameand the names of the columns you want to work with, even though it would make sense to take columns in many cases. Our group thinks the conceptual uniformity is a big plus. Also if
tidyevalobsoletes code in the future we will fix the issue in
WVPlots, and no
WVPlotsuser should have to be made to suffer.
Yes!! I am a part-time R user – I use SAS mostly but would like to move to R for the flexibility and for the ease of producing reproducable reports. However, I am a full time faculty member and I *don’t* have hours to while away making the adjustment – sometimes I just have to get the job done. How oblique some of the documentation is (for example, sometimes I find it hard to even figure out what package a command is in!) really holds me back from making the full transition. I am so glad you highlighted this.
I have been using R for about 15 years and still consider myself a part time user. No complaints though, its an excellent bit of software.
Using np++ and npToR makes life easier and allows documentation on the fly. I have wondered sometimes what the percentage breakdown is of users by profession? Is there any stats on that?
This describes me perfectly. I use R to get things done in science, rather than it being the main thing I do. Some documentation is great and really makes using that package easy; for others it is a struggle just to work out how to the basics. This is fine however, as I learn more about R by doing. I am also an isolated user in my (very small) organisation, so R groups are great, even if it means I have to travel to get to them.
A great comment. I am a statistician and have been using R since its inception for teaching and research. For much of my work I can use pretty basic R, however, there are applications where I need more advance techniques and like that I can find this information in R user groups.
While I don’t consider myself an R “expert”—it is by far the programming language I know best and I know R the way I know many technical skills, just enough to accomplish all the tasks I want but little extra. I just happen to have needed to accomplish many tasks in R, so I’m fairly competent.
I now maintain a package because there were many things I do in my social science workflow that I thought were unwieldy, redundant, and so on. My package had this part-time R user in mind from the get-go in that for the most part, my package’s functions are meant to make tasks that are doable for the intermediate user, but would take too much time and googling to work out all the minutiae. For instance, I wanted to have a summary of linear regression models that had a column with VIFs or substituted Huber-White standard errors instead of the homoskedastic ones. Many part-timers could work out how to get those two pieces of info and some would know how to take the summary function’s table, save it as another object, and substitute/append the columns of interest…but that’s a very slow workflow.
And then thinking about how a user asks the function to do things is worthwhile. For instance, because it’s *usually* not so hard on the programming side, I tried to make my package’s functions accept unquoted variable names whenever possible. The lower-competency or lowest frequency portion of the part-time R users will sometimes trip up on simple things like that even if they know they matter after working through errors.
While I don’t like un-quoted variable names that much, I believe they are considered good idiom in
R. So your point that including that ability helps beginning
Rusers is a good one; it is considered one of the right ways in R, and it is something an
Ruser will reasonably expect.