Menu Home

A Better Example of the Confused By The Environment Issue

Our interference from then environment issue was a bit subtle. But there are variations that can be a bit more insidious.

Please consider the following.

library("dplyr")

# unrelated value that happens 
# to be in our environment
z <- "y"


data.frame(x = 1, y = 2, z = 3) %>%
  select(-z) 
#   x y
# 1 1 2

data.frame(x = 1, y = 2) %>% # oops, no "z"
  select(-z) 
#   x
# 1 1

# notice column "y" was removed and
# no error or warning was signalled.

When the data.frame has a lot of columns, and is coming from somewhere else (even as an argument to a function): we may not notice the column loss until very much later (making for hard debugging or even unreliable results).

Categories: Opinion

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

4 replies

    1. It sounds like it may be senstive to package versions. And also I apologize to anybody who’s answer gets mangled by the comment system.

%d bloggers like this: