My BARUG rquery talk went very well, thank you very much to the attendees for being an attentive and generous audience. (John teaching rquery at BARUG, photo credit: Timothy Liu) I am now looking for invitations to give a streamlined version of this talk privately to groups using R who […]
Estimated reading time: 1 minute
I would like to thank LinkedIn for letting me speak with some of their data scientists and analysts. John Mount discussing rquery SQL generation at LinkedIn. If you have a group using R at database or Spark scale, please reach out ( jmount at win-vector.com ). We (Win-Vector LLC) have […]
Estimated reading time: 49 seconds
Trick question: is a 10,000 cell numeric data.frame big or small? In the era of “big data” 10,000 cells is minuscule. Such data could be fit on fewer than 1,000 punched cards (or less than half a box). The joking answer is: it is small when they are selling you […]
Estimated reading time: 6 minutes
Win-Vector LLC recently announced the rquery R package, an operator based query generator. In this note I want to share some exciting and favorable initial rquery benchmark timings.
Estimated reading time: 5 minutes
A big “thank you!!!” to Microsoft for hosting our new introduction to seplyr. If you are working R and big data I think the seplyr package can be a valuable tool.
Estimated reading time: 1 minute
Win-Vector LLC is proud to introduce two important new tool families (with documentation) in the 0.5.0 version of seplyr (also now available on CRAN): partition_mutate_se() / partition_mutate_qt(): these are query planners/optimizers that work over dplyr::mutate() assignments. When using big-data systems through R (such as PostgreSQL or Apache Spark) these planners […]
Estimated reading time: 2 minutes
Win-Vector LLC has been working on porting some significant large scale production systems from SAS to R. From this experience we want to share how to simulate, in R with Apache Spark (via Sparklyr), a nifty SAS feature: the vectorized “block if(){}else{}” structure.
Estimated reading time: 2 minutes
We have just released a major update of the cdata R package to CRAN. If you work with R and data, now is the time to check out the cdata package.
Estimated reading time: 2 minutes
Some Announcements: Dr. Nina Zumel will be presenting “Myths of Data Science: Things you Should and Should Not Believe”, Sunday, October 29, 2017 10:00 AM to 12:30 PM at the She Talks Data Meetup (Bay Area). ODSC West 2017 is soon. It is our favorite conference and we will be […]
Estimated reading time: 1 minute