Menu Home

Worry Over Columns, not Rows

I say: if you are a data scientist or working on an analytics project, worry over columns not rows. In analytics “rows” are instances, and “columns” are possible measurements. For example: each click on a website might generate a row recording the visit, and this row would be populated with […]

Technical books are amazing opportunities

Nina and I have been sending out drafts of our book Practical Data Science with R 2nd Edition for technical review. A few of the reviews came back from reviewers that described themselves with variations of: Senior Business Analyst for COMPANYNAME. I have been involved in presenting graphs of data […]

Using Excel versus using R

Here is a video I made showing how R should not be considered “scarier” than Excel to analysts. One of the takeaway points: it is easier to email R procedures than Excel procedures. Win-Vector’s John Mount shows a simple analysis both in Excel and in R. A save of the […]

Data science project planning

Given the range of wants, diverse data sources, required innovation and methods it often feels like data science projects are immune to planning, scoping and tracking. Without a system to break a data science project into smaller observable components you greatly increase your risk of failure. As a followup to […]

Setting expectations in data science projects

How is it even possible to set expectations and launch data science projects? Data science projects vary from “executive dashboards” through “automate what my analysts are already doing well” to “here is some data, we would like some magic.” That is you may be called to produce visualizations, analytics, data […]

SQL Screwdriver

Note February 11, 2020: this articles is out of date, we suggest using the methods of Using PostgreSQL in R: A quick how-to instead. We discuss a “medium scale data” technique that we call “SQL Screwdriver.” Previously we discussed some of the issues of large scale data analytics. A lot […]