Menu Home

Worry Over Columns, not Rows

I say: if you are a data scientist or working on an analytics project, worry over columns not rows. In analytics “rows” are instances, and “columns” are possible measurements. For example: each click on a website might generate a row recording the visit, and this row would be populated with […]

Wondering How To Think About Data Science

I just got back from a workshop meeting called Digital Transformation of Decision Analysis. This was a workshop organized by Eyas Raddad, David Matheson, and John-Mark Agosta. It was sponsored by The Society of Decision Professionals and Microsoft. Microsoft generously hosted at their new Experience Center at the Microsoft Silicon […]

Working in CRAN’s World

Part of the deal of having a package up on CRAN is: at any time one may be sent an automated email like the following. Dear maintainer, Please see the problems shown on URL. Please correct before TODAY+14DAYS to safely retain your package on CRAN. The CRAN Team If this […]

Method Warnings

Introduction The data algebra is a Python system for designing data transformations that can be used in Pandas or SQL. The new 1.3.0 version introduces a lot of early checking and warnings to make designing data transforms more convenient and safer. An Example I’d like to demonstrate some of these […]

How to Re-Map Many Columns in a Database

Introduction A surprisingly tricky problem in doing data science or analytics in the database are situations where one has to re-map a large number of columns. This occurs, for example, in the vtreat data preparation system. In the vtreat case, a large number of the variable encodings reduce to table-lookup […]