Our friends at Dataspora have a nice article on the more modern Map Reduce languages. A very good read and clearly a lot of thought went into preparing it.In passing we are rightfully taken to task for hiding a huge glob of code in a tar file that few people are likely to open. Using higher order tools could indeed make the code smaller. Perhaps small enough that we could share it in a more readable format. It is a good point and our only answer to it is we at Win-Vector LLC see ourselves as tool builders delivering complete tools that perform well defined tasks (like a logistic regression) so that most people do not have to open the tar file (but they can if they need to). That is: we believe in higher order languages tools, and we supply some of them. We also, however, like to minimize external dependencies so that our code can run on more systems.
Back to the tar file issue. We had been meaning to get our code up on github or some other public source control system. Instead we have HTMLified it (with some cross reference links, it still isn’t pretty). (edit: it is now up: WinVector/Logistic)
And Antonio Piccolboni, thanks for the great article.
Categories: Computer Science
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
Hi and thanks for your comments. I just wanted to apologize if my remarks came across as taking you to task for using java in that logistic regression example. It is clear that when data sets are large enough and programs are going to be run on a regular basis on large clusters there is little alternative to that, and the dependencies argument is also a factor. What I meant was to commend you for the effort, but also decry the code sprawl that even an expert implementor of this type of algorithms can not avoid. Maybe advancements in compiler technology and cheaper hardware will eventually let us code only for beauty; until then, as you say, a variety of tools and pragmatism are in order.
Antonio, I truly liked your article and don’t feel you said anything wrong. Thanks for the link and comments.