Nina Zumel and I ( John Mount ) have been working very hard on producing an exciting new book called “Practical Data Science with R.” The book has now entered Manning Early Access Program (MEAP) which allows you to subscribe to chapters as they become available and give us feedback […]
Estimated reading time: 59 seconds
There is no excuse for a digital creative person to not use some sort of version control or source control. In the past disk space was too dear, version control systems were too expensive and software was not powerful enough; this is no longer the case. Unless your work is […]
Estimated reading time: 10 minutes
We describe briefly the powerful simulation technique known as “importance sampling.” Importance sampling is a technique that allows you to use numerical simulation to explore events that, at first look, appear too rare to be reliably approximated numerically. The correctness of importance sampling follows almost immediately from the definition of […]
Estimated reading time: 1 minute
We share our admiration for a set of results called “locality sensitive hashing” by demonstrating a greatly simplified example that exhibits the spirit of the techniques.
Estimated reading time: 42 seconds
We extend the ideas of from Automatic Differentiation with Scala to include the reverse accumulation. Reverse accumulation is a non-obvious improvement to automatic differentiation that can in many cases vastly speed up calculations of gradients.
Estimated reading time: 1 minute
This article is a worked-out exercise in applying the Scala type system to solve a small scale optimization problem. For this article we supply complete Scala source code (under a GPLv3 license) and some design discussion.
Estimated reading time: 36 minutes
We describe the “the local to global principle.” It is a principle used to break algorithmic problem solving into two distinct phases (local criticism followed by global solution) and is an aid both in the design and in the application of algorithms. Instead of giving a formal definition of the […]
Estimated reading time: 53 minutes
What makes a good graph? When faced with a slew of numeric data, graphical visualization can be a more efficient way of getting a feel for the data than going through the rows of a spreadsheet. But do we know if we are getting an accurate or useful picture? How […]
Estimated reading time: 22 minutes
Our first “exciting technique” article is about a statistical language called “R.” R is a language for statistical analysis available from http://cran.r-project.org/ . The things you can immediately do with it are incredible. You can import a spreadsheet and immediately spot relationships, trend and anomalies. R gives you instant access […]
Estimated reading time: 3 minutes