Menu Home

Programs reduced to statistics

An interesting article on programming languages by Guillaume Marceau is making the rounds:
The speed, size and dependability of programming languages. The article points out very clearly what some of the differences in major programming languages are. The author uses benchmarking and graphs in an interesting way.

I have had a soft spot for this kind of study ever since I read: Donald E. Knuth: An Empirical Study of FORTRAN Programs. Softw., Pract. Exper. 1(2): 105-133 (1971). In that article Knuth admits to breaking into people’s accounts to collect statistics on what evil people were feeding into the FORTRAN complier.

Let’s look at the gestalt of a few popular programing languages following button-sized excerpts from Marceau’s article:


To build these graphs 19 challenge problems were implemented in 72 programming languages. Each square is programming language, the x-axis is runtime size and the y-axis is code size (large is bad on both of these). Each line segment connects the code size and run-time of one example program run to the centroid of all such runs for the language. We all know code size is not a very good stand-in for programming difficulty (compare C a merely primitive language to C++ an outright programmer hostile language), but the pictures actually tell a credible story.

  • GCC (or C) is very very fast but takes a lot of code (its graph is a vertical bar running up and down the left).
  • Java mostly works like C, but every once and a lets you down on performance (this is leaving out that Java is far safer than C and far more wasteful of memory).
  • Javascript and Ruby have such bad implementations that their centers are off the graph (this brings up a point the original authors well understand- you can not benchmark a language only a specific run of a specific program using a specific language implementation).
  • Perl and Erlang have similar run time performance (though are completely opposite poles of elegance, elegance not plotted on graph).
  • Ruby’s implementation makes Python look fast.
  • OCaml lives up to its reputation of being simultaneously very expressive and efficient (but expressive power is not a direct measure of ease of use, think of APL).

The benchmarking depends on people donating example programs and the problem types are heavily biased towards the puzzle are (where C, Java and OCaml excel) and not to the “its a one-liner because it is already done in a frame work” (Perl, Python, Ruby).

For all the problems inherent in such a study I think it is actually interesting what a little quantitative data lets us think about.

Categories: Computer Science

Tagged as:


Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

%d bloggers like this: