I am going to come-out and say it: I am emotionally done with 32 bit machines and operating systems. My sympathy for them is at an end.
I know that ARM is still 32 bit, but in that case you get something big back in exchange: the ability to deploy on smartphones and tablets. For PCs and servers 32 bit addressing’s time is long past, yet we still have to code for and regularly run into these machines and operating systems. The time/space savings of 32 bit representations is nothing compared to the loss of capability in sticking with that architecture and the wasted effort in coding around it. My work is largely data analysis in a server environment, and it is just getting ridiculous to not be able to always assume at least a 64 bit machine.Some dust-ups are associated with on 32 bit addressing and word-size include:
- The failure of Go’s heuristic garbage collector (32 bit values are too subject to collision to allow the heuristic “looks like a pointer” determination to be reliable.)
- Mongo DB quietly losing data after 2GB.
- GNU Hurd’s 2GB file partition limit.
- Limited arithmetic and small memory limits in tagged native-work Lisp implementations.
(Notice that the emacs-lisp variant this article is talking about has a native machine integer that tops out at under 300 million giving it only 24 bit addressing.)
I say “associated with” because none of these failures was really due to 32 bit addressing/word-size alone. They are all due to 32 bit addressing combined with one more shortcut (heuristic instead of correct garbage collector, not checking error codes, memory-mapping partitions, pointer mangling and on on). So the problem is 32 bit addressing has already spent a lot of your luck.
And we do still encounter these machines. Some common hold-outs of 32 bits in a server environment include:
- Amazon EC small/medium instances
- Cheap Hadoop clusters.
- Cheap virtual machine strategies (lots of small 32 bit virtual instances, often hosted on a 64 bit machine).
That sad part is all of these environments are remote/virtual and therefor only a simple configuration away from being 64 bit.
Don’t you feel your software deserves access to more than $40 worth of memory?
Or from a computational point of view: my aging laptop only takes around 1.2 seconds to count up from zero and overflow into negative numbers in Java:
public class Count { public static void main(String[] args) { final long t0MS = System.currentTimeMillis(); for(int i=0;i>=0;++i) { } final long tFMS = System.currentTimeMillis(); System.out.println("MS elapapsed: " + (tFMS-t0MS)); } }
We all know 32 bits represents trade-off of space for expressiveness. But I don’t think enough people remember they are settling for the expressiveness of about $40 of memory and 1.2 seconds of computation. That is how far Moore’s law has moved what we should settle for. The sweet spot in trading code complexity versus machine effort move rapidly, so compromises that made sense in the past rapidly become hindrances when not re-evaluated.
I will end with a personal example: a lot of our clients are in what I call the region of “medium data” (many gigabytes, but not terabytes). In this range you can, on a 64 bit architecture, perform very rapid and agile analyses using relational DBs and standard in-memory analysis suites (like vanilla R). However, at this scale on a 32 bit machine (or cluster of 32 bit machines) you tend to resort to the big data techniques designed for terabyte situations: map reduce and out-of core extensions (like Revolution Analytics or ff). These methods can limit your expressiveness, take longer to code for and take longer to run (using network and disk more often). And, you still are only in the “medium data regime” (you may not yet have enough data to hit “The Unreasonable Effectiveness of Data” effect, so you still need the agility in trying models to make progress as you don’t yet have enough data for the data to simply construct a dominant model). For many analysis tasks delaying the switch from small/medium data techniques to “big data methods” has significant advantages.
It all comes down to how much your time is worth.
Categories: Coding Computer Science Opinion
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.