Menu Home

My opinion on “5” == 5

Every programmer should have an opinion on what the outcomes of the expressions like "5" == 5 should be, and perhaps even a guess as to what the answer is in their most familiar programming language.

In my opinion SQL gets it right. For example, we get the following in Google BigQuery.

SELECT "5" = 5
-- No matching signature for operator = for argument types: STRING, INT64. Supported signature: ANY = ANY at [1:8]

That is a nice safe early error that can prevent a lot of confusing data processing bugs down the line. This may be clearer in the related example.

SELECT "5" IN (1, 2, 3)
-- No matching signature for operator IN for argument types literal STRING and {INT64} at [1:12]

This follows from the expectation in SQL that columns and lists have homogeneous types.

In this context: it is likely the user meant SELECT 5 IN (1, 2, 3), i.e. to check if an integer was in a given set of integers. And it is unlikely the user actually meant SELECT "5" IN (1, 2, 3). The second form can never be true in SQL, by simple type inspection, so it is useful to have it disallowed. Violations of expectations are caught and thus easy to find and avoid.

Now let’s try this in a few more languages.

For fun, from memory, what is the outcome of evaluating the expression "5" == 5 in Python3?

It turns out it is False, which is useful and understandable in a general purpose programming language. Python deals with heterogeneous lists and sets. Thus, "5" == 5 and "5" in {1, 2, 3} are sensible expressions, given the language context.

And now we get to the odd one.

R is a language I love and routinely work in. Heck, I even wrote a book about working in it, which I am very proud of.

However, I can’t defend R’s return value for "5" == 5. This turns out to be TRUE. One can, of course, guess at what sort of implicit casting is supporting this, but R isn’t a language where strings and numbers are generally equivalent. Yet we have 5 %in% c("5") evaluating to TRUE. R mostly enforces homogeneous types, but it does so by quiet implicit type conversion (a buggy gift that keeps on giving). A well informed R user expects c(5, "5") to be a vector of strings; I am less convinced many expect "5" == 5 to evaluate to TRUE.

One would expect equal values can always support the same operations. However, in R, 5 + 1 is a sensible expression and "5" + 1 is not. So, I find it hard to argue 5 and "5" are equivalent values.

And that is our programming foible for the day.

Categories: Opinion Tutorials

Tagged as:

John Mount

2 replies

%d