Artificial intelligence, like machine learning before it, is making big money off what I call the “sell ∀ ∃ as ∃ ∀ scam.”
The scam works as follows.
- Build a system that solves problems, but with an important user-facing control. For AI systems like GPT-X this is “prompt engineering.” For machine learning it is commonly hyper-parameters.
- Convince the user that it is their job to find a instantiation or setting of this control to make the system work for their tasks. Soften this by implying there is a setting of the control that works for all of their problems, so finding that setting is worth the trouble. This is the “∃ ∀” claim: that there exists (∃) a setting or configuration that makes the system work for all (∀) of your examples.
- In practice just make the setting or control complicated enough to provide memorization or over-fitting. That is: exploit the fact that for sufficiently rich systems is relatively easy to provide a “∀ ∃” system. That is a system for every (∀) task, there exists (∃) a setting that gives the correct answer for that one task. It is just there is no one setting useful for all tasks. This can devolve into something cryptomorphic to “the system can copy the answer from its input to its output.”
Hiding the right answer in variations.
Potter Three-upmanship, Holt, Rinehart and Winston, 1962.
In my opinion one can see this scam of hiding some debt in with an asset spreading.
Earliest modeling systems, such as linear regression, had no hyper-parameters. An under specified algorithm was not considered a fully specified method.
Machine learning systems, such as random forest, have hyper-parameters. However, they specified how to set them in the paper and supplied and implementation that set the hyper-parameters. Hyper-parameters were seen as embarrassing technical debt to be solved before releasing the system into the wild.
Then we get to boosting, neural nets, large language models, and others. These are very useful methods. Also, as they move hyper-parameter selection fully to the users, they become not falsifiable. If they didn’t work, it is because you didn’t pick the right hyper-parameters or training procedures. These are among my favorite methods, but they do have a bit of a fortress-like “no true Scotsman” defense built in. And, if you are only rewarding first to appear correct- you will (by selection bias) end up only rewarding scams.
To conclude: one must have different standards for developing systems than for testing, deploying, or using systems. Or: testing on your training data is a common way to cheat, but so is training on your test data.
Picayune: word missing, it seems: “as they the hyper-parameter selection fully to the users”
Thanks! Should be fixed now (blunder edit on my part).
Also “is is”
With deep neural networks the magic is that adding more hyperparameters doesn’t make it overfit as opposed to all previous models.
This is the statisticians fallacy: “adding more hyperparameters will lead to overfitting”.
Deep learning doesn’t overfit as quickly as the old school wide and shallow nets but with fixed data, increasing model complexity will result in overfitting, eventually.