Menu Home

Neural net image salad again (with code)

Alexander Mordvintsev, Christopher Olah, and Mike Tyka, recently posted a great research blog article where they tried to visualize what a image classification neural net “wants to see.” They achieve this by optimizing the input to correspond to a fixed pattern of neural net internal node activation. This generated truly beautiful and fascinating phantasmagorical images (or an “image salad” by analogy to word salad). It is sort of like a search for eigenfaces (but a lot more fun).

A number of researchers had previously done this (many cited in their references), but the authors added more good ideas:

  • Enforce a “natural image constraint” through insisting on near-pixel correlations.
  • Start the search from another real image. For example: if the net is internal activation is constrained to recognize buildings and you start the image optimization from a cloud you can get a cloud with building structures. This is a great way to force interesting pareidolia like effects.
  • They then “apply the algorithm iteratively on its own outputs and apply some zooming after each iteration.” This gives them wonderful fractal architecture with repeating motifs and beautiful interpolations.
  • Freeze the activation pattern on intermediate layers of the neural network.
  • (not claimed, but plausible given the look of the results) Use the access to the scoring gradient for final image polish (likely cleans up edges and improves resolution).

From Michael Tyka’s Inceptionism gallery

Likely this used a lot of GPU cycles. The question is, can we play with some of the ideas on our own (and on the cheap)? The answer is yes.

I share complete instructions, and complete code for a baby (couple of evenings) version of related effects.

What we need to optimize images through a neural net scoring function is at least the following:

  • A trained image recognizing neural net. This supplies our objective function. I chose Caffe after seeing it featured in another fun article.
  • Somewhere to run the whole thing. I chose Amazon EC2. I tried to assemble complete instructions for installing Caffe on a fresh EC2 instance.
  • A source of images and image transformations. Instead of modifying images directly (which likely is a bit of work to do effectively for an arbitrary scoring net) I decided to used an evolving image process I already had access to: my 1995 genetic art project. The code is getting a bit creaky, but is available here. This system already had a cross-over combinator for the underlying formulas that generate the images- so we have a ready process we can try to optimize over (through crude evolutionary algorithms). Exact EC2 instructions are in the included file ec2Steps.txt.
  • A “natural image constraint.” I dashed this off quickly by saying an image is “natural” if skimage.restoration denoise_tv_chambolle doesn’t pull pixels to far away from the original image. What we are fighting is the now well-known issue that convolutional neural nets deep learning machines seem to (unfortunately) determine a lot of their classification on what humans consider to be visual static (see the references included in the original article), so something as simple as a regularization control should work here.

Given this set-up I decided to optimize the genetic art for “crab-like pictures” (as defined by the classification categories from the chosen pre-trained neural net). This is a tip of the hat to Michael Witbrock (one of my collaborators, along with Scott Neal Reilly on the 1995 genetic art project) who inspired us with a (probably apocryphal) story of crabs perhaps naturally selected to have patterns resembling human faces.

Samurai crab, H. japonica and stylized Kabuki samurai face (inset). From: Samurai Crabs: Transmogrified Japanese warriors, the product of artificial selection, or pareidolia?

A quick run yielded an image that the neural net was 99.12% sure was some sort of crab:

Here it is rendered at 256×256 (the net’s concept space):

Artificial “crab” image (rendered 256×256, as this is the net’s concept space).

And re-rendered at a higher resolution (with some anti-aliasing):


The genetic art project really only seems to have so many images in its concept space, but even with a crude evolutionary optimizer over its underlying representation (which is text formulas, not images) it can evolve pictures that fool the image classification net (the advesary seems to have the easy side in adversarial machine learning). With more care (better “natural image” function, richer representation language) we could probably do a lot more.

Now the “winner” was not a very legible or natural image (so we need a better “natural image” filter, which we could definitely develop). But check out the renders of some images we got on the way to this one.

  • PicR000011
  • PicR000013
  • PicR000021
  • PicR000026

These are images saved as “new record matches” while running the genetic art unattended. Obviously purely artificial images scored against a low-resolution image classifier are not going to have as many realistic features as images built starting from high resolution sources on a high-resolution net (and repeated and re-zoomed). But I think there is something here. The image classification neural net seems to work as a passable “is interesting” function. This is noteworthy because one of the inspirations for my 1995 project was:

Shumeet Baluja, Dean Pomerleau and Todd Jochem, “Simulating User’s Preferences: Towards Automated Artificial Evolution for Computer Generated Images” Technical Report CMU-CS-93-198, Carnegie Mellon University. Pittsburgh, PA. October 1993.

A paper who’s goal was to train a neural net to recognize interesting images from a stream of artificial images (trained from previous user decisions).

And for a more generative approach to image synthesis check out: (warning, some of the image sources were pornographic) Scott Draves’ 1993 Fuse work .

Categories: Tutorials

Tagged as:


Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.