Prog Rock Mixture Density Network

I recently learned about using Mixture Density Networks and decided to attempt to learn the distribution of two classic prog rock album covers using a neural net.

The album covers that I used were King Crimson's "In The Court of the Crimson King" (painted by Barry Godber - 1969) and Gentle Giant's first self-titled album (painted by George Underwood - 1970). Both of these album covers are basically just giant faces, which is why I felt they might produce an interesting combination when fed through a Mixture Density Network. Here is an image showing the two album covers and the resulting output:

I used 2 hidden layers with 6 hidden units and a final layer that used a mixture of 5 gaussian distributions that were parameterized by my feedforward network. I trained my network for a total of 1000 iterations using a batch size of 25. The input took in X, Y positions of both images and then output a prediction of the combined RGB values.

Overall, it looks like the first image (King Crimson) was recognized the most, however there are traces of the second image (Gentle Giant) such as the eyebrows and beard.

Here are the tensorboard visualizations of my means and sigmas during training: