Deep Dream (Google) – Computerphile

Yeah, we’ve been late to the party Google Deep Dream has been out for a while, but… it follows so nicely from our neural network talks. Let’s talk about how it works. Google Deep Dream is a strange computer program that outputs…
kinda psychodelic, trippy images. This is one of the Google’s gallery,
and it’s some kind of… strange… I mean…
what is it? We don’t know… Kinda strange… There’s sort of a viaduct here, and that looks like a fountain…
and some grass… Weird, sort of artistic image,
but generated entirely by computer. Now, at the time this came out, most people where having lots of fun
playing with this, and playing with online generators… But no one, you know, really
talked about, deep down, how it worked. Kinda looks like sort of digital Salvador Dalí, doesn’t it? It does a bit, yeah. And I think we quite instinctively
quite like the idea of computers can do art,
in some way… For what it’s worth that’s not
quite art yet I don’t think, but we…
we… you know, it’s a bit of fun. It also has an insteresting
implications as to what a neural network is doing underneath. But we talked about, a bit, doing the video where we
looked into a network, and so it classified digits. This is a similar kind of thing
where you can see that the lower level layers
are doing some things and the higher level layers are
doing other things. So let’s try and break down
what it’s doing and then we can see
some funny images I’ve run on myself. Now, Google’s “GoogLeNet” (which is the name for Google’s networks that they released,
I think in… 2012, as part of the ImageNet competition) is quite complicated, right? They have these modules of
groups of convolutional layers called “Inception modules” which is a very cool name for something which is… probably not quite as cool as that but… it’s a cool name. The idea is you go
deeper and deeper into the network and you get
more and more powerful in classification out
of it, ok? But, at its core,
it’s still a classification network. So it’s saying:
“What is this a picture of?” “It’s a picture of a cat.”
Right? “A hundred percent confident:
picture of a cat, definitely…” “Oh, no, it could be a dog.”
Right? So… What I’m do is:
I’m gonna draw a network But I’m gonna draw my
sort of standard… multi-layer network (but it’s got nothing
to do with convolutions and nothing to do with
GoogLeNet) because it’s easier to visualize. So I’m over-simplifying it,
but on the other hand… …the same things apply to this
small network I’m going to draw, as to the big network. So, remember that we have
some input neurons here and then we have some intermediate neurons and then, finally, our output neurons. Now, if people’ve been back to
the videos we did on this, this neuron here calculates
a weighted sum of all these neurons, and this one calculates
a weighted sum of all these neurons. So, if we were going back to our house analogy, all right? This one here could be number of windows, this could be square footage, this could be if it’s got a pool or not, right? And this is taking some combinations of those things and trying to start to work towards
the price of the house. And this one takes some combination of those combinations and starts working towards the price of the house. Now, when we talked about
convolutional networks these neurons where replaced with
image convolutions, like Sobel edge detedections,
and other things, right? Where the actual convolutions them selves have learned. So, the early layers are going to be finding
lines and corners and things like these. Later on we are gonna start to find
objects, boxes, circles, things that have multiple lines
and corners as part of them. And finally, as a top, we start to move
towards actual objects we are trying to classify. Cats and dogs and bikes. And then finally we get an ouptut
that lights up if it’s a cat, right? That’s the key. I mentioned this very briefly in a video, and I’m gonna mention it here again,
very briefly because backpropagation is not
for a computerphile video, there’s a lot of detailed analysis of
backpropagation online for people who is interested in it, right? It’s fairly mathematically complex, It talks a lot about partial derivatives, and multivariable calculus and things like this. We won’t be doing any of that in this video, so please don’t turn off…, right? But the only theory is that if we put an image in it at this point we can calculate these weighted sums and we can propagate it through
and get a value out that says how much of a cat is in this image, right? It’s essentially what we are doing. When we actually want to train this network to do something. What we do is we know we are looking for a cat so we try and change these weights to better predict it. So we have something called
a “cost function” here, C, and what we’re trying to do, is we’re trying to work out
how we affect C by changing this particular weight here… So, some relationship between
the weights and the cost function. Now… when we train a neural network what we do is we try and minimize this cost function. So the cost function might be something
like prediction accuracy, or euclidean distance, or some sort of softmax, right? But the point is that this gives us a value
of how good our guess is, and then we alter all or weights going backwards to say, let’s change these weights a bit so that that error goes down and we get a little bit closer to do our prediction, right? So we go forward to get our prediction, we calculate the error and we propagate the error backwards. So, that’s all the background you’re gonna need to know how Google Deep Dream works. What Google Deep Dream does is… forget the cost function completely. We’ve already trained the network What we want to do is maximize these values here, or these values here. So… Think about it…
if this is a picture of a cat, right? So, I’m putting in a picture of a cat here. Then, what’s gonna happen is it’s gonna
crossweight the sum, and then, one of the cat neurons is gonna light up.
Right? But also, if you think about this layer, if we are working on it backwards, it might be because this one lit up, which is maybe “there’s ears”, and this one lit up, which is maybe “paws”. And maybe this one lit up because here was “a few lines in a row”, and this one is sort of “fury texture”,
or something, you know. And we’re getting lower and lower levels,
we go through, and in the end it’s because this one lit up, which is “edges” and this one lit up which is “corners in a certain place”. The values in here influence the values
here, and here, and here, and here… and then end up converging on our thing. So, what we want to do to make
our Google Deep Dream images is change the image to make these bigger, right? So this is the amount of “ear” in our image. If we can just make that as big as possible,
we can say “more ears please!”… – “There’s a bit of ears going on there…”
“Well I want more.” “I want more ears, I want more paws,
I want more bits of cat.” So instead of minimizing this cost function, we’re maximizing the sum of these,
or the squared sum of these. Let’s not do anymore maths, right? Let’s look at some pictures. I have my landscape image. Now, if it’s looking very boring to you, it’s because I havn’t passed this through
Google Deep Dream yet, right? But what I’ll do is:
I’ll pass this input into the Google Deep Dream and for every area in the image
it starts to light up some of these neurons. Because maybe, although this isn’t the picture of a cat, maybe there are some kind of “catty” features in it. You know, like the edge of a leaf might
be kind of the same shape as an ear, or this texture, this grass,
kind of the same texture as fur. So, some of the same neurons are gonna light up, right? So what we do is we’ve been trying
to make those bigger by altering our input image, ok? So just like we were trying…
train our network to get better, we train our image to get better, to be more of these features. Now, of course… this network is trained on lots of things
other than cats, so anything that looks at all plausible it’s gonna try and maximize that effect. So, this is a picture I run through it,
here, ok? We’ve done some strange things. In the sky here we’ve got what kinda looks
like buildings appearing And then down here we’ve got
some animals appearing, And s… I don’t know what that is! Some sort of… dalek And then this weird animal here, if we zoom in on that… I mean… it’s anybody’s guess
what kind of animal that is. But this is what’s so cool about Google Deep Dream: it’s you don’t know what you’re going to get and it’s going to depend on your input image. So… you know… The features that it found in the input image… “Oh, that feature looks a bit like a bunch of lines,
which in turn look a bit like… the edge of a cat’s head… …make it look more like that.” And if you keep doing this process it starts to converge on weird animals that have interesting features. – So is that multiple iterations? Yes! I think it does about… 40 iterations by default. So it… tweaks the weights of the image
40 times. An actual fact, Google Deep Dream
does it at different scales, as well. But we’ve… I sort of blush over that
because it’s not hugely important, but it runs a small version of the image first, makes it a bit bigger and runs it again,
makes it a bit bigger and runs it again. So you could then take this image,
and put it back into the fun and make it more: “I want more
of this weird shapes and weird animals”. So I take this image and I put it in, and I get something that’s really weird. So, it’s the same, but just more of it. It bears no ressemblance with it,
I mean, there’s a tiny bit of tree left here… But this… it bears very little ressemblance
to our original image, apart from this generic area of ground and the sky… But on the other hand, we’ve got all kinds of… there’s a weird car appearing here and some actual full own buildings
starting to appear. Because, later on in this network, some of these neurons are going to be representing building shapes. And so it’s trying to make it “more building shape”. What’s this for? Right? Why are we doing this? I mean… that’s a question you’ve got to ask Google,
cause I’m not entirely sure… But no, it’s… There are two things… It’s fun, right? so…
Mostly it’s fun. Most people aren’t interested in what neural
network is doing underneath, they like cool, trippy images. One of the probles with neural networks is
they are a black box. So we… design them with an architecture and then we run them,
and they get… I don’t know… 80% accuracy on some task, and that’s very good, and then we say no more about it. We now have a program that can
classify these things at 80% accuracy. In many ways we don’t really care about how it did it, (if it does it). But, if we want to improve these things
beyond the 80%, and beyond 90%,
and getting better and better… It’s a good idea to try and understand what’s going on underneath. So there are some papers out there, Google working on it developed papers as well, that are trying to understand what it is that the lowest
layers of the network are doing and the highest layers of the network are doing for different tasks. Intuitively the lowest levels are edges in things… and, with we go up, hierarchical group of these things, so… buildings and so on. One thing you could do… is you could… instead of maximizing this layer, which represents very high level objects, we can maximize one of the layers down here which maximizes edges and things. So here’s another picture of Google Deep Dream that I’ve maximized
a lower layer So you can see that instead of
starting to form objects, it’s now just starting to form patterns of lines and textures. And that’s because that’s the only thing that’s described at this lower level of a network. – So now we are on Van Gogh Yeah, right?
So yeah… impressionism, huh? This is much better I can paint, as well. The idea is the lower levels
of a network are doing things like this…. And the higher layers of a network
are looking for more complex objects. That’s basically what neural network does. This network has been trained on
somewhere around defined and closed objects so cats, dogs, bikes, people, buildings and so on. This network that I showed at the beginning is trained only on buildings, which is why many of the things
that have been generated in it look like buildings. Often… some of the objects you see start to look very similar So you’ve got a building here that looks like a building, And that one that looks kinda similar with this spike on. And that’s because the network’s been
trained on certain objects and these objects get a good response and then it maximizes those things. So the question was then: What if I want to generate an image
that makes it look more like a cat? Specifically a cat, rather than just cats and dogs and buildings
and bikes, and all this different things. So what we do is we put a cat image into it, into the network and we find out which of these light up, for a cat, specifically a cat, right? And then we… instead of maximizing all of them, we maximize only those ones, ok? So we’re basically saing “now… “…more of it, please…” “…but more of only the specific
interesting cat ones.” I chose cats because people on the
Internet have a lot of pictures of cats They’re very easy to obtain… So I put in a picture.
Here’s some pictures, Some cats I put in. So, when I put this into the image, into the neural network, it’s going to classify this as a cat, or multiple cats. And it’s going to that by finding combinations
of features that look like cats. So if I pin down the learning to do this, I can start to make my image look more like a cat. So you can see that some eyes have appeared, there’s a kinda… nose here That looks… let’s face it,
it’s not really a cat, but it’s more a cat than the landscape was. It’s a pretty weird image,
all things considered. And this is a high level. If we do the same thing for a lower layer, we can’t get all that hierarchical sort of ears plus eyes plus nose. We can only get the low level things. So we can say do this one, which is almost entirely fur and eyes. Right? so… you can see that the clouds kinda look a bit like fur, so it’s made them look more like that and the eyes… all down here. So, this is a different kind of image that we produced, by trying to make it look a bit like a cat, but only at low level, so, you know, what are the low level
features that make a cat. Ok? How strange! And finally, you can do it to people, so I put in a picture with some people’s faces. And… out we get this incredibly weird… picture of sort of weird… harpy baby things… that kinda…
gives me nightmares. You could argue, in some ways
we’re gaining in intuition about what the lower levels do,
what the higher levels do. Predominantly, it’s just for fun. There are other papers that do… an outputting of network layers and trying
to work out what it is that each layer is doing but in this way it just…
generates cool images. So, if you want to use Google Deep Dream, you just need an input image and maybe a reference image like a cat to target it towards, but really you don’t need much more than that. And then you can just get going on it. – Is there a website or something? So… no, so, actually, it’s, it’s python code which goes into caffe which is a deep learning library. So that’s how I generated these images. People have obviously put a website frontend on this, it’s very easy to find websites that do the same thing. But in actual fact, they would just
be running the source code back behing the scenes. If you look for the source code,
it’s actually not very long. because this process of backpropagation is already
coded up in these libraries. So what we need to do is telling… instead of maximi… minimizing this C we wanna maximize the value of these things. And you just change a few numbers around, and send it backwards through the network. It’s, you know, not… It sounds complicated, but really actually isn’t that complicated once you actually look at the code. (other video reproducing)


Add a Comment

Your email address will not be published. Required fields are marked *