Articles, Blog

Google DeepMind’s Deep Q-Learning & Superhuman Atari Gameplays | Two Minute Papers #27

Google DeepMind’s Deep Q-Learning & Superhuman Atari Gameplays | Two Minute Papers #27


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. This one is going to be huge, certainly one
of my favorites. This work is a combination of several techniques
that we have talked about earlier. If you don’t know some of these terms, it’s perfectly
okay, you can remedy this by clicking on the popups or checking the description box, but
you’ll get the idea even watching only this episode. So, first, we have a convolutional neural
network – this helps processing images and understanding what is depicted on an image. And a reinforcement learning algorithm – this
helps creating strategies, or to be more exact, it decides what the next action we make should
be, what buttons we push on the joystick. So, this technique mixes together these two
concepts, and we call it Deep Q-learning, and it is able to learn to play games the
same way as a human would – it is not exposed to any additional information in the code,
all it sees is the screen and the current score. When it starts learning to play an old game,
Atari breakout, at first, the algorithm loses all of its lives without any signs of intelligent
action. If we wait a bit, it becomes better at playing
the game, roughly matching the skill level of an adept player. But here’s the catch, if we wait for longer,
we get something absolutely spectacular. It finds out that the best way to win the
game is digging a tunnel through the bricks and hit them from behind. I really didn’t
know this, and this is an incredible moment – I can use my computer, this box next to
me that is able to create new knowledge, find out new things I haven’t known before. This
is completely absurd, science fiction is not the future, it is already here. It also plays many other games – the percentages
show the relation of the game scores compared to a human player. Above 70% means that it’s
great, and above 100% it’s superhuman. As a followup work, scientists at deepmind
started experimenting with 3D games, and after a few days of training, it could learn to
drive on ideal racing lines and pass others with ease. I’ve had a driving license for
a while now, but I still don’t always get the ideal racing lines right. Bravo. I have heard the complaint that this is not
real intelligence because it doesn’t know the concept of a ball or what it is exactly
doing. – Edsger Dijkstra once said, “The question of whether machines can think…
is about as relevant as the question of whether submarines can swim.” Beyond the fact that rigorously defining intelligence
leans more into the domain of philosophy than science, I’d like to add that I am perfectly
happy with effective algorithms. We use these techniques to accomplish different tasks,
and they are really good problem solvers. In the breakout game, you, as a person learn
the concept of a ball in order to be able to use this knowledge as a machinery to perform
better. If this is not the case, whoever knows a lot, but can’t use it to achieve anything
useful, is not an intelligent being, but an encyclopedia. What about the future? There are two major
unexplored directions: – the algorithm doesn’t have long-term memory,
and even if it had, it wouldn’t be able to generalize its knowledge to other similar
tasks. Super exciting directions for future work. Thanks for watching and for your generous support, and I’ll see you next time!

Tagged , , , , , , , , , , , , , , , , , , , , , , , , ,

20 thoughts on “Google DeepMind’s Deep Q-Learning & Superhuman Atari Gameplays | Two Minute Papers #27

  1. "Whoever knows allot, but can't use it to achieve anything useful, is not an intelligent being but an encyclopaedia."
    — Károly Zsolnai-Fehér

  2. just imagine using this for the stock market, have it research every even in government, business, all that. this could make so much money in the stock market

  3. You said in the video that it knows the score. How is that, is it being fed the score or is it watching the screen? If so, it probably has no idea what the score is, only that it's changing.

  4. There are actually some other implicit priors (of knowledge) given to the network. When in the last layer of the network all the possible actions are pre-specified, this sets up the problem to be more readily solved. What if we gave the network a great deal more of outputs it could generate (corresponding to different key presses) and it had to learn which keys are also the right ones? This would, in some sense, make this closer to the way humans learn to play these games, no? Though, thus increasing the action space considerably and adding irrelevant actions (that don't do anything) I wonder how well DQN would learn.

  5. I wonder if it would start to play Overwatch now, what would be it's main pick 6 months from now.

    I'm guessing Sombra

  6. Are different CNN feature extraction layers needed for different games? How would one CNN know what is a positive label from one game is to another?

  7. It is incredible that this video is 2 years old, and now we have an AI which beats DOTA2 players at the highest competitive level.

  8. Hi! Great video! I see you use Standard Youtube License instead of Creative Commons. Would you authorize me to show this video on a non-profit presentation at work for others employees that I will do about Machine Learning? Of course proper attribution would be given. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *