OpenAI Product Presentation Deck

Released by

Openai

1 of 56

Creator

openai

Category

Technology

Published

November 2018

Slides

Transcriptions

#1OpenAI Progress towards the OpenAl mission Ilya Sutskever Co-founder and Chief Scientist, OpenAl NOVEMBER 9, 2018#2OpenAl's mission OpenAl's mission is to ensure that artificial general intelligence (AGI) — by which we mean highly autonomous systems that outperform humans at most economically valuable work - benefits all of humanity. - The OpenAl Charter#3Technical progress from OpenAl#4OpenAl Five#5EB Dota $#6Dota is hard Partial observability 120 heroes (we integrated 18) 20,000 actions per game, massive action space Pros dedicate their lives to the game, 10K+ hrs of deliberate practice#7Dota is popular Largest professional scene Annual prize pool of $40M+#8Ĩ Đặt = CHIẾN BI NET WORTH 8755 6571 6393 6027 5913 4766 4333 3663 3650 3424 KIDIA 6/2/11 OpenAl 3 (Bot) 655/ TANEROpenAl 3 (Bot) 431/ OpenAl 3 (Bot) Killing Spree 3x Kill Streak ended OpenALS (Bot +26605 SLITH SMALLY 44 NO BUYBACK 605 Dota OpenAT 3 (Bot) is on a killing spree 0/1064 236/627 10 OPENAL FIVE MATCH #2 271#9Our approach Very large scale reinforcement learning millennia of practice LSTM policy = honeybee brain Self play Reward shaping#10Reinforcement learning (RL) actually works! Nearly all RL experts believed that RL can't solve tasks as hard as Dota Horizon too long#11Estimated Dota Rating (MMR) 8,000 7,000 6,000 5,000 4,000 3,000 2,000 T Mirror Necrophos, Lich, Crystal Maiden, Viper, Sniper May 6 Blitz + Audience, OpenAI Five-Estimated Dota Rating OpenAI Dev Team May 20 Results Amateur Team. June 3 Witch Doctor, Gyrocopter, Earthshaker, Tidehunter Mirror Death Prophet, O. June 17 Semi-Pro Team Picking Composition Test Team A July 1 Date (2018) July 15 Test Team B Drafting July 29 Team paiN X Caster Team Aug. 12 Single Courier X Chinese Superstar Team Aug. 26#12Dactyl#13V Dexterity GOAL 1 ΕΝ#14Diverse objects#15Strategy: Sim 2 Real REAL-WORLD ENVIRONMENT SIMULATION ENVIRONMENT#16Domain randomization Train in simulation: randomize perception and physics MOD#17Transfer to the Real World CONV CONV CONV Domain randomization Object Pose Fingertip Locations LSTM Actions#18Curiosity-based exploration#19Novel states = reward Fix all bugs Very hard to do Core idea#20Montezuma's Revenge ? [ 100 II B Our agent trained with RND shows a wide range of capabilities:#21Montezuma's Revenge 1400 It goes left...#22Mean Episodic Return 8K 6K 4K 2K 0. 0 Montezuma's Revenge 0.3B 0.6B Frames 0.9B 1.3B 1.6B#23Mean Episodic Return 8K 6K 4K 2K 0 0 Montezuma's Revenge 0.3B 0.6B 41 Frames 0.9B 1.3B 1.6B#24Game Score 10,000 8,000 6,000 4,000 2,000 0 Average Human SARSA Linear Progress in Montezuma's Revenge 2013 2014 DDQN Gorila DQN. 2015 MP-EB DDQN-CTS DQN-PixelCNN Duel. DQN Year 2016 A3C-CTS Prior. DQN A3C Pop-Art Feature-EB UBE BASS-hash 2017 ● ES C51 RND. IMPALA Ape-X • Rainbow 2018 2019#25Mario MARIO 000000 0x00 1601010100101010 WORLD 1-1 TIME 400#26Curiosity early in training ООЧ 4 1 0.25 0.20- 0.15- 0.10 0.05 with m 100 0 Intrinsic reward 200 300 400#27411 Curiosity late in training 5 4 3- 2- 1 0 0 500 Intrinsic reward 1000 1500 2000 2500#28The OpenAl Mission#29OpenAl's mission OpenAl's mission is to ensure that artificial general intelligence (AGI) — by which we mean highly autonomous systems that outperform humans at most economically valuable work - benefits all of humanity. - The OpenAl Charter#30Impact of AGI Generate massive wealth Potential to end poverty, achieve material abundance#31Impact of AGI Generate massive wealth Potential to end poverty, achieve material abundance Generate science and technology cure disease, extend life, superhuman healthcare mitigate global warming, clean the oceans, etc massively improve education and psychological well being#32Why is OpenAl's mission relevant today? We review progress in the field over the past 6 years Our conclusion: near term AGI should be taken as a serious possibility#33Algorithms#34Deep Learning at the root of it all During the past 6 years, deep learning repeatedly and rapidly broke through "insurmountable" barriers 31 A multilayered Perceptron trainable with backpropagation#35car Vision (2012-2016) The image patch HOG feature (2005)-image by Antonio Torrabla What the detector sees#3630 25 20 15 10 0 T | I L 26.0 2011 (XRCE) ImageNet Classification Error (Top 5) 16.4 Vision (2012-2016) 2012 (AlexNet) 11.7 2013 (ZF) 7.3 2014 (VGG) 6.7 2014 (GoogleLeNet) 5.0 Human 3.6 3.1 2015 2016 (ResNet) (GoogleLeNet-v4)#37Encoder Embed He loved to eat S Translation (2014-2018) Er liebte zu essen Softmax Decoder NULL Er liebte zu essen 46 44 42 40 38 36 2014 2015 2016 2017 BLEU score on EN to FR translation on the WMT dataset 2018#38Image generation (2014-2018) GANS over the years: 2014 16 B Goodfellow et al, 2014#39Image generation (2014-2018) GANS over the years: 2015 013 Radford et al, 2015#40Image generation (2014-2018) GANS over the years: 2017 Karras et al, 2017#41Image generation (2014-2018) GANS over the years: 2018 Brock et al, 2018#42DQN (2013) Reinforcement Learning (2013-2018) 056 Mnih et al, 2013#43TRPO (2015) Reinforcement Learning (2013-2018) Schulman et al, 2015#44AlphaGo (2016) 23456TBUHNSHEK769 8 10 11 12 13 14 15 16 17 18 19 Reinforcement Learning (2013-2018) ABCDEFGHIJKLMNOPQRS LEE SEDOL 00:01:00 ALPHAGO 00:00:55 Silver et al, 2016#45OpenAl Five (2018) Reinforcement Learning (2013-2018) Very large scale: +100,000 CPU cores, +1000 GPUs 17 PAIN GAMING HFN TA vo. GRD W33 DUSTER 17 OpenAl Five Predictions Next 8 Minutes lower court .: 23/409 Y /TIARY 1724#46Compute#47Compute grows rapidly Neural networks can usefully consume all available compute Neural networks are extremely parallelizable 300,000x increase in neural net compute used for the largest neural net experiments over the past 6 years#48Early production convnet (1989) TD-Gammon v2.1 (1993) Speech RNN (1994) TD-Gammon v3.1 (1998) AlexNet (2012) Dropout (2012) Visualizing & Understanding CNN's (2013) DQN (2013) Seq2Seq (2014) GoogleNet (2014) VGG (2014) DeepSpeech2 (2015) ResNets (2015) Neural Architecture Search (2016) Neural Machine Translation (2016) Xception (2016) TI7 Dota 1v1 (2017) AlphaGoZero (2017) AlphaZero (2017) TI8 Dota 5v5 (2018) 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 Petaflop/s-days to train#49Formidable challenges remain Unsupervised learning Robust classification Reasoning Abstraction ???#50We've been breaking through barriers for 6 years Will this trend continue, or will it stop? And if so, when?#51THIS TALK'S GOAL IS TO PRESENT EVIDENCE THAT: While highly uncertain, near-term AGI should be taken as a serious possibility. Means proactively thinking about risks: Machines pursuing goals misspecified by their operator Malicious humans subverting deployed systems Out-of-control economy that grows without resulting in improvements to human lives#52Thank you

Download to PowerPoint

Download presentation as an editable powerpoint.

Related