OpenAI Product Presentation Deck

Made public by

Openai

sourced by PitchSend

23 of 69

Creator

openai

Category

Technology

Published

September 2018

Slides

Transcriptions

#1OpenAI: our goal Build safe AGI Make it beneficial, and make its benefits be widely distributed#2Components of AGI Do hard things in simulation Transfer skills from the simulation to the real world Learn world models Safety and deployment 5 seconds#3Components of AGI Do hard things in simulation Transfer skills from the simulation to the real world Learn world models Safety and deployment#4Our recent results OpenAl Five Dactyl Unsupervised language understanding#5OpenAI Five#6Dota Of Mars 2 11631#7Dota is popular Largest professional scene Annual prize pool of $40M+ 5 seconds#84001 Dota is popular#9Strategy, tactics Partial observability Games are long Dota is hard 120 heroes, surprising interactions 20,000 actions per game, massive action space Pros dedicate their lives to the game, 10K+ hrs of deliberate practice#10Our approach Very large scale reinforcement learning: millennia of practice LSTM policy = honeybee brain Self play Reward shaping#11Reinforcement learning (RL) actually works! Nearly all RL experts believed that RL can't solve tasks as hard as Dota Horizon too long#12Reinforcement learning (RL) actually works Pure RL had been applied only to simple games and simple simulated robotics#13Average Return 5000 4000 3000 2000 1000 0 0.00 Skepticism about RL HalfCheetah-v1 (TRPO, Different Random Seeds) 0.25 0.50 0.75 1.00 Timesteps 1.25 ww A Random Average (5 runs) Random Average (5 runs) 1.50 1.75 2.00 * 100 From Henderson et al., 2017#14RL: the basics Add noise to actions Use reward to tell if action was good#15VALUE ADD NOISE R The key: Actor critic R R VALUE#16LSTM policy A recurrent neural network that's easy to train HHH#170-98-0 LSTM policy FETTO F#18Self play 80% of the time: play against self 20% of the time: against past versions#19Cool facts 100k+ CPU cores 1k+ GPUs RL time horizon of 5 minutes gamma = .9997 Games last for 20,000 moves#20Average Return 5000 4000 3000 2000 1000 0 0.00 Skepticism about RL HalfCheetah-v1 (TRPO, Different Random Seeds) 0.25 0.50 ** 0.75 1.00 Timesteps 1.25 AND LON Random Average (5 runs) Random Average (5 runs) 1.50 1.75 2.00 *10⁰ From Henderson et al., 2017#21LSTM policy A recurrent neural network that's easy to train HHH#22Team spirit At first, each LSTM greedily maximized its own reward Over time, the reward of each LSTM was made equal to the reward of the team#23Estimated Dota Rating (MMR) 8.000 7,000 6,000 5,000 4,000 3,000 2,000 1 Mor Necrophon, Lich Crystal Maiden. Vipet Sniper May 6 Open Dev Team May 20 Results OpenAI Five-Estimated Dota Rating Amateur Team June 3 June 17 Witch Doctor, Gyrocopnas. Earthstakes. Tidehunter Mirror Death Prophet. | Semi-Pro Team Compos July 1 Date (2018) July 15 Test Team Single Coun July 29 Aug 12 Aug 26#24Remaining tasks Beat the strongest teams#25Dactyl: learning dexterous manipulation#26Diverse objects#27Domain randomization 0#28Train in Simulation Distributed workers collect experience on randomized environments at large scale Domain randomization We tram a control policy using reinforcement learning. It chooses the next action besed on fingertip positions and the object pose. Robot Stanes Actions © We train a-comelational neural network to predict the object pose given three simulated camers images Object Fo#29Transfer to the Real World Domain randomization D We combine the pose estimation network and the control policy to transfer to the real world. CONY Object Pose Fingertip Locations Actions#30OpenAI Rapid#31Robot API Randomizations Vision-To-State Network Dactyl OpenAI Rapid Cloud APIs PPO LSTM Vision-To-State Input Vision-To-State Rewards Team Spirit OpenAI Five#32Vision-based system#33Vision Architecture SSM ResNet HH Pool Conv Camera 1 ROLU Concat SSM ResNet Pool Conv Camera 2 SSM Poni Object position ResNot Conv Object rotation Camera 3#34fingertip positions object pose Policy Architecture Action Distribution LSTH Fully-connected ReLU Normalization Noisy Observation Goal finger joint positions#35Improving Language Understanding with Unsupervised Learning#36Dataset SNLI MNL Matched MNLI Mismatched SciTall ONLI ATE STS-B GOP MAPC RACE ROCStories COPA SST-2 COLA GLUE Task Textual Entailment Textual Entallment Textual Entailment Textual Entallment Textual Entailment Textual Entallment Semantic Similarity Semantic Similarity Semantic Similarity Reading Comprehension Commonsense Reasoning Commonsense Reasoning Sentiment Analysis Linguistic Acceptability Multi Task Benchmark SOTA 83.3 81.0 53.3 77.6 71.2 Ours 89.9 82.1 81.4 56.0 82.0 70.3 59.0 78.6#37Biggest improvement:#38Details The model: a transformer Dataset: a corpus of books Context size: 512 Training time: 8 P100s for 1 month#39Text Prediction Task Classifier HIHI Layer Norm Feed Forward Layer Norm Masked Muls Sell Attention Text & Position Embed Classification Start Entailment Similarity Start Start Start Start Multiple Choice Start Start Details Text Premise Text 1 Text 2 Context Extract Context Delim Delim Delim Context Delim Delim Transformer Hypothesis Extract Text 2 Text 1 Answer 1 Answer 2 Extract Extract Extract Extract Answer N Extract Linear Transformer Linear Transformer Transformer Linear Transformer Linear Transformer Linear Transformer Linear#40Can the current AI boom scale to AGI?#41GOAL IS TO PRESENT EVIDENCE THAT: While highly uncertain, near-term AGI should be taken as a serious possibility.#42Lessons from history of science Lessons from history of Al Fundamental limits of deep learning Practical limits on compute PROFILES OF "With few exceptions, scientists seem to make rather poor prophets; this is rather surprising, for imagination is one of the first THE FUTURE requirements of a good scientist. Yet, time and again, distinguished astronomers and physicists have made utter fools of themselves by declaring publicly that such-and-such a project was impossible." - Profiles of the Future (1962) The Best#43Moving goalposts Simon Newcomb 1901: heavier-than-air flight is impossible 1908: it's possible, but won't be important since flying machines will never scale to both pilot and a passenger First airplane flight, at Kitty Hawk (December 17, 1903)#44On extending V-2 (14-ton rocket) technology to 5-ton payload (200-ton rocket): America: never going to happen Russia: let's do this Result: Russia first to space Aversion to scale Replica German V-2 rocket (invented 1942) TIT7777 Replica Russian R-7 rocket, first reliable means to transport objects into Earth orbit (launched 1957)#45Habitual detractors 1936: "It must be said at once that the whole procedure sketched in the present volume presents difficulties so fundamental a nature that we are forced to dismiss the notion as essentially impracticable, in spite of the author's insistent appeal to put aside prejudice and to recollect the supposed impossibility of heavier-than-air flight before it was actually accomplished." 1956: "Space travel is utter bilge" -Riet Woolley, Astronomer Royal of the UK Replica of Sputnik 1 (Launched October 4, 1957)#46Lessons from history of science Lessons from history of Al Fundamental limits of deep learning Practical limits on compute Perceptions "We have the impression that many people in the connectionist community do not understand that [back-propagation] is merely a particular way to compute a gradient and have assumed that back- propagation is a new learning scheme that somehow gets around the basic limitations of hill-climbing." -Minsky & Papert (1988)#471960s: Perceptron 1970s: expert systems 1980s: backprop 1990s: SVM + kernel trick 2012-: ImageNet 2020-: ?? History of AI-narrative I'd heard Guyon Vapnik Sutskever, Krizhevsky, and Hinton Cortes#48Community reacted to hype around perceptrons 456641 Rosenblatt & perception (1950) "The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence. Later perceptrons will be able to recognize people and call out their names and instantly translate speech in one language to speech and writing in another language, it was predicted." -New York Times (1959)#49Papert Perceptrons (1969) "There was some hostility in the energy behind the research reported in Perceptrons... Part of our drive came, as we quite plainly acknowledged in our book, from the fact that funding and research energy were being dissipated on...misleading attempts to use connectionist methods in practical applications." -Papert (1988)#50Minsky Where was the money going? "In the late 1950s and early 1960s, after Rosenblatt's work, there was a great wave of neural network research activity. There were maybe thousands of projects. For example Stanford Research Institute had a good project. But nothing happened. The machines were very limited. So I would say by 1965 people were getting worried. They were trying to get money to build bigger machines, but they didn't seem to be going anywhere." -Minsky (1989)#51Neural networks revival "In the early 1980s, dramatic decreases in computing costs brought about a 'democratization' in the access to computing resources." -A Sociological Study of the Official History of the Perceptrons Controversy (1996) Rummelhart McClelland Hinton Anderson#52Tesauro with TD-Gammon (1992) History of AI-alternate narrative Deep learning has consistently scaled with compute for 60 years New levels of compute steers researchers to develop new algorithms such as backpropagation, not the other way around Fads have been more political than technical "It took great chutzpah for Gerald Tesauro to start wasting computer cycles on temporal difference learning in the game of Backgammon" ([32k parameters; total training: 5s on modern GPU] -Pollack & Blair (1997)#53Lessons from history of science Lessons from history of Al Fundamental limits of deep learning Practical limits on compute "The speed with which those who once declaimed, 'It's impossible' can switch to, 'I said it could be done all the time' is really astounding." -Profiles of the Future (1962)#54Before Fantasy: one algorithm to solve speech recognition, machine translation, object recognition better than decades of domain- specific ingenuity The result AlexNet (2012) 60M-parameter neural network which got way better performance on the ImageNet dataset than any other approach After Neural networks dominant approach in these fields 300 car The image HOG features (2005) - figure by Torrabla what the detector ImageNet Classification Error (Top 5)#55Before Deep learning is about static datasets The result DQN (2013) 150,000-parameter feed-forward neural network which learned to play a number of Atari games purely from pixels and score After RL + deep networks might actually be able to observe and act in the world 060 2#56Before Deep learning is just perception The result Neural Machine Translation (2015) 380M-parameter neural network mapping input sequence in one language to output sequence in another After Deep learning can solve even the hardest supervised learning problems, regardless of type signature Encoder 00 He loved to eat Embed 00 Er lebte zu essen Softmax 0·000 Decoder NULL Er eble zu essen#57Before RL can't actually solve hard tasks The result AlphaGo (2016) 72-million parameter feedforward neural network, coupled with MCTS, to defeat top humans at Go After RL + MCTS can solve hard problems, given: discrete actions, modest action space, simulator at test time ICDEFGHIJKLMNOPO LEE SEDOL 00:01:00 ALPHAGO 00:00:54#58Before RL can't solve hard tasks on its own-long- term planning is fundamental barrier The result OpenAI Five (2018) 100M-parameter LSTM competitive with (not yet exceeding) top humans in the esports game Dota 2 After RL can solve extremely hard problems with long-term planning, given only training-time simulator 구 PAIN GAMING MEN-TAND-LINGROWI DUSTER 17#59Before Deep RL is limited to games and other perfectly-simulatable problems Dactyl (2018) The result 1.5M-parameter LSTM, trained in simulation by the OpenAl Five training system & learning algorithm, deployed on a physical robot hand to manipulate a cube After Deep RL can cross reality gap given only a "good enough" simulator at training time GOAL 1#60Before Al progress is driven by labeled datasets. The result Unsupervised NLP (2018) 117M-parameter transformer trained by reading 7,000 self-published books which, with a small amount of supervised fine-tuning, sets state-of-the-art on a huge variety of NLP datasets After Unlabeled data can be even more important than labeled. Dataset SNLI MNLI Matched MNLI Mismatched SciTail ONLI ATE STS-B OOP MAPC RACE ROCStories COPA SST-2 COLA GLUE Task Textual Entailment Textual Entailment Textual Entailment Textual Entailment Textual Entailment Textual Entailment Semantic Similarity Semantic Similarity Semantic Similarity Reading Comprehension Commonsense Reasoning Commonsense Reasoning Sentiment Analysis Linguistic Acceptability Muls Task Benchmark SOTA 80.6 80.1 82.3 66.1 77.6 Ours 82.1 88.3 56.0 82.0 70.3 59.0 78.6 91.3 45.4#61Can we put a confident bound on near-term compute progress? Doubling period: 3.5 months AlexNet to AlphaGo Zero: A 300,000x Increase in Compute Petaflop/s-day (Training) 10,000 1,000 100 10 0001 00001 #AlexNet * Dropout 2015 DON VOO * Seq25eq 2014 + GoogleNet Visualizing and Understanding Cony Nets 2015 AlphaZero Year TIX Dota 5v5 Neural Machine Translation *Neural Architecture Search Xception *T17 Data ivi #DeepSpeech2 ResNets AlphaGo Zero 2018 2019#62LeNet-5 (1989) TD-Gammon v2.1 (1993) Speech RNN (1994) TD-Gammon v3.1 (1998) AlexNet (2012) Dropout (2012) Visualizing & Understanding CNN's (2013) DON (2013) Seg25eq (2014) GoogleNet (2014) VOG (2014) DeepSpeech2 (2015) ResNets (2015) Neural Architecture Search (2016) Neural Machine Translation (2016) Xception (2016) T17 Data 1v1 (2017) AlphaGoZero (2017) AlphaZero (2017) TIB Dota 5v5 (2018) 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 Petaflop/s-day (Training)#63THIS TALK'S GOAL IS TO PRESENT EVIDENCE THAT: While highly uncertain, near-term AGI should be taken as a serious possibility. Means proactively thinking about risks: Machines pursuing goals misspecified by their operator Malicious humans subverting deployed systems Out-of-control economy that grows without resulting in improvements to human lives#64Thank You

Download to PowerPoint

Download presentation as an editable powerpoint.

Related

1st Quarter 2021 Earnings Presentation image

1st Quarter 2021 Earnings Presentation

Technology

Rackspace Technology Q4 2022 Earnings Presentation image

Rackspace Technology Q4 2022 Earnings Presentation

Technology

CBAK Energy Technology Investor Presentation image

CBAK Energy Technology Investor Presentation

Technology

Jianpu Technology Inc 23Q1 Presentation image

Jianpu Technology Inc 23Q1 Presentation

Technology

High Performance Computing Capabilities image

High Performance Computing Capabilities

Technology

SOLOMON Deep Learning Case Studies image

SOLOMON Deep Learning Case Studies

Technology

1Q20 Earnings image

1Q20 Earnings

Technology

Nutanix Corporate Overview image

Nutanix Corporate Overview

Technology