OpenAI Product Presentation Deck
Before
Al progress is driven by labeled datasets.
The result
Unsupervised NLP (2018)
117M-parameter transformer trained by
reading 7,000 self-published books which,
with a small amount of supervised fine-tuning,
sets state-of-the-art on a huge variety of NLP
datasets
After
Unlabeled data can be even more important
than labeled.
Dataset
SNLI
MNLI Matched
MNLI Mismatched
SciTail
ONLI
ATE
STS-B
OOP
MAPC
RACE
ROCStories
COPA
SST-2
COLA
GLUE
Task
Textual Entailment
Textual Entailment
Textual Entailment
Textual Entailment
Textual Entailment
Textual Entailment
Semantic Similarity
Semantic Similarity
Semantic Similarity
Reading Comprehension
Commonsense Reasoning
Commonsense Reasoning
Sentiment Analysis
Linguistic Acceptability
Muls Task Benchmark
SOTA
80.6
80.1
82.3
66.1
77.6
Ours
82.1
88.3
56.0
82.0
70.3
59.0
78.6
91.3
45.4View entire presentation