Multitask Prompted Training Enables Zero-Shot Task Generalization

Victor Sanh; Albert Webson; Colin Raffel; Stephen H. Bach; Lintang Sutawika; Zaid Alyafeai; Antoine Chaffin; Arnaud Stiegler; Teven Le Scao; Arun Raja; Manan Dey; M Saiful Bari; Canwen Xu; Urmish Thakker; Shanya Sharma; Eliza Szczechla; Taewoon Kim; Gunjan Chhablani; Nihal V. Nayak; Debajyoti Datta; Jonathan Chang; Mike Tian-Jian Jiang; Han Wang; Matteo Manica; Sheng Shen; Zheng-Xin Yong; Harshit Pandey; Michael Mckenna; Rachel Bawden; Thomas Wang; Trishala Neeraj; Jos Rozen; Abheesht Sharma; Andrea Santilli; Thibault Fevry; Jason Alan Fries; Ryan Teehan; Tali Bers; Stella Biderman; Leo Gao; Thomas Wolf; Alexander M. Rush

Communication Dans Un Congrès Année : 2022

Multitask Prompted Training Enables Zero-Shot Task Generalization

(1) , (2) , (1) , (2) , (3) , (4) , (5, 6) , (7) , (1) , (8) , (9) , (10) , (11, 1) , (12) , (13) , (14) , (15) , (16) , (2) , (17) , (18) , (19) , (20) , (21) , (22) , (23) , (24) , (25) , (26) , (1, 26) , (27) , (28) , (29) , (30) , (31) , (32) , (33) , (23) , (34, 35) , (34) , (1) , (1)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

Victor Sanh

Fonction : Auteur

Hugging Face

Albert Webson

Fonction : Auteur

Department of Computer Science

Colin Raffel

Fonction : Auteur

Hugging Face

Stephen H. Bach

Fonction : Auteur

Department of Computer Science

Lintang Sutawika

Fonction : Auteur

Konvergen AI

Zaid Alyafeai

Fonction : Auteur

King Fahd University of Petroleum and Minerals

Antoine Chaffin

Fonction : Auteur

Institut de Recherche en Informatique et Systèmes Aléatoires

IMATAG [Rennes]

Arnaud Stiegler

Fonction : Auteur

Hyperscience

Teven Le Scao

Fonction : Auteur

Hugging Face

Arun Raja

Fonction : Auteur

Institute for Infocomm Research - I²R [Singapore]

Manan Dey

Fonction : Auteur

SAP AI Research

M Saiful Bari

Fonction : Auteur

School of Computer Engineering [Singapore]

Canwen Xu

Fonction : Auteur

Department of Computer Science and Engineering [Univ California San Diego]

Hugging Face

Urmish Thakker

Fonction : Auteur

SambaNova Systems

Shanya Sharma

Fonction : Auteur

Walmart Labs

Eliza Szczechla

Fonction : Auteur

Scott Tiger S.A.

Taewoon Kim

Fonction : Auteur

Vrije Universiteit Amsterdam [Amsterdam]

Gunjan Chhablani

Fonction : Auteur

Oracle

Nihal V. Nayak

Fonction : Auteur

Department of Computer Science

Debajyoti Datta

Fonction : Auteur

University of Virginia

Jonathan Chang

Fonction : Auteur

AsusTeK Computer

Mike Tian-Jian Jiang

Fonction : Auteur

ZEALS

Han Wang

Fonction : Auteur

New York University [New York]

Matteo Manica

Fonction : Auteur

IBM Research [Zurich]

Sheng Shen

Fonction : Auteur

University of California [Berkeley]

Zheng-Xin Yong

Fonction : Auteur

Brown University

Harshit Pandey

Fonction : Auteur

Sans affiliation

Michael Mckenna

Fonction : Auteur

Parity

Rachel Bawden

Fonction : Auteur
PersonId : 9441
IdHAL : rachel-bawden
ORCID : 0000-0001-9553-1768
IdRef : 233174591

Automatic Language Modelling and ANAlysis & Computational Humanities

Thomas Wang

Fonction : Auteur

Hugging Face

Automatic Language Modelling and ANAlysis & Computational Humanities

Trishala Neeraj

Fonction : Auteur

CyberCube

Jos Rozen

Fonction : Auteur

Naver Labs Europe [Meylan]

Abheesht Sharma

Fonction : Auteur

Birla Institute of Technology and Science

Andrea Santilli

Fonction : Auteur

Università degli Studi di Roma "La Sapienza" = Sapienza University [Rome]

Thibault Fevry

Fonction : Auteur

Point72

Jason Alan Fries

Fonction : Auteur

Stanford University

Ryan Teehan

Fonction : Auteur

Charles River Analytics

Tali Bers

Fonction : Auteur

Brown University

Stella Biderman

Fonction : Auteur

EleutherAI

Booz Hallen Hamilton Inc

Leo Gao

Fonction : Auteur

EleutherAI

Thomas Wolf

Fonction : Auteur

Hugging Face

Alexander M. Rush

Fonction : Auteur

Hugging Face

Résumé

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models’ pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely held-out tasks. We fine-tune a pre-trained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models up to 16x its size. Further, our approach attains strong performance on a subset of tasks from the BIG-bench benchmark, outperforming models up to 6x its size. All trained models are available at https://github.com/bigscience-workshop/t-zero, and all prompts are available at https://github.com/bigscience-workshop/promptsource.

Domaines

Informatique et langage [cs.CL]

Fichier principal

2110.08207.pdf (1.99 Mo)

2110.08207 (1).pdf (1.99 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Rachel Bawden : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03540072

Soumis le : mardi 10 janvier 2023-10:23:39

Dernière modification le : mercredi 14 février 2024-16:54:02

Dates et versions

hal-03540072 , version 1 (10-01-2023)

Identifiants

HAL Id : hal-03540072 , version 1

Citer

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, et al.. Multitask Prompted Training Enables Zero-Shot Task Generalization. ICLR 2022 - Tenth International Conference on Learning Representations, Apr 2022, Online, Unknown Region. ⟨hal-03540072⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC INRIA2 GENCI UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE ANR PRAIRIE-IA UR1-MATH-NUM

654 Consultations

173 Téléchargements

Multitask Prompted Training Enables Zero-Shot Task Generalization

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager