About the Modeling approach category

Very large models trained on very large datasets display emergent prompting and priming behaviors: by conditioning on a task description and/or priming example pairs, the model generates an output for a given input and gets non-trivial performance on a diversity of benchmarks (cf GPT3). However, these behaviors feel somewhat accidental. When training a large enough left-to-right (a.k.a. causal) language model on enough text, the model learns non-trivial “skills” that are secondary to the LM object for which supervised signal was thought to be necessary. However, prompting and priming are a fickle art right now because of a model’s brittleness to the format of the prompt and examples (AutoPrompt, How can we know what LMs know?, What makes good in-context examples for GPT-3?). We want to make these prompting abilities more explicit and robust rather than solely relying on poorly understood emergent behaviors, by providing as much signal that resembles prompting as possible.

Towards this goal, the biggest “modeling” question is how we augment the training signal to encourage the model to learn prompting behaviors i.e. condition a model on an input (and potentially some additional context and metadata), give it a prompt that asks to process the input and to react to it, and generate the output response. Ideally, we would like to extract 100x scale more signal for prompting from our data sources compared to causal language modeling on raw web text.

The desired outcomes of this working group are:

  • A/ A very large multilingual model trained with the best-identified training objective and data format for our moonshot goal. Multiple sizes can be trained, but we expect that the largest one will be in the order of magnitude of GPT3 (~150 to 200 billion parameters) and will be trained on the French national supercomputer.

  • B/ A public codebase gathering the code, tools (and potentially processed dataset) developed in this project. Code should be documented so that it can be a starting point for future endeavors.

  • C/ Focused paper(s) describing the model, the modeling approach with experiments, and analysis of the design choices.

The moonshot goal of the project is to answer in natural language any prompted query expressed in natural language in a 0-shot manner.

Find more information about the WG Modeling here.
WG Modeling Chair: Victor Sanh

Thanks a lot for launching this topic!

I am wondering whether this formulation might be a bit too restrictive: in many contexts of application of such large language models, it will not be a human directly interacting with the language model, but artificial agents (potentially themselves interacting with humans).

In that case, artificial agents may prompt/condition the language model in an artificial language (e.g. using logical forms or conditioning with json files as is now possible with GPT-3, or with programming/script code as done in a number of fun uses of GPT-3). One could also want to prompt using knowledge graphs as part of the prompt. So I guess one might want to precise what is the scope of the targeted queries/conditioning. What do you think?