Question from Dory: What is the expected timeline for data collection in different languages?

Where, how and in what format should we give you data? Where to add the description? Can you run a preprocessing at scale on your side? Where to gather the preprocessing scripts?

If you want to help find and process data, the best thing to do is to join the relevant working groups!

For contributing specific data, you should join the Data Sourcing and Representativeness currently chaired by Angie McMillan-Major, Pedro Ortiz, and Zeerak Waseem. More specifically, join the relevant Localized Data Sourcing collaborative tasks for the relevant languages.

The Data Tooling Working Group will be responsible for organizing all of the pre-processing scripts. We have definitely reserved compute time for the pre-processing.

You’ll sign up for getting updates for these working groups by filling the Google Form, and we’ll be organizing further meetings and scoping out the organization in the next couple of weeks :hugs: