AI Sweden Preps Region Mother Tongue Model
If the King of Sweden wants help writing his annual Christmas speech this year, he could ask for the same AI model that’s available to his 10 million subjects.
As a test, the researchers prompted the model, called GPT-SW3, to write one of the royal messages, and it did a pretty good job, according to Magnus Sahlgren, who leads natural language understanding research at AI Sweden, a consortium launching the country’s journey into the age of machine learning.
“Later, our Minister for Digitization visited us and asked the model to generate arguments for policy positions and he came up with some very clever ones – and he intuitively figured out how to induce the model to generate good text” , Sahlgren said.
Early successes have inspired work on an even bigger and more powerful version of the language model that he hopes will serve any citizen, business or government agency in Scandinavia.
A multilingual model
The current version contains 3.6 billion parameters and is smart enough to do some cool stuff in Swedish. Sahlgren’s team aims to train a state-of-the-art model with 175 billion parameters that can handle all sorts of linguistic tasks in the Nordic languages Swedish, Danish, Norwegian and, she hopes, Icelandic as well.
For example, a startup can use it to automatically generate product descriptions for an e-commerce site by giving only product names. Government agencies can use it to quickly classify and route citizen questions.
Businesses can ask him to quickly summarize reports so they can react quickly. Hospitals can run distilled versions of the model privately on their own systems to improve patient care.
“It’s a foundational model that we’ll provide as a service for any task people want to solve,” said Sahlgren, who has worked at the intersection of language and machine learning since earning his degree. doctorate. in computational linguistics in 2006.
Permission to speak freely
It’s a capability increasingly seen as a strategic asset, a keystone of digital sovereignty in a world that speaks thousands of languages in nearly 200 countries.
Most language services today focus on Chinese or English, the two most widely spoken languages in the world. They are usually created in China or the United States, and they are not free.
“It’s important for us to have models built in Sweden for Sweden,” Sahlgren said.
Small team, great system
“We’re a small country and a core team of about six people, but we can create a cutting-edge resource like this for people to use,” he added.
That’s because Sweden has a powerful engine in BerzeLiUs, a 300-petaflop AI supercomputer at Linköping University. He trained the initial GPT-SW3 model using just 16 of the NVIDIA DGX SuperPOD’s 60 nodes.
The following pattern can exercise all nodes in the system. These large jobs require great software like the NVIDIA NeMo Megatron framework.
“It allows us to extend our training to the full supercomputer, and we were lucky to have access to experts from the NeMo development team. Without NVIDIA, it would have been so much more complicated to go this far”, did he declare.
A workflow for any language
NVIDIA engineers have created a recipe based on NeMo and an emerging process called p-tuning that quickly optimizes massive models and is designed to work with any language.
In an initial test, one model nearly doubled its accuracy after NVIDIA engineers applied the techniques.
Plus, it requires one-tenth the data, reducing the need for tens of thousands of hand-tagged records. This opens the door for users to fine-tune a model with the relatively small, industry-specific datasets they have.
“We hope to inspire a lot of entrepreneurship in industry, startups and the public by using our technology to develop their own apps and services,” said Sahlgren.
Write the next chapter
Meanwhile, NVIDIA developers are already working on ways to improve the enabler software.
One test holds great promise for training new abilities using widely available English datasets in models designed for any language. In another effort, they are using p-tuning techniques in inference work so models can learn on the fly.
Zenodia Charpy, Senior Solutions Architect at Gothenburg-based NVIDIA, shares the enthusiasm of the AI Sweden team she supports. “We’ve only just started trying new and better ways to tackle these big language challenges – there’s a lot more to come,” she said.
The GPT-SW3 model will be available by the end of the year through an early access program. To apply, contact [email protected]