South African AI learns all 11 official languages: A cultural milestone for MzansiLM

in my culture ai

Artificial intelligence (AI) is rapidly becoming part of everyday life. From answering questions and translating documents to helping students learn and businesses communicate, AI tools are shaping how people interact with information.

But for millions of South Africans, there has always been a problem.

Most AI systems understand English far better than they understand the languages spoken around South African dinner tables, in taxis, at family gatherings and in communities across the country.

That may be starting to change.

2026 05 04 researchers
The team behind MzansiLM at UCT. From left: Simbarashe Mawere, Anri Lombard, Dr Jan Buys and Dr Francois Meyer. Picture: UCT

The project was led by Anri Lombard and Dr Jan Buys from UCT’s Department of Computer Science, working alongside Dr Francois Meyer and a wider group of research collaborators. Together, they developed MzansiLM, a new artificial intelligence language model trained specifically on South Africa’s 11 official written languages.

Alongside the model, the team also created MzansiText, a multilingual dataset designed to support future AI development in local languages.

While the technical achievement is significant, its cultural importance may be even greater.

Why language matters in culture and technology

Language is more than a tool for communication.

It carries history, humour, identity, values and ways of seeing the world. Every South African language contains cultural knowledge that cannot always be translated neatly into English.

In my culture, language is often the first connection to home.

It is the language spoken by grandparents, the phrases shared at family gatherings, the stories passed from one generation to the next and the expressions that lose their meaning when translated.

Yet many modern technologies still struggle to understand languages such as isiNdebele, Sepedi, Setswana and Tshivenda.

According to the UCT researchers, nine of South Africa’s 11 official written languages are considered “low-resource” languages, meaning there is relatively little digital text available to train AI systems.

As a result, speakers of these languages are often left behind as AI technology advances.

Building an AI model for South Africans

The new MzansiLM model was designed to address that challenge.

Unlike many existing AI models that focus on only a handful of African languages, MzansiLM was developed to support all 11 official written languages.

The model contains 125 million parameters, making it much smaller than commercial AI systems. However, researchers found that it performed competitively on several language tasks and even outperformed much larger open-source models in certain South African language benchmarks.

Importantly, MzansiLM is not a chatbot like ChatGPT. Instead, it serves as a foundation that researchers and developers can adapt for specific tasks, such as summarisation, translation, language learning tools and information services in local languages.

That could eventually make digital services more accessible to people who prefer communicating in their home language.

WATCH: African-Languages AI: Mzansi LM, Translation Gaps, and the Future of Local LLMs

Video: TechNolgia Talks

Preserving language in the digital age

For decades, efforts to preserve indigenous languages focused on schools, books, radio stations and community programmes.

Today, another challenge has emerged: ensuring those languages remain visible in the digital world.

If future technologies only understand English, valuable cultural knowledge risks becoming harder to access online.

Projects such as MzansiLM represent an important step towards a more inclusive digital future, one where technology reflects the linguistic diversity of the country it serves.

The UCT team has made both the MzansiText dataset and MzansiLM publicly available, allowing other researchers and developers to build on their work.

For many South Africans, that means something bigger than artificial intelligence.

It means ensuring that the languages that carry our stories, identities and cultures have a place in the technologies shaping the future.

READ: “MzansiText and MzansiLM: An Open Corpus and Decoder-Only Language Model for South African Languages”, is available on arXiv.

In My Culture… knowledge grows when it is shared by all. Visit our social media pages and share your thoughts.