My Kamusi - Login
You can Register Here ,   OR

UNICODE Locales for African Languages

Archived Page

This is a page from the Kamusi archives. The information below may be out of date, and the links may no longer be valid. Please visit for current information. If you know of links or information on this page that can be updated, please let us know.

Unicode Consortium is partnering with ANLoc, the African Network for Localization, a project sponsored by Canada's International Development Research Centre (IDRC), to help extend modern computing on the African continent. ANLoc's vision is to empower Africans to participate in the
digital age by enabling their languages in computers. A sub-project of ANLoc, called Afrigen (, focuses on creating African locales. During the last 12 months, no fewer than 150 volunteers have teamed up with Afrigen-ANLoc, and gathered locale data for 72 African languages. The Afrigen-ANLoc data collection tool was developed by Louise Berthilson of IT46 (, and the project is managed by Martin Benjamin, director of Kamusi Project International (

According to Ethnologue (, there are an estimated 2,100 living languages spoken in Africa. The Afrigen-ANLoc project's stated mission is to create viable locale data for at least 100 of the many languages that are spoken in Africa, and upstream the data to Unicode Consortium's Common Locale Data Repository (CLDR project and Implementation of fundamental locale data within CLDR is a critical step for providing computer applications that can be localized into these African languages, thus reaching a population that has perhaps never before had the ability to use their native languages on computers and mobile phones.

Unicode CLDR provides key building blocks for software to support the world's languages. Unicode CLDR is by far the largest and most extensive standard repository of locale data. This data is used by a wide spectrum of companies for their software internationalization and localization: adapting software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; choosing languages or countries by name; transliterating different alphabets; and many others.

The upcoming 1.8 release of CLDR will incorporate data for a total of 54 different African languages. 41 of these languages are completely new to the CLDR project, while 13 others existed in CLDR and were enhanced with additional data. These languages are spoken in 26 different countries
spreading across the entire African continent. The Afrigen-ANLoc project selected approximately 200 candidate languages, including all official languages recognized by a national government and all languages with at least 500,000 native speakers; additional languages are also incorporated in the project when volunteers step forward. Data is collected through the Afrigen-ANLoc project by native-speaking volunteers around the world, and entered via a web-based utility designed specifically for this purpose. The data is then reviewed for accuracy and merged into the CLDR repository.

"The partnership with Afrigen has been a huge benefit for us," says John Emmons, vice-chair of the Unicode CLDR technical committee and lead CLDR engineer for IBM. "The Afrigen effort has allowed us to bring many new languages on board that we wouldn't be able to do through our normal
process, while still maintaining the level of quality and consistency that we require for every language."

For more information about the Unicode CLDR project (including charts) see

IDRC is a Crown corporation created by the Parliament of Canada in 1970 to help developing countries use science and technology to find practical, long-term solutions to the social, economic, and environmental problems they face. Its support is directed toward creating a local research community whose work will build healthier, more equitable, and more prosperous societies. For more information
about IDRC, please see

For more information about the Afrigen-ANLoc project see

For more information about the African Network for Localization see


Kamusi GOLD

These are the languages for which we have datasets that we are actively working toward putting online. Languages that are Active for you to search are marked with "A" in the list below.


•A = Active language, aligned and searchable
•c = Data 🔢 elicited through the Comparative African Word List
•d = Data from independent sources that Kamusi participants align playing 🐥📊 DUCKS
•e = Data from the 🎮 games you can play on 😂🌎🤖 EmojiWorldBot
•P = Pending language, data in queue for alignment
•w = Data from 🔠🕸 WordNet teams

Software and Systems

We are actively creating new software for you to make use of and contribute to the 🎓 knowledge we are bringing together. Learn about software that is ready for you to download or in development, and the unique data systems we are putting in place for advanced language learning and technology:

Articles and Information

Kamusi has many elements. With these articles, you can read the details that interest you:

Videos and Slideshows

Some of what you need to know about Kamusi can best be understood visually. Our 📽 videos are not professional, but we hope you find them useful:


Our partners - past, present, and future - include:

Hack Kamusi

Here are some of the work elements on our task list that you can help do or fund:

Theory of Kamusi

Select a link below to learn about the principles that guide the project's unique approach to lexicography and public service.

Contact Us

We welcome your comments and questions, and will try to respond quickly. To get in touch, please visit our contact page. You must use a real email address if you want to get a real reply!

© Copyright ©

The Kamusi Project dictionaries and the Kamusi Project databases are intellectual property protected by international copyright law, ©2007 through ©2018, under the joint ownership of Kamusi Project International and Kamusi Project USA. Further explanation may be found on our © Copyright page.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.


Discussion items about language, technology, and society, from the Kamusi editor and others. This box is growing. To help develop or fund the project, please contact us!

Our biggest struggle is keeping Kamusi online and keeping it free. We cannot charge money for our services because that would block access to the very people we most want to benefit, the students and speakers of languages around the world that are almost always excluded from information technology. So, we ask, request, beseech, beg you, to please support our work by donating as generously as you can to help build and maintain this unique public resource.


Frequently Asked Questions

Answers to general questions you might have about Kamusi services.

We are building this page around real questions from members of the Kamusi community. Send us a question that you think will help other visitors to the site, and frequently we will place the answer here.

Try it : Ask a "FAQ"!

Press Coverage

Kamusi in the news: Reports by journalists and bloggers about our work in newspapers, television, radio, and online.

Sponsor Search:
Who Do You Know?

To keep Kamusi growing as a "free" knowledge resource for the world's languages, we need major contributions from philanthropists and organizations. Do you have any connections with a generous person, corporation, foundation, or family office that might wish to make a long term impact on educational outcomes and economic opportunity for speakers of excluded languages around the world? If you can help us reach out to a potential 💛😇 GOLD Angel, please contact us!