A 🌎👅🔢🛣 global linguistic data infrastructure for societal and technological applications
The quest for Human Language Technology (HLT) development faces three large constrictions:
Adequate 🔢 data does not exist in codified form for most languages.
No system exists to effectively and consistently share existing 🔢 data.
Technical tools often do not span languages or projects.
We are organizing a 🌎👪 global consortium to address the challenges of linguistic 👅🔢 data head on, to do for language what the Human Genome Project has done for genetics and the Human Brain Project is doing for neuroscience. The partners in this collaboration will produce 🔢 data systematically for dozens or hundreds of languages, and set that data to the service of 🎓 knowledge, communication, and HLT. The signatories seek to participate in the creation of a global infrastructure for linguistic data, either through direct 👅📃 language documentation or through the use of that 🔢 data for the development of advanced technologies and social services.
To date, we have more than 60 letters of intent from interested partners. In fact, we paused with taking new letters because, while many professionals 🌍 worldwide recognize the vast potential advantages of joining this collaboration, we have been unable to find a visionary financeer or institutional grant source that will support languages outside of the usual lucrative suspects.
These are the languages for which we have datasets that we are actively working toward putting online. Languages that are Active for you to search are marked with "A" in the list below.
•A = Active language, aligned and searchable
•c = Data 🔢 elicited through the Comparative African Word List
•d = Data from independent sources that Kamusi participants align playing 🐥📊 DUCKS
•e = Data from the 🎮 games you can play on 😂🌎🤖 EmojiWorldBot
•P = Pending language, data in queue for alignment
•w = Data from 🔠🕸 WordNet teams
We are actively creating new software for you to make use of and contribute to the 🎓 knowledge we are bringing together. Learn about software that is ready for you to download or in development, and the unique data systems we are putting in place for advanced language learning and technology:
Our biggest struggle is keeping Kamusi online and keeping it free. We cannot charge money for our services because that would block access to the very people we most want to benefit, the students and speakers of languages around the world that are almost always excluded from information technology. So, we ask, request, beseech, beg you, to please support our work by donating as generously as you can to help build and maintain this unique public resource.
Answers to general questions you might have about Kamusi services.
We are building this page around real questions from members of the Kamusi community. Send us a question that you think will help other visitors to the site, and frequently we will place the answer here.