Our partners at Long Now are hard at work on PanLex, a repository of term-to-term translations from as many sources as they can collect. Our task is to take their crude data, which is based on spelling, and refine it into meaning.
That is to say, PanLex data currently shows l-i-g-h-t in English matches to 9658 of terms in 1599 language varieties. From the data, we cannot tell whether a term in Language 2 means "light - not heavy" or "light - not dark" or "light - not serious". Nor can we tell that information for Language 3. Which means the PanLex list gives us no way we can infer which term in Language 2 that matches to l-i-g-h-t is equivalent to a term in Language 3 that matches to the same English spelling.
PanLex currently has 24.9 million expressions from 11,044 language varieties, resulting in some 1.3 billion direct spelling-based translations.
Kamusi is preparing to align these terms based on meaning, so that the term for "light - not dark" in Language 2 matches the term for that idea in Language 3, Language 4, and Language 1599, and never gets confused with "light - not heavy" or "light - not serious" when connecting any of those languages. We have already developed the core functionality in our DUCKS software (Data Unified Conceptual Knowledge Sets), for speakers to match the given term in their language with the right definition in English or another bridge language. When we have the financial horsepower, we can automate the system so that it arranges the PanLex data into 11,044 DUCKS sets, and, with proper authentication, sets players to work choosing the right meanings for their terms.
Using DUCKS and other computational techniques, we can in principle convert the 1.3 billion raw PanLex connections into tens of billions of valid translations among all the world's languages. We are ready to blast off technically, and are now seeking visionary funders who can light the fuse.
These are the languages for which we have datasets that we are actively working toward putting online. Languages that are Active for you to search are marked with "A" in the list below.
•A = Active language, aligned and searchable
•c = Data 🔢 elicited through the Comparative African Word List
•d = Data from independent sources that Kamusi participants align playing 🐥📊 DUCKS
•e = Data from the 🎮 games you can play on 😂🌎🤖 EmojiWorldBot
•P = Pending language, data in queue for alignment
•w = Data from 🔠🕸 WordNet teams
We are actively creating new software for you to make use of and contribute to the 🎓 knowledge we are bringing together. Learn about software that is ready for you to download or in development, and the unique data systems we are putting in place for advanced language learning and technology:
Our biggest struggle is keeping Kamusi online and keeping it free. We cannot charge money for our services because that would block access to the very people we most want to benefit, the students and speakers of languages around the world that are almost always excluded from information technology. So, we ask, request, beseech, beg you, to please support our work by donating as generously as you can to help build and maintain this unique public resource.
Answers to general questions you might have about Kamusi services.
We are building this page around real questions from members of the Kamusi community. Send us a question that you think will help other visitors to the site, and frequently we will place the answer here.