Artificial Intelligence is an ongoing quest for computer science - nowhere more so than in understanding and producing natural language. Great resources have been directed toward "machine learning", where computers learn through reactions to previous results. If 51% of 👪 people asking in English for "weather in Naples" click on the result for the city in Florida, the machine learns to provide the US answer by default. Someone using those results to pack for a trip to 🇮🇹 Italy will understand why this actual example of learning might not be so intelligent. Yet statistically-based computer inference remains the primary method upon which translation models are predicated.
At Kamusi, we aim to learn first directly from 👪 people, and only learn from patterns when we have confidence they apply. That is, if we come across the party term "bust a gut", and it isn't already in our🔢 database, we don't blithely tell our users that it means "lacerate one's intestine". Rather, using Pre-D, the user indicates that the words play together to form a unique meaning. From the crowd, we can find out what that meaning is, and how the term is expressed in other languages. This human-derived 🎓 knowledge, validated by procedures to find community consensus, can then be fixed as fact that can serve as the basis for intelligent inferences. Machine learning can be applied to the repetitive aspects of language that resist nuance, such as "busted a gut" being the past tense of the phrase. Or we can use learning algorithms to make connections that are slated for human testing, such as whether a phrase in Hausa that has been matched to a phrase in German that has been matched to our English phrase produces a legitimate Hausa <-> English pair. Because 7000 languages potentially in Kamusi equates to 25 million translation pairs, it is impossible that we will find qualified bilinguals to verify direct connections among all of them, or to confirm every prediction we make about grammatical patterns. However, the more we learn directly from people, the more confident we can be in ensuing inferences. Our objective with machine learning, then, is to find potential facts that can be verified over time by real people, and then used for the rest of time as a confirmed 🎓 knowledge base and iterative seed for further learning. The more human intelligence we can pack into our language computation infrastructure, the less artificial will be the results of "artificial intelligence" - and the more likely we will be to bring the right wear for our 👅 linguistic travels.
These are the languages for which we have datasets that we are actively working toward putting online. Languages that are Active for you to search are marked with "A" in the list below.
•A = Active language, aligned and searchable
•c = Data 🔢 elicited through the Comparative African Word List
•d = Data from independent sources that Kamusi participants align playing 🐥📊 DUCKS
•e = Data from the 🎮 games you can play on 😂🌎🤖 EmojiWorldBot
•P = Pending language, data in queue for alignment
•w = Data from 🔠🕸 WordNet teams
We are actively creating new software for you to make use of and contribute to the 🎓 knowledge we are bringing together. Learn about software that is ready for you to download or in development, and the unique data systems we are putting in place for advanced language learning and technology:
Our biggest struggle is keeping Kamusi online and keeping it free. We cannot charge money for our services because that would block access to the very people we most want to benefit, the students and speakers of languages around the world that are almost always excluded from information technology. So, we ask, request, beseech, beg you, to please support our work by donating as generously as you can to help build and maintain this unique public resource.
Answers to general questions you might have about Kamusi services.
We are building this page around real questions from members of the Kamusi community. Send us a question that you think will help other visitors to the site, and frequently we will place the answer here.