People often ask, "What is the difference between Kamusi and other projects that deal with words in many languages?" Here, we answer that question in regards to the Open Multilingual Wordnet (OMW).
OMW is a fantastic resource that makes complete data for many individual Wordnet projects freely available to the public. We use its search features on a regular basis, and recommend you do the same. In addition, the files we used as the initial source for certain languages in Kamusi were downloaded directly from the OMW, benefiting from their labor formatting diverse files from the original language providers. If you wish to use those files, please follow our links to the OMW or original project sources where the data resides. We do not publish or redistribute Wordnets as such, unless requested by their originators or produced within the Kamusi system. We try to credit OMW on all language pages that have gained from their groundbreaking efforts.
There are a variety of differences between OMW and Kamusi that are important to note, however.
Wordnets are the endpoint for OMW, and the starting point for Kamusi. That is, the objective of OMW is to publish Wordnets as they are produced by their teams, or an extended form that is automatically constructed from Wiktionary. The objective for Kamusi is to publish comprehensive data about a language, mixing a variety of sources in an ever-continuing effort to expand and perfect the data. To that end, Wordnet data may be modified in Kamusi (for example, a definition might be changed) or deleted, additional data elements might be added (for example, plural forms or gender information), and new entries merged from entirely different sources. Kamusi is a "living" dictionary that changes over time, so we do not promise that you will find any given Wordnet in a fixed form. In fact, we hope for quite the opposite, and encourage you to visit OMW if you want the original.
OMW makes data available to the public in a grid format that shows all its languages together. Kamusi does not offer this service, though it is a good idea that we might emulate in some form in the future (probably by allowing the user to choose the languages that are shown in the grid, up to and including all languages). The OMW is the best existing method we have found to visualize how the different Wordnets relate together.
OMW and Kamusi both offer the option to display results for a search term between two languages. Some differences in the presentation include the ability to click on a result to launch a reverse search in Kamusi, and the inclusion of the Princeton Wordnet (PWN) for English pivot information in every result where that has been utilized as the basis for linking the two other languages. Individual users might like one approach better than the other. Try them both, and use the one you like better. Here's a search on Kamusi and you can enter "casa" and select Spanish to Basque to conduct the same search on OMW. If you want to share a search result directly as a hyperlink, in an email, or on social media, you'll need to use Kamusi. Keep in mind that certain languages might only be available in one or the other.
OMW and Kamusi make use of different Wordnets for French. OMW uses WOLF and Kamusi uses WoNeF. Both are imperfect. In our assessment, WoNef quality is high enough for public use, whereas WOLF has too many errors to publish. We welcome Francophones to join us in improving WoNeF data.
Kamusi has the objective of eliciting the contributions of users in improving the data for their language. These features are being rolled out slowly to registered users. We invite and beseech you to participate. Once new data from the public is validated, it is published on top of the Wordnet data. For the pure Wordnet form, the user is advised to consult OMW or the original source.
Kamusi makes data available in many different formats. The full dictionary is available to search on the Web, on Android and iOS devices, and as a bot on a variety of messaging and social media platforms. In addition, data is embedded in a variety of our other tools. For example, our visual dictionary often deploys data from Wordnet when showing image tags in the user's chosen language. OMW is primarily concerned with sharing data in an uncomplicated form. Many Kamusi products use the data to underlie more complicated user services.
PWN is structured around a deep set of ontological relationships between concepts. In brief, if a beak is a part of a bird and a wing is also part of a bird, that logical connection is charted in PWN. OMW tracks these ontological relationships in English. Kamusi does not now do so, though we have it on our task list. We intend to meld interesting applications on top of the Wordnet and other ontologies when we have the horsepower. For example, expect a Chinese logical dictionary that leads the user to the entry for wing when they search for beak, without needing to know the character for wing in advance. At present (January 2018), though, the best place to access Wordnet ontologies is the OMW.
Kamusi is mixing in languages that are not part of the Global Wordnet, and also designing systems to serve these new languages back as stand-alone Wordnets. We seek datasets from numerous languages, and process them through our DUCKS system so that concepts are aligned across languages. Wordnet is the source of many concepts for the alignment system, and we maintain those references even if we revise the original entry. Thus, over time, Kamusi will meld Wordnet data with many languages from outside the Wordnet community. In addition, Kamusi is bringing in data in current Wordnet languages from sources that are not part of Wordnet, such as terminology banks, Wiktionary, and repositories of named entities (people, places, and organizations). Each source will be credited on pages linked to the entries where their data appears. This expanded data is core to Kamusi's long term mission, and outside the scope of OMW.
In sum, OMW does some things that Kamusi does not, Kamusi does some things that OMW does not, and some things are done by both. We encourage you to use both services and take advantage of the different ways they may provide you with the answers you need.
These are the languages for which we have datasets that we are actively working toward putting online. Languages that are Active for you to search are marked with "A" in the list below.
•A = Active language, aligned and searchable
•c = Data 🔢 elicited through the Comparative African Word List
•d = Data from independent sources that Kamusi participants align playing 🐥📊 DUCKS
•e = Data from the 🎮 games you can play on 😂🌎🤖 EmojiWorldBot
•P = Pending language, data in queue for alignment
•w = Data from 🔠🕸 WordNet teams
We are actively creating new software for you to make use of and contribute to the 🎓 knowledge we are bringing together. Learn about software that is ready for you to download or in development, and the unique data systems we are putting in place for advanced language learning and technology:
Our biggest struggle is keeping Kamusi online and keeping it free. We cannot charge money for our services because that would block access to the very people we most want to benefit, the students and speakers of languages around the world that are almost always excluded from information technology. So, we ask, request, beseech, beg you, to please support our work by donating as generously as you can to help build and maintain this unique public resource.
Answers to general questions you might have about Kamusi services.
We are building this page around real questions from members of the Kamusi community. Send us a question that you think will help other visitors to the site, and frequently we will place the answer here.