In a new collaboration between «Tilde», a language technology company, and creative industries platform FOLD, the accumulated content of the website will become a language corpus with more than 10,500 parallel sentences in Latvian and English that will be published on open data portals, helping to improve the machine translation technology for creative industries.
Nowadays machine translation or automatic translation of text has become an indispensable helper in reducing language barriers and translating more efficiently. Strong machine translation support is already available for major European languages such as English, Spanish and French, but the Latvian language can also boast of high–quality language technologies. In the age of Big Data, the sharing of data and information creates a fertile environment for new ideas and development of technology. Online media content is valuable data that can be used, for example, to improve language technologies.
«Machine translation systems learn from language corpora that consist of word and sentence pairs, for example, a sentence in Latvian is paired up with its translation in English. The more and more diverse language data the system learns, the more accurate it is able to translate. Therefore, the collection of language corpora is an essential part of the development of language technologies and the identification and sharing of translated content is becoming a norm in today’s content circulation,» explains Roberts Rozis, «Tilde’s» language resource manager, «FOLD publishes topical content on Latvian and foreign creative industries, moreover, the content is closely similar in Latvian and in English. Therefore, the accumulated linguistic data is an outstanding resource for building a language corpus. When processing it, more than 10,500 parallel sentences will be obtained.»
«Tilde» works both on machine translation systems and spelling, voice recognition and voice synthesis tools. Neural networks and artificial intelligence are used for the development of technology, and trained with the help of a large amount of data. Popular machine translation systems like Google Translate are well versed in words and expressions in everyday use and focus on large languages, but «Tilde» is working to make machine translation available to the Latvian language, including industries with their terminology and language usage. Therefore, «Tilde» regularly searches for cross–sectoral partners who are ready to share their linguistic data.
«We originally decided that we would create FOLD content both in Latvian and in English, so that foreigners could read about Latvian creative industries, and that we would ensure that the written language is correct. In creative industries, in which new terms come in from English almost every day, it is often challenging to compose apprehensible sentences in Latvian. The fact that our translated articles have also proven useful for the development of machine translation systems is a positive assessment of the quality of FOLD’s texts, and we are very pleased to be able to help improve the Latvian language technologies,» says Evelīna Ozola, founder of FOLD.
«Tilde» also invites other companies and organisations to share their linguistic data and participate in the development of machine translation technology for the Latvian language.