Meta has unveiled a new multimodal and multilingual AI translation model that aims to make translation on the go effortless.
The model, called SeamlessM4T, works across different languages. It supports:
- Speech recognition for nearly 100 languages
- Speech-to-text translation for nearly 100 input and output languages
- Speech-to-speech translation, supporting nearly 100 input languages and 36 (including English) output languages
- Text-to-text translation for nearly 100 languages
- Text-to-speech translation, supporting nearly 100 input languages and 35 (including English) output languages
Meta is releasing the model, which researches and developers will be able to use in their own apps after getting a license. Meta describes SeamlessM4T as an “All-in-one system that performs multiple tasks across speech and text.”
Alongside the model, Meta is also releasing the metadata of ‘SeamlessAlign,’ which it says is the biggest open multimodal translation dataset with 270,000. hours of mined speech and text alignments.
Regular translation systems pile one model on top of others to provide the functionality, while SeamlessM4T can reportedly do it all by itself. “Building a universal language translator, like the fictional Babel Fish in The Hitchhiker’s Guide to the Galaxy, is challenging because existing speech-to-speech and speech-to-text systems only cover a small fraction of the world’s languages. But we believe the work we’re announcing today is a significant step forward in this journey,” reads Meta’s blog.
The company also stated that its model’s single system approach reduces errors and delays, and increases the efficiency and quality of translations.
Learn more about the development here.
Image credit: Meta