Machine learning could help read languages that still haven’t been deciphered, helping us discover things we never knew about history
In the few years that machine learning has been around, it has significantly transformed the study of linguistics, thanks to the availability of huge annotated databases, and techniques for having machines learn from them. Due to this, machine translation from one language to another is quite commonplace now. Though machines are not always perfect with their language, machine learning has given a fresh new perspective to linguistics. It could be any language or dialect from across the world, machines can now learn it, and help translate anything to and from it.
This ability is now being extended to languages of bygone history that academicians, historians and linguists have not yet been able to decipher at all – languages that were spoken or written or both, many, many years Before Christ. Jiaming Luo and Regina Barzilay from Massachusetts Institute of Technology (MIT) and Yuan Cao from Google’s AI Lab in Mountain View, California have developed a machine learning system that is capable of deciphering the lost languages. They have conducted numerous trials and have successfully deciphered a language discovered in one of the historic relics by British archeologist – Arthur Evans in 1886. This language is approximately dated back to 1400 BCE. Their approach for this project was slightly different from the usual machine learning techniques, though.
Generally, machine translation is powered by the understanding that words are related to each other in similar ways, irrespective of what language is involved. The process for machine learning begins with mapping out the relations for a specific language, which would require huge databases of text. A machine then searches this database to pick out how often each word appears next to every other word. This pattern of appearances is identified as a unique signature that would define the word in a multi-dimensional parameter space. The word can then be thought of as a vector within this space. And this vector acts as a powerful constraint on how the word can appear in any translation the machine comes up with. These vectors obey some simple mathematical rules. For example: king – man + woman = queen. And a sentence can be thought of as a set of vectors that follow one after the other to form a kind of trajectory through this space.
The fundamental that drives machine translation is that words in different languages occupy the same points in their respective parameter spaces. This enables mapping an entire language onto another language with one-to-one correspondence. The entire process can be described as the process of finding similar trajectories through these spaces. During the process, the machine doesn’t really need to understand what the sentence means.
Luo et al., on their quest to use machine translation to decipher languages that have been lost entirely are using the constraint of how the languages are known to evolve over time. The principle behind this is that any language can change only in certain ways – the symbols in related languages would still appear with similar distributions, related words would still have the same order of characters, etc. When these constraints are applied, it becomes simpler to decipher a particular language. But for this to happen, it is important that the progenitor language be known. So, if one can figure out the progenitor language of the lost languages, the machine translation process that is being developed can be used to decipher the languages.
One of the biggest challenges that linguists often face is fatigue, which is something that doesn’t happen when machines are used for this process. In some cases, linguists might even use a trial-and-error approach by decipher a particular lost language into every known language. This would be possible only by using the machine translation approach, while being quite impossible if attempted solely through human effort.
Machine translation could help us find so much more about the ancient civilizations, and how things panned out over the decades. Who knows, with this approach, we might even discover Atlantis!