Over the past decade or so, many of the marquee businesses have been related to the human need/desire to communicate. Facebook, Twitter, and Google led the pack for tech IPOs (initial public offerings). The (apparently) impending multibillion-dollar purchase of Beats by Apple seems to indicate communications technology will continue to be a focus of investment. There is every reason to believe cultural change will continue to be driven by these innovations.
With this in mind, I’ll suggest a communications technology that is a daring project, fitting Neal Stephenson’s “big stuff” challenge. His criteria are these:
- They should be achievable, without basic constraints (like time travel or faster than light travel).
- It’s okay, even desirable, that they should require the work of a career to complete.
- They should have a practical purpose.
- It’s okay for the project to be a technological fix to a major human problem.
With these in mind, I’ll suggest poetic translation as a “big stuff” project. By this I mean automated translation to/from foreign languages that is full, rich, and immediate. We already have what I’d call tourist translation, where small bits of input create understandable, delayed outputs that are flawed and lacking in nuance and style.
The ideal. In the year 2054, I head into a meeting that will be conducted in Arabic. I don’t speak the language, so I dial it into my Fluent system. At the meeting, I understand talks, including the charts. I am able to fully and immediately engage in conversations. There is no misunderstanding.
How? Perhaps I swap a chip into my head. Maybe I inhale a virus. Or it’s a simple Matrix-like download of a new capability. But what I experience is communication that approaches a native speaker.
The intermediate. In the year 2024, I head into a meeting wearing headphones and Google Glass augmented reality system with Fluent installed. Charts are subtitled for me, and I get a slightly delayed simultaneous translation. A colored line in my field of vision indicates the confidence levels of the translation. The others in the meeting are similarly equipped, so they get the gist of what I’m saying in near real-time with appropriate cautions. The cautions clue all of us to repeat and ask questions to clarify.
How? An interesting approach Nuance Communications, Inc., took to improving its Dragon Dictate voice recognition system was making it available for free on smartphones. This provided them with a huge number of samples of people speaking English in a variety of accents and dialects. Often, they spoke in common phrases.
Something similar might jumpstart translation. One of the things Turing winner John Cocke did to improve automated translation was take Canadian government records (which, by law, are available in English and French) and use them as a base.
So, given how much content is translated and available digitally nowadays, a huge number of samples could be used as a starting point for translation. Contextual information (who the speaker is, past interests, audience/listener, purpose for coming together, location, stress levels, emotion) could be added to improve the accuracy.
Components. These seem to be essential, though how they are approached may vary:
- Advanced translation engine. Fast, working off a large database, able to detect and adjust to accents and dialects.
- Input device. Audio and video. Able to pick clear signals out of ambient noise and images. For the speaker, able to pick up verbal expressions, not just individual words. For the listener, focused, based on his/her attention (since more than one person may be speaking).
- Output device. Immediate text translation. Near real-time translation of speech with expression, confidence factors (if needed), and accessible references.
- Contextual information. A mix of static data, based on things like scheduling, and location and dynamic, based on gestures, mood and environmental changes.
- Personalization. Specific knowledge of speakers and listeners to handle nonstandard word use, nuance, and subtext.
- Universal participation. Many communities today are isolated because of language. The less broadly your language is spoken, the more limited is the content available to you. And the more difficult it is to have your views known and heard. In principle, Fluent could obliterate this barrier.
- Understanding. As the experiences and ideas of others become available, the potential for correct analysis of their behaviors and the development of empathy may increase. Also, direct communication may decrease the chances of misunderstanding.
- Perspective. Communicating regularly with a more diverse set of people with very different experiences has the potential to broaden personal perspectives on issues, concerns, approaches, and philosophies.
- Questions. By providing access to other cultures, accepted norms, wisdom, and approaches will be challenged.
- Knowledge. Whole areas of knowledge – current and historical – will become accessible.
- Collaboration. Jargon, as well as language, should be a subject of translation. This could open the door to conversations and joint projects across different disciplines.
Does this meet Stephenson’s challenge?
- Achievable. Language processing is one of the most intensive and complex activities in the brain. Creating a chip that does this work is probably a stretch. But the hardware and software for an augmented reality system seem to be within reach.
- Career objective. Is this big enough to take decades? A recent TED talk about developing futuristic input/output designs for “Minority Report” seemed wistful because what the designers had designed as far future became real so quickly. Once the challenge is out there in a way that catches the imaginations of engineers (and businesses), things can happen quickly. (Though not always. Virtual reality continues to lurch forward at an uneven pace.) I suspect getting something better – meaning an improvement dramatic enough to change culture – can be achieved within ten years. But the full expression of the fluent concept seems to be further off.
- Practical purpose. Yes.
- A technological fix for a major human problem. A great deal of misunderstanding, sometimes tragic, occurs because of differences in language. But conversations, especially those with emotional importance, have subtext. What is said is not what is meant. Connotation is more important than denotation. And language is just one part of culture. The larger context provides meaning. So this is a useful fix, but it will have unintended side effects.
I think there’s also something poetic about matching Stephenson’s example of the Tall Tower with an end to the confusion of languages.