Notes from recent meetings with emerging companies
Developing a new approach to automatically translating text documents from one language to another.
What's new Fluent has just begun describing details of its technology and sketching out a product roadmap.
Profile Getting machines to understand human, or so-called natural language is one of the great challenges in computer science. Every year, it seems, scientists refine their techniques and get a few steps closer to the perhaps unreachable goal of making the computer as fluent as a human being. As yet, no algorithm solves the problem completely and definitively.
Evidently, the thing that most defines us as human is not easily reducible to a simple transfer or processing of some mathematical stuff called information.
Machine translation of written texts, as compared to getting computers to interpret and act on written or spoken commands in real-time, might appear to be a fairly easy task. After all, the computer can take its time, within reason, in analyzing a document and creating its translation. And with documents of any size, lots of clues as to meaning and context would seem to exist in the text itself.
Sure enough, there are a number of machine translation programs commercially available in the form of desktop and enterprise software products and as Web-based services. But their accuracy rate generally hovers in the 70% range. That may suffice for the quick turnaround of a technical product data sheet, say, but in most cases, the software's output needs cleaning up by people familiar with the target language. And that costs money.
Fluent Machines claims to have come up with a breakthrough in machine translation, or MT, as it's often called. Eli Abir, its chief technologist, has devised an enhancement to the so-called example- based approach to translation. In essence, this approach involves building a large database of sentence pairs, each in a different language but with equivalent meaning: "I closed the door" and "J'ai fermι la porte," for instance.
Through a process of searching this database of examples, matches can be found for new sentences and sentence fragments and fairly accurate translations can be constructed automatically. If needed, a set of grammatical rules can be applied to refine the initial translation and produce a considerably more accurate text. Much research in example-based translation has been done at Carnegie-Mellon University, led by Jamie Carbonell, an old hand in artificial intelligence. Numerous other universities have looked into it, too, often with funding from the military.
Mr. Abir's insight into the problem is two-fold. the first part centers on an improved way of determining how to connect translated sentence fragments into proper sentences. And, he seems to have come up with a better way of undertaking the extensive, automated statistical analysis of large volumes of matching texts that's required to find and match those fragments. Previous example-based systems, we understand, stored every matching pair of sentences that they're given, thus creating a huge, unwieldy database. But Mr. Abir has figured out a way to store only what's useful and thereby reduce the size and improve the speed and usefulness of searches in that database.
| Instead of seeking to match complete sentences,
Mr. Abir's algorithm focuses on determining the frequency of practically
every word string in each of a pair of matching
Clearly, words that match in meaning don't necessarily show up in exactly the same sequential order in two corresponding sentences written in different languages. But Mr. Abir figures that it's quite possible for a computer to identify the words (and strings of words) that each word typically appears in conjunction with. Boxers tend to punch other boxers and win fights, for instance, while dogs bark and airplanes fly, land, and crash. This kind of association eventually leads to a long list of matching words and sentence fragments.
The assumption is that in any language at any given moment, there is in current use a finite number of these DNA-like "blocks of meaning." The trick is identifying them somehow without manually poring over vast amounts of text. Mr. Abir estimates this number at between 1 billion and 5 billion blocks for modern tongues like English and German. That's a big number, no doubt, but by analyzing enough document pairs that are known to be good translations of each other, he reckons, his software should be able to identify the different word strings in each language that carry the same meaning. And, what's more, the code can determine how these chunks typically fit together when used properly.
Fluent's is a purely statistical approach, with no regard for grammatical or syntactical rules or the actual semantics of words. That's not entirely new, we gather; though relatively young, statistical MT has its own rich history. But by virtue of the way Fluent does its analysis, company officials argue, the more matching texts the system is fed the better it can translate texts in any of the languages it handles. In other words, its English-to-French abilities will improve not only with every new English-and-French text-pair it's given but also with new German-to-French pairs, too. And it should improve incrementally with every contribution of a new translation, including those supplied by human translators who are refining Fluent-produced texts and feed the system their corrections.
So far, Fluent tells us, the firm has only just finished prototypes of its database builder and word-string connector programs, the two key algorithms it has developed. And now, the company is scrambling to find translated document pairs to analyze. Thousands of these are available on the Web, of course, but the company hopes to strike deals with government agencies, including the military, and explore other sources.
Having reviewed Fluent's technology, Mr. Carbonell has given it a generous endorsement: "...clearly the most promising and theoretically important MT development in the past several years." We'd be considerably more impressed with this assessment if we weren't aware that the Carnegie-Mellon professor has also joined Fluent's board of directors. Still, we have to believe Fluent is on to something if someone of Mr. Carbonell's stature actually joins the firm.
Estimates of the document translation market vary, but the rapid globalization of markets is making it imperative to get product manuals, website content, and other documentation translated as quickly and with as little cost as possible. Even e-mail needs translating, preferably in near real-time. Fluent sees worldwide language-translation revenues of $5.7 billion in 2001 growing to $7.6 billion by
|2006, with MT's portion
growing from $73 million to $117 million over the same period. That's not
exactly the biggest potential market we've heard touted by an
entrepreneur, but the implication is significant improvements in MT's
accuracy would tend to give MT a bigger slice of the pie. And, it could
well drive demand for new applications not considered economical right
now: translating daily newspapers, for instance, and providing access to
Fluent plans to pursue a service-based business model, with availability starting within a year. The choice of a that model is dictated largely by the fact that the core database will be fairly huge in size-tens, maybe hundreds of gigabytes-and the actual translation of new texts will consume great amounts of memory and processing capacity. Company officials tell us that they can foresee an enterprise version of the system eventually being produced for installation on dedicated sets of servers. But for now, the plan is to translate documents submitted over the Web to a central site. This is the business model already in place at WorldLingo, which retains banks of trained translators who work on their own and with the aid of MT systems. The firm can call on translators who have expertise not only in specific languages but also in selected subject domains. (We submitted this article for machine translation into Spanish at WorldLingo's website. The Spanish text came back with the option of having a human translator touch it up for $110.)
Fluent's background isn't typical for the technology industry. It is actually one of two wholly-owned subsidiaries of an entity called Meaningful Machines. The sister startup is called Internet Driver, which is also commercializing technology conceived by Mr. Abir. Internet Driver offers a browser plug-in and hosted service that together enable non-English-speakers to navigate the Web in their own languages and character sets. The Web's URLs are available only in English. Fluent has exclusive license to patents filed by Meaningful that cover use of Mr. Abir's technology for human language translation apps. Mr. Abir is described as an inventor who served time in the Israeli army and then owned three restaurants and other small businesses in the U.S. Fluent's backer, Apple Core Holdings, is a real-estate company and hotel operator in New York that in the '90s, began investing in early-stage tech firms such as Register.com, GoAmerica, Cryptek, and Javelin Technologies.
An intriguing, offbeat story, we believe. Now, the challenge will be to create a robust, commercial product and to convince investors that there's something worthwhile here. Last year, another MT system-called Gedanken and developed by a New York outfit called Applied Knowledge Systems-fell off the map when a planned take-over by The Translation Group fell through. (We notice that Translation, publicly listed, itself seems to have gone belly up; its phone is disconnected and website unreachable.) Even if it's as technically advanced as Mr. Carbonell insists, Mr. Abir's idea won't necessarily translate into profits right away.
Upside Rampant globalization is making rapid and low-cost language translation a must for many companies and agencies.
Downside The technology is still unproven, particularly from a commercial point of view. Though Fluent says its approach can potentially increase in accuracy to 99%-plus over time, it's not clear how accurate it will be upon first delivery-or how fast it will operate.
CEO and Chairman Steve Klein, chairman and CEO of Apple Core.
HQ New York
Financing $4.1 million in one round
Investors Apple Core Holdings
ComputerLetter is a trademark and service mark of Technologic Partners
©2002 Technologic Partners