Translation Challenge

Thursday, 2006-08-10; 02:14:00

What with me posting the answer to the Znax challenge, I thought I'd post another interesting question. Well, a scenario, really:

Suppose you're trying to communicate with a race of aliens. You have two machine translators: one of them translates exclusively from your language to the alien language. The other translates exclusively from the alien language to your own. You know that sentences in your language correspond to the same number of sentences in the other language. That is, you can divide up a paragraph into sentence-sized chunks, give them to the first translator separately, and reassemble the translated sentences into a coherent paragraph, with no information being lost.

The two translators you have are perfect when applied to single sentences. If you fed one translator a single sentence, you can be guaranteed that the translated sentence perfectly conveys the meaning you intended in the corresponding language. However, there are an unknown number of flaws in one or both of your translators: you feed the first translator a 1000 sentence document, feed the result to the second translator, and you don't get back what you originally fed to the first translator. Your job is to identify where the problems are. Note that the exact flaw is perfectly reproducible: each time you feed the document through the two translators, you get the exact same flawed result every time.

It's worth noting two additional properties of the translators. First, "flaws" can produce all kinds of different results. It can simply garble a sentence. It can add additional sentences to the translation. Or, it can remove necessary sentences. (Think of what might happen if you were the translator: in the course of reading over the original document, you might misinterpret a sentence. You might misalign your eyesight when jumping to the next line, in which case you might read a sentence or two over again. You might also accidentally skip over a sentence or two because of the same misalignment. You might also skip over whole paragraphs if you translated the document in multiple sessions.)

The second property of the translators is that they work in a strictly linear fashion. That means that if the translator correctly translates the first fifty sentences correctly, a flaw in the translation of the 51st sentence cannot affect the already completed translation of the first 50 sentences. (This would also be true if you were translating a document yourself.)

The problem is, these are prototype translators and are the first of their kind. You don't have any reference translators. If you did, you could simply feed the 1000 sentence document to the first perfect translator, and compare it sentence by sentence to the output of the imperfect translator. You could do the same thing for the second translator as well.

One way to solve the problem would be to simply feed each sentence one by one through the first translator, and then compare the results sentence by sentence to the translation provided when translating the entire document. Since you know your translators are perfect when applied to single sentences, and no information is lost when feeding the translators individual sentences one after another as opposed to feeding them the entire document at once, you can do this comparison. Then, when you are confident that you've found any problems in the first translator, you feed the translated document sentence by sentence through the second translator, and repeat the process.

There is another complication, though. Each time you feed something to one of the translators, it takes a minute to process, regardless of the length of the document. (It's like those annoying copiers that can xerox fifty pages, print 50 copies, and collate them in two minutes flat, but it takes almost 20 minutes just to start them up.) So a 1 sentence document and an 1000 sentence document will both take one minute to process when fed to one of the translators. You obviously want to minimize the amount of time used in fixing the problem. Using the above solution, it takes you 2002 minutes to identify all the problems. (If you had two reference perfect translators, you would only need 4 minutes, but you don't have these references.)

How do you minimize the time it takes to locate all the problems in the two translators? Remember, there are an unknown number of flaws in one or both of your translators.

I'll post the solution in a week. Feel free to discuss in the comments: don't worry about spoiling it for anyone else if you think you have the solution.

Yes, this does have to do with D2X-XL. You'll see why soon enough. The scenario is kind of contrived, and I hope I've taken care of any loopholes with careful wording. Even if the scenario doesn't hold up to scrutiny, though, I do have a highly related story regarding D2X-XL.

Technological Supernova   D2X-XL   Older   Newer   Post a Comment