AI translation of literary texts is "fine", but readers still prefer human translations
Abstract
Human readers prefer human-translated literary works over machine translations, finding the latter less immersive and harder to distinguish from human translations, despite machine translation metrics favoring the automated versions.
AI translation of literary works is increasingly common. While the content may be rendered adequately, we do not know enough about how readers experience it in terms of immersiveness and literary effect, aspects poorly captured by automatic machine translation metrics or human evaluation targeting fluency and adequacy. We ask 15 avid readers to compare recently published human translations (HT) to machine translations (MT) generated with an agentic large language model (LLM)-based pipeline, for 15 recent novels in French, Polish, and Japanese and translated into English. Readers evaluated approximately 8K-word excerpts in two conditions: immersive reading of the whole excerpt (30 comparisons) and close reading of 386 aligned HT-MT chunk pairs (772 comparisons), with two readers per book and in alternating order of presentation. Overall, readers find MT "fine", but prefer HT (slightly at excerpt-level 19/30, more clearly at chunk-level 522/772) for its ease, clarity, and immersive nature. Readers' highlights show that MT's quality varies more within one book than HT's does. Crucially, readers cannot reliably tell the two apart (17/30 guess correctly) and tend to prefer the version they believe to be human. Automatic metrics, including LLM-as-a-judge approaches, fail to recover reader preferences and favor MT. We release LAIT (Literary AI Translation), a reader-centered evaluation dataset with 1K reader comments, 2K judgments and preference ratings, and 7.2K span-level annotations, along with our evaluation protocol and supporting interface.
Community
AI translation is continually getting more attention.
Translating a fictional work has never been as easy and fast when using AI.
So what is lost in AI literary translation?
To answer this question, we asked 15 avid readers to read two novel excerpts translated in English (~8k words), one by a published literary translator and one by AI.
They evaluated the translation in two setups: (1) through immersive reading, by reading the entire excerpt at once, and (2) through close reading, comparing shorter passages side by side.
We found that human translation is often more preferred than AI and is better rated.
However, during immersive reading, AI translations were still considered readable, with more than half receiving a high (4 or 5) rating for 'willingness to continue reading' this version.
But close reading shows that AI translations were not consistent in quality, which could vary significantly even within one book.
Another aspect of the research was to see if readers could successfully detect the AI-translated text.
Only readers pointing to concrete writing issues (e.g., run-on sentence) succeed to detect AI, and some were misled by what they believed to be Al tells, such as em-dashes.
We also introduce LAIT (Literary Al Translation) dataset containing:
- literary excerpts in translation
- 1K readers' comments
- 2K ratings & preference judgments
- 7.2K span-level annotations
Get this paper in your agent:
hf papers read 2606.26040 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper