Machine translation
Machine translation is the task of translating a sentence in a source language to a different target language.
Results with a * indicate that the mean test score over the the best window based on average dev-set BLEU score over
21 consecutive evaluations is reported as in Chen et al. (2018).
WMT 2014 EN-DE
Models are evaluated on the English-German dataset of the Ninth Workshop on Statistical Machine Translation (WMT 2014) based
on BLEU.
Model |
BLEU |
Paper / Source |
Transformer Big + BT (Edunov et al., 2018) |
35.0 |
Understanding Back-Translation at Scale |
DeepL |
33.3 |
DeepL Press release |
Admin (Liu et al., 2020) |
30.1 |
Very Deep Transformers for Neural Machine Translation |
MUSE (Zhao et al., 2019) |
29.9 |
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning |
DynamicConv (Wu et al., 2019) |
29.7 |
Pay Less Attention With Lightweight and Dynamic Convolutions |
TaLK Convolutions (Lioutas et al., 2020) |
29.6 |
Time-aware Large Kernel Convolutions |
AdvSoft + Transformer Big (Wang et al., 2019) |
29.52 |
Improving Neural Language Modeling via Adversarial Training |
Transformer Big (Ott et al., 2018) |
29.3 |
Scaling Neural Machine Translation |
RNMT+ (Chen et al., 2018) |
28.5* |
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation |
Transformer Big (Vaswani et al., 2017) |
28.4 |
Attention Is All You Need |
Transformer Base (Vaswani et al., 2017) |
27.3 |
Attention Is All You Need |
MoE (Shazeer et al., 2017) |
26.03 |
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer |
ConvS2S (Gehring et al., 2017) |
25.16 |
Convolutional Sequence to Sequence Learning |
WMT 2014 EN-FR
Similarly, models are evaluated on the English-French dataset of the Ninth Workshop on Statistical Machine Translation (WMT 2014) based
on BLEU.
Model |
BLEU |
Paper / Source |
DeepL |
45.9 |
DeepL Press release |
Transformer Big + BT (Edunov et al., 2018) |
45.6 |
Understanding Back-Translation at Scale |
Admin (Liu et al., 2020) |
43.8 |
Understand the Difficulty of Training Transformers |
MUSE (Zhao et al., 2019) |
43.5 |
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning |
TaLK Convolutions (Lioutas et al., 2020) |
43.2 |
Time-aware Large Kernel Convolutions |
DynamicConv (Wu et al., 2019) |
43.2 |
Pay Less Attention With Lightweight and Dynamic Convolutions |
Transformer Big (Ott et al., 2018) |
43.2 |
Scaling Neural Machine Translation |
RNMT+ (Chen et al., 2018) |
41.0* |
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation |
Transformer Big (Vaswani et al., 2017) |
41.0 |
Attention Is All You Need |
MoE (Shazeer et al., 2017) |
40.56 |
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer |
ConvS2S (Gehring et al., 2017) |
40.46 |
Convolutional Sequence to Sequence Learning |
Transformer Base (Vaswani et al., 2017) |
38.1 |
Attention Is All You Need |
### WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages
Go back to the README