Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT’s use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google’s Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units (”wordpieces”) for both input and output. This method provides a good balance between the flexibility of ”character”-delimited models and the efficiency of ”word”-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT’14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google’s phrase-based production system.

References in zbMATH (referenced in 27 articles , 1 standard article )

Showing results 1 to 20 of 27.
Sorted by year (citations)

1 2 next

  1. Dash, Tirtharaj; Srinivasan, Ashwin; Baskar, A.: Inclusion of domain-knowledge into GNNs using mode-directed inverse entailment (2022)
  2. Daubechies, I.; DeVore, R.; Foucart, S.; Hanin, B.; Petrova, G.: Nonlinear approximation and (Deep) ReLU networks (2022)
  3. Benigni, Lucas; Péché, Sandrine: Eigenvalue distribution of some nonlinear models of random matrices (2021)
  4. Ding, Man; Han, Congying; Guo, Tiande: High generalization performance structured self-attention model for knapsack problem (2021)
  5. Fan, Angela; Bhosale, Shruti; Schwenk, Holger; Ma, Zhiyi; El-Kishky, Ahmed; Goyal, Siddharth; Baines, Mandeep; Celebi, Onur; Wenzek, Guillaume; Chaudhary, Vishrav; Goyal, Naman; Birch, Tom; Liptchinsky, Vitaliy; Edunov, Sergey; Auli, Michael; Joulin, Armand: Beyond English-centric multilingual machine translation (2021)
  6. Fan, Jianqing; Ma, Cong; Zhong, Yiqiao: A selective overview of deep learning (2021)
  7. Ghods, Alireza; Cook, Diane J.: A survey of deep network techniques all classifiers can adopt (2021)
  8. Hu, Yifan; Hu, Changwei; Tran, Thanh; Kasturi, Tejaswi; Joseph, Elizabeth; Gillingham, Matt: What’s in a name? -- Gender classification of names with character based machine learning models (2021)
  9. Nelson Tavares de Sousa, Wilhelm Hasselbring: JavaBERT: Training a transformer-based model for the Java programming language (2021) arXiv
  10. Perez-Beltrachini, Laura; Lapata, Mirella: Multi-document summarization with determinantal point process attention (2021)
  11. Tripathy, Jatin Karthik; Sethuraman, Sibi Chakkaravarthy; Cruz, Meenalosini Vimal; Namburu, Anupama; P., Mangalraj; R., Nandha Kumar; S., Sudhakar Ilango; Vijayakumar, Vaidehi: Comprehensive analysis of embeddings and pre-training in NLP (2021)
  12. Wang, Baoxun; Xu, Zhen; Zhang, Huan; Qiu, Kexin; Zhang, Deyuan; Sun, Chengjie: LocalGAN: modeling local distributions for adversarial response generation (2021)
  13. Xiao, Danyang; Mei, Yuan; Kuang, Di; Chen, Mengqiang; Guo, Binbin; Wu, Weigang: EGC: entropy-based gradient compression for distributed deep learning (2021)
  14. Duarte, Victor; Duarte, Diogo; Fonseca, Julia; Montecinos, Alexis: Benchmarking machine-learning software and hardware for quantitative economics (2020)
  15. Jagtap, Ameya D.; Kharazmi, Ehsan; Karniadakis, George Em: Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems (2020)
  16. Kool, Wouter; van Hoof, Herke; Welling, Max: Ancestral Gumbel-top-(k) sampling for sampling without replacement (2020)
  17. Liu, Minliang; Liang, Liang; Sun, Wei: A generic physics-informed neural network-based constitutive model for soft biological tissues (2020)
  18. Sirignano, Justin; Spiliopoulos, Konstantinos: Mean field analysis of neural networks: a law of large numbers (2020)
  19. Tang, Meng; Liu, Yimin; Durlofsky, Louis J.: A deep-learning-based surrogate model for data assimilation in dynamic subsurface flow problems (2020)
  20. Xu, Jiayang; Duraisamy, Karthik: Multi-level convolutional autoencoder networks for parametric prediction of spatio-temporal dynamics (2020)

1 2 next