Sequence to Sequence Modeling

In this project, we explain the sequence to sequence modeling using Pytorch. The source code is available on GitHub.

map to buried treasure

What is the problem?

Machine Translation(MT) is one of the areas of NLP that has been profoundly affected by advances in deep learning. In fact, progress in MT can be categorized into pre-deep learning and deep learning era. Today we know that the top-performing machine translation systems are solely based on neural networks which led to the term Neural Machine Translation (NMT).

When we use the term neural machine translation, we are talking about applying different deep learning techniques for the task of machine translation. It was after the success of the neural networks in image classification tasks that researchers started to use neural networks in machine translation. Around 2013 research groups started to achieve breakthrough results in NMT and boosted state of the art performance. Unlike traditional statistical machine translation, NMT is based on an end-to-end neural network that increases the performance of machine translation systems.

We dedicate this project to a core deep learning-based model for sequence-to-sequence modeling and in particular machine translation: An Encoder-Decoder architecture based on Long-Short Term Memory (LSTM) networks.

What makes the problem a problem?

Although sequence to sequence modeling scope is broader than just the machine translation task, the main focus on seq-2-seq research has been dedicated to MT due to its great importance in real-world problems. Furthermore, machine translation is the bridge for a universal human-machine conversation.

What is the secret sauce here?

Here, we tried to achieve some primary goals as we hope to make this work unique compared to the many other available tutorials:

1. We called this repo "from scratch" due to the fact that we do NOT consider any background for the reader in terms of implementation.

2. Instead of using high-level package modules, simple RNN architectures are used for demonstration purposes. This helps the reader to understand everything from scratch. The downside, however, is the relatively low speed of training. This may not cause any trouble as we try to train a very small model.

3. The difference between uni-directional LSTMs and bi-directional LSTMs have been clarified using the simple encoder-decoder implementation.

Who cares?

This tutorial has been provided for the developers/researchers who really want to start from scratch and learn everything spoon-by-spoon. The goal is to give as much detail as possible so the others do NOT have to spend the time to understand hidden and yet very important details.

Scroll to Top