Some ideas about further research on neural network language models

In this post, the limits of existing neural network language models are analysed, and some possible directions of further searches on neural network language models are proposed.

Clarifing a misunderstanding of back-propagation through time method

It is easy to get confused about back-propagation through time (BPTT) algorithm when starting to implement it in some applications, at least I did. In this post, if BPTT should always be implemented with truncation will be discussed, and a common misunderstanding of BPTT will be explained.

Sampling approximation of gradient

Sampling approximation of gradient is a speed-up technique for trainging neural netowrk language models, and is proposaled by Bengio and Senecal. Three algorithms are represented by Bengio and Senecal, but only the importance sampling method worked finely for neural network language models. This post mainly focuses on improtance sampling, and converys it in a simpler and easier way.

Class based neural network language model

The idea of word classes has been used extensively in language modeling for improving perplexities or increasing speed. In this post, some researches about introducing word classes into neural network language modeling will be described, and a extension of word classes, hierarchical neural network language model, will also be included.

Some tips for building neural network languge models

In this post, some tips for the implemetation details of neural network language models will be summarized, including initializing neural network language models, treating data set as a long sequence or multiple independent sentences, taking word or character as input level, and dealing with words out of vocabulary.