Transformer Decoder

Transformer encoder is used usually in NLP tasks. It’s the backbone of models like BERT, RoBERTa, DistilBERT and the encoder part of T5.

The overall Transformer encoder is made up of N identical layers, where each layer has two main sub-layers:

  1. Multi-Head Self-Attention
  2. Feed-Forward Neural Network (FNN) Each sub-layer has a residual connection + layer normalization.
Input Embeddings 
	→ [Positional Encoding added] 
	→ [Encoder Layer 1] 
	→ [Encoder Layer 2] 
	→ ... 
	→ [Encoder Layer N] 
	→ Final Encoder Output

Inside a Single Encoder Layer

Read more...

Tokenization and Word Embedding

Tokenization and Word Embedding are two common NLP processes before we do anything else. The reason is that AI models can only handle numbers as input and output but the target that NLP handles is mostly about language data, like words or sentences. So tokenization and word embedding are needed to transform natural languages into numbers so AI models can handle them.

What is Tokenization?

Tokenization is usually pretty straight forward. For a given word sequence, tokenization would turn them to a series of numbers. Usually one sequence unit corresponds to a number in the output number sequence.

Read more...