Transformer Inference

 


The encoder runs only once for the input sentence.

All 6 decoder layers are executed each time a new target token is predicted.

Also, masking happens during inference as well.



Comments

Popular posts from this blog

Extracting Tables and Text from Images Using Python

Positional Encoding in Transformer

Chain Component in LangChain