Cross Attention in Decoder Block of Transformer

 


Notice where the cross attention is marked, 2 arrows are coming from encoder block, and one is coming from decoder block.

Why do we need to consider Encoder Block?


Now, lets say we have predicted 2 words, and we need to predict the 3rd word? It will depend on what?

Ofc, first 2 words of decoder block, and original sentence context from the Encoder block.

So, we need to figure out the relationship between these two.


How will we get the relationship?





q : Hindi (from Decoder Block)

k : Eng (from Encoder Block)

v : Eng (from Encoder Block)


Comments

Popular posts from this blog

Extracting Tables and Text from Images Using Python

Positional Encoding in Transformer

Chain Component in LangChain