Posts

Showing posts from July, 2025

Positional Encoding in Transformer

Image
1. Why Position Matters in Transformers? Transformers rely on self‑attention, which processes tokens in parallel. This means, unlike RNNs, they don’t inherently know the order of words. So, sentences like “Ravi killed the lion” vs. “The lion killed Ravi” would look identical to a vanilla Transformer—clearly problematic! πŸ§ͺ Idea #1: The NaΓ―ve Approach A simple fix would be to add index/position of the token in an embedding vector. Issues: Unbounded values: Position IDs can become huge (e.g. 100,000+ in long texts), destabilizing training. Discrete steps: Sharp jumps between integers disrupt gradient flow. πŸ§ͺ Idea #2: Normalize the Position Numbers What if we divide the position numbers by a constant to make them small and smooth? That helps a bit—values don’t explode anymore. Issues: Now, if you observe, in both the sentences, the word at second position has got the different values. 1 for sentence1, and 0.5 for sentence2. so, the Neural network will get confused while training, what a...

How Is the Word Embedding Generated?

Image
How Is the Word Embedding Generated? | A Simple Guide with Examples What is a Word Embedding? Imagine you have a word like “apple.” Now, think of representing “apple” as a point in a space — let’s say in a 3D space. Each dimension can represent a feature or meaning category . For example: Dimension 1: Tech (related to Apple Inc.) Dimension 2: Fruit (edible apple) Dimension 3: Vehicle (rare but possible use) Let’s say our model generates this vector for “apple”: apple → [0.2, 0.8, 0.00001] where dimensions represent [tech, fruit, vehicle] This tells us that “apple” here is mostly referring to fruit (0.8), a little to tech (0.2), and barely related to vehicles. Similar Words Stay Closer In this vector space, words with similar meanings are closer together , and words with different meanings are far apart . 🐢 dog and 🐱 cat might be nearby. πŸš— car and 🌳 tree would be far apart. 🍎 “apple” (fruit) and πŸ‡ “grape” will likely be close. This closeness is captur...