How Is the Word Embedding Generated?
How Is the Word Embedding Generated? | A Simple Guide with Examples
What is a Word Embedding?
Imagine you have a word like “apple.”
Now, think of representing “apple” as a point in a space — let’s say in a 3D space. Each dimension can represent a feature or meaning category. For example:
-
Dimension 1: Tech (related to Apple Inc.)
-
Dimension 2: Fruit (edible apple)
-
Dimension 3: Vehicle (rare but possible use)
Let’s say our model generates this vector for “apple”:
This tells us that “apple” here is mostly referring to fruit (0.8), a little to tech (0.2), and barely related to vehicles.
Similar Words Stay Closer
In this vector space, words with similar meanings are closer together, and words with different meanings are far apart.
-
πΆ dog and π± cat might be nearby.
-
π car and π³ tree would be far apart.
-
π “apple” (fruit) and π “grape” will likely be close.
This closeness is captured using cosine similarity, which measures how "directionally" close the two vectors are.
So, do we generate embeddings manually?
No, embeddings aren’t manually assigned. They're learned from data.
Here's an example.
Let’s say we’re generating embeddings for the word “apple”, and we use 3 features again: [tech, fruit, vehicle].
Here's how different sentences might influence the embedding:
Sentence | Embedding (approx.) [fruit, tech, other] |
---|---|
apple is healthy | [0.6, 0, 0] - notice that fruit has high weight |
apple is better than grapes | [0.7, 0, 0] - notice that weight of the fruit is increasing with more data |
apple launched new iPhone | [0.7, 0.2, 0] - notice that it is realizing that it depends on Tech as well |
Over time, the model notices that “healthy” and “grapes” often co-occur with apple (fruit), while “iPhone” indicates the tech brand. This starts shaping the embedding. It gets better with features as it goes through large amount of data.
But Wait — Context Matters!
Let’s say, you are working on a search application where you show the top match with user's search query. assume that, user has searched for 'should I eat apple daily?'
Now consider two different sentences:
-
“Apple launched iPhone, while I was eating banana.”
-
“Apple is healthy.”
Should the word “apple” have the same embedding in both?
NO!
Because in (1), apple = tech brand
While in (2), apple = fruit
If we use the same embedding [0.5, 0.5, 0]
, we’re mixing meanings. This hurts search, recommendations, and understanding.
Instead, we need embeddings that are context-aware.
This is Where Self-Attention Comes In
Modern NLP models (like BERT, GPT) use self-attention to build dynamic embeddings based on the surrounding words.
So, for example:
Sentence | Embedding for “apple” [fruit, tech, other] |
---|---|
Apple launched iPhone, while I was eating banana. | [0.01, 0.9, 0] |
Apple is healthy | [0.9, 0.1, 0] |
This is powerful — this is how models disambiguate meaning and generate relevant results in applications like search, translation, and chat. We will learn about this in my next blog.
Real-World Use Case: Semantic Search
Suppose you’re building a search engine.
A user types:
“Healthy fruits like apple”
Now, your system should NOT return iPhones.
Thanks to contextual embeddings, your model understands that “apple” here refers to fruit, not the brand.
In Short
-
A word is represented as a vector of numbers, each capturing a feature.
-
These vectors are learned from data, based on how words co-occur in sentences.
-
To make them context-aware, modern models use self-attention, which allows the same word to have different meanings based on context. You will learn about this in my next blog.
Why This Matters
Whether you're building a chatbot, search tool, or recommendation engine, understanding how embeddings work can help you:
-
Improve accuracy
-
Reduce irrelevant results
-
Deliver human-like understanding
Comments
Post a Comment