How Is the Word Embedding Generated?

How Is the Word Embedding Generated? | A Simple Guide with Examples


What is a Word Embedding?

Imagine you have a word like “apple.”

Now, think of representing “apple” as a point in a space — let’s say in a 3D space. Each dimension can represent a feature or meaning category. For example:

  • Dimension 1: Tech (related to Apple Inc.)

  • Dimension 2: Fruit (edible apple)

  • Dimension 3: Vehicle (rare but possible use)

Let’s say our model generates this vector for “apple”:

apple → [0.2, 0.8, 0.00001] where dimensions represent [tech, fruit, vehicle]

This tells us that “apple” here is mostly referring to fruit (0.8), a little to tech (0.2), and barely related to vehicles.


Similar Words Stay Closer




In this vector space, words with similar meanings are closer together, and words with different meanings are far apart.

  • 🐢 dog and 🐱 cat might be nearby.

  • πŸš— car and 🌳 tree would be far apart.

  • 🍎 “apple” (fruit) and πŸ‡ “grape” will likely be close.

This closeness is captured using cosine similarity, which measures how "directionally" close the two vectors are.


So, do we generate embeddings manually?

No, embeddings aren’t manually assigned. They're learned from data

Here's an example.

Let’s say we’re generating embeddings for the word “apple”, and we use 3 features again: [tech, fruit, vehicle].

Here's how different sentences might influence the embedding:

SentenceEmbedding (approx.) [fruit, tech, other]
apple is healthy[0.6, 0, 0] - notice that fruit has high weight
apple is better than grapes[0.7, 0, 0] - notice that weight of the fruit is increasing with more data
apple launched new iPhone[0.7, 0.2, 0] - notice that it is realizing that it depends on Tech as well

Over time, the model notices that “healthy” and “grapes” often co-occur with apple (fruit), while “iPhone” indicates the tech brand. This starts shaping the embedding. It gets better with features as it goes through large amount of data.


But Wait — Context Matters!

Let’s say, you are working on a search application where you show the top match with user's search query. assume that, user has searched for 'should I eat apple daily?'

Now consider two different sentences:

  1. “Apple launched iPhone, while I was eating banana.”

  2. “Apple is healthy.”

From what we have learned till now, in both the sentences, the embedding vector for apple will be the same.

Should the word “apple” have the same embedding in both?

NO!

Because in (1), apple = tech brand
While in (2), apple = fruit

If we use the same embedding [0.5, 0.5, 0], we’re mixing meanings. This hurts search, recommendations, and understanding.

Instead, we need embeddings that are context-aware.


This is Where Self-Attention Comes In

Modern NLP models (like BERT, GPT) use self-attention to build dynamic embeddings based on the surrounding words.

So, for example:

SentenceEmbedding for “apple” [fruit, tech, other]
Apple launched iPhone, while I was eating banana. 
[0.01, 0.9, 0]

Apple is healthy[0.9, 0.1, 0]

The same word gets different vectors depending on the context.

This is powerful — this is how models disambiguate meaning and generate relevant results in applications like search, translation, and chat. We will learn about this in my next blog.


Real-World Use Case: Semantic Search

Suppose you’re building a search engine.

A user types:
“Healthy fruits like apple”

Now, your system should NOT return iPhones.
Thanks to contextual embeddings, your model understands that “apple” here refers to fruit, not the brand.


In Short

  • A word is represented as a vector of numbers, each capturing a feature.

  • These vectors are learned from data, based on how words co-occur in sentences.

  • To make them context-aware, modern models use self-attention, which allows the same word to have different meanings based on context. You will learn about this in my next blog.


Why This Matters

Whether you're building a chatbot, search tool, or recommendation engine, understanding how embeddings work can help you:

  • Improve accuracy

  • Reduce irrelevant results

  • Deliver human-like understanding

Comments

Popular posts from this blog

Extracting Tables and Text from Images Using Python

Semantic Search and Vector Database

Chain Component in LangChain