Vector Similarity Visualization¶
Run the Vector Similarity Visualization Fullscreen
About This MicroSim¶
This visualization demonstrates how word embeddings capture semantic relationships. Words with similar meanings cluster together in the embedding space, and cosine similarity measures how closely related two words are.
Iframe Embedding¶
<iframe src="https://dmccreary.github.io/Digital-Transformation-with-AI-Spring-2026/sims/vector-similarity/main.html"
height="652px"
width="100%"
scrolling="no">
</iframe>
How to Use¶
- Explore Clusters: Notice how semantically related words cluster together
- Click Two Words: Select any two words to calculate their similarity
- Compare Metrics: View cosine similarity and Euclidean distance
- Test Hypotheses: Try words from same vs. different categories
Understanding Word Embeddings¶
| Concept | Description |
|---|---|
| Embedding | Dense vector representation of a word |
| Dimension | Number of values in the vector (typically 300-1536) |
| Cosine Similarity | Measure of angle between vectors (0-1) |
| Semantic Space | Geometric space where meaning is encoded |
Cosine Similarity¶
Cosine similarity measures the angle between two vectors:
| Value | Interpretation |
|---|---|
| 0.8 - 1.0 | Very similar (synonyms, same category) |
| 0.6 - 0.8 | Related concepts |
| 0.4 - 0.6 | Loosely related |
| 0.0 - 0.4 | Unrelated or opposite |
Why Semantic Search Outperforms Keyword Matching¶
| Keyword Search | Semantic Search |
|---|---|
| Requires exact word match | Finds conceptually similar content |
| "car" won't find "automobile" | "car" finds "automobile", "vehicle" |
| Fails with synonyms | Understands synonymy |
| No context understanding | Captures meaning |
Learning Objectives¶
After using this tool, students should be able to:
- Understand (Bloom's L2): Explain how vector similarity captures semantic relationships
- Apply (Bloom's L3): Interpret cosine similarity values
- Analyze (Bloom's L4): Compare semantic search with keyword matching
Lesson Plan¶
Activity 1: Cluster Analysis (10 minutes)¶
- Identify the 5 semantic clusters in the visualization
- Predict which words will have highest similarity
- Test your predictions by clicking word pairs
Activity 2: Cross-Category Comparison (15 minutes)¶
- Find the highest similarity between words in DIFFERENT categories
- Find the lowest similarity between words in the SAME category
- Explain the results
Discussion Questions¶
- Why do words in the same category have higher similarity?
- What business problems can semantic search solve that keyword search cannot?
- How does embedding quality affect RAG system performance?
Applications in RAG Systems¶
| Component | Role of Embeddings |
|---|---|
| Document Chunking | Split documents into embeddable segments |
| Vector Storage | Store embeddings in vector database |
| Query Embedding | Convert user query to same vector space |
| Retrieval | Find chunks with highest similarity to query |
| Context Assembly | Provide relevant chunks to LLM |
Related Concepts¶
- Chapter 5: Custom GPTs, Agents, and RAG Systems
- Vector Database
- Retrieval Augmented Generation
- Embedding Models
References¶
- Mikolov, T., et al. (2013). Efficient Estimation of Word Representations in Vector Space. ICLR.
- Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global Vectors for Word Representation. EMNLP.
- Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. EMNLP.
Self-Assessment Quiz¶
Test your understanding of vector similarity and word embeddings.
Question 1: What is a word embedding?
- A physical object embedded in text
- A dense numerical vector that represents the meaning of a word
- A type of font style
- A grammar checking tool
Answer
B) A dense numerical vector that represents the meaning of a word - Word embeddings convert words into multi-dimensional vectors where semantic relationships are preserved as geometric relationships.
Question 2: What does cosine similarity measure?
- The physical distance between two objects
- The angle between two vectors, indicating how similar their directions are
- The size of two vectors
- The color difference between vectors
Answer
B) The angle between two vectors, indicating how similar their directions are - Cosine similarity measures the cosine of the angle between vectors, with values closer to 1 indicating more similar meanings.
Question 3: In a well-trained embedding space, what happens to words with similar meanings?
- They are placed far apart
- They cluster together in the vector space
- They are deleted
- They become identical
Answer
B) They cluster together in the vector space - Words with similar meanings (like "car" and "automobile") are positioned near each other in the embedding space.
Question 4: Why does semantic search outperform keyword matching?
- Semantic search is always faster
- Semantic search finds conceptually similar content even without exact word matches
- Keyword matching is illegal
- Semantic search uses less computing power
Answer
B) Semantic search finds conceptually similar content even without exact word matches - Semantic search using embeddings can find documents about "automobiles" when searching for "cars" because it understands meaning, not just word presence.
Question 5: How are vector embeddings used in RAG (Retrieval Augmented Generation) systems?
- They are not used in RAG
- They enable finding relevant document chunks based on semantic similarity to user queries
- They replace the language model
- They generate random content
Answer
B) They enable finding relevant document chunks based on semantic similarity to user queries - RAG systems embed both documents and queries into the same vector space, then retrieve chunks with high similarity to provide relevant context to the LLM.