Jobly / RAG_ARCHITECTURE.md
Valentina9502's picture
First commit
fdf5af0 verified
# 🧠 RAG Architecture & Vector Embeddings
## Overview
GigMatch AI uses **Retrieval-Augmented Generation (RAG)** with **vector embeddings** to perform intelligent semantic matching between workers and gigs. This goes far beyond simple keyword matching!
## 🏗️ Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ DATA INGESTION │
├─────────────────────────────────────────────────────────────┤
│ 50 Workers + 50 Gigs (JSON) │
│ ↓ │
│ Text Enrichment (skills, bio, location, etc.) │
│ ↓ │
│ HuggingFace Embeddings (all-MiniLM-L6-v2) │
│ ↓ │
│ Vector Storage (ChromaDB) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ QUERY PIPELINE │
├─────────────────────────────────────────────────────────────┤
│ User Query (worker profile or gig post) │
│ ↓ │
│ Convert to Search Query │
│ ↓ │
│ Embed Query (HuggingFace) │
│ ↓ │
│ Semantic Search (Vector Similarity) │
│ ↓ │
│ Retrieve Top K Results │
│ ↓ │
│ Calculate Match Scores │
│ ↓ │
│ Return Results to Agent │
└─────────────────────────────────────────────────────────────┘
```
## 🦙 LlamaIndex Integration
### Why LlamaIndex?
1. **Sponsor Recognition** - LlamaIndex is a hackathon sponsor 🎉
2. **Production-Ready** - Battle-tested RAG framework
3. **Easy Integration** - Simple API for vector operations
4. **Flexible** - Supports multiple vector stores and embeddings
### Implementation
```python
from llama_index.core import VectorStoreIndex, Document
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
# Initialize embedding model
embed_model = HuggingFaceEmbedding(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
# Create documents with rich text
worker_doc = Document(
text=f"Name: {name}, Skills: {skills}, Location: {location}...",
metadata=worker_data
)
# Create vector index
index = VectorStoreIndex.from_documents(
documents,
vector_store=vector_store
)
# Query
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("Looking for plumber in Rome...")
```
## 🤗 HuggingFace Embeddings
### Model: all-MiniLM-L6-v2
**Why this model?**
- ✅ Fast inference (only 23M parameters)
- ✅ Good quality embeddings (384 dimensions)
- ✅ Pre-trained on semantic similarity
- ✅ HuggingFace sponsor recognition 🤗
**Performance:**
- Embedding time: ~20ms per text
- Vector size: 384 dimensions
- Cosine similarity for matching
### How Embeddings Work
1. **Text → Vector**: Each worker/gig is converted to a 384-dimensional vector
2. **Semantic Meaning**: Similar meanings = similar vectors
3. **Cosine Similarity**: Measure angle between vectors (0-1 score)
4. **Top K**: Return K most similar vectors
**Example:**
```python
text1 = "Experienced plumber, pipe repair, Rome"
text2 = "Looking for plumbing services, leak fix, Rome"
# After embedding:
vec1 = [0.23, -0.45, 0.67, ...] # 384 dimensions
vec2 = [0.21, -0.43, 0.69, ...] # 384 dimensions
# Cosine similarity: 0.94 (very similar!)
```
## 📊 ChromaDB Vector Store
### Why ChromaDB?
- ✅ Simple local setup (no server needed)
- ✅ Fast vector search
- ✅ Native Python API
- ✅ Persistence support
- ✅ Perfect for demo/hackathon
### Collections
**Workers Collection:**
- 50 worker profiles
- Indexed by skills, experience, location
- Searchable by semantic similarity
**Gigs Collection:**
- 50 gig posts
- Indexed by requirements, project details
- Searchable by semantic similarity
## 🎯 Semantic Matching Algorithm
### Traditional Keyword Matching (OLD)
```python
# Problem: Only finds exact keyword matches
if "plumbing" in worker_skills and "plumbing" in gig_requirements:
score += 1 # Match!
```
### Semantic Matching with RAG (NEW)
```python
# Solution: Understands meaning and context
Query: "Need someone to fix leaking pipes"
Embedding: [0.23, -0.45, 0.67, ...]
Worker 1: "Plumber, pipe repair specialist"
Embedding: [0.21, -0.43, 0.69, ...]
Similarity: 0.94 ← HIGH MATCH!
Worker 2: "Electrician, wiring expert"
Embedding: [-0.11, 0.52, -0.33, ...]
Similarity: 0.12 ← LOW MATCH
# Semantic search finds Worker 1 even though
# the word "plumbing" wasn't explicitly mentioned!
```
### Advantages
1. **Synonym Understanding**: "plumber" ≈ "pipe specialist"
2. **Context Awareness**: "fix pipes" ≈ "repair plumbing"
3. **Related Concepts**: "garden" ≈ "landscaping" ≈ "outdoor"
4. **Multi-language**: Can handle slight variations
5. **Fuzzy Matching**: Typos and variations still work
## 🔬 Match Score Calculation
### Components
1. **Semantic Similarity** (70% weight)
- Cosine similarity from vector embeddings
- Range: 0.0 to 1.0
- Higher = better semantic match
2. **Keyword Overlap** (20% weight)
- Exact skill matches
- Experience level alignment
- Calculated as: matched_skills / required_skills
3. **Location Match** (10% weight)
- Geographic proximity
- Remote work consideration
- Binary: 1.0 (same location/remote) or 0.5 (different)
### Final Formula
```python
semantic_score = cosine_similarity(query_vec, doc_vec)
keyword_score = len(matched_skills) / len(required_skills)
location_score = 1.0 if location_match else 0.5
final_score = (
semantic_score * 0.7 +
keyword_score * 0.2 +
location_score * 0.1
) * 100 # Convert to 0-100 scale
```
## 📈 Performance & Scalability
### Current Setup (Demo)
- 50 workers + 50 gigs = 100 vectors
- Average query time: ~100ms
- Embedding model loaded in memory: ~100MB
- Total memory usage: ~200MB
### Production Scaling
**For 10,000 entries:**
- ✅ Still fast (<500ms per query)
- ✅ ChromaDB handles easily
- ✅ Consider batch embedding for ingestion
**For 100,000+ entries:**
- Use hosted vector DB (Pinecone, Weaviate)
- Batch processing for embeddings
- Caching layer for frequent queries
- GPU acceleration for embedding
## 🎨 Benefits for the Hackathon
### Why This is WOW
1. **Not Just LLM Calls**: Real vector database with semantic search
2. **Sponsor Integration**: LlamaIndex 🦙 + HuggingFace 🤗
3. **Production Patterns**: Proper RAG architecture
4. **Scalable**: Easy to extend to 1000s of entries
5. **Explainable**: Can show similarity scores
### Demo Impact
Judges will see:
- ✅ "Powered by LlamaIndex + HuggingFace"
- ✅ Semantic similarity scores in results
- ✅ Better matches than keyword search
- ✅ 100 entries in vector database
- ✅ Real-time vector search
## 🔮 Future Enhancements
### Easy Wins
- [ ] Add filters (location, budget, experience)
- [ ] Implement hybrid search (semantic + keyword)
- [ ] Add reranking with cross-encoders
- [ ] Cache popular queries
### Advanced
- [ ] Fine-tune embedding model on gig data
- [ ] Multi-modal embeddings (add images)
- [ ] Graph relationships between skills
- [ ] Temporal embeddings (availability matching)
## 📚 Code Examples
### Creating the Index
```python
# 1. Load data
workers = load_workers_from_json()
# 2. Create documents
documents = []
for worker in workers:
text = f"""
Name: {worker['name']}
Skills: {', '.join(worker['skills'])}
Experience: {worker['experience']}
Location: {worker['location']}
"""
doc = Document(text=text, metadata=worker)
documents.append(doc)
# 3. Create vector store
chroma_collection = chroma_client.create_collection("workers")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
# 4. Build index
index = VectorStoreIndex.from_documents(
documents,
vector_store=vector_store
)
```
### Querying the Index
```python
# 1. Create query
query = f"""
Looking for: {', '.join(required_skills)}
Location: {location}
Experience: {experience_level}
"""
# 2. Get query engine
query_engine = index.as_query_engine(similarity_top_k=5)
# 3. Execute query
response = query_engine.query(query)
# 4. Extract results
for node in response.source_nodes:
worker_data = node.metadata
similarity_score = node.score
print(f"Match: {worker_data['name']}, Score: {similarity_score}")
```
## 🎯 Key Takeaways
1. **RAG = Better Matches**: Semantic understanding > keyword matching
2. **LlamaIndex = Easy**: Production RAG in <100 lines of code
3. **HuggingFace = Quality**: Great embeddings, sponsor recognition
4. **ChromaDB = Fast**: Local vector store, perfect for demo
5. **Scalable = Future-proof**: Architecture works at scale
---
**This is what makes GigMatch AI stand out in the hackathon!** 🚀