AarnavNoble commited on
Commit
1f4cee2
Β·
verified Β·
1 Parent(s): e5a10da

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +198 -3
README.md CHANGED
@@ -1,10 +1,205 @@
1
  ---
2
  title: Roam
3
- emoji: 🐠
4
  colorFrom: blue
5
- colorTo: indigo
6
  sdk: docker
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Roam
 
3
  colorFrom: blue
4
+ colorTo: purple
5
  sdk: docker
6
  pinned: false
7
  ---
8
 
9
+ # roam
10
+
11
+ An AI-powered travel itinerary generator. Give it a destination, trip duration, transport mode, and your interests β€” it returns a day-by-day itinerary with stops ordered to minimize travel time.
12
+
13
+ Most "AI" travel apps are LLM wrappers: prompt GPT, display output. Roam builds the actual ML stack underneath.
14
+
15
+ ---
16
+
17
+ ## How it works
18
+
19
+ ```
20
+ User Input (destination, days, transport, goals)
21
+ β”‚
22
+ β–Ό
23
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
24
+ β”‚ RAG Retrieval β”‚ ← FAISS vector search over scraped Wikivoyage + Reddit content
25
+ β”‚ β”‚ sentence-transformers (all-MiniLM-L6-v2) embeddings
26
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
27
+ β”‚
28
+ β–Ό
29
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
30
+ β”‚ POI Fetcher β”‚ ← Overpass API (OpenStreetMap) β€” local places only, chains filtered
31
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
32
+ β”‚
33
+ β–Ό
34
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
35
+ β”‚ Preference β”‚ ← LightGBM LambdaRank model trained on (goal, POI, relevance) triplets
36
+ β”‚ Ranker β”‚ 8 features: semantic similarity + category match signals
37
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
38
+ β”‚
39
+ β–Ό
40
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
41
+ β”‚ VRP Optimizer β”‚ ← OR-Tools TSP with time windows β€” minimizes daily travel time
42
+ β”‚ β”‚ Assigns POIs across days, respects 10hr daily budget
43
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
44
+ β”‚
45
+ β–Ό
46
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
47
+ β”‚ LLM Synthesis β”‚ ← Groq (Llama 3.3 70B) generates natural language itinerary
48
+ β”‚ β”‚ from optimized route + retrieved travel context
49
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
50
+ β”‚
51
+ β–Ό
52
+ Mobile App (React Native + MapLibre)
53
+ ```
54
+
55
+ ---
56
+
57
+ ## ML Components
58
+
59
+ ### 1. RAG Pipeline (`backend/ml/rag/`)
60
+ Retrieval-Augmented Generation over real travel content β€” not just prompting an LLM blind.
61
+
62
+ - Scrapes Wikivoyage travel guides + Reddit trip reports per destination
63
+ - Chunks text into overlapping 512-word windows
64
+ - Embeds with `sentence-transformers/all-MiniLM-L6-v2` (384-dim, runs locally)
65
+ - Stores in FAISS flat index (cosine similarity via inner product on normalized vectors)
66
+ - At query time, retrieves top-5 semantically relevant chunks to ground the LLM
67
+
68
+ ### 2. Learning-to-Rank (`backend/ml/ranker/`)
69
+ A trained model that scores POIs against user goals β€” not keyword matching.
70
+
71
+ - **Features**: cosine similarity between goal embedding and POI description, category match signals (food/nature/history/nightlife), name specificity, tag richness
72
+ - **Model**: LightGBM with `lambdarank` objective β€” the same ranking approach used in production search engines (NDCG-optimized)
73
+ - **Training data**: synthetic (goal, POI list, relevance scores) scenarios covering 5 travel styles
74
+ - **Feedback hook**: stubbed for online learning β€” thumbs up/down signals can trigger incremental retraining
75
+
76
+ ### 3. VRP Route Optimizer (`backend/ml/optimizer/`)
77
+ Formulates itinerary generation as a constrained Vehicle Routing Problem β€” not just sorting by distance.
78
+
79
+ - Builds NxN travel time matrix (OpenRouteService API, Haversine fallback)
80
+ - Solves TSP per day using OR-Tools with time windows (opening hours) and visit duration constraints
81
+ - Greedy day assignment: spreads ranked POIs across trip days respecting 10-hour daily budget
82
+ - Returns estimated arrival times per stop
83
+
84
+ ---
85
+
86
+ ## Stack
87
+
88
+ | Layer | Tech |
89
+ |---|---|
90
+ | Mobile | React Native (Expo) + MapLibre |
91
+ | Backend | Python + FastAPI |
92
+ | Embeddings | sentence-transformers (`all-MiniLM-L6-v2`) |
93
+ | Vector Store | FAISS |
94
+ | Ranking | LightGBM LambdaRank |
95
+ | Route Optimization | Google OR-Tools (TSP/VRP) |
96
+ | LLM | Groq API (Llama 3.3 70B) |
97
+ | POI Data | OpenStreetMap / Overpass API |
98
+ | Routing | OpenRouteService |
99
+ | Geocoding | Nominatim |
100
+
101
+ Everything except Groq is free and open source. Groq has a free tier.
102
+
103
+ ---
104
+
105
+ ## Project Structure
106
+
107
+ ```
108
+ roam/
109
+ β”œβ”€β”€ backend/
110
+ β”‚ β”œβ”€β”€ api/
111
+ β”‚ β”‚ └── routes.py # FastAPI endpoints β€” wires full pipeline
112
+ β”‚ β”œβ”€β”€ ml/
113
+ β”‚ β”‚ β”œβ”€β”€ rag/
114
+ β”‚ β”‚ β”‚ β”œβ”€β”€ scraper.py # Wikivoyage + Reddit scraper
115
+ β”‚ β”‚ β”‚ β”œβ”€β”€ chunker.py # Overlapping text chunker
116
+ β”‚ β”‚ β”‚ β”œβ”€β”€ embedder.py # sentence-transformers encoding
117
+ β”‚ β”‚ β”‚ β”œβ”€β”€ vector_store.py # FAISS index build/save/load
118
+ β”‚ β”‚ β”‚ β”œβ”€β”€ retriever.py # Query-time retrieval
119
+ β”‚ β”‚ β”‚ └── build_pipeline.py # One-shot index builder
120
+ β”‚ β”‚ β”œβ”€β”€ ranker/
121
+ β”‚ β”‚ β”‚ β”œβ”€β”€ features.py # Feature extraction (embeddings + metadata)
122
+ β”‚ β”‚ β”‚ β”œβ”€β”€ model.py # LightGBM LambdaRank model
123
+ β”‚ β”‚ β”‚ β”œβ”€β”€ trainer.py # Training on synthetic data
124
+ β”‚ β”‚ β”‚ └── scorer.py # Runtime scoring + feedback hook
125
+ β”‚ β”‚ └── optimizer/
126
+ β”‚ β”‚ β”œβ”€β”€ distance.py # Travel time matrix (ORS + Haversine fallback)
127
+ β”‚ β”‚ β”œβ”€β”€ vrp.py # OR-Tools TSP solver with time windows
128
+ β”‚ β”‚ └── scheduler.py # Day assignment + route optimization
129
+ β”‚ β”œβ”€β”€ services/
130
+ β”‚ β”‚ β”œβ”€β”€ overpass.py # OSM POI fetcher (chains filtered)
131
+ β”‚ β”‚ β”œβ”€β”€ nominatim.py # Geocoding
132
+ β”‚ β”‚ └── groq_client.py # LLM synthesis
133
+ β”‚ └── main.py
134
+ β”œβ”€β”€ mobile/
135
+ β”‚ β”œβ”€β”€ app/
136
+ β”‚ β”‚ β”œβ”€β”€ index.tsx # Home screen (trip input)
137
+ β”‚ β”‚ └── itinerary.tsx # Results screen (list + map view)
138
+ β”‚ └── services/
139
+ β”‚ └── api.ts # Typed API client
140
+ └── data/ # FAISS index + trained model (gitignored)
141
+ ```
142
+
143
+ ---
144
+
145
+ ## Setup
146
+
147
+ ### Backend
148
+
149
+ ```bash
150
+ cd roam
151
+ python3 -m venv .venv && source .venv/bin/activate
152
+ pip install -r backend/requirements.txt
153
+
154
+ # Add your Groq API key (free at console.groq.com)
155
+ cp backend/.env.example backend/.env
156
+ # Edit backend/.env and set GROQ_API_KEY
157
+
158
+ # Build RAG index (scrapes + embeds ~8 cities, takes ~5 min)
159
+ python -m backend.ml.rag.build_pipeline
160
+
161
+ # Train the ranker
162
+ python -m backend.ml.ranker.trainer
163
+
164
+ # Start the API
165
+ uvicorn backend.main:app --reload
166
+ ```
167
+
168
+ ### Mobile
169
+
170
+ ```bash
171
+ cd mobile
172
+ npm install
173
+ cp .env.example .env
174
+ npx expo start
175
+ ```
176
+
177
+ Scan the QR code with **Expo Go** (iOS / Android). Phone and Mac must be on the same WiFi.
178
+
179
+ ---
180
+
181
+ ## API
182
+
183
+ ### `POST /api/itinerary`
184
+
185
+ ```json
186
+ {
187
+ "destination": "Tokyo",
188
+ "days": 3,
189
+ "transport": "walking",
190
+ "goals": ["food", "history", "hidden gems"]
191
+ }
192
+ ```
193
+
194
+ Returns a structured day-by-day itinerary with stops, arrival times, descriptions, and coordinates.
195
+
196
+ ### `POST /api/feedback`
197
+
198
+ ```json
199
+ {
200
+ "poi_id": 12345,
201
+ "relevant": true
202
+ }
203
+ ```
204
+
205
+ Logs positive/negative signals for future ranker retraining.