File size: 3,453 Bytes
f20ae83 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | import streamlit as st
import pandas as pd
import numpy as np
from sklearn.neighbors import NearestNeighbors
# Load data
df = pd.read_csv("cities.csv")
df_preprocessed = np.loadtxt("df_processed.csv", delimiter=',') # exported after preprocessing step in lecture example
st.title("π City Similarity Recommender")
# City selection
selected_places = st.multiselect("Select one or more cities you like:", options=df['place'].tolist())
# Optional sub-region filter
subregion_options = ['All'] + sorted(df['sub-region'].unique().tolist())
selected_subregion = st.selectbox("Optionally filter recommended city by sub-region:", options=subregion_options)
if selected_places:
# Get indices of selected cities
selected_indices = df[df['place'].isin(selected_places)].index.tolist()
# Compute the average feature vector
avg_vector = df_preprocessed[selected_indices].mean(axis=0).reshape(1, -1)
# Apply sub-region filtering if selected
if selected_subregion != 'All':
candidate_mask = (df['sub-region'] == selected_subregion)
candidate_mask[selected_indices] = False
else:
candidate_mask = np.ones(len(df), dtype=bool)
candidate_mask[selected_indices] = False
candidate_indices = np.where(candidate_mask)[0]
candidate_vectors = df_preprocessed[candidate_indices]
# Use NearestNeighbors to find top 3 closest
nn = NearestNeighbors(n_neighbors=3, metric='euclidean')
nn.fit(candidate_vectors)
distances, indices = nn.kneighbors(avg_vector)
top_indices = [candidate_indices[i] for i in indices[0]]
st.subheader("π Top 3 Similar Cities")
for rank, idx in enumerate(top_indices, 1):
city = df.iloc[idx]
st.markdown(f"### {rank}. {city['place']}")
st.markdown(f"Country: {city['alpha-2']}")
st.markdown(f"Region: {city['region']}")
st.markdown(f"Sub-region: {city['sub-region']}")
st.markdown(f"Similarity score (Euclidean distance): {round(distances[0][rank - 1], 4)}")
st.markdown("---")
else:
st.info("Please select at least one city to get a recommendation.")
# --- Explanatory Sections ---
with st.expander("π§ Technique Used"):
st.markdown("""
This recommender uses **unsupervised learning techniques** to find the most similar cities based on selected examples. Here's how it works:
- The app uses **scaled numerical features** that describe cities in terms of cost, safety, infrastructure, and lifestyle.
- When you select cities you like, their feature vectors are averaged.
- We then use **k-Nearest Neighbors (k-NN)** to find the top 3 most similar cities based on **Euclidean distance**.
- Optionally, results can be limited to a **specific sub-region**.
""")
with st.expander("π Why this is Unsupervised Learning"):
st.markdown("""
This approach is a classic example of **unsupervised learning (UML)**:
- There are **no labels or targets** provided β we donβt tell the model what to predict.
- Instead, we explore the **structure of the data** to find **similarities** and groupings.
- The recommendation is based purely on feature-based proximity β no training labels are used.
Techniques like clustering and similarity search (like k-NN) are key tools in the UML toolbox β making this system a real-life application of UML.
""")
|