Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.5.1
title: RecommendationSystemForApps
emoji: 😻
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
macOS App Recommendation System
Submitted by Lior Feinstein & Karin Mamaev
Video Presentation
A semantic search engine for the Mac App Store that recommends applications based on natural language user prompts.
Project Goal
The primary objective of this project is to build a recommendation system that helps users discover macOS applications by describing their needs in natural language (e.g., "I need a productivity app to organize my homework"). Unlike traditional keyword search, this system utilizes semantic embeddings to understand the intent and context of the user's request.
Dataset Overview
We utilized the mac-app-store-apps-metadata dataset sourced from Hugging Face.
- Source: macpaw-research/mac-app-store-apps-metadata
- Collection Period: Dec 2023 – Jan 2024 (via public iTunes Search API)
- Scale: Over 87,000 unique apps (Filtered to 31,438 US-store configurations for this project).
- Key Features:
trackName: Official application title.description: Full feature explanation (primary source for Text Recommendations).primaryGenreName: Main category (e.g., Productivity, Utilities).artworkUrl512: URL for the 512x512 app icon.averageUserRating: Performance metric (0.0 - 5.0).
Data Cleaning & Preprocessing
Before analysis, the raw data underwent rigorous cleaning:
- Missing Values:
PriceNaNs were filled with0, assuming unlisted prices indicated free apps. - Dimensionality Reduction: Dropped irrelevant columns (e.g.,
advisories,isGameCenterEnabled,bundleId,sellerName) to focus on content-relevant features.
Outlier Analysis
We analyzed numerical outliers to distinguish between data errors and legitimate anomalies.
1. File Size:
- Observation: Extreme values were retained. Modern software naturally ranges from kilobytes (tiny utility scripts) to gigabytes (professional video editors/games). These are not errors but accurate representations of the ecosystem.
- Observation: High-value outliers were retained. These represent legitimate niche professional software which naturally carries a higher cost.
Exploratory Data Analysis (EDA)
Correlations & Distribution
We generated a correlation heatmap to identify linear relationships between features.
- Finding: No strong linear correlations were detected between most metadata features, suggesting that non-linear methods (like embeddings) are necessary for recommendations.
App Characteristics
- Most applications in the store are lightweight utilities.
Temporal Trends:
This line chart displays the monthly number of app releases for the top five genres over time, illustrating the growth trends and release frequency in the macOS market.
We can see the starting from 2019 the release of applications has seemingly grown.
Interesting to notice that there are no apps younger than 2 years old, and no apps older than 17 years old.
Insights:
The histogram illustrates the distribution of average user ratings, revealing a distinct left-skewed pattern where the vast majority of apps are concentrated in the 4 to 5-star range. The data indicates that the mode is a 5-star rating, with over 3,000 apps achieving this score, which suggests that users in this dataset are overwhelmingly satisfied with the applications. Conversely, ratings between 1 and 3 are relatively rare, implying that poorly rated apps are outliers within this sample
The chart shows a clear decline in average user ratings as the time since the last update increases. Apps updated most recently (within 2 years) have the highest ratings, while very old apps (over 3 years since last update) receive the lowest average ratings, suggesting that regular updates are associated with better user satisfaction
The plot presents that apps tend to receive higher average ratings during their mid-lifecycle (around 3–6 years after release), while very new and much older apps generally have lower ratings.
The heatmap reveals that popularity is a stronger predictor of high user satisfaction than price, with the highest average ratings consistently appearing in the "Very High" popularity tier across all price points. Notably, the most expensive apps ($20+) that achieve high popularity reach the peak average rating of 4.82, suggesting that premium products meeting high demand offer the greatest perceived value.
Embeddings & Feature Engineering
1. Image Embeddings (Visual Search)
- Model: CLIP (Contrastive Language-Image Pre-Training)
- Process: We utilized the CLIP model to generate embeddings for the app icons. To facilitate iterative testing, we implemented a persistent caching system using Google Drive. This pipeline checks for existing files via MD5 hashes to prevent redundant downloads.
- Optimization: Only missing images were fetched and resized to 224x224 pixels, drastically reducing wait times.
- Total Embeddings: 31,408 processed images.
2. Text Embeddings (Semantic Search)
- Model:
all-MiniLM-L6-v2 - Feature Engineering: Created a "Master Text" column by concatenating
trackName,description,genres, andcategory. - Why this model? It is lightweight and specifically optimized to identify deep semantic relationships, projecting user queries into a 384-dimensional latent space.
Clustering Analysis
We employed unsupervised learning to validate that our embeddings captured semantic meaning.
- Algorithm: K-Means (Chosen over DBSCAN due to the "Curse of Dimensionality" in 384-dim text data).
- Optimization: we selected K=10 based on the Kmeans elbow and Schilott Score.
t-SNE Visualization
- Figure A (Full Projection): Shows natural boundary overlap (characteristic of semantic text).
- Figure B (Core-Sampled): We filtered for "Cluster Cores" (top 50% confidence). This revealed distinct, highly separated functional archetypes.
Cluster Validation: How We Know It Worked
We validated the quality of our model by analyzing the semantic coherence of the generated clusters. The results confirmed that the model successfully captured fine-grained distinctions that go beyond standard App Store categories:
- Granular Separation of "Productivity": The model successfully deconstructed the broad "Productivity" genre into distinct functional use cases:
- Cluster 0 (Document Management): Aggregated tools specifically for file manipulation (e.g.,
TinyPDF Pro,Duplicate File Finder). - Cluster 1 (Time & Focus): Formed a completely separate group for temporal tools (e.g.,
Nice Timer 2,Focus Clock).
- Cluster 0 (Document Management): Aggregated tools specifically for file manipulation (e.g.,
- Modality Awareness: The embeddings demonstrated precise awareness of media types by completely separating Audio, Video, and Graphics apps into their own exclusive clusters.
Conclusion: This proves the model understands that "managing files" and "managing time" are semantically different concepts, even though they share the same store category. The clusters are coherent, intuitive, and provide a robust foundation for the recommendation engine.
Final Result-Reccomendation App
How it Works
- Input: User provides a natural language query (e.g., "I need a tool to edit videos").
- Processing: The input is transformed using the
all-MiniLM-L6-v2model. - Similarity Search: We calculate Cosine Similarity between the user's query vector and the pre-computed dataset vectors.
- Output: The system returns the top 3 apps with the highest similarity scores.
Deployment
The final application was deployed to Hugging Face Spaces using Gradio for the user interface.
Link to app: RecommendationSystemForApps








