Spaces:

evalstate
/

hidden-gems

Runtime error

App Files Files Community

evalstate HF Staff commited on Feb 12

Commit

ddd502d

verified ·

1 Parent(s): 122ab90

Deploy hidden gems MCP server

Browse files

Files changed (4) hide show

Dockerfile +3 -23
README.md +80 -28
hidden-gems.md +24 -44
hidden_gems_tool.py +119 -176

Dockerfile CHANGED Viewed

@@ -1,38 +1,18 @@
 FROM python:3.13-slim
-# Install system dependencies required by fast-agent and HF Spaces
 RUN apt-get update && \
-    apt-get install -y \
-      bash \
-      git git-lfs \
-      wget curl procps \
-      && rm -rf /var/lib/apt/lists/*
-# Install uv for fast, reliable package management
 COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
-# Set working directory
 WORKDIR /app
-# Install fast-agent-mcp from PyPI
 RUN uv pip install --system --no-cache fast-agent-mcp
-# Copy all files from the Space repository to /app
 COPY --link ./ /app
-# Ensure /app is owned by uid 1000 (required for HF Spaces)
 RUN chown -R 1000:1000 /app
-# Switch to non-root user
 USER 1000
-# Expose port 7860 (HF Spaces default)
 EXPOSE 7860
-# Run fast-agent serve with token passthrough
-CMD ["fast-agent", "serve", \
-     "--card", "hidden-gems.md", \
-     "--transport", "http", \
-     "--instance-scope", "request", \
-     "--host", "0.0.0.0", \
-     "--port", "7860"]

 FROM python:3.13-slim
 RUN apt-get update && \
+    apt-get install -y bash git git-lfs wget curl procps && \
+    rm -rf /var/lib/apt/lists/*
 COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
 WORKDIR /app
 RUN uv pip install --system --no-cache fast-agent-mcp
 COPY --link ./ /app
 RUN chown -R 1000:1000 /app
 USER 1000
 EXPOSE 7860
+CMD ["fast-agent", "serve", "--card", "hidden-gems.md", "--transport", "http", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,43 +1,95 @@
 ---
-title: Hidden Gems Finder
-emoji: 💎
-colorFrom: purple
-colorTo: blue
 sdk: docker
 app_port: 7860
-license: mit
-short_description: Discover undervalued HF models with high engagement ratios
 ---
-# Hidden Gems Finder MCP Server
-An MCP server that helps you discover undervalued Hugging Face models. It finds models with high "likes per download" ratios - quality models that haven't blown up yet!
-## What it does
-The Hidden Gems Finder analyzes models from the Hugging Face Hub and identifies those where:
-- The community engagement (likes) is high relative to downloads
-- Downloads are in the "sweet spot" (not too obscure, not too popular)
-- Model quality signals suggest undervalued potential
-## Usage
-Connect to this MCP server to:
-- Find hidden gems across all model types
-- Search for gems in specific categories (text-generation, image-to-text, etc.)
-- Get detailed model information with gem scores
-- Export results as JSON
-## Algorithm
-The tool calculates a "gem score" based on:
-1. **Ratio** = likes per 1000 downloads (raw quality signal)
-2. **Age balance** - excludes very new (<14 days) and very old (>2 years) models
-3. **Download sweet spot** - boosts models with 100-10k downloads
-4. **Minimum thresholds** - filters out low-engagement models
-Higher scores indicate models that are loved by the community but haven't gone mainstream yet.
-## Configuration
-This Space uses token passthrough - clients authenticate with their own HF tokens.

 ---
+title: Hidden Gems - Undervalued HF Models
 sdk: docker
 app_port: 7860
 ---
+# 🔍 Hidden Gems Finder
+An MCP Server that discovers undervalued Hugging Face models with high likes-to-downloads ratios — quality models that haven't gone viral yet!
+## What are Hidden Gems?
+The **Hidden Gem Score** = Likes / Downloads
+- **High ratio** = Loved by the community, but not widely adopted (undervalued!)
+- **Low ratio** = Popular but mainstream (already discovered)
+This helps you find quality models before they blow up.
+## MCP Tools
+### `find_hidden_gems`
+Search for undervalued models with various filters:
+**Parameters:**
+- `limit` (int): Models to fetch from API (default: 100)
+- `min_downloads` (int): Filter out brand-new models (default: 100)
+- `top` (int): Number of results to return (default: 20)
+- `pipeline_tag` (str, optional): Filter by type like "text-generation", "image-to-text"
+- `sort_by` (str): Sort by "ratio" (default), "likes", "downloads", or "trending"
+**Example:** Find text generation gems
+```json
+{
+  "limit": 200,
+  "min_downloads": 100,
+  "top": 10,
+  "pipeline_tag": "text-generation",
+  "sort_by": "ratio"
+}
+```
+### `get_model_details`
+Get detailed information about a specific model:
+**Parameters:**
+- `model_id` (str): The model ID (e.g., "microsoft/DialoGPT-medium")
+**Example:**
+```json
+{"model_id": "openbmb/MiniCPM-SALA"}
+```
+## Usage Examples
+### Find Top 20 Hidden Gems
+```
+find_hidden_gems()
+```
+### Find Text Generation Gems
+```
+find_hidden_gems(pipeline_tag="text-generation", top=15)
+```
+### Deep Search
+```
+find_hidden_gems(limit=500, min_downloads=500, top=50, sort_by="likes")
+```
+### Check Specific Model
+```
+get_model_details("xai-org/grok-1")
+```
+## Connecting to Claude Desktop
+Add to your `claude_desktop_config.json`:
+```json
+{
+  "mcpServers": {
+    "hidden-gems": {
+      "command": "npx",
+      "args": ["-y", "mcp-remote", "https://evalstate-hidden-gems.hf.space/mcp"]
+    }
+  }
+}
+```
+## Environment Variables
+- `HF_TOKEN`: Optional Hugging Face token for higher API rate limits

hidden-gems.md CHANGED Viewed

@@ -1,65 +1,45 @@
 ---
 type: agent
 name: hidden-gems
-default: true
-description: |
-  Find undervalued Hugging Face models with high likes-to-downloads ratio.
-  Discover hidden gems - quality models that haven't blown up yet.
 function_tools:
   - hidden_gems_tool.py:find_hidden_gems
   - hidden_gems_tool.py:get_model_details
-  - hidden_gems_tool.py:list_model_types
-model: kimi
 ---
 # Hidden Gems Finder
-You are a specialized assistant that helps users discover undervalued Hugging Face models. You use the "hidden gems" algorithm to find models with high community engagement relative to their download count.
-## What Makes a "Hidden Gem"
-A hidden gem is a model where:
-- **Likes are high** relative to downloads (quality indicator)
-- **Downloads are in the sweet spot** (100-10k) - not too obscure, not too popular
-- **Age is balanced** - not brand new, not ancient
-- **Community loves it** but it hasn't gone mainstream yet
-## How to Help Users
-1. **Find gems** - Use `find_hidden_gems` to search for undervalued models
-2. **Filter by type** - Help users find gems in specific categories (text-generation, image-to-text, etc.)
-3. **Get details** - Use `get_model_details` for deep dives on specific models
-4. **Explain scores** - Interpret gem scores and ratios for users
-## Gem Score Interpretation
-- **Ratio**: Likes per 1000 downloads (higher = more loved per download)
-- **Gem Score**: Adjusted ratio accounting for age, downloads, and timing
-- **Score > 100**: Exceptional quality signal
-- **Score 50-100**: Strong hidden gem
-- **Score 20-50**: Worth investigating
-- **Score < 20**: Decent but more mainstream
-## Common Model Types
-- `text-generation` - LLMs and text completion models
-- `image-to-text` - Vision-language models, OCR
-- `text-classification` - Sentiment analysis, topic classification
-- `image-classification` - Vision models
-- `text-to-image` - Diffusion models, image generation
-- `text-to-speech` - TTS models
-- `automatic-speech-recognition` - ASR models
-- `audio-to-audio` - Audio processing models
 ## Response Style
-Be enthusiastic about discoveries! Present gems in a clear format:
-```
-💎 Hidden Gem: {model_id}
-   Score: {gem_score} | Ratio: {ratio} likes/1k downloads
-   Likes: {likes} | Downloads: {downloads}
-   Type: {pipeline_tag} | Age: {age_days} days
-```
-Help users understand WHY a model is a gem and what makes it special.

 ---
 type: agent
 name: hidden-gems
 function_tools:
   - hidden_gems_tool.py:find_hidden_gems
   - hidden_gems_tool.py:get_model_details
+default: true
+description: Discover undervalued Hugging Face models with high likes-to-downloads ratios - hidden gems that haven't gone viral yet
 ---
 # Hidden Gems Finder
+You are a specialized tool for discovering undervalued Hugging Face models. Your purpose is to help users find "hidden gems" - high-quality models that have received significant community appreciation (likes) relative to their download numbers.
+## Capabilities
+1. **Search for hidden gems** across different categories:
+   - Filter by minimum downloads to avoid brand-new models
+   - Filter by pipeline tag (text-generation, image-to-image, etc.)
+   - Sort by ratio (default), likes, downloads, or trending score
+2. **Get detailed information** about specific models
+## The "Hidden Gem Score"
+The key metric is **Likes / Downloads ratio**:
+- High ratio = Model is loved by the community but not widely adopted yet
+- Low ratio = Model is popular but mainstream
+This helps identify quality models before they blow up!
+## Usage Guidelines
+When users ask about hidden gems:
+1. Ask if they have a specific category/pipeline in mind
+2. Suggest appropriate filters based on their needs
+3. Present results in a clear, ranked format
+4. Highlight interesting findings with context
 ## Response Style
+- Be enthusiastic about discoveries
+- Explain why high-ratio models are valuable
+- Suggest next steps (trying the model, checking the model card, etc.)
+- Use data to back up recommendations

hidden_gems_tool.py CHANGED Viewed

@@ -1,205 +1,148 @@
-"""
-Hidden Gems Finder Tool
-Finds Hugging Face models with high likes-to-downloads ratio - undervalued
-quality models that haven't blown up yet.
-"""
-import json
 import os
-import urllib.request
-from datetime import datetime
-from typing import Any
-HF_API_BASE = "https://huggingface.co/api"
-def fetch_models(limit: int = 500, model_type: str | None = None) -> list[dict[str, Any]]:
-    """Fetch models from Hugging Face API."""
-    url = f"{HF_API_BASE}/models?limit={limit}"
-    if model_type:
-        url += f"&filter={model_type}"
-    headers = {}
-    if token := os.environ.get("HF_TOKEN"):
-        headers["Authorization"] = f"Bearer {token}"
-    req = urllib.request.Request(url, headers=headers)
-    with urllib.request.urlopen(req, timeout=60) as response:
-        return json.loads(response.read().decode("utf-8"))
-def parse_date(date_str: str) -> datetime:
-    """Parse ISO date string."""
-    return datetime.fromisoformat(date_str.replace("Z", "+00:00"))
-def calculate_gem_score(model: dict[str, Any]) -> dict[str, Any]:
-    """Calculate a 'hidden gem' score based on likes-to-downloads ratio."""
-    downloads = model.get("downloads", 0)
-    likes = model.get("likes", 0)
-    created_str = model.get("createdAt", "")
-    if not downloads or not likes or not created_str:
-        return {**model, "gem_score": 0.0, "ratio": 0.0}
-    # Calculate raw ratio (likes per 1000 downloads)
-    ratio = (likes / downloads) * 1000 if downloads > 0 else 0
-    # Parse creation date
-    created_at = parse_date(created_str)
-    age_days = (datetime.now(datetime.now().astimezone().tzinfo) - created_at).days
-    # Penalize very new models (< 14 days)
-    newness_penalty = max(0, (14 - age_days) / 14) if age_days < 14 else 0
-    # Penalize very old models (> 2 years)
-    age_penalty = max(0, (age_days - 730) / 365) if age_days > 730 else 0
-    # Penalize very low downloads (< 100)
-    download_penalty = max(0, (100 - downloads) / 100) if downloads < 100 else 0
-    # Boost for moderate downloads (100-10k)
-    download_boost = 1.0
-    if 100 <= downloads <= 10000:
-        download_boost = 1.2
-    elif downloads > 100000:
-        download_boost = 0.7
-    # Calculate adjusted gem score
-    base_score = ratio * download_boost
-    gem_score = base_score * (1 - newness_penalty) * (1 - age_penalty) * (1 - download_penalty)
-    return {
-        "id": model["id"],
-        "likes": likes,
-        "downloads": downloads,
-        "ratio": round(ratio, 2),
-        "gem_score": round(gem_score, 2),
-        "pipeline_tag": model.get("pipeline_tag") or "unknown",
-        "created_at": created_str,
-        "age_days": age_days,
-        "tags": model.get("tags", [])[:5],
-    }
 def find_hidden_gems(
-    limit: int = 500,
-    top: int = 10,
-    model_type: str | None = None,
-    min_likes: int = 10,
-    min_downloads: int = 50,
-    max_downloads: int = 500000,
 ) -> str:
     """
-    Find hidden gem models on Hugging Face Hub.
     Args:
-        limit: Number of models to fetch from API (default: 500)
-        top: Number of top gems to return (default: 10)
-        model_type: Filter by model type (e.g., 'text-generation', 'image-to-text')
-        min_likes: Minimum likes required (default: 10)
-        min_downloads: Minimum downloads required (default: 50)
-        max_downloads: Maximum downloads to exclude popular models (default: 500000)
     Returns:
-        JSON string with list of hidden gem models, sorted by gem score
     """
     try:
-        models = fetch_models(limit, model_type)
-        # Calculate gem scores
-        scored_models = [calculate_gem_score(m) for m in models]
-        # Filter by criteria
-        filtered = [
-            m for m in scored_models
-            if m["likes"] >= min_likes
-            and m["downloads"] >= min_downloads
-            and m["downloads"] <= max_downloads
-            and m["gem_score"] > 0
-        ]
-        # Sort by gem score (descending)
-        gems = sorted(filtered, key=lambda x: x["gem_score"], reverse=True)[:top]
-        result = {
-            "total_analyzed": len(models),
-            "candidates": len(filtered),
-            "gems_found": len(gems),
-            "model_type_filter": model_type,
-            "gems": gems,
-        }
-        return json.dumps(result, indent=2)
-    except Exception as e:
-        return json.dumps({"error": str(e)}, indent=2)
 def get_model_details(model_id: str) -> str:
     """
-    Get detailed information about a specific model.
     Args:
-        model_id: The model ID (e.g., 'microsoft/bitnet-b1.58-2B-4T')
     Returns:
-        JSON string with model details
     """
     try:
-        url = f"{HF_API_BASE}/models/{model_id}"
-        headers = {}
-        if token := os.environ.get("HF_TOKEN"):
-            headers["Authorization"] = f"Bearer {token}"
-        req = urllib.request.Request(url, headers=headers)
-        with urllib.request.urlopen(req, timeout=30) as response:
             model = json.loads(response.read().decode("utf-8"))
-        scored = calculate_gem_score(model)
-        # Add extra details
-        details = {
-            **scored,
-            "library_name": model.get("library_name"),
-            "config": model.get("config", {}),
-            "siblings_count": len(model.get("siblings", [])),
-            "card_data": model.get("cardData", {}),
-        }
-        return json.dumps(details, indent=2)
     except Exception as e:
-        return json.dumps({"error": str(e)}, indent=2)
-def list_model_types() -> str:
-    """
-    List common model types/pipeline tags available for filtering.
-    Returns:
-        JSON string with list of model types
-    """
-    common_types = [
-        "text-generation",
-        "text-classification",
-        "image-to-text",
-        "image-classification",
-        "text-to-image",
-        "text-to-speech",
-        "automatic-speech-recognition",
-        "question-answering",
-        "summarization",
-        "translation",
-        "feature-extraction",
-        "sentence-similarity",
-        "token-classification",
-        "zero-shot-classification",
-        "audio-to-audio",
-        "video-classification",
-        "image-to-image",
-    ]
-    return json.dumps({"model_types": common_types}, indent=2)

+"""Hidden Gems Tool - Find undervalued Hugging Face models."""
 import os
+import json
+from typing import Optional
+from urllib.request import urlopen, Request
+from urllib.error import HTTPError
 def find_hidden_gems(
+    limit: int = 100,
+    min_downloads: int = 100,
+    top: int = 20,
+    pipeline_tag: Optional[str] = None,
+    sort_by: str = "ratio"
 ) -> str:
     """
+    Find hidden gem models with high likes-to-downloads ratios.
     Args:
+        limit: Number of models to fetch from API (default: 100)
+        min_downloads: Minimum downloads to filter out very new models (default: 100)
+        top: Number of top results to return (default: 20)
+        pipeline_tag: Filter by pipeline tag like "text-generation", "image-to-text" (optional)
+        sort_by: Sort results by "ratio" (default), "likes", "downloads", or "trending"
     Returns:
+        JSON string with list of hidden gem models ranked by score
     """
+    # Build API URL
+    api_url = f"https://huggingface.co/api/models?limit={limit}"
+    if pipeline_tag:
+        api_url += f"&pipeline_tag={pipeline_tag}"
+    # Fetch models
+    headers = {}
+    token = os.environ.get("HF_TOKEN")
+    if token:
+        headers["Authorization"] = f"Bearer {token}"
     try:
+        req = Request(api_url, headers=headers)
+        with urlopen(req, timeout=60) as response:
+            models = json.loads(response.read().decode("utf-8"))
+    except HTTPError as e:
+        return json.dumps({"error": f"API error: {e.code} - {e.reason}"})
+    except Exception as e:
+        return json.dumps({"error": f"Failed to fetch models: {str(e)}"})
+    # Calculate hidden gem scores
+    results = []
+    for model in models:
+        likes = model.get("likes")
+        downloads = model.get("downloads")
+        if likes is None or downloads is None:
+            continue
+        if downloads < min_downloads:
+            continue
+        ratio = likes / downloads if downloads > 0 else 0
+        results.append({
+            "id": model.get("id"),
+            "likes": likes,
+            "downloads": downloads,
+            "ratio": round(ratio, 6),
+            "pipeline_tag": model.get("pipeline_tag") or "unknown",
+            "library_name": model.get("library_name") or "unknown",
+            "createdAt": model.get("createdAt") or "unknown",
+            "trendingScore": model.get("trendingScore") or 0,
+            "tags": model.get("tags", [])
+        })
+    # Sort results
+    sort_key = {
+        "likes": "likes",
+        "downloads": "downloads",
+        "trending": "trendingScore"
+    }.get(sort_by, "ratio")
+    results.sort(key=lambda x: x[sort_key], reverse=True)
+    # Take top N
+    top_results = results[:top]
+    return json.dumps({
+        "count": len(results),
+        "showing": len(top_results),
+        "filters": {
+            "min_downloads": min_downloads,
+            "pipeline_tag": pipeline_tag,
+            "sort_by": sort_by
+        },
+        "gems": top_results
+    }, indent=2)
 def get_model_details(model_id: str) -> str:
     """
+    Get detailed information about a specific Hugging Face model.
     Args:
+        model_id: The model ID (e.g., "microsoft/DialoGPT-medium")
     Returns:
+        JSON string with detailed model information
     """
+    api_url = f"https://huggingface.co/api/models/{model_id}"
+    headers = {}
+    token = os.environ.get("HF_TOKEN")
+    if token:
+        headers["Authorization"] = f"Bearer {token}"
     try:
+        req = Request(api_url, headers=headers)
+        with urlopen(req, timeout=30) as response:
             model = json.loads(response.read().decode("utf-8"))
+    except HTTPError as e:
+        if e.code == 404:
+            return json.dumps({"error": f"Model '{model_id}' not found"})
+        return json.dumps({"error": f"API error: {e.code} - {e.reason}"})
     except Exception as e:
+        return json.dumps({"error": f"Failed to fetch model: {str(e)}"})
+    # Calculate hidden gem score
+    likes = model.get("likes", 0)
+    downloads = model.get("downloads", 0)
+    ratio = likes / downloads if downloads > 0 else 0
+    result = {
+        "id": model.get("id"),
+        "likes": likes,
+        "downloads": downloads,
+        "hidden_gem_score": round(ratio, 6),
+        "pipeline_tag": model.get("pipeline_tag") or "unknown",
+        "library_name": model.get("library_name") or "unknown",
+        "tags": model.get("tags", []),
+        "createdAt": model.get("createdAt"),
+        "lastModified": model.get("lastModified"),
+        "cardExists": model.get("cardExists", False),
+        "widgetData": model.get("widgetData", []),
+        "siblings": [s.get("rfilename") for s in model.get("siblings", [])[:10]],
+        "description": (model.get("cardData") or {}).get("tags", [])
+    }
+    return json.dumps(result, indent=2)