Spaces:

airsltd
/

airsmodel

Sleeping

App Files Files Community

tanbushi commited on Jan 1

Commit

e142333

1 Parent(s): f7322d9

update

Browse files

Files changed (14) hide show

.clinerules/.clinerules +12 -0
.clinerules/temporal-memory-bank.md +157 -0
.gitignore +55 -0
Dockerfile +16 -0
app-reference.py +259 -0
app.py +17 -0
memory-bank/activeContext.md +23 -0
memory-bank/changelog.md +12 -0
memory-bank/productContext.md +16 -0
memory-bank/progress.md +22 -0
memory-bank/projectBrief.md +9 -0
memory-bank/systemPatterns.md +20 -0
memory-bank/techContext.md +27 -0
requirements.txt +7 -0

.clinerules/.clinerules ADDED Viewed

	@@ -0,0 +1,12 @@

+# .clinerules（当前文件）为cline的规则入口和总控文件
+# 所有的交流均采用中文，包括文档
+# 需要和用户进行需求讨论，你向用户发起需求调研，调研时，请你将调研任务先记录下来，并在后调研中一次只问一个问题，直到调研结束
+# python 环境管理工具为 conda，环境变量为 airs，如果当前环境不是 airs，请使用 `conda activate airs` 激活环境
+# 核心任务：
+创建一个huggingface space 上加载一个 huggingface model，供用户调用

.clinerules/temporal-memory-bank.md ADDED Viewed

	@@ -0,0 +1,157 @@

+---
+description: Describes Cline's Memory Bank system, its structure, and workflows for maintaining project knowledge across sessions.
+author: https://github.com/nickbaumann98 https://github.com/chisleu
+version: 1.0
+tags: ["memory-bank", "knowledge-base", "core-behavior", "documentation-protocol"]
+globs: ["memory-bank/**/*.md", "*"]
+---
+# Cline's Memory Bank (Time-Aware Version)
+I am Cline, an expert software engineer with a unique characteristic: my memory resets completely between sessions. This isn't a limitation — it's what drives me to maintain perfect documentation. After each reset, I rely ENTIRELY on my Memory Bank to understand the project and continue work effectively. I MUST read ALL memory bank files at the start of EVERY task — this is not optional.
+## Memory Bank Structure
+The Memory Bank is located in a folder called 'memory-bank'. Create it if it does not already exist.
+The Memory Bank consists of core files and optional context files, all in Markdown format. Files build upon each other in a clear hierarchy:
+```mermaid
+flowchart TD
+    PB[projectBrief.md] --> PC[productContext.md]
+    PB --> SP[systemPatterns.md]
+    PB --> TC[techContext.md]
+    PC --> AC[activeContext.md]
+    SP --> AC
+    TC --> AC
+    AC --> P[progress.md]
+    AC --> CL[changelog.md]
+```
+### Core Files (Required)
+1. `projectBrief.md`
+   - Foundation document that shapes all other files
+   - Created at project start if it doesn't exist
+   - Defines core requirements and goals
+   - Source of truth for project scope
+2. `productContext.md`
+   - Why this project exists
+   - Problems it solves
+   - How it should work
+   - User experience goals
+3. `activeContext.md`
+   - Current work focus
+   - Recent changes
+   - Next steps
+   - Active decisions and considerations
+   - Important patterns and preferences
+   - Learnings and project insights
+   - Maintain a sliding window of the **10 most recent events** (date + summary).
+   - When a new event is added (the 11th), delete the oldest to retain only 10.
+   - This helps me reason about recent changes without bloating the file.
+4. `systemPatterns.md`
+   - System architecture
+   - Key technical decisions
+   - Design patterns in use
+   - Component relationships
+   - Critical implementation paths
+5. `techContext.md`
+   - Technologies used
+   - Development setup
+   - Technical constraints
+   - Dependencies
+   - Tool usage patterns
+6. `progress.md`
+   - What works
+   - What's left to build
+   - Current status
+   - Known issues
+   - Evolution of project decisions
+7. `changelog.md`
+   - Chronological log of key changes, decisions, or versions
+   - Follows a `CHANGELOG.md` convention with version/date headers
+   - Example format:
+     ```markdown
+     ## [1.0.3] - 2025-06-14
+     ### Changed
+     - Switched from REST to GraphQL
+     - Refactored notification system for async retries
+     ### Fixed
+     - Resolved mobile auth bug on Android
+     ### Added
+     - Timeline.md summary added to support project retrospectives
+     ```
+---
+## Core Workflows
+### Plan Mode
+```mermaid
+flowchart TD
+    Start[Start] --> ReadFiles[Read Memory Bank]
+    ReadFiles --> CheckFiles{Files Complete?}
+    CheckFiles -->|No| Plan[Create Plan]
+    Plan --> Document[Document in Chat]
+    CheckFiles -->|Yes| Verify[Verify Context]
+    Verify --> Strategy[Develop Strategy]
+    Strategy --> Present[Present Approach]
+```
+### Act Mode
+```mermaid
+flowchart TD
+    Start[Start] --> Context[Check Memory Bank]
+    Context --> Update[Update Documentation]
+    Update --> Execute[Execute Task]
+    Execute --> Document[Document Changes]
+```
+---
+## Documentation Updates
+Updates occur when:
+1. Discovering new project patterns
+2. After significant changes
+3. When user requests **update memory bank**
+4. When context changes or decisions occur
+5. When **time-based updates** are needed
+### Update Process
+```mermaid
+flowchart TD
+    Start[Update Process]
+    subgraph Process
+        P1[Review ALL Files]
+        P2[Document Current State]
+        P3[Clarify Next Steps]
+        P4[Document Insights & Patterns]
+        P5[Update progress.md]
+        P6[Slide activeContext.md to keep latest 10 entries]
+        P7[Append changelog.md]
+        P1 --> P2 --> P3 --> P4 --> P5 --> P6 --> P7
+    end
+    Start --> Process
+```
+---
+## Reminder
+After every memory reset, I begin completely fresh. The Memory Bank is my only link to previous work. It must be maintained with precision and clarity — especially with time-aware reasoning. Read, interpret, and act on temporal data carefully.

.gitignore ADDED Viewed

	@@ -0,0 +1,55 @@

+# Environment variables
+.env
+.env.local
+.env.*.local
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+ENV/
+env/
+# Model cache
+my_model_cache/
+*.bin
+*.safetensors
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# Logs
+*.log
+logs/
+# Temporary files
+*.tmp
+*.temp

Dockerfile ADDED Viewed

	@@ -0,0 +1,16 @@

+# Read the doc: https://huggingface.co/docs/hub/spaces-sdks-docker
+# you will also find guides on how best to write your Dockerfile
+FROM python:3.12.4
+RUN useradd -m -u 1000 user
+USER user
+ENV PATH="/home/user/.local/bin:$PATH"
+WORKDIR /app
+COPY --chown=user ./requirements.txt requirements.txt
+RUN pip install --no-cache-dir --upgrade -r requirements.txt
+COPY --chown=user . /app
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

app-reference.py ADDED Viewed

	@@ -0,0 +1,259 @@

+#!/usr/bin/env python3
+"""
+FastAPI application for FunctionGemma with HuggingFace login support.
+This file is designed to be run with: uvicorn app:app --host 0.0.0.0 --port 7860
+"""
+import os
+import sys
+from pathlib import Path
+from fastapi import FastAPI
+from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
+from huggingface_hub import login
+# Global variables
+model_name = None
+pipe = None
+tokenizer = None # Add global tokenizer
+app = FastAPI(title="FunctionGemma API", version="1.0.0")
+def check_and_download_model():
+    """Check if model exists in cache, if not download it"""
+    global model_name, tokenizer # Include tokenizer in global
+    # Use TinyLlama - a fully public model
+    # model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
+    model_name = "unsloth/functiongemma-270m-it"
+    # model_name = "Qwen/Qwen3-0.6B"
+    cache_dir = "./my_model_cache"
+    # Check if model already exists in cache
+    model_path = Path(cache_dir) / f"models--{model_name.replace('/', '--')}"
+    snapshot_path = model_path / "snapshots"
+    if snapshot_path.exists() and any(snapshot_path.iterdir()):
+        print(f"✓ Model {model_name} already exists in cache")
+        tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir) # Load tokenizer if model exists
+        return model_name, cache_dir
+    print(f"✗ Model {model_name} not found in cache")
+    print("Downloading model...")
+    # Login to Hugging Face (optional, for gated models)
+    token = os.getenv("HUGGINGFACE_TOKEN")
+    if token:
+        try:
+            print("Logging in to Hugging Face...")
+            login(token=token)
+            print("✓ HuggingFace login successful!")
+        except Exception as e:
+            print(f"⚠ Login failed: {e}")
+            print("Continuing without login (public models only)")
+    else:
+        print("ℹ No HUGGINGFACE_TOKEN set - using public models only")
+    try:
+        # Download tokenizer
+        print("Loading tokenizer...")
+        tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)
+        print("✓ Tokenizer loaded successfully!")
+        # Download model
+        print("Loading model...")
+        model = AutoModelForCausalLM.from_pretrained(model_name, cache_dir=cache_dir)
+        print("✓ Model loaded successfully!")
+        print(f"✓ Model and tokenizer downloaded successfully to {cache_dir}")
+        return model_name, cache_dir
+    except Exception as e:
+        print(f"✗ Error downloading model: {e}")
+        print("\nPossible reasons:")
+        print("1. Model requires authentication - set HUGGINGFACE_TOKEN in .env")
+        print("2. Model is gated and you don't have access")
+        print("3. Network connection issues")
+        sys.exit(1)
+def initialize_pipeline():
+    """Initialize the pipeline with the model"""
+    global pipe, model_name, tokenizer # Include tokenizer in global
+    if model_name is None:
+        model_name, _ = check_and_download_model()
+    if tokenizer is None: # Ensure tokenizer is loaded
+        tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="./my_model_cache")
+    print(f"Initializing pipeline with {model_name}...")
+    pipe = pipeline("text-generation", model=model_name, tokenizer=tokenizer) # Pass tokenizer to pipeline
+    print("✓ Pipeline initialized successfully!")
+# API Endpoints
+@app.get("/")
+def greet_json():
+    return {
+        "message": "FunctionGemma API is running!",
+        "model": model_name,
+        "status": "ready"
+    }
+@app.get("/health")
+def health_check():
+    return {"status": "healthy", "model": model_name}
+@app.get("/generate")
+def generate_text(prompt: str = "Who are you?"):
+    """Generate text using the model"""
+    if pipe is None:
+        initialize_pipeline()
+    messages = [{"role": "user", "content": prompt}]
+    result = pipe(messages, max_new_tokens=1000)
+    return {"response": result[0]["generated_text"]}
+@app.post("/chat")
+def chat_completion(messages: list):
+    """Chat completion endpoint"""
+    if pipe is None:
+        initialize_pipeline()
+    result = pipe(messages, max_new_tokens=200)
+    return {"response": result[0]["generated_text"]}
+@app.post("/v1/chat/completions")
+def openai_chat_completions(request: dict):
+    """
+    OpenAI-compatible chat completions endpoint
+    Expected request format:
+    {
+        "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+        "messages": [
+            {"role": "user", "content": "Hello"}
+        ],
+        "max_tokens": 100,
+        "temperature": 0.7
+    }
+    """
+    if pipe is None:
+        initialize_pipeline()
+    import time
+    messages = request.get("messages", [])
+    model = request.get("model", model_name)
+    max_tokens = request.get("max_tokens", 1000)
+    temperature = request.get("temperature", 0.7)
+    print('\n\n request')
+    print(request)
+    print('\n\n messages')
+    print(messages)
+    print('\n\n model')
+    print(model)
+    print('\n\n max_tokens')
+    print(max_tokens)
+    print('\n\n temperature')
+    print(temperature)
+    # Generate response
+    result = pipe(
+        messages,
+        max_new_tokens=max_tokens,
+        # temperature=temperature
+    )
+    result = convert_json_format(result)
+    completion_id = f"chatcmpl-{int(time.time())}"
+    created = int(time.time())
+    return_json = {
+        "id": completion_id,
+        "object": "chat.completion",
+        "created": created,
+        "model": model,
+        "choices": [
+            {
+                "index": 0,
+                "message": {
+                    "role": "assistant",
+                    "content": result["generations"][0][0]["text"] # Corrected access
+                },
+                "finish_reason": "stop"
+            }
+        ],
+        "usage": {
+            "prompt_tokens": 0,
+            "completion_tokens": 0,
+            "total_tokens": 0
+        }
+    }
+    # Calculate prompt tokens
+    if tokenizer:
+        prompt_text = ""
+        for message in messages:
+            prompt_text += message.get("content", "") + " "
+        prompt_tokens = len(tokenizer.encode(prompt_text.strip()))
+        return_json["usage"]["prompt_tokens"] = prompt_tokens
+    # Calculate completion tokens
+    if tokenizer and result["generations"]:
+        completion_text = result["generations"][0][0]["text"]
+        completion_tokens = len(tokenizer.encode(completion_text))
+        return_json["usage"]["completion_tokens"] = completion_tokens
+    return_json["usage"]["total_tokens"] = return_json["usage"]["prompt_tokens"] + return_json["usage"]["completion_tokens"]
+    print('\n\n return_json')
+    print(return_json)
+    print('return over! \n\n')
+    return return_json
+# Initialize model on startup
+@app.on_event("startup")
+async def startup_event():
+    """Initialize the model when the app starts"""
+    print("=" * 60)
+    print("FunctionGemma FastAPI Server")
+    print("=" * 60)
+    print("Initializing model...")
+    initialize_pipeline()
+    print("\n" + "=" * 60)
+    print("Server ready at http://0.0.0.0:7860")
+    print("Available endpoints:")
+    print("  GET  /                       - Welcome message")
+    print("  GET  /health                 - Health check")
+    print("  GET  /generate?prompt=...    - Generate text with prompt")
+    print("  POST /chat                   - Chat completion")
+    print("  POST /v1/chat/completions    - OpenAI-compatible endpoint")
+    print("=" * 60 + "\n")
+import re
+def convert_json_format(input_data):
+    output_generations = []
+    for item in input_data:
+        generated_text_list = item.get('generated_text', [])
+        assistant_content = ""
+        for message in generated_text_list:
+            if message.get('role') == 'assistant':
+                assistant_content = message.get('content', '')
+                break # Assuming only one assistant response per generated_text
+        # Remove <think>...</think> tags
+        clean_content = re.sub(r'<think>.*?</think>\s*', '', assistant_content, flags=re.DOTALL).strip()
+        output_generations.append([
+            {
+                "text": clean_content,
+                "generationInfo": {
+                    "finish_reason": "stop"
+                }
+            }
+        ])
+    return {"generations": output_generations}

app.py ADDED Viewed

	@@ -0,0 +1,17 @@

+from fastapi import FastAPI
+# 初始化 FastAPI 应用
+app = FastAPI(title="HF-Model-Runner API", version="0.0.1")
+model_name = None
+@app.get("/")
+async def read_root():
+    return {"message": "Welcome to HF-Model-Runner API! Visit /docs for API documentation."}
+@app.get("/")
+def greet_json():
+    return {
+        "message": "HF-Model-Runner API is running!",
+        "model": model_name,
+        "status": "ready"
+    }

memory-bank/activeContext.md ADDED Viewed

	@@ -0,0 +1,23 @@

+# Active Context
+**Current Work Focus:**
+- Integrating a Hugging Face model into `app.py`.
+- Creating API endpoints for model interaction.
+**Recent Changes:**
+- 2026-01-01: Created `projectBrief.md`, `productContext.md`, `systemPatterns.md`, `techContext.md`, `activeContext.md`, `progress.md`, and `changelog.md` in the `memory-bank` directory.
+- 2026-01-01: Modified `app.py` to implement the basic FastAPI structure.
+- 2026-01-01: Integrated a Hugging Face sentiment analysis model (`distilbert-base-uncased-finetuned-sst-2-english`) into `app.py` and added a `/predict` API endpoint.
+**Next Steps:**
+- Finalize deployment on Hugging Face Spaces.
+**Active Decisions and Considerations:**
+- The FastAPI application will run on port 7860, as is common for Hugging Face Spaces.
+- The initial `app.py` now includes a functional model inference endpoint.
+**Important Patterns and Preferences:**
+- Adhere to the Memory Bank documentation structure and update process.
+**Learnings and Project Insights:**
+- The Memory Bank is crucial for maintaining context across sessions.

memory-bank/changelog.md ADDED Viewed

	@@ -0,0 +1,12 @@

+# Changelog
+## [0.0.1] - 2026-01-01
+### Added
+- Initial setup of `memory-bank` directory and core documentation files:
+  - `projectBrief.md`
+  - `productContext.md`
+  - `systemPatterns.md`
+  - `techContext.md`
+  - `activeContext.md`
+  - `progress.md`
+- Defined initial project scope, product context, system architecture, technical stack, active work focus, and project progress.

memory-bank/productContext.md ADDED Viewed

	@@ -0,0 +1,16 @@

+# Product Context
+This project provides a web API for a Hugging Face model, allowing other applications or users to interact with the model programmatically.
+**Problems it solves:**
+- Enables easy access to Hugging Face models via a standard API.
+- Simplifies integration of AI models into other services.
+**How it should work:**
+- Users send HTTP requests to the API endpoints.
+- The API processes the request, interacts with the loaded Hugging Face model, and returns a response.
+**User experience goals:**
+- Simple and intuitive API interface.
+- Fast and reliable model inference.
+- Clear documentation for API usage.

memory-bank/progress.md ADDED Viewed

	@@ -0,0 +1,22 @@

+# Progress
+**What Works:**
+- The `memory-bank` directory has been created.
+- Core Memory Bank files (`projectBrief.md`, `productContext.md`, `systemPatterns.md`, `techContext.md`, `activeContext.md`) have been initialized with relevant project context.
+**What's Left to Build:**
+- Implement the minimal FastAPI application in `app.py`.
+- Ensure `requirements.txt` contains `fastapi` and `uvicorn`.
+- Integrate a Hugging Face model.
+- Create API endpoints for model interaction.
+- Finalize deployment on Hugging Face Spaces.
+**Current Status:**
+- Documentation setup is nearly complete.
+- Ready to proceed with code implementation.
+**Known Issues:**
+- None at this stage.
+**Evolution of Project Decisions:**
+- Initial focus on establishing a robust documentation foundation before coding.

memory-bank/projectBrief.md ADDED Viewed

	@@ -0,0 +1,9 @@

+# Project Brief
+This project aims to create a Hugging Face Space application that loads and exposes a Hugging Face model for user interaction via a FastAPI interface.
+**Core Requirements:**
+- Implement a minimal FastAPI application in `app.py`.
+- Load a Hugging Face model.
+- Provide an API endpoint to interact with the loaded model.
+- Deploy the application on Hugging Face Spaces.

memory-bank/systemPatterns.md ADDED Viewed

	@@ -0,0 +1,20 @@

+# System Patterns
+**System Architecture:**
+- FastAPI for the web API.
+- Hugging Face Transformers library for model loading and inference.
+- Deployed on Hugging Face Spaces.
+**Key Technical Decisions:**
+- Use FastAPI for its performance and automatic interactive API documentation (Swagger UI).
+- Leverage Hugging Face's ecosystem for model management and deployment.
+**Design Patterns in Use:**
+- **MVC (Model-View-Controller) variant:** FastAPI acts as the controller, handling requests and responses. The Hugging Face model is the "model" (data/logic). There's no explicit "view" as it's an API.
+- **Dependency Injection:** FastAPI's dependency injection system will be used for managing model loading and other resources.
+**Component Relationships:**
+- `app.py`: Main FastAPI application, defines routes and interacts with the model.
+- Hugging Face Model: Loaded and used by `app.py` for inference.
+- `requirements.txt`: Specifies Python dependencies.
+- `Dockerfile` (if used): Defines the environment for deployment.

memory-bank/techContext.md ADDED Viewed

	@@ -0,0 +1,27 @@

+# Tech Context
+**Technologies Used:**
+- **Python:** Primary programming language.
+- **FastAPI:** Web framework for building the API.
+- **Hugging Face Transformers:** Library for loading and using pre-trained models.
+- **Uvicorn:** ASGI server to run the FastAPI application.
+**Development Setup:**
+- **Conda:** Environment management for Python.
+- **pip:** Package installer for Python.
+- **Git:** Version control.
+**Technical Constraints:**
+- Deployment on Hugging Face Spaces requires adherence to their environment specifications (e.g., `requirements.txt`, `app.py` as the main entry point).
+- Model size and inference speed will be factors for performance on Hugging Face Spaces.
+**Dependencies:**
+- `fastapi`
+- `uvicorn`
+- `transformers` (for model loading)
+- `torch` or `tensorflow` (as backend for transformers, depending on the model)
+**Tool Usage Patterns:**
+- `conda activate airs`: To activate the development environment.
+- `pip install -r requirements.txt`: To install dependencies.
+- `uvicorn app:app --host 0.0.0.0 --port 7860`: To run the FastAPI application locally (Hugging Face Spaces typically uses port 7860).

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+fastapi
+uvicorn[standard]
+transformers
+huggingface_hub
+torch
+accelerate
+python-multipart