update
Browse files- .clinerules/.clinerules +12 -0
- .clinerules/temporal-memory-bank.md +157 -0
- .gitignore +55 -0
- Dockerfile +16 -0
- app-reference.py +259 -0
- app.py +17 -0
- memory-bank/activeContext.md +23 -0
- memory-bank/changelog.md +12 -0
- memory-bank/productContext.md +16 -0
- memory-bank/progress.md +22 -0
- memory-bank/projectBrief.md +9 -0
- memory-bank/systemPatterns.md +20 -0
- memory-bank/techContext.md +27 -0
- requirements.txt +7 -0
.clinerules/.clinerules
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# .clinerules(当前文件)为cline的规则入口和总控文件
|
| 2 |
+
|
| 3 |
+
# 所有的交流均采用中文,包括文档
|
| 4 |
+
|
| 5 |
+
# 需要和用户进行需求讨论,你向用户发起需求调研,调研时,请你将调研任务先记录下来,并在后调研中一次只问一个问题,直到调研结束
|
| 6 |
+
|
| 7 |
+
# python 环境管理工具为 conda,环境变量为 airs,如果当前环境不是 airs,请使用 `conda activate airs` 激活环境
|
| 8 |
+
|
| 9 |
+
# 核心任务:
|
| 10 |
+
创建一个huggingface space 上加载一个 huggingface model,供用户调用
|
| 11 |
+
|
| 12 |
+
|
.clinerules/temporal-memory-bank.md
ADDED
|
@@ -0,0 +1,157 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
description: Describes Cline's Memory Bank system, its structure, and workflows for maintaining project knowledge across sessions.
|
| 3 |
+
author: https://github.com/nickbaumann98 https://github.com/chisleu
|
| 4 |
+
version: 1.0
|
| 5 |
+
tags: ["memory-bank", "knowledge-base", "core-behavior", "documentation-protocol"]
|
| 6 |
+
globs: ["memory-bank/**/*.md", "*"]
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# Cline's Memory Bank (Time-Aware Version)
|
| 10 |
+
|
| 11 |
+
I am Cline, an expert software engineer with a unique characteristic: my memory resets completely between sessions. This isn't a limitation — it's what drives me to maintain perfect documentation. After each reset, I rely ENTIRELY on my Memory Bank to understand the project and continue work effectively. I MUST read ALL memory bank files at the start of EVERY task — this is not optional.
|
| 12 |
+
|
| 13 |
+
## Memory Bank Structure
|
| 14 |
+
|
| 15 |
+
The Memory Bank is located in a folder called 'memory-bank'. Create it if it does not already exist.
|
| 16 |
+
The Memory Bank consists of core files and optional context files, all in Markdown format. Files build upon each other in a clear hierarchy:
|
| 17 |
+
|
| 18 |
+
```mermaid
|
| 19 |
+
flowchart TD
|
| 20 |
+
PB[projectBrief.md] --> PC[productContext.md]
|
| 21 |
+
PB --> SP[systemPatterns.md]
|
| 22 |
+
PB --> TC[techContext.md]
|
| 23 |
+
|
| 24 |
+
PC --> AC[activeContext.md]
|
| 25 |
+
SP --> AC
|
| 26 |
+
TC --> AC
|
| 27 |
+
|
| 28 |
+
AC --> P[progress.md]
|
| 29 |
+
AC --> CL[changelog.md]
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
### Core Files (Required)
|
| 33 |
+
1. `projectBrief.md`
|
| 34 |
+
- Foundation document that shapes all other files
|
| 35 |
+
- Created at project start if it doesn't exist
|
| 36 |
+
- Defines core requirements and goals
|
| 37 |
+
- Source of truth for project scope
|
| 38 |
+
|
| 39 |
+
2. `productContext.md`
|
| 40 |
+
- Why this project exists
|
| 41 |
+
- Problems it solves
|
| 42 |
+
- How it should work
|
| 43 |
+
- User experience goals
|
| 44 |
+
|
| 45 |
+
3. `activeContext.md`
|
| 46 |
+
- Current work focus
|
| 47 |
+
- Recent changes
|
| 48 |
+
- Next steps
|
| 49 |
+
- Active decisions and considerations
|
| 50 |
+
- Important patterns and preferences
|
| 51 |
+
- Learnings and project insights
|
| 52 |
+
- Maintain a sliding window of the **10 most recent events** (date + summary).
|
| 53 |
+
- When a new event is added (the 11th), delete the oldest to retain only 10.
|
| 54 |
+
- This helps me reason about recent changes without bloating the file.
|
| 55 |
+
|
| 56 |
+
4. `systemPatterns.md`
|
| 57 |
+
- System architecture
|
| 58 |
+
- Key technical decisions
|
| 59 |
+
- Design patterns in use
|
| 60 |
+
- Component relationships
|
| 61 |
+
- Critical implementation paths
|
| 62 |
+
|
| 63 |
+
5. `techContext.md`
|
| 64 |
+
- Technologies used
|
| 65 |
+
- Development setup
|
| 66 |
+
- Technical constraints
|
| 67 |
+
- Dependencies
|
| 68 |
+
- Tool usage patterns
|
| 69 |
+
|
| 70 |
+
6. `progress.md`
|
| 71 |
+
- What works
|
| 72 |
+
- What's left to build
|
| 73 |
+
- Current status
|
| 74 |
+
- Known issues
|
| 75 |
+
- Evolution of project decisions
|
| 76 |
+
|
| 77 |
+
7. `changelog.md`
|
| 78 |
+
- Chronological log of key changes, decisions, or versions
|
| 79 |
+
- Follows a `CHANGELOG.md` convention with version/date headers
|
| 80 |
+
- Example format:
|
| 81 |
+
```markdown
|
| 82 |
+
## [1.0.3] - 2025-06-14
|
| 83 |
+
### Changed
|
| 84 |
+
- Switched from REST to GraphQL
|
| 85 |
+
- Refactored notification system for async retries
|
| 86 |
+
|
| 87 |
+
### Fixed
|
| 88 |
+
- Resolved mobile auth bug on Android
|
| 89 |
+
|
| 90 |
+
### Added
|
| 91 |
+
- Timeline.md summary added to support project retrospectives
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
## Core Workflows
|
| 97 |
+
|
| 98 |
+
### Plan Mode
|
| 99 |
+
```mermaid
|
| 100 |
+
flowchart TD
|
| 101 |
+
Start[Start] --> ReadFiles[Read Memory Bank]
|
| 102 |
+
ReadFiles --> CheckFiles{Files Complete?}
|
| 103 |
+
|
| 104 |
+
CheckFiles -->|No| Plan[Create Plan]
|
| 105 |
+
Plan --> Document[Document in Chat]
|
| 106 |
+
|
| 107 |
+
CheckFiles -->|Yes| Verify[Verify Context]
|
| 108 |
+
Verify --> Strategy[Develop Strategy]
|
| 109 |
+
Strategy --> Present[Present Approach]
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
### Act Mode
|
| 113 |
+
```mermaid
|
| 114 |
+
flowchart TD
|
| 115 |
+
Start[Start] --> Context[Check Memory Bank]
|
| 116 |
+
Context --> Update[Update Documentation]
|
| 117 |
+
Update --> Execute[Execute Task]
|
| 118 |
+
Execute --> Document[Document Changes]
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
---
|
| 122 |
+
|
| 123 |
+
## Documentation Updates
|
| 124 |
+
|
| 125 |
+
Updates occur when:
|
| 126 |
+
1. Discovering new project patterns
|
| 127 |
+
2. After significant changes
|
| 128 |
+
3. When user requests **update memory bank**
|
| 129 |
+
4. When context changes or decisions occur
|
| 130 |
+
5. When **time-based updates** are needed
|
| 131 |
+
|
| 132 |
+
### Update Process
|
| 133 |
+
```mermaid
|
| 134 |
+
flowchart TD
|
| 135 |
+
Start[Update Process]
|
| 136 |
+
|
| 137 |
+
subgraph Process
|
| 138 |
+
P1[Review ALL Files]
|
| 139 |
+
P2[Document Current State]
|
| 140 |
+
P3[Clarify Next Steps]
|
| 141 |
+
P4[Document Insights & Patterns]
|
| 142 |
+
P5[Update progress.md]
|
| 143 |
+
P6[Slide activeContext.md to keep latest 10 entries]
|
| 144 |
+
P7[Append changelog.md]
|
| 145 |
+
|
| 146 |
+
P1 --> P2 --> P3 --> P4 --> P5 --> P6 --> P7
|
| 147 |
+
end
|
| 148 |
+
|
| 149 |
+
Start --> Process
|
| 150 |
+
```
|
| 151 |
+
|
| 152 |
+
---
|
| 153 |
+
|
| 154 |
+
## Reminder
|
| 155 |
+
|
| 156 |
+
After every memory reset, I begin completely fresh. The Memory Bank is my only link to previous work. It must be maintained with precision and clarity — especially with time-aware reasoning. Read, interpret, and act on temporal data carefully.
|
| 157 |
+
|
.gitignore
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Environment variables
|
| 2 |
+
.env
|
| 3 |
+
.env.local
|
| 4 |
+
.env.*.local
|
| 5 |
+
|
| 6 |
+
# Python
|
| 7 |
+
__pycache__/
|
| 8 |
+
*.py[cod]
|
| 9 |
+
*$py.class
|
| 10 |
+
*.so
|
| 11 |
+
.Python
|
| 12 |
+
build/
|
| 13 |
+
develop-eggs/
|
| 14 |
+
dist/
|
| 15 |
+
downloads/
|
| 16 |
+
eggs/
|
| 17 |
+
.eggs/
|
| 18 |
+
lib/
|
| 19 |
+
lib64/
|
| 20 |
+
parts/
|
| 21 |
+
sdist/
|
| 22 |
+
var/
|
| 23 |
+
wheels/
|
| 24 |
+
*.egg-info/
|
| 25 |
+
.installed.cfg
|
| 26 |
+
*.egg
|
| 27 |
+
|
| 28 |
+
# Virtual environments
|
| 29 |
+
venv/
|
| 30 |
+
ENV/
|
| 31 |
+
env/
|
| 32 |
+
|
| 33 |
+
# Model cache
|
| 34 |
+
my_model_cache/
|
| 35 |
+
*.bin
|
| 36 |
+
*.safetensors
|
| 37 |
+
|
| 38 |
+
# IDE
|
| 39 |
+
.vscode/
|
| 40 |
+
.idea/
|
| 41 |
+
*.swp
|
| 42 |
+
*.swo
|
| 43 |
+
*~
|
| 44 |
+
|
| 45 |
+
# OS
|
| 46 |
+
.DS_Store
|
| 47 |
+
Thumbs.db
|
| 48 |
+
|
| 49 |
+
# Logs
|
| 50 |
+
*.log
|
| 51 |
+
logs/
|
| 52 |
+
|
| 53 |
+
# Temporary files
|
| 54 |
+
*.tmp
|
| 55 |
+
*.temp
|
Dockerfile
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Read the doc: https://huggingface.co/docs/hub/spaces-sdks-docker
|
| 2 |
+
# you will also find guides on how best to write your Dockerfile
|
| 3 |
+
|
| 4 |
+
FROM python:3.12.4
|
| 5 |
+
|
| 6 |
+
RUN useradd -m -u 1000 user
|
| 7 |
+
USER user
|
| 8 |
+
ENV PATH="/home/user/.local/bin:$PATH"
|
| 9 |
+
|
| 10 |
+
WORKDIR /app
|
| 11 |
+
|
| 12 |
+
COPY --chown=user ./requirements.txt requirements.txt
|
| 13 |
+
RUN pip install --no-cache-dir --upgrade -r requirements.txt
|
| 14 |
+
|
| 15 |
+
COPY --chown=user . /app
|
| 16 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
|
app-reference.py
ADDED
|
@@ -0,0 +1,259 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
FastAPI application for FunctionGemma with HuggingFace login support.
|
| 4 |
+
This file is designed to be run with: uvicorn app:app --host 0.0.0.0 --port 7860
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import os
|
| 8 |
+
import sys
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
from fastapi import FastAPI
|
| 11 |
+
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
|
| 12 |
+
from huggingface_hub import login
|
| 13 |
+
|
| 14 |
+
# Global variables
|
| 15 |
+
model_name = None
|
| 16 |
+
pipe = None
|
| 17 |
+
tokenizer = None # Add global tokenizer
|
| 18 |
+
app = FastAPI(title="FunctionGemma API", version="1.0.0")
|
| 19 |
+
|
| 20 |
+
def check_and_download_model():
|
| 21 |
+
"""Check if model exists in cache, if not download it"""
|
| 22 |
+
global model_name, tokenizer # Include tokenizer in global
|
| 23 |
+
|
| 24 |
+
# Use TinyLlama - a fully public model
|
| 25 |
+
# model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
|
| 26 |
+
model_name = "unsloth/functiongemma-270m-it"
|
| 27 |
+
# model_name = "Qwen/Qwen3-0.6B"
|
| 28 |
+
cache_dir = "./my_model_cache"
|
| 29 |
+
|
| 30 |
+
# Check if model already exists in cache
|
| 31 |
+
model_path = Path(cache_dir) / f"models--{model_name.replace('/', '--')}"
|
| 32 |
+
snapshot_path = model_path / "snapshots"
|
| 33 |
+
|
| 34 |
+
if snapshot_path.exists() and any(snapshot_path.iterdir()):
|
| 35 |
+
print(f"✓ Model {model_name} already exists in cache")
|
| 36 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir) # Load tokenizer if model exists
|
| 37 |
+
return model_name, cache_dir
|
| 38 |
+
|
| 39 |
+
print(f"✗ Model {model_name} not found in cache")
|
| 40 |
+
print("Downloading model...")
|
| 41 |
+
|
| 42 |
+
# Login to Hugging Face (optional, for gated models)
|
| 43 |
+
token = os.getenv("HUGGINGFACE_TOKEN")
|
| 44 |
+
if token:
|
| 45 |
+
try:
|
| 46 |
+
print("Logging in to Hugging Face...")
|
| 47 |
+
login(token=token)
|
| 48 |
+
print("✓ HuggingFace login successful!")
|
| 49 |
+
except Exception as e:
|
| 50 |
+
print(f"⚠ Login failed: {e}")
|
| 51 |
+
print("Continuing without login (public models only)")
|
| 52 |
+
else:
|
| 53 |
+
print("ℹ No HUGGINGFACE_TOKEN set - using public models only")
|
| 54 |
+
|
| 55 |
+
try:
|
| 56 |
+
# Download tokenizer
|
| 57 |
+
print("Loading tokenizer...")
|
| 58 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)
|
| 59 |
+
print("✓ Tokenizer loaded successfully!")
|
| 60 |
+
|
| 61 |
+
# Download model
|
| 62 |
+
print("Loading model...")
|
| 63 |
+
model = AutoModelForCausalLM.from_pretrained(model_name, cache_dir=cache_dir)
|
| 64 |
+
print("✓ Model loaded successfully!")
|
| 65 |
+
|
| 66 |
+
print(f"✓ Model and tokenizer downloaded successfully to {cache_dir}")
|
| 67 |
+
return model_name, cache_dir
|
| 68 |
+
|
| 69 |
+
except Exception as e:
|
| 70 |
+
print(f"✗ Error downloading model: {e}")
|
| 71 |
+
print("\nPossible reasons:")
|
| 72 |
+
print("1. Model requires authentication - set HUGGINGFACE_TOKEN in .env")
|
| 73 |
+
print("2. Model is gated and you don't have access")
|
| 74 |
+
print("3. Network connection issues")
|
| 75 |
+
sys.exit(1)
|
| 76 |
+
|
| 77 |
+
def initialize_pipeline():
|
| 78 |
+
"""Initialize the pipeline with the model"""
|
| 79 |
+
global pipe, model_name, tokenizer # Include tokenizer in global
|
| 80 |
+
|
| 81 |
+
if model_name is None:
|
| 82 |
+
model_name, _ = check_and_download_model()
|
| 83 |
+
|
| 84 |
+
if tokenizer is None: # Ensure tokenizer is loaded
|
| 85 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="./my_model_cache")
|
| 86 |
+
|
| 87 |
+
print(f"Initializing pipeline with {model_name}...")
|
| 88 |
+
pipe = pipeline("text-generation", model=model_name, tokenizer=tokenizer) # Pass tokenizer to pipeline
|
| 89 |
+
print("✓ Pipeline initialized successfully!")
|
| 90 |
+
|
| 91 |
+
# API Endpoints
|
| 92 |
+
@app.get("/")
|
| 93 |
+
def greet_json():
|
| 94 |
+
return {
|
| 95 |
+
"message": "FunctionGemma API is running!",
|
| 96 |
+
"model": model_name,
|
| 97 |
+
"status": "ready"
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
@app.get("/health")
|
| 101 |
+
def health_check():
|
| 102 |
+
return {"status": "healthy", "model": model_name}
|
| 103 |
+
|
| 104 |
+
@app.get("/generate")
|
| 105 |
+
def generate_text(prompt: str = "Who are you?"):
|
| 106 |
+
"""Generate text using the model"""
|
| 107 |
+
if pipe is None:
|
| 108 |
+
initialize_pipeline()
|
| 109 |
+
|
| 110 |
+
messages = [{"role": "user", "content": prompt}]
|
| 111 |
+
result = pipe(messages, max_new_tokens=1000)
|
| 112 |
+
return {"response": result[0]["generated_text"]}
|
| 113 |
+
|
| 114 |
+
@app.post("/chat")
|
| 115 |
+
def chat_completion(messages: list):
|
| 116 |
+
"""Chat completion endpoint"""
|
| 117 |
+
if pipe is None:
|
| 118 |
+
initialize_pipeline()
|
| 119 |
+
|
| 120 |
+
result = pipe(messages, max_new_tokens=200)
|
| 121 |
+
return {"response": result[0]["generated_text"]}
|
| 122 |
+
|
| 123 |
+
@app.post("/v1/chat/completions")
|
| 124 |
+
def openai_chat_completions(request: dict):
|
| 125 |
+
"""
|
| 126 |
+
OpenAI-compatible chat completions endpoint
|
| 127 |
+
Expected request format:
|
| 128 |
+
{
|
| 129 |
+
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
|
| 130 |
+
"messages": [
|
| 131 |
+
{"role": "user", "content": "Hello"}
|
| 132 |
+
],
|
| 133 |
+
"max_tokens": 100,
|
| 134 |
+
"temperature": 0.7
|
| 135 |
+
}
|
| 136 |
+
"""
|
| 137 |
+
if pipe is None:
|
| 138 |
+
initialize_pipeline()
|
| 139 |
+
|
| 140 |
+
import time
|
| 141 |
+
|
| 142 |
+
messages = request.get("messages", [])
|
| 143 |
+
model = request.get("model", model_name)
|
| 144 |
+
max_tokens = request.get("max_tokens", 1000)
|
| 145 |
+
temperature = request.get("temperature", 0.7)
|
| 146 |
+
|
| 147 |
+
print('\n\n request')
|
| 148 |
+
print(request)
|
| 149 |
+
print('\n\n messages')
|
| 150 |
+
print(messages)
|
| 151 |
+
print('\n\n model')
|
| 152 |
+
print(model)
|
| 153 |
+
print('\n\n max_tokens')
|
| 154 |
+
print(max_tokens)
|
| 155 |
+
print('\n\n temperature')
|
| 156 |
+
print(temperature)
|
| 157 |
+
|
| 158 |
+
# Generate response
|
| 159 |
+
result = pipe(
|
| 160 |
+
messages,
|
| 161 |
+
max_new_tokens=max_tokens,
|
| 162 |
+
# temperature=temperature
|
| 163 |
+
)
|
| 164 |
+
|
| 165 |
+
result = convert_json_format(result)
|
| 166 |
+
|
| 167 |
+
|
| 168 |
+
completion_id = f"chatcmpl-{int(time.time())}"
|
| 169 |
+
created = int(time.time())
|
| 170 |
+
|
| 171 |
+
return_json = {
|
| 172 |
+
"id": completion_id,
|
| 173 |
+
"object": "chat.completion",
|
| 174 |
+
"created": created,
|
| 175 |
+
"model": model,
|
| 176 |
+
"choices": [
|
| 177 |
+
{
|
| 178 |
+
"index": 0,
|
| 179 |
+
"message": {
|
| 180 |
+
"role": "assistant",
|
| 181 |
+
"content": result["generations"][0][0]["text"] # Corrected access
|
| 182 |
+
},
|
| 183 |
+
"finish_reason": "stop"
|
| 184 |
+
}
|
| 185 |
+
],
|
| 186 |
+
"usage": {
|
| 187 |
+
"prompt_tokens": 0,
|
| 188 |
+
"completion_tokens": 0,
|
| 189 |
+
"total_tokens": 0
|
| 190 |
+
}
|
| 191 |
+
}
|
| 192 |
+
|
| 193 |
+
# Calculate prompt tokens
|
| 194 |
+
if tokenizer:
|
| 195 |
+
prompt_text = ""
|
| 196 |
+
for message in messages:
|
| 197 |
+
prompt_text += message.get("content", "") + " "
|
| 198 |
+
prompt_tokens = len(tokenizer.encode(prompt_text.strip()))
|
| 199 |
+
return_json["usage"]["prompt_tokens"] = prompt_tokens
|
| 200 |
+
|
| 201 |
+
# Calculate completion tokens
|
| 202 |
+
if tokenizer and result["generations"]:
|
| 203 |
+
completion_text = result["generations"][0][0]["text"]
|
| 204 |
+
completion_tokens = len(tokenizer.encode(completion_text))
|
| 205 |
+
return_json["usage"]["completion_tokens"] = completion_tokens
|
| 206 |
+
|
| 207 |
+
return_json["usage"]["total_tokens"] = return_json["usage"]["prompt_tokens"] + return_json["usage"]["completion_tokens"]
|
| 208 |
+
|
| 209 |
+
print('\n\n return_json')
|
| 210 |
+
print(return_json)
|
| 211 |
+
print('return over! \n\n')
|
| 212 |
+
|
| 213 |
+
return return_json
|
| 214 |
+
|
| 215 |
+
# Initialize model on startup
|
| 216 |
+
@app.on_event("startup")
|
| 217 |
+
async def startup_event():
|
| 218 |
+
"""Initialize the model when the app starts"""
|
| 219 |
+
print("=" * 60)
|
| 220 |
+
print("FunctionGemma FastAPI Server")
|
| 221 |
+
print("=" * 60)
|
| 222 |
+
print("Initializing model...")
|
| 223 |
+
initialize_pipeline()
|
| 224 |
+
print("\n" + "=" * 60)
|
| 225 |
+
print("Server ready at http://0.0.0.0:7860")
|
| 226 |
+
print("Available endpoints:")
|
| 227 |
+
print(" GET / - Welcome message")
|
| 228 |
+
print(" GET /health - Health check")
|
| 229 |
+
print(" GET /generate?prompt=... - Generate text with prompt")
|
| 230 |
+
print(" POST /chat - Chat completion")
|
| 231 |
+
print(" POST /v1/chat/completions - OpenAI-compatible endpoint")
|
| 232 |
+
print("=" * 60 + "\n")
|
| 233 |
+
|
| 234 |
+
import re
|
| 235 |
+
|
| 236 |
+
def convert_json_format(input_data):
|
| 237 |
+
output_generations = []
|
| 238 |
+
for item in input_data:
|
| 239 |
+
generated_text_list = item.get('generated_text', [])
|
| 240 |
+
|
| 241 |
+
assistant_content = ""
|
| 242 |
+
for message in generated_text_list:
|
| 243 |
+
if message.get('role') == 'assistant':
|
| 244 |
+
assistant_content = message.get('content', '')
|
| 245 |
+
break # Assuming only one assistant response per generated_text
|
| 246 |
+
|
| 247 |
+
# Remove <think>...</think> tags
|
| 248 |
+
clean_content = re.sub(r'<think>.*?</think>\s*', '', assistant_content, flags=re.DOTALL).strip()
|
| 249 |
+
|
| 250 |
+
output_generations.append([
|
| 251 |
+
{
|
| 252 |
+
"text": clean_content,
|
| 253 |
+
"generationInfo": {
|
| 254 |
+
"finish_reason": "stop"
|
| 255 |
+
}
|
| 256 |
+
}
|
| 257 |
+
])
|
| 258 |
+
|
| 259 |
+
return {"generations": output_generations}
|
app.py
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from fastapi import FastAPI
|
| 2 |
+
|
| 3 |
+
# 初始化 FastAPI 应用
|
| 4 |
+
app = FastAPI(title="HF-Model-Runner API", version="0.0.1")
|
| 5 |
+
|
| 6 |
+
model_name = None
|
| 7 |
+
|
| 8 |
+
@app.get("/")
|
| 9 |
+
async def read_root():
|
| 10 |
+
return {"message": "Welcome to HF-Model-Runner API! Visit /docs for API documentation."}
|
| 11 |
+
@app.get("/")
|
| 12 |
+
def greet_json():
|
| 13 |
+
return {
|
| 14 |
+
"message": "HF-Model-Runner API is running!",
|
| 15 |
+
"model": model_name,
|
| 16 |
+
"status": "ready"
|
| 17 |
+
}
|
memory-bank/activeContext.md
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Active Context
|
| 2 |
+
|
| 3 |
+
**Current Work Focus:**
|
| 4 |
+
- Integrating a Hugging Face model into `app.py`.
|
| 5 |
+
- Creating API endpoints for model interaction.
|
| 6 |
+
|
| 7 |
+
**Recent Changes:**
|
| 8 |
+
- 2026-01-01: Created `projectBrief.md`, `productContext.md`, `systemPatterns.md`, `techContext.md`, `activeContext.md`, `progress.md`, and `changelog.md` in the `memory-bank` directory.
|
| 9 |
+
- 2026-01-01: Modified `app.py` to implement the basic FastAPI structure.
|
| 10 |
+
- 2026-01-01: Integrated a Hugging Face sentiment analysis model (`distilbert-base-uncased-finetuned-sst-2-english`) into `app.py` and added a `/predict` API endpoint.
|
| 11 |
+
|
| 12 |
+
**Next Steps:**
|
| 13 |
+
- Finalize deployment on Hugging Face Spaces.
|
| 14 |
+
|
| 15 |
+
**Active Decisions and Considerations:**
|
| 16 |
+
- The FastAPI application will run on port 7860, as is common for Hugging Face Spaces.
|
| 17 |
+
- The initial `app.py` now includes a functional model inference endpoint.
|
| 18 |
+
|
| 19 |
+
**Important Patterns and Preferences:**
|
| 20 |
+
- Adhere to the Memory Bank documentation structure and update process.
|
| 21 |
+
|
| 22 |
+
**Learnings and Project Insights:**
|
| 23 |
+
- The Memory Bank is crucial for maintaining context across sessions.
|
memory-bank/changelog.md
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Changelog
|
| 2 |
+
|
| 3 |
+
## [0.0.1] - 2026-01-01
|
| 4 |
+
### Added
|
| 5 |
+
- Initial setup of `memory-bank` directory and core documentation files:
|
| 6 |
+
- `projectBrief.md`
|
| 7 |
+
- `productContext.md`
|
| 8 |
+
- `systemPatterns.md`
|
| 9 |
+
- `techContext.md`
|
| 10 |
+
- `activeContext.md`
|
| 11 |
+
- `progress.md`
|
| 12 |
+
- Defined initial project scope, product context, system architecture, technical stack, active work focus, and project progress.
|
memory-bank/productContext.md
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Product Context
|
| 2 |
+
|
| 3 |
+
This project provides a web API for a Hugging Face model, allowing other applications or users to interact with the model programmatically.
|
| 4 |
+
|
| 5 |
+
**Problems it solves:**
|
| 6 |
+
- Enables easy access to Hugging Face models via a standard API.
|
| 7 |
+
- Simplifies integration of AI models into other services.
|
| 8 |
+
|
| 9 |
+
**How it should work:**
|
| 10 |
+
- Users send HTTP requests to the API endpoints.
|
| 11 |
+
- The API processes the request, interacts with the loaded Hugging Face model, and returns a response.
|
| 12 |
+
|
| 13 |
+
**User experience goals:**
|
| 14 |
+
- Simple and intuitive API interface.
|
| 15 |
+
- Fast and reliable model inference.
|
| 16 |
+
- Clear documentation for API usage.
|
memory-bank/progress.md
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Progress
|
| 2 |
+
|
| 3 |
+
**What Works:**
|
| 4 |
+
- The `memory-bank` directory has been created.
|
| 5 |
+
- Core Memory Bank files (`projectBrief.md`, `productContext.md`, `systemPatterns.md`, `techContext.md`, `activeContext.md`) have been initialized with relevant project context.
|
| 6 |
+
|
| 7 |
+
**What's Left to Build:**
|
| 8 |
+
- Implement the minimal FastAPI application in `app.py`.
|
| 9 |
+
- Ensure `requirements.txt` contains `fastapi` and `uvicorn`.
|
| 10 |
+
- Integrate a Hugging Face model.
|
| 11 |
+
- Create API endpoints for model interaction.
|
| 12 |
+
- Finalize deployment on Hugging Face Spaces.
|
| 13 |
+
|
| 14 |
+
**Current Status:**
|
| 15 |
+
- Documentation setup is nearly complete.
|
| 16 |
+
- Ready to proceed with code implementation.
|
| 17 |
+
|
| 18 |
+
**Known Issues:**
|
| 19 |
+
- None at this stage.
|
| 20 |
+
|
| 21 |
+
**Evolution of Project Decisions:**
|
| 22 |
+
- Initial focus on establishing a robust documentation foundation before coding.
|
memory-bank/projectBrief.md
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Project Brief
|
| 2 |
+
|
| 3 |
+
This project aims to create a Hugging Face Space application that loads and exposes a Hugging Face model for user interaction via a FastAPI interface.
|
| 4 |
+
|
| 5 |
+
**Core Requirements:**
|
| 6 |
+
- Implement a minimal FastAPI application in `app.py`.
|
| 7 |
+
- Load a Hugging Face model.
|
| 8 |
+
- Provide an API endpoint to interact with the loaded model.
|
| 9 |
+
- Deploy the application on Hugging Face Spaces.
|
memory-bank/systemPatterns.md
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# System Patterns
|
| 2 |
+
|
| 3 |
+
**System Architecture:**
|
| 4 |
+
- FastAPI for the web API.
|
| 5 |
+
- Hugging Face Transformers library for model loading and inference.
|
| 6 |
+
- Deployed on Hugging Face Spaces.
|
| 7 |
+
|
| 8 |
+
**Key Technical Decisions:**
|
| 9 |
+
- Use FastAPI for its performance and automatic interactive API documentation (Swagger UI).
|
| 10 |
+
- Leverage Hugging Face's ecosystem for model management and deployment.
|
| 11 |
+
|
| 12 |
+
**Design Patterns in Use:**
|
| 13 |
+
- **MVC (Model-View-Controller) variant:** FastAPI acts as the controller, handling requests and responses. The Hugging Face model is the "model" (data/logic). There's no explicit "view" as it's an API.
|
| 14 |
+
- **Dependency Injection:** FastAPI's dependency injection system will be used for managing model loading and other resources.
|
| 15 |
+
|
| 16 |
+
**Component Relationships:**
|
| 17 |
+
- `app.py`: Main FastAPI application, defines routes and interacts with the model.
|
| 18 |
+
- Hugging Face Model: Loaded and used by `app.py` for inference.
|
| 19 |
+
- `requirements.txt`: Specifies Python dependencies.
|
| 20 |
+
- `Dockerfile` (if used): Defines the environment for deployment.
|
memory-bank/techContext.md
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Tech Context
|
| 2 |
+
|
| 3 |
+
**Technologies Used:**
|
| 4 |
+
- **Python:** Primary programming language.
|
| 5 |
+
- **FastAPI:** Web framework for building the API.
|
| 6 |
+
- **Hugging Face Transformers:** Library for loading and using pre-trained models.
|
| 7 |
+
- **Uvicorn:** ASGI server to run the FastAPI application.
|
| 8 |
+
|
| 9 |
+
**Development Setup:**
|
| 10 |
+
- **Conda:** Environment management for Python.
|
| 11 |
+
- **pip:** Package installer for Python.
|
| 12 |
+
- **Git:** Version control.
|
| 13 |
+
|
| 14 |
+
**Technical Constraints:**
|
| 15 |
+
- Deployment on Hugging Face Spaces requires adherence to their environment specifications (e.g., `requirements.txt`, `app.py` as the main entry point).
|
| 16 |
+
- Model size and inference speed will be factors for performance on Hugging Face Spaces.
|
| 17 |
+
|
| 18 |
+
**Dependencies:**
|
| 19 |
+
- `fastapi`
|
| 20 |
+
- `uvicorn`
|
| 21 |
+
- `transformers` (for model loading)
|
| 22 |
+
- `torch` or `tensorflow` (as backend for transformers, depending on the model)
|
| 23 |
+
|
| 24 |
+
**Tool Usage Patterns:**
|
| 25 |
+
- `conda activate airs`: To activate the development environment.
|
| 26 |
+
- `pip install -r requirements.txt`: To install dependencies.
|
| 27 |
+
- `uvicorn app:app --host 0.0.0.0 --port 7860`: To run the FastAPI application locally (Hugging Face Spaces typically uses port 7860).
|
requirements.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
fastapi
|
| 2 |
+
uvicorn[standard]
|
| 3 |
+
transformers
|
| 4 |
+
huggingface_hub
|
| 5 |
+
torch
|
| 6 |
+
accelerate
|
| 7 |
+
python-multipart
|