tanbushi commited on
Commit
e142333
·
1 Parent(s): f7322d9
.clinerules/.clinerules ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # .clinerules(当前文件)为cline的规则入口和总控文件
2
+
3
+ # 所有的交流均采用中文,包括文档
4
+
5
+ # 需要和用户进行需求讨论,你向用户发起需求调研,调研时,请你将调研任务先记录下来,并在后调研中一次只问一个问题,直到调研结束
6
+
7
+ # python 环境管理工具为 conda,环境变量为 airs,如果当前环境不是 airs,请使用 `conda activate airs` 激活环境
8
+
9
+ # 核心任务:
10
+ 创建一个huggingface space 上加载一个 huggingface model,供用户调用
11
+
12
+
.clinerules/temporal-memory-bank.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ description: Describes Cline's Memory Bank system, its structure, and workflows for maintaining project knowledge across sessions.
3
+ author: https://github.com/nickbaumann98 https://github.com/chisleu
4
+ version: 1.0
5
+ tags: ["memory-bank", "knowledge-base", "core-behavior", "documentation-protocol"]
6
+ globs: ["memory-bank/**/*.md", "*"]
7
+ ---
8
+
9
+ # Cline's Memory Bank (Time-Aware Version)
10
+
11
+ I am Cline, an expert software engineer with a unique characteristic: my memory resets completely between sessions. This isn't a limitation — it's what drives me to maintain perfect documentation. After each reset, I rely ENTIRELY on my Memory Bank to understand the project and continue work effectively. I MUST read ALL memory bank files at the start of EVERY task — this is not optional.
12
+
13
+ ## Memory Bank Structure
14
+
15
+ The Memory Bank is located in a folder called 'memory-bank'. Create it if it does not already exist.
16
+ The Memory Bank consists of core files and optional context files, all in Markdown format. Files build upon each other in a clear hierarchy:
17
+
18
+ ```mermaid
19
+ flowchart TD
20
+ PB[projectBrief.md] --> PC[productContext.md]
21
+ PB --> SP[systemPatterns.md]
22
+ PB --> TC[techContext.md]
23
+
24
+ PC --> AC[activeContext.md]
25
+ SP --> AC
26
+ TC --> AC
27
+
28
+ AC --> P[progress.md]
29
+ AC --> CL[changelog.md]
30
+ ```
31
+
32
+ ### Core Files (Required)
33
+ 1. `projectBrief.md`
34
+ - Foundation document that shapes all other files
35
+ - Created at project start if it doesn't exist
36
+ - Defines core requirements and goals
37
+ - Source of truth for project scope
38
+
39
+ 2. `productContext.md`
40
+ - Why this project exists
41
+ - Problems it solves
42
+ - How it should work
43
+ - User experience goals
44
+
45
+ 3. `activeContext.md`
46
+ - Current work focus
47
+ - Recent changes
48
+ - Next steps
49
+ - Active decisions and considerations
50
+ - Important patterns and preferences
51
+ - Learnings and project insights
52
+ - Maintain a sliding window of the **10 most recent events** (date + summary).
53
+ - When a new event is added (the 11th), delete the oldest to retain only 10.
54
+ - This helps me reason about recent changes without bloating the file.
55
+
56
+ 4. `systemPatterns.md`
57
+ - System architecture
58
+ - Key technical decisions
59
+ - Design patterns in use
60
+ - Component relationships
61
+ - Critical implementation paths
62
+
63
+ 5. `techContext.md`
64
+ - Technologies used
65
+ - Development setup
66
+ - Technical constraints
67
+ - Dependencies
68
+ - Tool usage patterns
69
+
70
+ 6. `progress.md`
71
+ - What works
72
+ - What's left to build
73
+ - Current status
74
+ - Known issues
75
+ - Evolution of project decisions
76
+
77
+ 7. `changelog.md`
78
+ - Chronological log of key changes, decisions, or versions
79
+ - Follows a `CHANGELOG.md` convention with version/date headers
80
+ - Example format:
81
+ ```markdown
82
+ ## [1.0.3] - 2025-06-14
83
+ ### Changed
84
+ - Switched from REST to GraphQL
85
+ - Refactored notification system for async retries
86
+
87
+ ### Fixed
88
+ - Resolved mobile auth bug on Android
89
+
90
+ ### Added
91
+ - Timeline.md summary added to support project retrospectives
92
+ ```
93
+
94
+ ---
95
+
96
+ ## Core Workflows
97
+
98
+ ### Plan Mode
99
+ ```mermaid
100
+ flowchart TD
101
+ Start[Start] --> ReadFiles[Read Memory Bank]
102
+ ReadFiles --> CheckFiles{Files Complete?}
103
+
104
+ CheckFiles -->|No| Plan[Create Plan]
105
+ Plan --> Document[Document in Chat]
106
+
107
+ CheckFiles -->|Yes| Verify[Verify Context]
108
+ Verify --> Strategy[Develop Strategy]
109
+ Strategy --> Present[Present Approach]
110
+ ```
111
+
112
+ ### Act Mode
113
+ ```mermaid
114
+ flowchart TD
115
+ Start[Start] --> Context[Check Memory Bank]
116
+ Context --> Update[Update Documentation]
117
+ Update --> Execute[Execute Task]
118
+ Execute --> Document[Document Changes]
119
+ ```
120
+
121
+ ---
122
+
123
+ ## Documentation Updates
124
+
125
+ Updates occur when:
126
+ 1. Discovering new project patterns
127
+ 2. After significant changes
128
+ 3. When user requests **update memory bank**
129
+ 4. When context changes or decisions occur
130
+ 5. When **time-based updates** are needed
131
+
132
+ ### Update Process
133
+ ```mermaid
134
+ flowchart TD
135
+ Start[Update Process]
136
+
137
+ subgraph Process
138
+ P1[Review ALL Files]
139
+ P2[Document Current State]
140
+ P3[Clarify Next Steps]
141
+ P4[Document Insights & Patterns]
142
+ P5[Update progress.md]
143
+ P6[Slide activeContext.md to keep latest 10 entries]
144
+ P7[Append changelog.md]
145
+
146
+ P1 --> P2 --> P3 --> P4 --> P5 --> P6 --> P7
147
+ end
148
+
149
+ Start --> Process
150
+ ```
151
+
152
+ ---
153
+
154
+ ## Reminder
155
+
156
+ After every memory reset, I begin completely fresh. The Memory Bank is my only link to previous work. It must be maintained with precision and clarity — especially with time-aware reasoning. Read, interpret, and act on temporal data carefully.
157
+
.gitignore ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Environment variables
2
+ .env
3
+ .env.local
4
+ .env.*.local
5
+
6
+ # Python
7
+ __pycache__/
8
+ *.py[cod]
9
+ *$py.class
10
+ *.so
11
+ .Python
12
+ build/
13
+ develop-eggs/
14
+ dist/
15
+ downloads/
16
+ eggs/
17
+ .eggs/
18
+ lib/
19
+ lib64/
20
+ parts/
21
+ sdist/
22
+ var/
23
+ wheels/
24
+ *.egg-info/
25
+ .installed.cfg
26
+ *.egg
27
+
28
+ # Virtual environments
29
+ venv/
30
+ ENV/
31
+ env/
32
+
33
+ # Model cache
34
+ my_model_cache/
35
+ *.bin
36
+ *.safetensors
37
+
38
+ # IDE
39
+ .vscode/
40
+ .idea/
41
+ *.swp
42
+ *.swo
43
+ *~
44
+
45
+ # OS
46
+ .DS_Store
47
+ Thumbs.db
48
+
49
+ # Logs
50
+ *.log
51
+ logs/
52
+
53
+ # Temporary files
54
+ *.tmp
55
+ *.temp
Dockerfile ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Read the doc: https://huggingface.co/docs/hub/spaces-sdks-docker
2
+ # you will also find guides on how best to write your Dockerfile
3
+
4
+ FROM python:3.12.4
5
+
6
+ RUN useradd -m -u 1000 user
7
+ USER user
8
+ ENV PATH="/home/user/.local/bin:$PATH"
9
+
10
+ WORKDIR /app
11
+
12
+ COPY --chown=user ./requirements.txt requirements.txt
13
+ RUN pip install --no-cache-dir --upgrade -r requirements.txt
14
+
15
+ COPY --chown=user . /app
16
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
app-reference.py ADDED
@@ -0,0 +1,259 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ FastAPI application for FunctionGemma with HuggingFace login support.
4
+ This file is designed to be run with: uvicorn app:app --host 0.0.0.0 --port 7860
5
+ """
6
+
7
+ import os
8
+ import sys
9
+ from pathlib import Path
10
+ from fastapi import FastAPI
11
+ from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
12
+ from huggingface_hub import login
13
+
14
+ # Global variables
15
+ model_name = None
16
+ pipe = None
17
+ tokenizer = None # Add global tokenizer
18
+ app = FastAPI(title="FunctionGemma API", version="1.0.0")
19
+
20
+ def check_and_download_model():
21
+ """Check if model exists in cache, if not download it"""
22
+ global model_name, tokenizer # Include tokenizer in global
23
+
24
+ # Use TinyLlama - a fully public model
25
+ # model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
26
+ model_name = "unsloth/functiongemma-270m-it"
27
+ # model_name = "Qwen/Qwen3-0.6B"
28
+ cache_dir = "./my_model_cache"
29
+
30
+ # Check if model already exists in cache
31
+ model_path = Path(cache_dir) / f"models--{model_name.replace('/', '--')}"
32
+ snapshot_path = model_path / "snapshots"
33
+
34
+ if snapshot_path.exists() and any(snapshot_path.iterdir()):
35
+ print(f"✓ Model {model_name} already exists in cache")
36
+ tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir) # Load tokenizer if model exists
37
+ return model_name, cache_dir
38
+
39
+ print(f"✗ Model {model_name} not found in cache")
40
+ print("Downloading model...")
41
+
42
+ # Login to Hugging Face (optional, for gated models)
43
+ token = os.getenv("HUGGINGFACE_TOKEN")
44
+ if token:
45
+ try:
46
+ print("Logging in to Hugging Face...")
47
+ login(token=token)
48
+ print("✓ HuggingFace login successful!")
49
+ except Exception as e:
50
+ print(f"⚠ Login failed: {e}")
51
+ print("Continuing without login (public models only)")
52
+ else:
53
+ print("ℹ No HUGGINGFACE_TOKEN set - using public models only")
54
+
55
+ try:
56
+ # Download tokenizer
57
+ print("Loading tokenizer...")
58
+ tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)
59
+ print("✓ Tokenizer loaded successfully!")
60
+
61
+ # Download model
62
+ print("Loading model...")
63
+ model = AutoModelForCausalLM.from_pretrained(model_name, cache_dir=cache_dir)
64
+ print("✓ Model loaded successfully!")
65
+
66
+ print(f"✓ Model and tokenizer downloaded successfully to {cache_dir}")
67
+ return model_name, cache_dir
68
+
69
+ except Exception as e:
70
+ print(f"✗ Error downloading model: {e}")
71
+ print("\nPossible reasons:")
72
+ print("1. Model requires authentication - set HUGGINGFACE_TOKEN in .env")
73
+ print("2. Model is gated and you don't have access")
74
+ print("3. Network connection issues")
75
+ sys.exit(1)
76
+
77
+ def initialize_pipeline():
78
+ """Initialize the pipeline with the model"""
79
+ global pipe, model_name, tokenizer # Include tokenizer in global
80
+
81
+ if model_name is None:
82
+ model_name, _ = check_and_download_model()
83
+
84
+ if tokenizer is None: # Ensure tokenizer is loaded
85
+ tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="./my_model_cache")
86
+
87
+ print(f"Initializing pipeline with {model_name}...")
88
+ pipe = pipeline("text-generation", model=model_name, tokenizer=tokenizer) # Pass tokenizer to pipeline
89
+ print("✓ Pipeline initialized successfully!")
90
+
91
+ # API Endpoints
92
+ @app.get("/")
93
+ def greet_json():
94
+ return {
95
+ "message": "FunctionGemma API is running!",
96
+ "model": model_name,
97
+ "status": "ready"
98
+ }
99
+
100
+ @app.get("/health")
101
+ def health_check():
102
+ return {"status": "healthy", "model": model_name}
103
+
104
+ @app.get("/generate")
105
+ def generate_text(prompt: str = "Who are you?"):
106
+ """Generate text using the model"""
107
+ if pipe is None:
108
+ initialize_pipeline()
109
+
110
+ messages = [{"role": "user", "content": prompt}]
111
+ result = pipe(messages, max_new_tokens=1000)
112
+ return {"response": result[0]["generated_text"]}
113
+
114
+ @app.post("/chat")
115
+ def chat_completion(messages: list):
116
+ """Chat completion endpoint"""
117
+ if pipe is None:
118
+ initialize_pipeline()
119
+
120
+ result = pipe(messages, max_new_tokens=200)
121
+ return {"response": result[0]["generated_text"]}
122
+
123
+ @app.post("/v1/chat/completions")
124
+ def openai_chat_completions(request: dict):
125
+ """
126
+ OpenAI-compatible chat completions endpoint
127
+ Expected request format:
128
+ {
129
+ "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
130
+ "messages": [
131
+ {"role": "user", "content": "Hello"}
132
+ ],
133
+ "max_tokens": 100,
134
+ "temperature": 0.7
135
+ }
136
+ """
137
+ if pipe is None:
138
+ initialize_pipeline()
139
+
140
+ import time
141
+
142
+ messages = request.get("messages", [])
143
+ model = request.get("model", model_name)
144
+ max_tokens = request.get("max_tokens", 1000)
145
+ temperature = request.get("temperature", 0.7)
146
+
147
+ print('\n\n request')
148
+ print(request)
149
+ print('\n\n messages')
150
+ print(messages)
151
+ print('\n\n model')
152
+ print(model)
153
+ print('\n\n max_tokens')
154
+ print(max_tokens)
155
+ print('\n\n temperature')
156
+ print(temperature)
157
+
158
+ # Generate response
159
+ result = pipe(
160
+ messages,
161
+ max_new_tokens=max_tokens,
162
+ # temperature=temperature
163
+ )
164
+
165
+ result = convert_json_format(result)
166
+
167
+
168
+ completion_id = f"chatcmpl-{int(time.time())}"
169
+ created = int(time.time())
170
+
171
+ return_json = {
172
+ "id": completion_id,
173
+ "object": "chat.completion",
174
+ "created": created,
175
+ "model": model,
176
+ "choices": [
177
+ {
178
+ "index": 0,
179
+ "message": {
180
+ "role": "assistant",
181
+ "content": result["generations"][0][0]["text"] # Corrected access
182
+ },
183
+ "finish_reason": "stop"
184
+ }
185
+ ],
186
+ "usage": {
187
+ "prompt_tokens": 0,
188
+ "completion_tokens": 0,
189
+ "total_tokens": 0
190
+ }
191
+ }
192
+
193
+ # Calculate prompt tokens
194
+ if tokenizer:
195
+ prompt_text = ""
196
+ for message in messages:
197
+ prompt_text += message.get("content", "") + " "
198
+ prompt_tokens = len(tokenizer.encode(prompt_text.strip()))
199
+ return_json["usage"]["prompt_tokens"] = prompt_tokens
200
+
201
+ # Calculate completion tokens
202
+ if tokenizer and result["generations"]:
203
+ completion_text = result["generations"][0][0]["text"]
204
+ completion_tokens = len(tokenizer.encode(completion_text))
205
+ return_json["usage"]["completion_tokens"] = completion_tokens
206
+
207
+ return_json["usage"]["total_tokens"] = return_json["usage"]["prompt_tokens"] + return_json["usage"]["completion_tokens"]
208
+
209
+ print('\n\n return_json')
210
+ print(return_json)
211
+ print('return over! \n\n')
212
+
213
+ return return_json
214
+
215
+ # Initialize model on startup
216
+ @app.on_event("startup")
217
+ async def startup_event():
218
+ """Initialize the model when the app starts"""
219
+ print("=" * 60)
220
+ print("FunctionGemma FastAPI Server")
221
+ print("=" * 60)
222
+ print("Initializing model...")
223
+ initialize_pipeline()
224
+ print("\n" + "=" * 60)
225
+ print("Server ready at http://0.0.0.0:7860")
226
+ print("Available endpoints:")
227
+ print(" GET / - Welcome message")
228
+ print(" GET /health - Health check")
229
+ print(" GET /generate?prompt=... - Generate text with prompt")
230
+ print(" POST /chat - Chat completion")
231
+ print(" POST /v1/chat/completions - OpenAI-compatible endpoint")
232
+ print("=" * 60 + "\n")
233
+
234
+ import re
235
+
236
+ def convert_json_format(input_data):
237
+ output_generations = []
238
+ for item in input_data:
239
+ generated_text_list = item.get('generated_text', [])
240
+
241
+ assistant_content = ""
242
+ for message in generated_text_list:
243
+ if message.get('role') == 'assistant':
244
+ assistant_content = message.get('content', '')
245
+ break # Assuming only one assistant response per generated_text
246
+
247
+ # Remove <think>...</think> tags
248
+ clean_content = re.sub(r'<think>.*?</think>\s*', '', assistant_content, flags=re.DOTALL).strip()
249
+
250
+ output_generations.append([
251
+ {
252
+ "text": clean_content,
253
+ "generationInfo": {
254
+ "finish_reason": "stop"
255
+ }
256
+ }
257
+ ])
258
+
259
+ return {"generations": output_generations}
app.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI
2
+
3
+ # 初始化 FastAPI 应用
4
+ app = FastAPI(title="HF-Model-Runner API", version="0.0.1")
5
+
6
+ model_name = None
7
+
8
+ @app.get("/")
9
+ async def read_root():
10
+ return {"message": "Welcome to HF-Model-Runner API! Visit /docs for API documentation."}
11
+ @app.get("/")
12
+ def greet_json():
13
+ return {
14
+ "message": "HF-Model-Runner API is running!",
15
+ "model": model_name,
16
+ "status": "ready"
17
+ }
memory-bank/activeContext.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Active Context
2
+
3
+ **Current Work Focus:**
4
+ - Integrating a Hugging Face model into `app.py`.
5
+ - Creating API endpoints for model interaction.
6
+
7
+ **Recent Changes:**
8
+ - 2026-01-01: Created `projectBrief.md`, `productContext.md`, `systemPatterns.md`, `techContext.md`, `activeContext.md`, `progress.md`, and `changelog.md` in the `memory-bank` directory.
9
+ - 2026-01-01: Modified `app.py` to implement the basic FastAPI structure.
10
+ - 2026-01-01: Integrated a Hugging Face sentiment analysis model (`distilbert-base-uncased-finetuned-sst-2-english`) into `app.py` and added a `/predict` API endpoint.
11
+
12
+ **Next Steps:**
13
+ - Finalize deployment on Hugging Face Spaces.
14
+
15
+ **Active Decisions and Considerations:**
16
+ - The FastAPI application will run on port 7860, as is common for Hugging Face Spaces.
17
+ - The initial `app.py` now includes a functional model inference endpoint.
18
+
19
+ **Important Patterns and Preferences:**
20
+ - Adhere to the Memory Bank documentation structure and update process.
21
+
22
+ **Learnings and Project Insights:**
23
+ - The Memory Bank is crucial for maintaining context across sessions.
memory-bank/changelog.md ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Changelog
2
+
3
+ ## [0.0.1] - 2026-01-01
4
+ ### Added
5
+ - Initial setup of `memory-bank` directory and core documentation files:
6
+ - `projectBrief.md`
7
+ - `productContext.md`
8
+ - `systemPatterns.md`
9
+ - `techContext.md`
10
+ - `activeContext.md`
11
+ - `progress.md`
12
+ - Defined initial project scope, product context, system architecture, technical stack, active work focus, and project progress.
memory-bank/productContext.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Product Context
2
+
3
+ This project provides a web API for a Hugging Face model, allowing other applications or users to interact with the model programmatically.
4
+
5
+ **Problems it solves:**
6
+ - Enables easy access to Hugging Face models via a standard API.
7
+ - Simplifies integration of AI models into other services.
8
+
9
+ **How it should work:**
10
+ - Users send HTTP requests to the API endpoints.
11
+ - The API processes the request, interacts with the loaded Hugging Face model, and returns a response.
12
+
13
+ **User experience goals:**
14
+ - Simple and intuitive API interface.
15
+ - Fast and reliable model inference.
16
+ - Clear documentation for API usage.
memory-bank/progress.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Progress
2
+
3
+ **What Works:**
4
+ - The `memory-bank` directory has been created.
5
+ - Core Memory Bank files (`projectBrief.md`, `productContext.md`, `systemPatterns.md`, `techContext.md`, `activeContext.md`) have been initialized with relevant project context.
6
+
7
+ **What's Left to Build:**
8
+ - Implement the minimal FastAPI application in `app.py`.
9
+ - Ensure `requirements.txt` contains `fastapi` and `uvicorn`.
10
+ - Integrate a Hugging Face model.
11
+ - Create API endpoints for model interaction.
12
+ - Finalize deployment on Hugging Face Spaces.
13
+
14
+ **Current Status:**
15
+ - Documentation setup is nearly complete.
16
+ - Ready to proceed with code implementation.
17
+
18
+ **Known Issues:**
19
+ - None at this stage.
20
+
21
+ **Evolution of Project Decisions:**
22
+ - Initial focus on establishing a robust documentation foundation before coding.
memory-bank/projectBrief.md ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # Project Brief
2
+
3
+ This project aims to create a Hugging Face Space application that loads and exposes a Hugging Face model for user interaction via a FastAPI interface.
4
+
5
+ **Core Requirements:**
6
+ - Implement a minimal FastAPI application in `app.py`.
7
+ - Load a Hugging Face model.
8
+ - Provide an API endpoint to interact with the loaded model.
9
+ - Deploy the application on Hugging Face Spaces.
memory-bank/systemPatterns.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # System Patterns
2
+
3
+ **System Architecture:**
4
+ - FastAPI for the web API.
5
+ - Hugging Face Transformers library for model loading and inference.
6
+ - Deployed on Hugging Face Spaces.
7
+
8
+ **Key Technical Decisions:**
9
+ - Use FastAPI for its performance and automatic interactive API documentation (Swagger UI).
10
+ - Leverage Hugging Face's ecosystem for model management and deployment.
11
+
12
+ **Design Patterns in Use:**
13
+ - **MVC (Model-View-Controller) variant:** FastAPI acts as the controller, handling requests and responses. The Hugging Face model is the "model" (data/logic). There's no explicit "view" as it's an API.
14
+ - **Dependency Injection:** FastAPI's dependency injection system will be used for managing model loading and other resources.
15
+
16
+ **Component Relationships:**
17
+ - `app.py`: Main FastAPI application, defines routes and interacts with the model.
18
+ - Hugging Face Model: Loaded and used by `app.py` for inference.
19
+ - `requirements.txt`: Specifies Python dependencies.
20
+ - `Dockerfile` (if used): Defines the environment for deployment.
memory-bank/techContext.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tech Context
2
+
3
+ **Technologies Used:**
4
+ - **Python:** Primary programming language.
5
+ - **FastAPI:** Web framework for building the API.
6
+ - **Hugging Face Transformers:** Library for loading and using pre-trained models.
7
+ - **Uvicorn:** ASGI server to run the FastAPI application.
8
+
9
+ **Development Setup:**
10
+ - **Conda:** Environment management for Python.
11
+ - **pip:** Package installer for Python.
12
+ - **Git:** Version control.
13
+
14
+ **Technical Constraints:**
15
+ - Deployment on Hugging Face Spaces requires adherence to their environment specifications (e.g., `requirements.txt`, `app.py` as the main entry point).
16
+ - Model size and inference speed will be factors for performance on Hugging Face Spaces.
17
+
18
+ **Dependencies:**
19
+ - `fastapi`
20
+ - `uvicorn`
21
+ - `transformers` (for model loading)
22
+ - `torch` or `tensorflow` (as backend for transformers, depending on the model)
23
+
24
+ **Tool Usage Patterns:**
25
+ - `conda activate airs`: To activate the development environment.
26
+ - `pip install -r requirements.txt`: To install dependencies.
27
+ - `uvicorn app:app --host 0.0.0.0 --port 7860`: To run the FastAPI application locally (Hugging Face Spaces typically uses port 7860).
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ fastapi
2
+ uvicorn[standard]
3
+ transformers
4
+ huggingface_hub
5
+ torch
6
+ accelerate
7
+ python-multipart