Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,82 @@
|
|
| 1 |
-
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Nanbeige4.1-3B Inference Server
|
| 2 |
+
|
| 3 |
+
Lightweight remote LLM inference service for Enterprise ReAct Agent systems.
|
| 4 |
+
|
| 5 |
+
## Overview
|
| 6 |
+
|
| 7 |
+
This Hugging Face Space hosts the **Nanbeige4.1-3B** model as a remote inference API, designed to work with local agent orchestration systems. The model runs entirely in this Space, while all agent logic, tools, and memory systems run on the user's local machine.
|
| 8 |
+
|
| 9 |
+
## Model Information
|
| 10 |
+
|
| 11 |
+
- **Model**: [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B)
|
| 12 |
+
- **Parameters**: 3B
|
| 13 |
+
- **Context Window**: 8K tokens
|
| 14 |
+
- **Capabilities**: Tool calling, reasoning, 500+ tool invocation rounds
|
| 15 |
+
- **License**: Apache 2.0
|
| 16 |
+
|
| 17 |
+
## API Endpoints
|
| 18 |
+
|
| 19 |
+
### POST /chat
|
| 20 |
+
Main chat completion endpoint (OpenAI-compatible).
|
| 21 |
+
|
| 22 |
+
**Request:**
|
| 23 |
+
```json
|
| 24 |
+
{
|
| 25 |
+
"messages": [
|
| 26 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
| 27 |
+
{"role": "user", "content": "Hello!"}
|
| 28 |
+
],
|
| 29 |
+
"tools": [...],
|
| 30 |
+
"stream": false,
|
| 31 |
+
"max_tokens": 2048,
|
| 32 |
+
"temperature": 0.6,
|
| 33 |
+
"top_p": 0.95
|
| 34 |
+
}
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
**Response:**
|
| 38 |
+
```json
|
| 39 |
+
{
|
| 40 |
+
"id": "chatcmpl-...",
|
| 41 |
+
"object": "chat.completion",
|
| 42 |
+
"created": 1234567890,
|
| 43 |
+
"model": "Nanbeige/Nanbeige4.1-3B",
|
| 44 |
+
"choices": [...],
|
| 45 |
+
"usage": {
|
| 46 |
+
"prompt_tokens": 20,
|
| 47 |
+
"completion_tokens": 50,
|
| 48 |
+
"total_tokens": 70
|
| 49 |
+
}
|
| 50 |
+
}
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
### GET /chat
|
| 54 |
+
Web interface for testing.
|
| 55 |
+
|
| 56 |
+
### GET /health
|
| 57 |
+
Health check endpoint.
|
| 58 |
+
|
| 59 |
+
## Usage with Local Agent
|
| 60 |
+
|
| 61 |
+
```python
|
| 62 |
+
import requests
|
| 63 |
+
|
| 64 |
+
response = requests.post(
|
| 65 |
+
"https://your-space.hf.space/chat",
|
| 66 |
+
json={
|
| 67 |
+
"messages": [{"role": "user", "content": "Hello!"}],
|
| 68 |
+
"temperature": 0.6
|
| 69 |
+
}
|
| 70 |
+
)
|
| 71 |
+
result = response.json()
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
## Hardware Requirements
|
| 75 |
+
|
| 76 |
+
- **GPU**: Recommended (CUDA-compatible)
|
| 77 |
+
- **CPU**: Fallback supported
|
| 78 |
+
- **Memory**: ~8GB RAM minimum
|
| 79 |
+
|
| 80 |
+
## Local Agent Repository
|
| 81 |
+
|
| 82 |
+
For the complete local agent system that connects to this Space, see the companion repository.
|