metadata
title: React
emoji: 🌍
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 6.8.0
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Nanbeige4.1-3B Inference Server
Lightweight remote LLM inference service for Enterprise ReAct Agent systems.
Overview
This Hugging Face Space hosts the Nanbeige4.1-3B model as a remote inference API, designed to work with local agent orchestration systems. The model runs entirely in this Space, while all agent logic, tools, and memory systems run on the user's local machine.
Model Information
- Model: Nanbeige/Nanbeige4.1-3B
- Parameters: 3B
- Context Window: 8K tokens
- Capabilities: Tool calling, reasoning, 500+ tool invocation rounds
- License: Apache 2.0
API Endpoints
POST /chat
Main chat completion endpoint (OpenAI-compatible).
Request:
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"tools": [...],
"stream": false,
"max_tokens": 2048,
"temperature": 0.6,
"top_p": 0.95
}
Response:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1234567890,
"model": "Nanbeige/Nanbeige4.1-3B",
"choices": [...],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 50,
"total_tokens": 70
}
}
GET /chat
Web interface for testing.
GET /health
Health check endpoint.
Usage with Local Agent
import requests
response = requests.post(
"https://your-space.hf.space/chat",
json={
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.6
}
)
result = response.json()
Hardware Requirements
- GPU: Recommended (CUDA-compatible)
- CPU: Fallback supported
- Memory: ~8GB RAM minimum
Local Agent Repository
For the complete local agent system that connects to this Space, see the companion repository.