react / README.md
Ashok75's picture
Update README.md
a6112f3 verified
|
raw
history blame
2.08 kB
metadata
title: React
emoji: 🌍
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 6.8.0
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Nanbeige4.1-3B Inference Server

Lightweight remote LLM inference service for Enterprise ReAct Agent systems.

Overview

This Hugging Face Space hosts the Nanbeige4.1-3B model as a remote inference API, designed to work with local agent orchestration systems. The model runs entirely in this Space, while all agent logic, tools, and memory systems run on the user's local machine.

Model Information

  • Model: Nanbeige/Nanbeige4.1-3B
  • Parameters: 3B
  • Context Window: 8K tokens
  • Capabilities: Tool calling, reasoning, 500+ tool invocation rounds
  • License: Apache 2.0

API Endpoints

POST /chat

Main chat completion endpoint (OpenAI-compatible).

Request:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "tools": [...],
  "stream": false,
  "max_tokens": 2048,
  "temperature": 0.6,
  "top_p": 0.95
}

Response:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "Nanbeige/Nanbeige4.1-3B",
  "choices": [...],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 50,
    "total_tokens": 70
  }
}

GET /chat

Web interface for testing.

GET /health

Health check endpoint.

Usage with Local Agent

import requests

response = requests.post(
    "https://your-space.hf.space/chat",
    json={
        "messages": [{"role": "user", "content": "Hello!"}],
        "temperature": 0.6
    }
)
result = response.json()

Hardware Requirements

  • GPU: Recommended (CUDA-compatible)
  • CPU: Fallback supported
  • Memory: ~8GB RAM minimum

Local Agent Repository

For the complete local agent system that connects to this Space, see the companion repository.