Minecraft Bot LLM Backend

01 / Architecture

🗺Luồng xử lý hệ thống

Minecraft Bot
(Node.js / Python)

──▶

Bearer Token
Header Auth

──▶

FastAPI Server
:7860 /v1/chat/completions

──▶

llama-cpp-python
Qwen2.5-Coder GGUF

📦 Hosted on

Hugging Face Spaces (Docker)

· Model downloaded at build time · ENV secrets injected at runtime

02 / Model Selection

🧠Model được đề xuất: Qwen2.5-Coder-7B-Instruct Q4_K_M

Model	Size	RAM ~	Coding	Reasoning	License	GGUF
★ Qwen2.5-Coder-7B-Instruct Q4_K_M	7B	~5.5 GB	✔ Excellent	✔ Strong	Apache 2.0	✔
DeepSeek-Coder-V2-Lite-Instruct Q4	16B MoE	~8 GB	✔ Excellent	✔ Excellent	DeepSeek	✔
Phi-3.5-mini-instruct Q4	3.8B	~2.5 GB	✔ Good	✔ Good	MIT	✔
CodeLlama-7B-Instruct Q4	7B	~5 GB	✔ Good	✘ Weaker	Llama 2	✔

⚠ HF Spaces free tier RAM ~ 16 GB · Qwen2.5-Coder-7B Q4_K_M (~4.4 GB file, ~5.5 GB runtime) là lựa chọn an toàn nhất về tài nguyên và chất lượng.

03 / Source Files

📁Các file cấu hình & mã nguồn

🐳 Dockerfile

DOCKER

# ── Stage 1: builder (compile llama-cpp-python)
FROM python:3.11-slim AS builder
RUN apt-get install -y build-essential cmake wget ...

ENV CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
RUN pip install -r requirements.txt --target /build/deps

# ── Stage 2: runtime (slim image)
FROM python:3.11-slim
RUN useradd -m -u 1000 user   # HF Spaces yêu cầu non-root
USER user

# Download GGUF model tại BUILD time (~4.4 GB)
RUN python -c "from huggingface_hub import hf_hub_download; \
    hf_hub_download(repo_id='Qwen/Qwen2.5-Coder-7B-Instruct-GGUF', \
    filename='qwen2.5-coder-7b-instruct-q4_k_m.gguf', \
    local_dir='/app/models')"

ENV MODEL_PATH=/app/models/qwen2.5-coder-7b-instruct-q4_k_m.gguf
# BEARER_TOKEN được inject từ HF Secret — không hard-code ở đây!

EXPOSE 7860
CMD ["python", "-m", "uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

🐍 app.py — FastAPI Server

PYTHON

from fastapi import FastAPI, HTTPException, Depends
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
import os

BEARER_TOKEN = os.environ.get("BEARER_TOKEN", "")  # ← từ HF Secret

# Auth middleware
def verify_token(creds: HTTPAuthorizationCredentials):
    if creds.credentials != BEARER_TOKEN:
        raise HTTPException(401, "Invalid Bearer Token")

# OpenAI-compatible endpoint
@app.post("/v1/chat/completions", dependencies=[Depends(verify_token)])
async def chat_completions(request: ChatCompletionRequest):
    result = llm.create_chat_completion(
        messages=[{"role": m.role, "content": m.content} for m in request.messages],
        max_tokens=request.max_tokens,
        temperature=request.temperature,
    )
    return ChatCompletionResponse(...)   # wrapped in OpenAI schema

📋 requirements.txt

TXT

llama-cpp-python==0.3.4     # GGUF inference engine
fastapi==0.115.6          # API framework
uvicorn[standard]==0.32.1  # ASGI server
pydantic==2.10.3          # Schema validation
huggingface-hub==0.27.0   # Model download
httpx==0.28.1             # HTTP client

🟨 client_test.js — Node.js client

NODE.JS

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://<username>-<space>.hf.space/v1",
  apiKey:  process.env.BEARER_TOKEN,
});

const response = await client.chat.completions.create({
  model:    "qwen2.5-coder-7b-instruct",
  messages: [
    { role: "system",  content: "You are a Minecraft bot brain..." },
    { role: "user",    content: "Bot at x=120. Nearest: oak_log. Chop it." },
  ],
  max_tokens:  512,
  temperature: 0.2,
});

🐍 client_test.py — Python client

PYTHON

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://<username>-<space>.hf.space/v1",
    api_key=os.environ.get("BEARER_TOKEN"),
)

response = client.chat.completions.create(
    model="qwen2.5-coder-7b-instruct",
    messages=[
        {"role": "system", "content": "You are a Minecraft bot brain..."},
        {"role": "user",   "content": "Bot at x=120. Nearest: oak_log. Chop it."},
    ],
    max_tokens=512,
    temperature=0.2,
)

04 / Deployment

🔐Cấu hình Secret trên Hugging Face Spaces

Mở Settings của Space

Vào Space của bạn → tab Settings → cuộn xuống mục "Repository secrets".

Thêm Secret mới

Click "New secret" → điền BEARER_TOKEN vào trường Name → điền token bí mật của bạn vào Value.

Save & Rebuild

HF sẽ inject giá trị này như biến môi trường vào container lúc runtime. Container sẽ tự rebuild. Token không bao giờ xuất hiện trong log hay image layer.

Dùng token khi gọi API

Client phải gửi header: Authorization: Bearer <your-token>. Đặt token vào biến môi trường phía client (BEARER_TOKEN) để tránh hard-code.

🚫 Tuyệt đối KHÔNG làm

✘Hard-code token trong Dockerfile, app.py, hay bất kỳ file nào commit lên repo

✘Dùng ENV trong Dockerfile để set BEARER_TOKEN (sẽ bị lộ trong image layer)

✘In token ra console hay log file

05 / Summary

📊Thông số hệ thống

Port

7860

Mặc định của HF Spaces Docker

Model RAM usage

~5.5 GB

Qwen2.5-7B Q4_K_M — an toàn với 16 GB

Context window

4096 tokens

Tunable qua N_CTX env var

API format

OpenAI v1

/v1/chat/completions · /v1/models

Auth method

Bearer Token

Đọc từ HF Secret BEARER_TOKEN

Build strategy

Multi-stage

Builder + slim runtime, model pre-downloaded