Instructions to use NovaAI6868/BaiHu-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use NovaAI6868/BaiHu-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="NovaAI6868/BaiHu-gguf", filename="BaiHu-v2.F16-mmproj.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use NovaAI6868/BaiHu-gguf with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf NovaAI6868/BaiHu-gguf:F16 # Run inference directly in the terminal: llama cli -hf NovaAI6868/BaiHu-gguf:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf NovaAI6868/BaiHu-gguf:F16 # Run inference directly in the terminal: llama cli -hf NovaAI6868/BaiHu-gguf:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf NovaAI6868/BaiHu-gguf:F16 # Run inference directly in the terminal: ./llama-cli -hf NovaAI6868/BaiHu-gguf:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf NovaAI6868/BaiHu-gguf:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf NovaAI6868/BaiHu-gguf:F16
Use Docker
docker model run hf.co/NovaAI6868/BaiHu-gguf:F16
- LM Studio
- Jan
- vLLM
How to use NovaAI6868/BaiHu-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "NovaAI6868/BaiHu-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NovaAI6868/BaiHu-gguf", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/NovaAI6868/BaiHu-gguf:F16
- Ollama
How to use NovaAI6868/BaiHu-gguf with Ollama:
ollama run hf.co/NovaAI6868/BaiHu-gguf:F16
- Unsloth Studio
How to use NovaAI6868/BaiHu-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NovaAI6868/BaiHu-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NovaAI6868/BaiHu-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for NovaAI6868/BaiHu-gguf to start chatting
- Pi
How to use NovaAI6868/BaiHu-gguf with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf NovaAI6868/BaiHu-gguf:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "NovaAI6868/BaiHu-gguf:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use NovaAI6868/BaiHu-gguf with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf NovaAI6868/BaiHu-gguf:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default NovaAI6868/BaiHu-gguf:F16
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use NovaAI6868/BaiHu-gguf with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf NovaAI6868/BaiHu-gguf:F16
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "NovaAI6868/BaiHu-gguf:F16" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use NovaAI6868/BaiHu-gguf with Docker Model Runner:
docker model run hf.co/NovaAI6868/BaiHu-gguf:F16
- Lemonade
How to use NovaAI6868/BaiHu-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull NovaAI6868/BaiHu-gguf:F16
Run and chat with the model
lemonade run user.BaiHu-gguf-F16
List all available models
lemonade list
| language: zh | |
| license: mit | |
| tags: | |
| - multimodal | |
| - vision | |
| - audio | |
| - video | |
| - chinese | |
| - gguf | |
| - llama.cpp | |
| pipeline_tag: image-text-to-text | |
| # 白虎-v2 | |
| 白虎-v2 是一款多模态大语言模型,支持文本、图像、音频与视频输入,适用于中文场景下的多模态理解与生成任务。 | |
| --- | |
| ## 模型简介 | |
| - **模型名称**: 白虎-v2 | |
| - **上下文长度**: 131,072 tokens | |
| - **词表大小**: 262,144 | |
| - **数据类型**: float16 | |
| - **支持模态**: 文本 / 图像 / 音频 / 视频 | |
| --- | |
| ## 仓库文件说明 | |
| | 文件 | 说明 | | |
| |------|------| | |
| | `BaiHu-v2.Q4_K_M.gguf` | 主模型 GGUF 量化版本(Q4_K_M),适合本地 CPU/GPU 推理 | | |
| | `BaiHu-v2.F16-mmproj.gguf` | 多模态投影层(mmproj)FP16 版本,配合主模型用于图像/音频/视频理解 | | |
| | `Modelfile` | llama.cpp / Ollama 的模型配置文件示例 | | |
| > 推荐搭配使用:`BaiHu-v2.Q4_K_M.gguf` + `BaiHu-v2.F16-mmproj.gguf` | |
| --- | |
| ## 模型能力 | |
| - 中文多轮对话 | |
| - 图像描述与视觉问答 | |
| - 音频内容理解 | |
| - 视频内容理解 | |
| - 工具调用 | |
| --- | |
| ## 使用方法 | |
| ### 使用 llama.cpp / Ollama 推理 | |
| 参考仓库中的 `Modelfile` 创建 Ollama 模型: | |
| ```bash | |
| ollama create BaiHu-v2 -f Modelfile | |
| ollama run BaiHu-v2 | |
| ``` | |
| ### 使用 llama.cpp 命令行 | |
| ```bash | |
| ./llama-cli \ | |
| -m BaiHu-v2.Q4_K_M.gguf \ | |
| --mmproj BaiHu-v2.F16-mmproj.gguf \ | |
| --image example.jpg \ | |
| -p "请描述这张图片:" | |
| ``` | |
| ### 使用 transformers(完整模型) | |
| 完整 PyTorch/Safetensors 版本请参考配套仓库。本仓库仅提供 GGUF 量化版本。 | |
| --- | |
| ## 模型配置 | |
| - **文本模型**: 35 层,隐藏维度 1536,8 头注意力 | |
| - **视觉编码器**: 16 层,隐藏维度 768,图像 token 数 280 | |
| - **音频编码器**: 12 层,隐藏维度 1024 | |
| - **视频**: 支持 32 帧采样,每帧最大 70 个 soft token | |
| --- | |
| ## 训练信息 | |
| - **训练框架**: Unsloth | |
| - **Unsloth 版本**: 2026.6.8 | |
| - **优化目标**: 在保持多模态能力的同时,提升中文指令跟随与对话质量 | |
| --- | |
| ## 免责声明 | |
| 本模型生成的内容可能受训练数据影响。请勿将模型输出作为专业建议(医疗、法律、金融等)使用。模型可能存在幻觉、偏见或不准确信息,请谨慎使用并自行验证。 | |
| --- | |
| ## 授权协议 | |
| 本模型采用 **MIT 协议** 开源。使用本模型前请仔细阅读并遵守 MIT 许可协议条款。 | |
| --- | |
| ## 致谢 | |
| - [Unsloth](https://unsloth.ai/) | |
| - [llama.cpp](https://github.com/ggerganov/llama.cpp) | |