Instructions to use build-small-hackathon/minicpm5-speech-feedback-lora-v12 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use build-small-hackathon/minicpm5-speech-feedback-lora-v12 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM5-1B") model = PeftModel.from_pretrained(base_model, "build-small-hackathon/minicpm5-speech-feedback-lora-v12") - Transformers
How to use build-small-hackathon/minicpm5-speech-feedback-lora-v12 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="build-small-hackathon/minicpm5-speech-feedback-lora-v12")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("build-small-hackathon/minicpm5-speech-feedback-lora-v12", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use build-small-hackathon/minicpm5-speech-feedback-lora-v12 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "build-small-hackathon/minicpm5-speech-feedback-lora-v12" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "build-small-hackathon/minicpm5-speech-feedback-lora-v12", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/build-small-hackathon/minicpm5-speech-feedback-lora-v12
- SGLang
How to use build-small-hackathon/minicpm5-speech-feedback-lora-v12 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "build-small-hackathon/minicpm5-speech-feedback-lora-v12" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "build-small-hackathon/minicpm5-speech-feedback-lora-v12", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "build-small-hackathon/minicpm5-speech-feedback-lora-v12" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "build-small-hackathon/minicpm5-speech-feedback-lora-v12", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use build-small-hackathon/minicpm5-speech-feedback-lora-v12 with Docker Model Runner:
docker model run hf.co/build-small-hackathon/minicpm5-speech-feedback-lora-v12
MiniCPM5 Speech Feedback LoRA v12
LoRA adapter for openbmb/MiniCPM5-1B, tuned to produce compact speech-rehearsal coaching from a transcript plus computed stats. It is intended for the hackathon speech-feedback app and should be used with MiniCPM5 thinking mode off.
This adapter is released under Apache 2.0, matching the base model. The base model openbmb/MiniCPM5-1B is © OpenBMB, licensed under Apache 2.0; please retain the attribution if you redistribute or build on this adapter.
The target output is a short scorecard:
- one specific earned strength
- one to three prioritized fixes, usually one fix by default
- one concrete next run action
Training
- Base model:
openbmb/MiniCPM5-1B - Method: TRL
SFTTrainerwith PEFT LoRA - Assistant-only loss: enabled via a training-only chat template with
{% generation %}blocks - Train rows: 135
- Validation rows held out from SFT training: 26
- Smoke rows: 8
- Epochs: 4
- Steps: 68
- Per-device train batch size: 4
- Gradient accumulation: 2
- Effective batch size: 8
- Learning rate:
1e-4 - Scheduler: cosine
- Warmup ratio:
0.03 - Precision: bf16
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.05
- Target modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Training GPU: Modal L4
The v12 split excluded canary rows from training, cleared repeated augment weights, and removed concrete Sarah/car examples from the prompt to avoid leakage into unrelated speeches.
Evaluation Notes
Checked with the app's current prompt and deterministic post-generation cleaner.
- Held-out eval: 15 rows
- Held-out shape failures: 0
- Held-out quote-faithfulness failures: 0
- Held-out leaked Sarah/car canary phrases: 0
- Held-out fix-count distribution: 14 one-fix outputs, 1 two-fix output
- Perspective canary rerun: 24 rows, 0 shape failures, 0 quote-faithfulness failures
- Perseverance grammarian leak canary: passed; no Sarah/car phrase leakage
Remaining limitations:
- Some advice is still rough or awkward on thin transcripts.
- The adapter can choose a weaker strength or miss deeper structural coaching.
- A deterministic cleaner is still used by the app to catch narrow actor, ownership, quote, and negation mistakes.
- This is not a general-purpose public-speaking evaluator; it is tuned for concise rehearsal feedback in the app's prompt format.
Inference
Load the base model and attach this PEFT adapter:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base_id = "openbmb/MiniCPM5-1B"
adapter_id = "build-small-hackathon/minicpm5-speech-feedback-lora-v12"
tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id).eval()
messages = [
{"role": "system", "content": "You are a direct, candid speech rehearsal coach."},
{"role": "user", "content": "Review this rehearsal using the stats and transcript..."},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
enable_thinking=False,
return_dict=True,
return_tensors="pt",
).to(model.device)
Use the full app prompt and cleaner from the repository for production behavior.
- Downloads last month
- 57
Model tree for build-small-hackathon/minicpm5-speech-feedback-lora-v12
Base model
openbmb/MiniCPM5-1B