AngelCare โ Cosmos Reason2 8B (Fine-Tuned)
Fine-tuned nvidia/Cosmos-Reason2-8B for 8-class elderly safety video classification.
Available Models
| Branch | Method | Accuracy | Description |
|---|---|---|---|
main |
QLoRA (4-bit NF4, r=16) | 85.0% | Best model โ recommended for deployment |
lora |
LoRA (fp16, r=16) | 83.3% | Alternative fine-tune for comparison |
Both are full merged models (adapter baked into base weights). Load directly โ no need for base model or adapters.
Raw LoRA/QLoRA adapters are also available under the qlora-adapter and lora-adapter branches.
Usage
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
# Load QLoRA model (best, main branch)
model = AutoModelForCausalLM.from_pretrained(
"Chloepv/Angelcare-Cosmos-Reason2-8B",
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained("Chloepv/Angelcare-Cosmos-Reason2-8B")
# Load LoRA model (alternative)
# model = AutoModelForCausalLM.from_pretrained("Chloepv/Angelcare-Cosmos-Reason2-8B", revision="lora", ...)
Task
Given a short video clip (~5-10s) of an elderly person, classify their activity into one of 8 safety categories:
| ID | Label | Risk Level |
|---|---|---|
| 0 | Fall Detected | CRITICAL |
| 1 | Prolonged Immobility | CRITICAL |
| 2 | Unsteady Movement | MEDIUM |
| 3 | Distress Posture | HIGH |
| 4 | Normal Walking | SAFE |
| 5 | Normal Sitting | SAFE |
| 6 | Normal Daily Activity | SAFE |
| 7 | Resting or Sleeping | SAFE |
Output Format
The model outputs structured JSON:
{
"prediction_class_id": 0,
"prediction_label": "Fall Detected",
"risk_level": "CRITICAL",
"video_description": "The person falls from standing position onto the floor.",
"risk_assessment": {
"is_at_risk": true,
"recommended_action": "Call emergency services immediately"
}
}
Results
Overall
| Metric | Base Model | QLoRA | LoRA |
|---|---|---|---|
| Accuracy | 26.7% | 85.0% | 83.3% |
| JSON Compliance | 65% | 100% | 100% |
| Inference Time | 6.4s | 1.9s | 1.9s |
Per-Class Accuracy (QLoRA)
| Class | Accuracy |
|---|---|
| Fall (n=11) | 90.9% |
| Unsteady (n=11) | 72.7% |
| Distress (n=11) | 100% |
| Sitting (n=11) | 81.8% |
| Daily (n=10) | 100% |
| Resting (n=3) | 100% |
Training Details
- Dataset: 277 train / 60 test samples (LLaVA format, source-stratified split)
- Sources: Harvard Dataverse, GMDC-SA24, NTU RGB+D 120, DIY annotated, personal clips, Cosmos Transfer 2.5 synthetic
- Hardware: 1x NVIDIA H100 80GB (Nebius)
- QLoRA config: r=16, alpha=32, dropout=0.05, lr=2e-4, 10 epochs, effective batch=8
- LoRA config: r=16, alpha=32, dropout=0.05, lr=1e-4, 10 epochs, effective batch=8
- Framework: TRL SFTTrainer + PEFT
Limitations
- Classes with few training samples (Immobility, Walking) have 0% test accuracy due to insufficient data
- Trained primarily on indoor surveillance-style videos; may not generalize to outdoor or unusual camera angles
- Small dataset (277 samples) โ more data would likely improve performance significantly
Citation
@misc{angelcare2026,
title={AngelCare: Fine-tuning Cosmos Reason2 8B for Elderly Safety Video Classification},
author={Chloe PV},
year={2026},
url={https://huggingface.co/Chloepv/Angelcare-Cosmos-Reason2-8B}
}
- Downloads last month
- 13
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support