AngelCare โ€” Cosmos Reason2 8B (Fine-Tuned)

Fine-tuned nvidia/Cosmos-Reason2-8B for 8-class elderly safety video classification.

Available Models

Branch Method Accuracy Description
main QLoRA (4-bit NF4, r=16) 85.0% Best model โ€” recommended for deployment
lora LoRA (fp16, r=16) 83.3% Alternative fine-tune for comparison

Both are full merged models (adapter baked into base weights). Load directly โ€” no need for base model or adapters.

Raw LoRA/QLoRA adapters are also available under the qlora-adapter and lora-adapter branches.

Usage

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

# Load QLoRA model (best, main branch)
model = AutoModelForCausalLM.from_pretrained(
    "Chloepv/Angelcare-Cosmos-Reason2-8B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("Chloepv/Angelcare-Cosmos-Reason2-8B")

# Load LoRA model (alternative)
# model = AutoModelForCausalLM.from_pretrained("Chloepv/Angelcare-Cosmos-Reason2-8B", revision="lora", ...)

Task

Given a short video clip (~5-10s) of an elderly person, classify their activity into one of 8 safety categories:

ID Label Risk Level
0 Fall Detected CRITICAL
1 Prolonged Immobility CRITICAL
2 Unsteady Movement MEDIUM
3 Distress Posture HIGH
4 Normal Walking SAFE
5 Normal Sitting SAFE
6 Normal Daily Activity SAFE
7 Resting or Sleeping SAFE

Output Format

The model outputs structured JSON:

{
  "prediction_class_id": 0,
  "prediction_label": "Fall Detected",
  "risk_level": "CRITICAL",
  "video_description": "The person falls from standing position onto the floor.",
  "risk_assessment": {
    "is_at_risk": true,
    "recommended_action": "Call emergency services immediately"
  }
}

Results

Overall

Metric Base Model QLoRA LoRA
Accuracy 26.7% 85.0% 83.3%
JSON Compliance 65% 100% 100%
Inference Time 6.4s 1.9s 1.9s

Per-Class Accuracy (QLoRA)

Class Accuracy
Fall (n=11) 90.9%
Unsteady (n=11) 72.7%
Distress (n=11) 100%
Sitting (n=11) 81.8%
Daily (n=10) 100%
Resting (n=3) 100%

Training Details

  • Dataset: 277 train / 60 test samples (LLaVA format, source-stratified split)
  • Sources: Harvard Dataverse, GMDC-SA24, NTU RGB+D 120, DIY annotated, personal clips, Cosmos Transfer 2.5 synthetic
  • Hardware: 1x NVIDIA H100 80GB (Nebius)
  • QLoRA config: r=16, alpha=32, dropout=0.05, lr=2e-4, 10 epochs, effective batch=8
  • LoRA config: r=16, alpha=32, dropout=0.05, lr=1e-4, 10 epochs, effective batch=8
  • Framework: TRL SFTTrainer + PEFT

Limitations

  • Classes with few training samples (Immobility, Walking) have 0% test accuracy due to insufficient data
  • Trained primarily on indoor surveillance-style videos; may not generalize to outdoor or unusual camera angles
  • Small dataset (277 samples) โ€” more data would likely improve performance significantly

Citation

@misc{angelcare2026,
  title={AngelCare: Fine-tuning Cosmos Reason2 8B for Elderly Safety Video Classification},
  author={Chloe PV},
  year={2026},
  url={https://huggingface.co/Chloepv/Angelcare-Cosmos-Reason2-8B}
}
Downloads last month
13
Safetensors
Model size
9B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Chloepv/Angelcare-Cosmos-Reason2-8B

Adapter
(3)
this model