AngelCare — Cosmos Reason2 8B (Fine-Tuned)

Fine-tuned nvidia/Cosmos-Reason2-8B for 8-class elderly safety video classification.

Available Models

Branch	Method	Accuracy	Description
`main`	QLoRA (4-bit NF4, r=16)	85.0%	Best model — recommended for deployment
`lora`	LoRA (fp16, r=16)	83.3%	Alternative fine-tune for comparison

Both are full merged models (adapter baked into base weights). Load directly — no need for base model or adapters.

Raw LoRA/QLoRA adapters are also available under the qlora-adapter and lora-adapter branches.

Usage

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

# Load QLoRA model (best, main branch)
model = AutoModelForCausalLM.from_pretrained(
    "Chloepv/Angelcare-Cosmos-Reason2-8B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("Chloepv/Angelcare-Cosmos-Reason2-8B")

# Load LoRA model (alternative)
# model = AutoModelForCausalLM.from_pretrained("Chloepv/Angelcare-Cosmos-Reason2-8B", revision="lora", ...)

Task

Given a short video clip (~5-10s) of an elderly person, classify their activity into one of 8 safety categories:

ID	Label	Risk Level
0	Fall Detected	CRITICAL
1	Prolonged Immobility	CRITICAL
2	Unsteady Movement	MEDIUM
3	Distress Posture	HIGH
4	Normal Walking	SAFE
5	Normal Sitting	SAFE
6	Normal Daily Activity	SAFE
7	Resting or Sleeping	SAFE

Output Format

The model outputs structured JSON:

{
  "prediction_class_id": 0,
  "prediction_label": "Fall Detected",
  "risk_level": "CRITICAL",
  "video_description": "The person falls from standing position onto the floor.",
  "risk_assessment": {
    "is_at_risk": true,
    "recommended_action": "Call emergency services immediately"
  }
}

Results

Overall

Metric	Base Model	QLoRA	LoRA
Accuracy	26.7%	85.0%	83.3%
JSON Compliance	65%	100%	100%
Inference Time	6.4s	1.9s	1.9s

Per-Class Accuracy (QLoRA)

Class	Accuracy
Fall (n=11)	90.9%
Unsteady (n=11)	72.7%
Distress (n=11)	100%
Sitting (n=11)	81.8%
Daily (n=10)	100%
Resting (n=3)	100%

Training Details

Dataset: 277 train / 60 test samples (LLaVA format, source-stratified split)
Sources: Harvard Dataverse, GMDC-SA24, NTU RGB+D 120, DIY annotated, personal clips, Cosmos Transfer 2.5 synthetic
Hardware: 1x NVIDIA H100 80GB (Nebius)
QLoRA config: r=16, alpha=32, dropout=0.05, lr=2e-4, 10 epochs, effective batch=8
LoRA config: r=16, alpha=32, dropout=0.05, lr=1e-4, 10 epochs, effective batch=8
Framework: TRL SFTTrainer + PEFT

Limitations

Classes with few training samples (Immobility, Walking) have 0% test accuracy due to insufficient data
Trained primarily on indoor surveillance-style videos; may not generalize to outdoor or unusual camera angles
Small dataset (277 samples) — more data would likely improve performance significantly

Citation

@misc{angelcare2026,
  title={AngelCare: Fine-tuning Cosmos Reason2 8B for Elderly Safety Video Classification},
  author={Chloe PV},
  year={2026},
  url={https://huggingface.co/Chloepv/Angelcare-Cosmos-Reason2-8B}
}

Downloads last month: 45

Safetensors

Model size

9B params

Tensor type

BF16

Inference Providers NEW

Video Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chloepv/Angelcare-Cosmos-Reason2-8B

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

nvidia/Cosmos-Reason2-8B

Adapter

(3)

this model