File size: 6,730 Bytes

---
license: apache-2.0
base_model: unsloth/gemma-3n-E2B-it
library_name: peft
tags:
  - lora
  - peft
  - rl
  - grpo
  - openenv
  - voice
  - indic
  - hindi
  - tamil
  - kannada
  - hinglish
  - schema-drift
  - gemma-3n
  - text-generation
  - tool-use
language:
  - en
  - hi
  - ta
  - kn
pipeline_tag: text-generation
datasets: []
inference: false
---

# DriftCall — Gemma-3n-E2B LoRA (apache-2.0)

LoRA adapter for **`unsloth/gemma-3n-E2B-it`**, GRPO-tuned on
[**DriftCall**](https://saumilyajj-driftcall.hf.space) — an OpenEnv-compliant
voice-first Indic concierge environment where vendor APIs **mutate
mid-episode** and the agent must keep its promise to the user across the
schema drift.

```
trained on:    DriftCall (OpenEnv v1.0 — 5 reward components, 20 drift patterns)
hardware:      1× NVIDIA H100 80GB HBM3 (bf16, 16-bit LoRA)
trainer:       native PyTorch GRPO (no TRL)
curriculum:    3 stages × 240 GRPO steps total · group size 2
reward:        five deterministic components (no LLM judge), Brier-calibrated,
               uncertain-floor at 0.50
```

The companion env, demo, REST API, and full project site all live at one
HF Space: **<https://huggingface.co/spaces/saumilyajj/driftcall>**.

---

## Model details

| Field | Value |
|---|---|
| Base model | [`unsloth/gemma-3n-E2B-it`](https://huggingface.co/unsloth/gemma-3n-E2B-it) (Gemma-3n-E2B Instruction-tuned, Unsloth-quantised checkpoint) |
| Adapter type | PEFT / LoRA |
| `r` | 16 |
| `lora_alpha` | 32 |
| `lora_dropout` | 0.0 (Unsloth fast path) |
| Precision | 16-bit LoRA on bf16 base |
| File | `adapter_model.safetensors` · 84.6 MB · plus tokenizer (33.4 MB) |
| Languages | Hindi · Tamil · Kannada · English · Hinglish |
| License | Apache-2.0 |

**This is an adapter-only release.** No merged-fp16 weights are published —
naive 4-bit → 16-bit merging produces silently broken weights for this base
(see DriftCall DESIGN.md §10.5). Always load on top of the base.

---

## Training

| Stage | Drift regime | Steps | Initial weights |
|---|---|---:|---|
| 1 | no drift | 70 | base Gemma-3n-E2B-it |
| 2 | single-pattern drift | 100 | stage-1 adapter |
| 3 | compound drift | 70 | stage-2 adapter |

- **Algorithm:** Group Relative Policy Optimization (GRPO), native PyTorch
  loop in `scripts/train_driftcall_grpo.py` (1300 LOC, no TRL dependency).
- **Group size (`G`):** 2 rollouts per goal — small for GRPO; signal is
  primarily compounded across the curriculum rather than per-step.
- **Curriculum:** language weights and drift patterns are stage-controlled
  (no drift → single pattern → compound). Held-out 50-episode eval +
  200-episode reward-hacking probe (`cells/step_18..20`).
- **Wandb runs:** `vasudeo118-lnmiit/driftcall` project — three runs
  (`mypquww4`, the s2 run, `og9xqlwy`).

### Reward function — five components, no LLM judge

| ID | Component | Weight | Implementation |
|---:|---|---:|---|
| R1 | `task_completion` | 0.40 | `cells.step_08_rewards:task_completion` |
| R2 | `drift_detection` | 0.20 | `cells.step_08_rewards:drift_detection` |
| R3 | `constraint_adherence` | 0.20 | `cells.step_08_rewards:constraint_adherence` |
| R4 | `format_compliance` | 0.10 | `cells.step_08_rewards:format_compliance` |
| R5 | `anti_hack_penalty` | 0.10 | `cells.step_08_rewards:anti_hack_penalty` |

Calibration pipeline:

```
quality        = combine_quality(R1..R5, weights)
brier          = brier_penalty(confidence, R1)
reward_raw     = quality * (1 - brier)
reward         = apply_uncertain_floor(reward_raw, confidence, quality)  # floor=0.50
final         := clamp(reward, -1.0, 1.0)
```

**Hard rule:** every reward bit traces to a deterministic schema- and
trace-grounded check. There is no LLM-as-a-judge anywhere in the pipeline.

---

## How to use

```python
from unsloth import FastModel
from peft import PeftModel

model, tokenizer = FastModel.from_pretrained(
    "unsloth/gemma-3n-E2B-it",
    max_seq_length=4096,
    load_in_4bit=False,         # 16-bit LoRA path; matches training
    full_finetuning=False,
)
model = PeftModel.from_pretrained(model, "DGXAI/gemma-3n-e2b-driftcall-lora")
model.eval()

prompt = (
    "BRIEF: 9 baje se pehle ek veg thali ₹500 ke andar Indiranagar mein.\n\n"
    "Reply with EXACTLY one JSON object matching the DriftCallAction schema."
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0, inputs.input_ids.shape[1]:], skip_special_tokens=True))
```

### Or — run it against the live env over OpenEnv REST

```bash
# Public bearer token for the hackathon Space.
curl -X POST https://saumilyajj-driftcall.hf.space/reset \
  -H "Authorization: Bearer driftcall-demo" \
  -H "X-Session-Id: smoke-001" \
  -H "Content-Type: application/json" \
  -d '{"seed": 42, "curriculum_stage": 2}'
```

The OpenEnv gym client lives at
[`deploy/inference/`](https://github.com/saumilyagupta/openenv-DGXAI/tree/google/gemma-3n-E4B-it/DRIFTCALL/deploy/inference)
and wraps `/reset`, `/step`, `/state`, `/close` in a gymnasium-style API.

---

## Limitations

- **Small training run.** 240 GRPO steps at G=2 is a smoke + push validation,
  not a learning run. Step-0-and-after reward fluctuates in `[0.175, 0.300]`,
  largely against the uncertain-floor at 0.50. Real lift comes after several
  thousand steps with G=4–8.
- **Tool-use, not tool-execution.** The agent emits JSON DriftCallAction
  payloads. Side effects (`cab.book`, `payment.charge`, …) are realised by
  the env's mock vendor surface, not by real infrastructure.
- **Indic ASR is upstream.** Voice input goes through `faster-whisper-small`;
  this model never sees raw audio. Code-switched Hinglish accuracy is bounded
  by Whisper.
- **Reward components are deterministic, not perfect.** R5 (`anti_hack_penalty`)
  catches known patterns; novel exploits would need to be added to the probe
  set in `cells/step_20_probe.py`.
- **Not safety-aligned beyond Gemma-3n's defaults.** Off-task or adversarial
  inputs are not specifically guarded for in this run.

---

## Citation / acknowledgement

DriftCall is built on top of:

- [`unsloth/gemma-3n-E2B-it`](https://huggingface.co/unsloth/gemma-3n-E2B-it) — base model
- [Unsloth](https://github.com/unslothai/unsloth) — fast LoRA path
- [`hexgrad/Kokoro-82M`](https://huggingface.co/hexgrad/Kokoro-82M) — TTS in the env's audio pipeline
- [`Systran/faster-whisper-small`](https://huggingface.co/Systran/faster-whisper-small) — ASR in the env's audio pipeline

Source: <https://github.com/saumilyagupta/openenv-DGXAI> · branch `google/gemma-3n-E4B-it`.

Hackathon: DGX Hackathon 2026 — Indic Voice + RL track.