Text Generation
PEFT
Safetensors
lora
rl
grpo
openenv
voice
indic
hindi
tamil
kannada
hinglish
schema-drift
gemma-3n
tool-use
conversational
Instructions to use DGXAI/gemma-3n-e2b-driftcall-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use DGXAI/gemma-3n-e2b-driftcall-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-3n-E2B-it") model = PeftModel.from_pretrained(base_model, "DGXAI/gemma-3n-e2b-driftcall-lora") - Notebooks
- Google Colab
- Kaggle
File size: 6,730 Bytes
92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 92034af 110c946 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | ---
license: apache-2.0
base_model: unsloth/gemma-3n-E2B-it
library_name: peft
tags:
- lora
- peft
- rl
- grpo
- openenv
- voice
- indic
- hindi
- tamil
- kannada
- hinglish
- schema-drift
- gemma-3n
- text-generation
- tool-use
language:
- en
- hi
- ta
- kn
pipeline_tag: text-generation
datasets: []
inference: false
---
# DriftCall — Gemma-3n-E2B LoRA (apache-2.0)
LoRA adapter for **`unsloth/gemma-3n-E2B-it`**, GRPO-tuned on
[**DriftCall**](https://saumilyajj-driftcall.hf.space) — an OpenEnv-compliant
voice-first Indic concierge environment where vendor APIs **mutate
mid-episode** and the agent must keep its promise to the user across the
schema drift.
```
trained on: DriftCall (OpenEnv v1.0 — 5 reward components, 20 drift patterns)
hardware: 1× NVIDIA H100 80GB HBM3 (bf16, 16-bit LoRA)
trainer: native PyTorch GRPO (no TRL)
curriculum: 3 stages × 240 GRPO steps total · group size 2
reward: five deterministic components (no LLM judge), Brier-calibrated,
uncertain-floor at 0.50
```
The companion env, demo, REST API, and full project site all live at one
HF Space: **<https://huggingface.co/spaces/saumilyajj/driftcall>**.
---
## Model details
| Field | Value |
|---|---|
| Base model | [`unsloth/gemma-3n-E2B-it`](https://huggingface.co/unsloth/gemma-3n-E2B-it) (Gemma-3n-E2B Instruction-tuned, Unsloth-quantised checkpoint) |
| Adapter type | PEFT / LoRA |
| `r` | 16 |
| `lora_alpha` | 32 |
| `lora_dropout` | 0.0 (Unsloth fast path) |
| Precision | 16-bit LoRA on bf16 base |
| File | `adapter_model.safetensors` · 84.6 MB · plus tokenizer (33.4 MB) |
| Languages | Hindi · Tamil · Kannada · English · Hinglish |
| License | Apache-2.0 |
**This is an adapter-only release.** No merged-fp16 weights are published —
naive 4-bit → 16-bit merging produces silently broken weights for this base
(see DriftCall DESIGN.md §10.5). Always load on top of the base.
---
## Training
| Stage | Drift regime | Steps | Initial weights |
|---|---|---:|---|
| 1 | no drift | 70 | base Gemma-3n-E2B-it |
| 2 | single-pattern drift | 100 | stage-1 adapter |
| 3 | compound drift | 70 | stage-2 adapter |
- **Algorithm:** Group Relative Policy Optimization (GRPO), native PyTorch
loop in `scripts/train_driftcall_grpo.py` (1300 LOC, no TRL dependency).
- **Group size (`G`):** 2 rollouts per goal — small for GRPO; signal is
primarily compounded across the curriculum rather than per-step.
- **Curriculum:** language weights and drift patterns are stage-controlled
(no drift → single pattern → compound). Held-out 50-episode eval +
200-episode reward-hacking probe (`cells/step_18..20`).
- **Wandb runs:** `vasudeo118-lnmiit/driftcall` project — three runs
(`mypquww4`, the s2 run, `og9xqlwy`).
### Reward function — five components, no LLM judge
| ID | Component | Weight | Implementation |
|---:|---|---:|---|
| R1 | `task_completion` | 0.40 | `cells.step_08_rewards:task_completion` |
| R2 | `drift_detection` | 0.20 | `cells.step_08_rewards:drift_detection` |
| R3 | `constraint_adherence` | 0.20 | `cells.step_08_rewards:constraint_adherence` |
| R4 | `format_compliance` | 0.10 | `cells.step_08_rewards:format_compliance` |
| R5 | `anti_hack_penalty` | 0.10 | `cells.step_08_rewards:anti_hack_penalty` |
Calibration pipeline:
```
quality = combine_quality(R1..R5, weights)
brier = brier_penalty(confidence, R1)
reward_raw = quality * (1 - brier)
reward = apply_uncertain_floor(reward_raw, confidence, quality) # floor=0.50
final := clamp(reward, -1.0, 1.0)
```
**Hard rule:** every reward bit traces to a deterministic schema- and
trace-grounded check. There is no LLM-as-a-judge anywhere in the pipeline.
---
## How to use
```python
from unsloth import FastModel
from peft import PeftModel
model, tokenizer = FastModel.from_pretrained(
"unsloth/gemma-3n-E2B-it",
max_seq_length=4096,
load_in_4bit=False, # 16-bit LoRA path; matches training
full_finetuning=False,
)
model = PeftModel.from_pretrained(model, "DGXAI/gemma-3n-e2b-driftcall-lora")
model.eval()
prompt = (
"BRIEF: 9 baje se pehle ek veg thali ₹500 ke andar Indiranagar mein.\n\n"
"Reply with EXACTLY one JSON object matching the DriftCallAction schema."
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0, inputs.input_ids.shape[1]:], skip_special_tokens=True))
```
### Or — run it against the live env over OpenEnv REST
```bash
# Public bearer token for the hackathon Space.
curl -X POST https://saumilyajj-driftcall.hf.space/reset \
-H "Authorization: Bearer driftcall-demo" \
-H "X-Session-Id: smoke-001" \
-H "Content-Type: application/json" \
-d '{"seed": 42, "curriculum_stage": 2}'
```
The OpenEnv gym client lives at
[`deploy/inference/`](https://github.com/saumilyagupta/openenv-DGXAI/tree/google/gemma-3n-E4B-it/DRIFTCALL/deploy/inference)
and wraps `/reset`, `/step`, `/state`, `/close` in a gymnasium-style API.
---
## Limitations
- **Small training run.** 240 GRPO steps at G=2 is a smoke + push validation,
not a learning run. Step-0-and-after reward fluctuates in `[0.175, 0.300]`,
largely against the uncertain-floor at 0.50. Real lift comes after several
thousand steps with G=4–8.
- **Tool-use, not tool-execution.** The agent emits JSON DriftCallAction
payloads. Side effects (`cab.book`, `payment.charge`, …) are realised by
the env's mock vendor surface, not by real infrastructure.
- **Indic ASR is upstream.** Voice input goes through `faster-whisper-small`;
this model never sees raw audio. Code-switched Hinglish accuracy is bounded
by Whisper.
- **Reward components are deterministic, not perfect.** R5 (`anti_hack_penalty`)
catches known patterns; novel exploits would need to be added to the probe
set in `cells/step_20_probe.py`.
- **Not safety-aligned beyond Gemma-3n's defaults.** Off-task or adversarial
inputs are not specifically guarded for in this run.
---
## Citation / acknowledgement
DriftCall is built on top of:
- [`unsloth/gemma-3n-E2B-it`](https://huggingface.co/unsloth/gemma-3n-E2B-it) — base model
- [Unsloth](https://github.com/unslothai/unsloth) — fast LoRA path
- [`hexgrad/Kokoro-82M`](https://huggingface.co/hexgrad/Kokoro-82M) — TTS in the env's audio pipeline
- [`Systran/faster-whisper-small`](https://huggingface.co/Systran/faster-whisper-small) — ASR in the env's audio pipeline
Source: <https://github.com/saumilyagupta/openenv-DGXAI> · branch `google/gemma-3n-E4B-it`.
Hackathon: DGX Hackathon 2026 — Indic Voice + RL track.
|