Text Generation
PEFT
Safetensors
lora
rl
grpo
openenv
voice
indic
hindi
tamil
kannada
hinglish
schema-drift
gemma-3n
tool-use
conversational
Instructions to use DGXAI/gemma-3n-e2b-driftcall-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use DGXAI/gemma-3n-e2b-driftcall-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-3n-E2B-it") model = PeftModel.from_pretrained(base_model, "DGXAI/gemma-3n-e2b-driftcall-lora") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: unsloth/gemma-3n-E2B-it | |
| library_name: peft | |
| tags: | |
| - lora | |
| - peft | |
| - rl | |
| - grpo | |
| - openenv | |
| - voice | |
| - indic | |
| - hindi | |
| - tamil | |
| - kannada | |
| - hinglish | |
| - schema-drift | |
| - gemma-3n | |
| - text-generation | |
| - tool-use | |
| language: | |
| - en | |
| - hi | |
| - ta | |
| - kn | |
| pipeline_tag: text-generation | |
| datasets: [] | |
| inference: false | |
| # DriftCall — Gemma-3n-E2B LoRA (apache-2.0) | |
| LoRA adapter for **`unsloth/gemma-3n-E2B-it`**, GRPO-tuned on | |
| [**DriftCall**](https://saumilyajj-driftcall.hf.space) — an OpenEnv-compliant | |
| voice-first Indic concierge environment where vendor APIs **mutate | |
| mid-episode** and the agent must keep its promise to the user across the | |
| schema drift. | |
| ``` | |
| trained on: DriftCall (OpenEnv v1.0 — 5 reward components, 20 drift patterns) | |
| hardware: 1× NVIDIA H100 80GB HBM3 (bf16, 16-bit LoRA) | |
| trainer: native PyTorch GRPO (no TRL) | |
| curriculum: 3 stages × 240 GRPO steps total · group size 2 | |
| reward: five deterministic components (no LLM judge), Brier-calibrated, | |
| uncertain-floor at 0.50 | |
| ``` | |
| The companion env, demo, REST API, and full project site all live at one | |
| HF Space: **<https://huggingface.co/spaces/saumilyajj/driftcall>**. | |
| --- | |
| ## Model details | |
| | Field | Value | | |
| |---|---| | |
| | Base model | [`unsloth/gemma-3n-E2B-it`](https://huggingface.co/unsloth/gemma-3n-E2B-it) (Gemma-3n-E2B Instruction-tuned, Unsloth-quantised checkpoint) | | |
| | Adapter type | PEFT / LoRA | | |
| | `r` | 16 | | |
| | `lora_alpha` | 32 | | |
| | `lora_dropout` | 0.0 (Unsloth fast path) | | |
| | Precision | 16-bit LoRA on bf16 base | | |
| | File | `adapter_model.safetensors` · 84.6 MB · plus tokenizer (33.4 MB) | | |
| | Languages | Hindi · Tamil · Kannada · English · Hinglish | | |
| | License | Apache-2.0 | | |
| **This is an adapter-only release.** No merged-fp16 weights are published — | |
| naive 4-bit → 16-bit merging produces silently broken weights for this base | |
| (see DriftCall DESIGN.md §10.5). Always load on top of the base. | |
| --- | |
| ## Training | |
| | Stage | Drift regime | Steps | Initial weights | | |
| |---|---|---:|---| | |
| | 1 | no drift | 70 | base Gemma-3n-E2B-it | | |
| | 2 | single-pattern drift | 100 | stage-1 adapter | | |
| | 3 | compound drift | 70 | stage-2 adapter | | |
| - **Algorithm:** Group Relative Policy Optimization (GRPO), native PyTorch | |
| loop in `scripts/train_driftcall_grpo.py` (1300 LOC, no TRL dependency). | |
| - **Group size (`G`):** 2 rollouts per goal — small for GRPO; signal is | |
| primarily compounded across the curriculum rather than per-step. | |
| - **Curriculum:** language weights and drift patterns are stage-controlled | |
| (no drift → single pattern → compound). Held-out 50-episode eval + | |
| 200-episode reward-hacking probe (`cells/step_18..20`). | |
| - **Wandb runs:** `vasudeo118-lnmiit/driftcall` project — three runs | |
| (`mypquww4`, the s2 run, `og9xqlwy`). | |
| ### Reward function — five components, no LLM judge | |
| | ID | Component | Weight | Implementation | | |
| |---:|---|---:|---| | |
| | R1 | `task_completion` | 0.40 | `cells.step_08_rewards:task_completion` | | |
| | R2 | `drift_detection` | 0.20 | `cells.step_08_rewards:drift_detection` | | |
| | R3 | `constraint_adherence` | 0.20 | `cells.step_08_rewards:constraint_adherence` | | |
| | R4 | `format_compliance` | 0.10 | `cells.step_08_rewards:format_compliance` | | |
| | R5 | `anti_hack_penalty` | 0.10 | `cells.step_08_rewards:anti_hack_penalty` | | |
| Calibration pipeline: | |
| ``` | |
| quality = combine_quality(R1..R5, weights) | |
| brier = brier_penalty(confidence, R1) | |
| reward_raw = quality * (1 - brier) | |
| reward = apply_uncertain_floor(reward_raw, confidence, quality) # floor=0.50 | |
| final := clamp(reward, -1.0, 1.0) | |
| ``` | |
| **Hard rule:** every reward bit traces to a deterministic schema- and | |
| trace-grounded check. There is no LLM-as-a-judge anywhere in the pipeline. | |
| --- | |
| ## How to use | |
| ```python | |
| from unsloth import FastModel | |
| from peft import PeftModel | |
| model, tokenizer = FastModel.from_pretrained( | |
| "unsloth/gemma-3n-E2B-it", | |
| max_seq_length=4096, | |
| load_in_4bit=False, # 16-bit LoRA path; matches training | |
| full_finetuning=False, | |
| ) | |
| model = PeftModel.from_pretrained(model, "DGXAI/gemma-3n-e2b-driftcall-lora") | |
| model.eval() | |
| prompt = ( | |
| "BRIEF: 9 baje se pehle ek veg thali ₹500 ke andar Indiranagar mein.\n\n" | |
| "Reply with EXACTLY one JSON object matching the DriftCallAction schema." | |
| ) | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| out = model.generate(**inputs, max_new_tokens=256, do_sample=False) | |
| print(tokenizer.decode(out[0, inputs.input_ids.shape[1]:], skip_special_tokens=True)) | |
| ``` | |
| ### Or — run it against the live env over OpenEnv REST | |
| ```bash | |
| # Public bearer token for the hackathon Space. | |
| curl -X POST https://saumilyajj-driftcall.hf.space/reset \ | |
| -H "Authorization: Bearer driftcall-demo" \ | |
| -H "X-Session-Id: smoke-001" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"seed": 42, "curriculum_stage": 2}' | |
| ``` | |
| The OpenEnv gym client lives at | |
| [`deploy/inference/`](https://github.com/saumilyagupta/openenv-DGXAI/tree/google/gemma-3n-E4B-it/DRIFTCALL/deploy/inference) | |
| and wraps `/reset`, `/step`, `/state`, `/close` in a gymnasium-style API. | |
| --- | |
| ## Limitations | |
| - **Small training run.** 240 GRPO steps at G=2 is a smoke + push validation, | |
| not a learning run. Step-0-and-after reward fluctuates in `[0.175, 0.300]`, | |
| largely against the uncertain-floor at 0.50. Real lift comes after several | |
| thousand steps with G=4–8. | |
| - **Tool-use, not tool-execution.** The agent emits JSON DriftCallAction | |
| payloads. Side effects (`cab.book`, `payment.charge`, …) are realised by | |
| the env's mock vendor surface, not by real infrastructure. | |
| - **Indic ASR is upstream.** Voice input goes through `faster-whisper-small`; | |
| this model never sees raw audio. Code-switched Hinglish accuracy is bounded | |
| by Whisper. | |
| - **Reward components are deterministic, not perfect.** R5 (`anti_hack_penalty`) | |
| catches known patterns; novel exploits would need to be added to the probe | |
| set in `cells/step_20_probe.py`. | |
| - **Not safety-aligned beyond Gemma-3n's defaults.** Off-task or adversarial | |
| inputs are not specifically guarded for in this run. | |
| --- | |
| ## Citation / acknowledgement | |
| DriftCall is built on top of: | |
| - [`unsloth/gemma-3n-E2B-it`](https://huggingface.co/unsloth/gemma-3n-E2B-it) — base model | |
| - [Unsloth](https://github.com/unslothai/unsloth) — fast LoRA path | |
| - [`hexgrad/Kokoro-82M`](https://huggingface.co/hexgrad/Kokoro-82M) — TTS in the env's audio pipeline | |
| - [`Systran/faster-whisper-small`](https://huggingface.co/Systran/faster-whisper-small) — ASR in the env's audio pipeline | |
| Source: <https://github.com/saumilyagupta/openenv-DGXAI> · branch `google/gemma-3n-E4B-it`. | |
| Hackathon: DGX Hackathon 2026 — Indic Voice + RL track. | |