Add Codex trace dataset link

5014dea verified about 7 hours ago

5.5 kB

	---
	base_model: unsloth/gemma-3-12b-it
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- lora
	- peft
	- gemma-3
	- adhd
	- neurodivergent
	- task-initiation
	- build-small-hackathon
	- backyard-ai
	- modal
	- zerogpu
	license: other
	---

	# NeuroBait

	NeuroBait is a LoRA fine-tune of `unsloth/gemma-3-12b-it` for ADHD and
	neurodivergent task-initiation conversations.

	It is designed for the moment when a person knows what they need to do, but the
	first move still feels too heavy. The model aims to respond with warm, short,
	agency-preserving language rather than shame, pressure, or a generic productivity
	script.

	Try the live Gradio app:

	```text
	https://huggingface.co/spaces/build-small-hackathon/NeuroBait
	```

	## Build Small Hackathon Submission

	NeuroBait is submitted for the Build Small Hackathon.

	- Primary track: Backyard AI
	- Why this track: NeuroBait focuses on a specific, real, everyday problem -
	ADHD task initiation - and turns a small model into a practical companion for
	that moment.
	- Bonus quest fit: Well-Tuned, because this repo publishes the fine-tuned
	LoRA adapter used by the Space.
	- Bonus quest fit: Off-Brand, because the app uses custom Gradio UI and
	anti-shame product copy rather than a default chatbot shell.
	- Sponsor fit: Modal-powered, because fine-tuning and generation evaluation
	were run on Modal GPU infrastructure.

	The project follows the hackathon shape: fine-tune a small-enough open model,
	publish the model on Hugging Face, and deploy a working Gradio app as a Hugging
	Face Space.

	## Intended Behavior

	NeuroBait should:

	- respond in short, natural prose,
	- avoid visible labels such as `Micro-action`, `Hook`, or `Stakes`,
	- avoid guilt framing and productivity shame,
	- preserve user agency,
	- ask one light question when context is too sparse,
	- offer one tiny concrete action when enough context exists.

	It should not act as a medical device, diagnostic tool, therapist, emergency
	support system, or replacement for professional care.

	## Training Data

	Run #4 used a bilingual Indonesian/English conversational dataset:

	- 270 train conversations
	- 30 eval conversations
	- multi-turn `messages[]` format
	- official NeuroBait system prompt prepended to each example

	The dataset is intentionally not included in this model repo.

	## Training Configuration

	- Base: `unsloth/gemma-3-12b-it` (dense Gemma 3 12B)
	- Method: 16-bit LoRA, not QLoRA, via Unsloth
	- LoRA rank: 16
	- LoRA alpha: 16
	- LoRA dropout: 0
	- Target modules: q/k/v/o/gate/up/down projections
	- Epochs: 3
	- Learning rate: 2e-4
	- Effective batch size: 8
	- Max sequence length: 2048
	- Scheduler: cosine
	- Warmup ratio: 0.05
	- Optimizer: adamw 8-bit
	- Precision: bf16
	- Chat template: `gemma-3`
	- Response-only markers: `<start_of_turn>user\n` / `<start_of_turn>model\n`
	- Checkpoints: `save_strategy="no"` to avoid the known Unsloth/TRL checkpoint
	pickle issue

	Training ran on Modal with an H100 80GB GPU.

	## Deployment

	The deployed Space runs on Hugging Face ZeroGPU.

	Runtime path:

	- Gradio Space
	- `transformers` + `peft`
	- 4-bit bitsandbytes NF4 loading
	- base model: `unsloth/gemma-3-12b-it`
	- LoRA adapter: `build-small-hackathon/NeuroBait`

	Unsloth is used for training, not for Space inference. The dense Gemma 3 12B base
	was chosen because it deploys cleanly through the standard
	`transformers` + `peft` path on ZeroGPU.

	## Run #4 Results

	Training completed 102 steps.

	Training summary:

	- train conversations: 270
	- eval conversations: 30
	- train loss: 1.7501
	- eval loss: 1.8844

	The loss signal should be treated as a weak training diagnostic for this project.
	NeuroBait is primarily evaluated through generated behavior against the base
	model.

	Generation eval summary over 8 held-out or novel prompts:

	- base persona average: 2.25 / 4
	- fine-tuned persona average: 4.0 / 4
	- base average words: 80.4
	- fine-tuned average words: 55.1
	- base label leaks: 5
	- fine-tuned label leaks: 0
	- base action-cue responses: 5
	- fine-tuned action-cue responses: 4

	Qualitatively, the fine-tuned adapter produced shorter, more conversational
	responses and did not leak literal structure labels, while the base model leaked
	labels in 5 of 8 prompts.

	## Loading

	Example adapter loading path:

	```python
	from peft import PeftModel
	from transformers import AutoModelForImageTextToText, AutoTokenizer, BitsAndBytesConfig
	import torch

	base_model = "unsloth/gemma-3-12b-it"
	adapter_id = "build-small-hackathon/NeuroBait"

	tokenizer = AutoTokenizer.from_pretrained(adapter_id)
	model = AutoModelForImageTextToText.from_pretrained(
	base_model,
	quantization_config=BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_quant_type="nf4",
	),
	device_map="auto",
	)
	model = PeftModel.from_pretrained(model, adapter_id)
	model.eval()
	```

	## Limitations

	- The model is not a medical or crisis-support system.
	- It may still ask a clarifying question when the user expected a direct nudge.
	- It may be too brief, too playful, or too gentle for some contexts.
	- Long-term personalization should be handled by product logic, not only by the
	model.
	- App-initiated reminders should be handled by the app layer, not generated
	spontaneously by the model.

	## Source

	Public source repo:

	```text
	https://github.com/Subrata15/NeuroBait-Build-Small-Model
	```

	Codex trace dataset:

	```text
	https://huggingface.co/datasets/build-small-hackathon/NeuroBait-Codex-Traces
	```