chimbiwide's picture
Update README.md
c47f8b4 verified
---
base_model: unsloth/gemma-3-1b-it
tags:
- text-generation-inference
- transformers
- unsloth
- gemma3_text
license: gemma
language:
- en
---
# Gemma3NPC-1b
**A new attempt in training Gemma3NPC.**
***Tensorboard data are available!***
---
It's been a while since the last Gemma3NPC model release, in the mean while we were working on some other models like [GemmaThink](https://huggingface.co/collections/chimbiwide/gemmathink).
Now we are back with the newest **Gemma3NPC-1b**, trained using our [RolePlay-NPCv2](https://huggingface.co/datasets/chimbiwide/RolePlay-NPCv2) dataset.
---
### Training Parameters
We trained this model as a rank-32 LoRA adapter with two epoches over `RolePlay-NPCv2` using a 80GB A100 in Google Colab. For this run, we employed a learning rate of `2e-5` and a total batch size of 8 and gradient accumulation steps of 4. A cosine learning rate scheduler was used with an 150-step warmup. With a gradient clipping of 1.0.
Check out our training notebook [here](https://github.com/chimbiwide/Gemma3NPC/blob/main/Training/Gemma3NPC_1b.ipynb).
---
### Changes & Performance
With this new 1b model, we used much more aggresive training parameters and added some NSFW dataset to experiment with the results. We noticed a few really interesting responses:
- **There seems to be some sign of "reasoning"**
![image](https://cdn-uploads.huggingface.co/production/uploads/67d5b5a056a9d31aa0b49687/K-RdDLXbkZSNuf-bFZU8P.png)
![image](https://cdn-uploads.huggingface.co/production/uploads/67d5b5a056a9d31aa0b49687/WTPMNS2A8skZ0cwm43YTH.png)
- **The model is less likely to break out of character**
-
Something up to the users to explore for themselves, remember to provide a roleplaying prompt first!
---
### Future Work
Now, we will be focusing on further improving Gemma3NPC, not only just through training parameters.
1. Better data (most of our data are old and need an update), either collected or synthetically generated.
2. Better & new models, expand beyond Gemma3 model family, our next goal is a Qwen3 based model.
3. Adding GRPO into the training loop.
These improvements serve our ultimate goal of creating an small agentic NPC model, with good RP quality and tool-calling for dynamic in-game interactions.
We also plan to create some sort of a Unity game demo,it's on its way.