Hercules LLM collection
Collection
Hercules Gemma 3N collection • 6 items • Updated
How to use EpistemeAI/Hercules-Coder-E4B-it with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("EpistemeAI/Hercules-Coder-E4B-it", dtype="auto")How to use EpistemeAI/Hercules-Coder-E4B-it with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EpistemeAI/Hercules-Coder-E4B-it to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EpistemeAI/Hercules-Coder-E4B-it to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for EpistemeAI/Hercules-Coder-E4B-it to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="EpistemeAI/Hercules-Coder-E4B-it",
max_seq_length=2048,
)curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EpistemeAI/Hercules-Coder-E4B-it to start chattingirm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EpistemeAI/Hercules-Coder-E4B-it to start chatting# No setup required# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EpistemeAI/Hercules-Coder-E4B-it to start chattingpip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="EpistemeAI/Hercules-Coder-E4B-it",
max_seq_length=2048,
)This finetuned model is specialized in STEM like LCB, CodeForce, AIME24, AIME25, AMC23, MATH500.
Note:
Use unsloth inference
!pip install --upgrade transformers
import torch
from transformers import pipeline
model_id = "EpistemeAI/Hercules-Coder-E4B-it"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
print(pipe("Write me a Python function to calculate the nth fibonacci number."))
Benchmark results (5 shot):
| Tasks | Version | Filter | n-shot | Metric | Value | |
|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 5 | acc | ↑ | 0.5759 |
| hellaswag | 1 | none | 5 | acc | ↑ | 0.7651 |
| winogrande | 1 | none | 5 | acc | ↑ | 0.7526 |
GPQA Diamond result
| Tasks | Version | Filter | n-shot | Metric | Value | |
|---|---|---|---|---|---|---|
| gpqa_diamond_zeroshot | 1 | none | 0 | acc | ↑ | 0.2516 |
| none | 0 | acc_norm | ↑ | 0.2516 |
This gemma3n model was trained 2x faster with Unsloth and Huggingface's TRL library.
@misc{liu2025rstarcoderscalingcompetitivecode,
title={rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset},
author={Yifei Liu and Li Lyna Zhang and Yi Zhu and Bingcheng Dong and Xudong Zhou and Ning Shang and Fan Yang and Mao Yang},
year={2025},
eprint={2505.21297},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.21297},
}
# Gated model: Login with a HF token with gated access permission hf auth login