How to use from
Unsloth Studio
# Gated model: Login with a HF token with gated access permission
hf auth login
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EpistemeAI/Hercules-Coder-E4B-it to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EpistemeAI/Hercules-Coder-E4B-it to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EpistemeAI/Hercules-Coder-E4B-it to start chatting
Load model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="EpistemeAI/Hercules-Coder-E4B-it",
    max_seq_length=2048,
)
Quick Links

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

This finetuned model is specialized in STEM like LCB, CodeForce, AIME24, AIME25, AMC23, MATH500.

Note:

  • Currently only text is supported.
  • Ollama: ollama run hf.co/unsloth/gemma-3n-E4B-it-GGUF:Q4_K_XL - auto-sets correct chat template and settings
  • Set temperature = 1.0, top_k = 64, top_p = 0.95, min_p = 0.0
  • Gemma 3n max tokens (context length): 32K. Gemma 3n chat template:

Use unsloth inference

!pip install --upgrade transformers

import torch
from transformers import pipeline
model_id = "EpistemeAI/Hercules-Coder-E4B-it"
pipe = pipeline(
    "text-generation", 
    model=model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)
print(pipe("Write me a Python function to calculate the nth fibonacci number."))

Benchmark results (5 shot):

Tasks Version Filter n-shot Metric Value
arc_challenge 1 none 5 acc ↑ 0.5759
hellaswag 1 none 5 acc ↑ 0.7651
winogrande 1 none 5 acc ↑ 0.7526

GPQA Diamond result

Tasks Version Filter n-shot Metric Value
gpqa_diamond_zeroshot 1 none 0 acc ↑ 0.2516
none 0 acc_norm ↑ 0.2516

Uploaded finetuned model

  • Developed by: EpistemeAI
  • License: apache-2.0
  • Finetuned from model : unsloth/gemma-3n-e4b-unsloth-bnb-4bit

This gemma3n model was trained 2x faster with Unsloth and Huggingface's TRL library.

Citations

@misc{liu2025rstarcoderscalingcompetitivecode,
      title={rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset}, 
      author={Yifei Liu and Li Lyna Zhang and Yi Zhu and Bingcheng Dong and Xudong Zhou and Ning Shang and Fan Yang and Mao Yang},
      year={2025},
      eprint={2505.21297},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.21297}, 
}
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EpistemeAI/Hercules-Coder-E4B-it

Quantizations
1 model

Collection including EpistemeAI/Hercules-Coder-E4B-it

Paper for EpistemeAI/Hercules-Coder-E4B-it