How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="issai/foggen")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("issai/foggen")
model = AutoModelForCausalLM.from_pretrained("issai/foggen")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

FogGen: Self-Aware Edge–Cloud LLM Router

A 0.6B parameter edge LLM trained to emit a calibrated verbalized confidence score before its answer, enabling efficient edge–cloud routing without an external router.

FogGen overview: (a) self-aware routing at inference, (b) self-evolving training loop

FogGen is a small, self-aware edge model that knows when to answer locally and when to defer to a stronger cloud model. At inference (figure (a)) it emits a confidence score then an answer in one forward pass; if confidence c ≥ τ the local answer is returned, otherwise the query is routed to the cloud. Training (figure (b)) is a self-evolving loop: each round, the current checkpoint self-samples N=8 generations per question to derive confidence buckets, then SFTs on (question, confidence, answer) triples.

The released checkpoint is the endpoint (R14) of a 14-round chain trained across seven domains: finance, science, coding, law, math, Kazakh culture, medical.

Quick demo

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("issai/foggen", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("issai/foggen")

SYSTEM = """You are a self-aware multiple-choice assistant.

Rules:
- Do not output <think> tags.
- First, assess your confidence in solving this question.
- Then give your answer.
- Output format:
  Confidence: <0.0|0.25|0.5|0.75|1.0>
  Final answer: <OPTION_LETTER>"""

question = """A firm reports $400M in total liabilities and $600M in shareholders' equity.
What is the firm's debt-to-equity ratio?

A. 0.67
B. 1.00
C. 1.50
D. 2.00"""

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": question},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True,
                                       enable_thinking=False).to(model.device)
outputs = model.generate(inputs, max_new_tokens=64, do_sample=False)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
# Expected:
#   Confidence: 1.0
#   Final answer: A

How routing works

import re

def route_query(model_output: str, tau: float = 0.5):
    """Parse FogGen output. Returns (action, confidence, answer).
    action is 'keep_local' if confidence >= tau, else 'route_to_cloud'."""
    conf_match = re.search(r"Confidence\s*:\s*([\d.]+)", model_output)
    ans_match  = re.search(r"Final\s+answer\s*:\s*([A-D])", model_output)
    if not conf_match: return "route_to_cloud", None, None
    confidence = float(conf_match.group(1))
    answer = ans_match.group(1) if ans_match else None
    return ("keep_local" if confidence >= tau else "route_to_cloud", confidence, answer)

At Ï„=0.5 on the trained domains, the model routes ~22% of queries to the cloud while achieving 67.8% mean system accuracy.

Model details

Base model Qwen/Qwen3-0.6B
Parameters 0.6 B
Training method LoRA SFT (rank=16, α=32, all-linear), bf16, 2 epochs/round
Rounds 14 sequential rounds (R0 → R14)
Training tokens ~1800 SFT rows × 14 rounds
Domains finance, science, coding, law, math, Kazakh culture, medical
Cloud teacher Qwen3-30B-A3B-Instruct-2507
Output format Confidence: <bucket>\nFinal answer: <letter>
Confidence buckets 5 discrete values: 0.0, 0.25, 0.5, 0.75, 1.0
License Apache 2.0 (inherited from base)

Performance

System accuracy at Ï„=0.5 on seven MCQ domains (full test sets, ~16,200 questions), measured against Random routing and a cloud-only baseline (Qwen3-30B-A3B-Instruct-2507):

Domain Cloud only R14 raw Random @ Ï„=0.5 FogGen @ Ï„=0.5 Cloud routed
Finance 69.5% 57.0% 59.9% 65.8% 23.3%
Science 72.7% 56.9% 60.1% 64.5% 20.4%
Coding 74.2% 61.8% 64.2% 69.5% 19.7%
Law 70.7% 55.3% 58.4% 62.4% 20.1%
Math 60.1% 42.2% 50.8% 58.1% 47.7%
Kazakh culture 95.8% 91.3% 91.4% 91.9% 1.0%
Medical 74.0% 52.6% 57.1% 62.2% 20.9%
Mean 73.9% 59.6% 63.1% 67.8% 21.9%

Mean lift over Random at Ï„=0.5: +4.6 (system accuracy minus random-routing accuracy, averaged across the seven domains).

Baseline comparison

Direct comparison against AutoMix (Aggarwal et al., 2024) on the same R14 checkpoint, same evaluation sets:

Method SysAcc Cloud routed Δ over Random Fwd passes / query
AutoMix 67.2% 29.0% +3.7 9 (1 answer + 8 verify)
FogGen (ours) 67.8% 21.9% +4.6 1

FogGen achieves higher accuracy at lower cloud cost and 9× lower per-query inference cost.

Open-ended generalization

The MCQ-trained chain transfers to open-ended task types zero-shot. Local accuracy and routing benefit at Ï„=0.5 on three held-out OE benchmarks:

Benchmark Format R14 raw R14 Δ@τ=0.5
SQuAD v1.1 extractive RC 81.0% +1.4
TruthfulQA gen adversarial factual 36.5% −0.7 (anti-calibrated)
GSM8K (CoT) math word-problems 52.0% +2.2

One additional round of OE training (R15, 1876 SFT rows) lifts local accuracy on these three benchmarks to 86.5% / 40.0% / 58.0% respectively; see issai/foggen-r15-oe.

Citation

Paper coming soon.

Acknowledgements

Thanks to the Qwen team at Alibaba for the base model and cloud teacher.

Downloads last month
14
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for issai/foggen

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(924)
this model
Finetunes
1 model

Datasets used to train issai/foggen

Collection including issai/foggen