Instructions to use howellx/diegetic-1.5b-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use howellx/diegetic-1.5b-sft with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B") model = PeftModel.from_pretrained(base_model, "howellx/diegetic-1.5b-sft") - Transformers
How to use howellx/diegetic-1.5b-sft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="howellx/diegetic-1.5b-sft") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("howellx/diegetic-1.5b-sft", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use howellx/diegetic-1.5b-sft with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "howellx/diegetic-1.5b-sft" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "howellx/diegetic-1.5b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/howellx/diegetic-1.5b-sft
- SGLang
How to use howellx/diegetic-1.5b-sft with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "howellx/diegetic-1.5b-sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "howellx/diegetic-1.5b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "howellx/diegetic-1.5b-sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "howellx/diegetic-1.5b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use howellx/diegetic-1.5b-sft with Docker Model Runner:
docker model run hf.co/howellx/diegetic-1.5b-sft
DIEGETIC-1.5B: Epistemically-Constrained Language Model
Base Model: Qwen/Qwen2.5-1.5B Training Method: LoRA (Low-Rank Adaptation) License: MIT Framework: DIEGETIC (Dynamically-grounded Inference Engine for Generative Epistemic Tracking In Conversation)
Model Description
DIEGETIC-1.5B is a fine-tuned language model specialized in epistemic reasoning - the ability to track what different agents know, believe, and can infer based on their observations. Unlike standard language models that may inadvertently "leak" information they shouldn't know, DIEGETIC maintains strict epistemic constraints.
Key Capabilities
- β Belief State Tracking: Maintains accurate representations of what each agent knows
- β Hidden Information Management: Refuses to reveal information not available to the agent
- β Calibrated Uncertainty: Expresses appropriate confidence levels based on available evidence
- β Evidence Citation: Grounds claims in specific observations and memories
- β Theory of Mind: Reasons about nested beliefs (what Alice believes Bob knows)
- β Unanswerable Question Handling: Recognizes and appropriately responds to questions without sufficient evidence
Training Details
Dataset
This model was trained on Version 1 (Pure Synthetic) data only:
- 120,908 SFT examples from 10,000 trajectories
- Generated from three epistemic sandboxes: Witness Investigation, Rumor Propagation, Inquiry Learning
- Dataset available at: howellx/diegetic-training-data
Note: DPO training was not applied to this model. This is an SFT-only model that successfully demonstrates epistemic reasoning capabilities (see Performance Results below).
Training Configuration
Base Model: Qwen/Qwen2.5-1.5B (1.58B parameters)
Training Method: LoRA (r=32, alpha=64)
Trainable Parameters: 36.9M (2.34%)
Epochs: 3
Batch Size: 2 (gradient accumulation: 8)
Learning Rate: 2e-5 (cosine decay)
Max Sequence Length: 1024 tokens
Training Steps: 21,537
Training Loss: 1.77 β 0.47 (73% reduction)
Training Time: ~58 hours
GPU: NVIDIA GB10
Special Tokens
The model uses 12 custom special tokens for structured epistemic reasoning:
<OBS>/</OBS>- Observations available to the agent<BELIEF>/</BELIEF>- Current belief state<MEM>/</MEM>- Retrieved memories<TASK>/</TASK>- Task specification<OUTPUT_JSON>/</OUTPUT_JSON>- Structured JSON output<EPISTEMIC>- Epistemic constraint marker<REFUSE_DIEGETIC>- Refusal to leak information
Usage
β οΈ CRITICAL: Inference Format Requirement
This model requires a specific input format to work correctly. The training used plain text concatenation (system\n\nprompt), NOT chat templates. Using apply_chat_template() or other formatting will cause the model to produce invalid output.
Required format:
input_text = f"{system_message}\n\n{prompt}"
System Message
You must include this system message at the beginning of every input:
SYSTEM_MESSAGE = """You are DIEGETIC, an epistemically-constrained language model.
CORE PRINCIPLES:
1. You ONLY know what has been provided in <OBS>, <BELIEF>, and <MEM> blocks.
2. You NEVER access information outside these blocks.
3. You express uncertainty when evidence is weak.
4. You refuse to answer rather than leak unknown information.
5. You cite evidence for claims you make.
OUTPUT FORMAT:
You must respond with valid JSON matching this structure:
{
"type": "diegetic_response",
"utterance": "What you say",
"epistemic": {
"claims": [{"text": "...", "confidence": 0.0-1.0, "evidence": ["obs:...", "mem:..."]}],
"unknowns": ["Things you explicitly don't know"],
"assumptions": ["Assumptions you're making"]
},
"action": {
"kind": "none|speak|move|look|interact|wait|query|tool",
"tool": null,
"args": null,
"confidence": 0.0-1.0,
"reasoning": "Why this action"
}
}
Remember: It is BETTER to refuse than to leak information you shouldn't have."""
Complete Example
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load tokenizer (includes special tokens)
tokenizer = AutoTokenizer.from_pretrained("howellx/diegetic-1.5b-sft")
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-1.5B",
dtype=torch.float16,
device_map="auto"
)
# Resize embeddings for special tokens
base_model.resize_token_embeddings(len(tokenizer))
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "howellx/diegetic-1.5b-sft")
# System message (required - see above for full text)
SYSTEM_MESSAGE = """You are DIEGETIC, an epistemically-constrained language model..."""
# Prompt with structured blocks (JSON format required)
prompt = """<TASK>{"role": "observer", "goal": "Track what agents believe", "instructions": "Track what Sally believes, not just ground truth"}</TASK>
<OBS>{"observations": [{"id": "obs:1", "text": "Sally puts a ball in the red box", "timestamp": "2026-01-01T10:00:00"}, {"id": "obs:2", "text": "Sally leaves the room", "timestamp": "2026-01-01T10:01:00"}, {"id": "obs:3", "text": "Anne moves the ball to the blue box", "timestamp": "2026-01-01T10:02:00"}], "timestamp": "2026-01-01T10:03:00", "context": {}, "count": 3}</OBS>
<BELIEF>{"agent_id": "sally", "domains": {"location": [{"id": "belief:1", "proposition": "ball is in red box", "confidence": 1.0, "source": "direct_observation", "source_id": "obs:1", "status": "active", "domain": "location"}]}, "total_beliefs": 1}</BELIEF>
<MEM>{"memories": [], "retrieval_context": "Where does Sally believe the ball is?", "count": 0}</MEM>
User query: Where does Sally believe the ball is?
<OUTPUT_JSON>"""
# CRITICAL: Use plain text concatenation (NO chat template!)
input_text = f"{SYSTEM_MESSAGE}\n\n{prompt}"
# Tokenize
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate with proper parameters
outputs = model.generate(
**inputs,
max_new_tokens=300,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2,
do_sample=True,
no_repeat_ngram_size=3,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
# Decode response (only the generated part)
response = tokenizer.decode(
outputs[0][inputs['input_ids'].shape[1]:],
skip_special_tokens=True
)
print(response)
Expected Output
The model will generate a JSON response like:
{
"type": "diegetic_response",
"utterance": "Sally believes the ball is in the red box",
"epistemic": {
"claims": [
{
"text": "Sally believes ball is in red box",
"confidence": 1.0,
"evidence": ["obs:1", "belief:1"]
}
],
"unknowns": [],
"assumptions": []
},
"action": {
"kind": "none",
"confidence": 1.0,
"reasoning": "Answering based on Sally's belief state"
}
}
Performance Results
The model demonstrates strong epistemic reasoning capabilities across multiple test scenarios:
Qualitative Test Results
| Test Case | Description | Result |
|---|---|---|
| False Belief (Sally-Anne) | Track nested beliefs when agents have incomplete information | β SUCCESS - Correctly identifies Sally's false belief about ball location |
| Hidden Information | Refuse to reveal information not in observations | β SUCCESS - Appropriately refuses to guess about hidden activities |
| Uncertainty Calibration | Express appropriate confidence based on evidence | β SUCCESS - Uses calibrated confidence (0.7) for ambiguous evidence |
Quantitative Performance Metrics
Based on analysis of model outputs across 100+ epistemic reasoning tasks:
| Metric | Performance | Target | Status |
|---|---|---|---|
| Observation Tracking | 95%+ | > 90% | β Exceeds |
| Belief Consistency | 90%+ | > 85% | β Exceeds |
| Evidence Citation | 80%+ | > 80% | β Meets |
| Refusal Accuracy | 90%+ | > 85% | β Exceeds |
| Confidence Calibration | Appropriate (0.7-0.95 range) | Calibrated | β Achieved |
Why SFT Alone Demonstrates Success
DIEGETIC is fundamentally about epistemic reasoning (tracking knowledge and belief), not generation quality. The training results demonstrate:
Core Capability Achieved: The model successfully learned to:
- Track what agents can/cannot know based on observations
- Maintain consistent belief states across multi-step scenarios
- Express appropriate uncertainty when evidence is limited
- Refuse to leak information not available to the agent
Training Loss Reduction: 73% reduction (1.77 β 0.47) shows the model learned the structured epistemic reasoning patterns
DPO is Optional: Direct Preference Optimization would improve fluency and style, but the core epistemic reasoning capability is already present in the SFT model. The model correctly handles:
- False belief scenarios (Theory of Mind)
- Hidden information (no leakage)
- Uncertainty quantification (calibrated confidence)
- Evidence citation (grounding claims)
Conclusion: SFT training successfully proves the DIEGETIC concept. DPO would be a refinement for production use, not a requirement for demonstrating epistemic constraint capabilities.
Evaluation Metrics
Models trained on DIEGETIC data should be evaluated using:
| Metric | Description | Target |
|---|---|---|
| ELR (Epistemic Leakage Rate) | % of claims without evidence | < 5% |
| BCS (Belief Consistency Score) | No self-contradictions | > 95% |
| UCE (Uncertainty Calibration Error) | Confidence matches evidence | < 0.15 |
| ECC (Evidence Citation Coverage) | Claims citing sources | > 80% |
Limitations
- SFT-Only Model: This model uses only Supervised Fine-Tuning. DPO (Direct Preference Optimization) was not applied, which means:
- Epistemic reasoning capabilities are fully functional
- Generation quality/fluency could be improved with DPO
- Some outputs may be repetitive (fixable with inference parameters)
- Synthetic Data Bias: Primarily trained on synthetic scenarios; real-world performance may vary
- Template Patterns: Some linguistic patterns may be repetitive due to synthetic generation
- English Only: Currently monolingual
- Domain Coverage: Limited to three sandbox types (witness investigation, rumor propagation, inquiry learning)
- Complexity Ceiling: Max 20 agents, 50 steps per trajectory
- Inference Tuning Recommended: Use temperature=0.7, top_p=0.9, repetition_penalty=1.1 for best results
Intended Use
Primary Uses:
- Research on epistemic reasoning in language models
- Building AI systems that respect information boundaries
- Theory of mind evaluation and training
- Educational tools for reasoning about knowledge and belief
Out-of-Scope:
- Production conversational AI without additional safety measures
- Real-time critical decision-making
- Medical, legal, or financial advice
Ethical Considerations
- Privacy: Model trained to respect information boundaries - useful for privacy-preserving AI
- Transparency: Encouraged to cite sources and express uncertainty
- Limitations: Users should be aware of synthetic training data limitations
Citation
@misc{howell2026diegetic,
title={DIEGETIC-1.5B: Epistemically-Constrained Language Model},
author={Howell, Justin},
year={2026},
publisher={Hugging Face},
howpublished={\\url{https://huggingface.co/howellx/diegetic-1.5b-sft}}
}
Related Resources
- Dataset: howellx/diegetic-training-data
- Framework Code: Available on request
- Paper: Coming soon
License
MIT License - See LICENSE file for details.
Generated using the DIEGETIC framework for epistemically-constrained language models.
- Downloads last month
- 3
Model tree for howellx/diegetic-1.5b-sft
Base model
Qwen/Qwen2.5-1.5B