Codette Reasoning - Training Datasets
Synthetic training datasets for the Codette Multi-Perspective Reasoning System.
Each dataset contains instruction-tuning examples designed to teach a specific cognitive reasoning perspective to Llama 3.1 8B Instruct via LoRA fine-tuning.
Datasets
| Dataset | Adapter | Examples | Description |
|---|---|---|---|
| newton_reasoning.jsonl | Newton | 3000 | Analytical physics, systematic reasoning, empirical evidence |
| davinci_reasoning.jsonl | DaVinci | 2500 | Creative invention, cross-domain connections, visual thinking |
| empathy_reasoning.jsonl | Empathy | 2500 | Emotional intelligence, human experience, compassion |
| philosophy_reasoning.jsonl | Philosophy | 2000 | Conceptual analysis, ethical reasoning, fundamental questions |
| quantum_reasoning.jsonl | Quantum | 2000 | Probabilistic thinking, superposition, complementarity |
| consciousness_reasoning.jsonl | Consciousness | 3000 | Recursive cognition (RC+xi), meta-cognition, epistemic tension |
| multi_perspective_reasoning.jsonl | Multi-Perspective | 2500 | Cross-lens synthesis, integrative reasoning |
| systems_architecture_reasoning.jsonl | Systems Architecture | 2000 | Modularity, scalability, engineering principles |
| orchestrator_reasoning.jsonl | Orchestrator | 4000 | Query routing, debate coordination, coherence monitoring |
Total: ~24,500 training examples
Format
Each JSONL file contains records in chat-completion format:
{
"messages": [
{"role": "system", "content": "You are Codette, reasoning with Newtonian analytical precision."},
{"role": "user", "content": "Explain the relationship between force and acceleration."},
{"role": "assistant", "content": "From an analytical physics perspective..."}
]
}
Generation Method
Datasets are generated using a pure-Python template engine (no model inference required):
- Template Registry: 30-60 question templates per adapter with variable slots
- Topic Engine: 40-80 topics with subtopics for domain-specific coverage
- Answer Generator: Structured educational answers (80-200 words) with perspective-specific framing
- Counterexamples: 12% of examples include counterexample reasoning for robustness
- Phase 6+ Awareness: All templates incorporate semantic tension, coherence field, and AEGIS concepts
Phase 6+ Framework Coverage
The datasets teach these framework concepts across all perspectives:
- Semantic Tension (xi): Measuring and working with epistemic disagreement
- Coherence Field (Gamma): Monitoring reasoning health and detecting collapse
- Quantum Spiderweb: Belief propagation and perspective interconnection
- AEGIS Governance: Ethical validation across 6 frameworks (utilitarian, deontological, virtue, care, justice, rights)
- Specialization Tracking: Domain expertise development and confidence calibration
- Pre-flight Prediction: Anticipating conflicts before multi-agent debate
Usage
Load with HuggingFace Datasets
from datasets import load_dataset
ds = load_dataset("Raiff1982/Codette-Reasoning", data_files="newton_reasoning.jsonl")
Train a LoRA Adapter
from trl import SFTTrainer
from peft import LoraConfig
lora_config = LoraConfig(
r=16, lora_alpha=32, lora_dropout=0.05,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
task_type="CAUSAL_LM",
)
trainer = SFTTrainer(
model=base_model,
train_dataset=ds["train"],
peft_config=lora_config,
max_seq_length=2048,
num_train_epochs=3,
)
trainer.train()
Related Repos
- Raiff1982/codette-llama-3.1-8b-gguf - Quantized GGUF model
- Raiff1982/codette-lora-adapters - Trained LoRA adapters
- Raiff1982/codette-llama-3.1-8b-merged - Merged orchestrator model
License
Datasets are released under the same terms as the Llama 3.1 model they are designed to fine-tune. Subject to the Llama 3.1 Community License.
- Downloads last month
- -
Hardware compatibility
Log In to add your hardware
4-bit
8-bit
16-bit