Codette Reasoning - Training Datasets

Synthetic training datasets for the Codette Multi-Perspective Reasoning System.

Each dataset contains instruction-tuning examples designed to teach a specific cognitive reasoning perspective to Llama 3.1 8B Instruct via LoRA fine-tuning.

Datasets

Dataset Adapter Examples Description
newton_reasoning.jsonl Newton 3000 Analytical physics, systematic reasoning, empirical evidence
davinci_reasoning.jsonl DaVinci 2500 Creative invention, cross-domain connections, visual thinking
empathy_reasoning.jsonl Empathy 2500 Emotional intelligence, human experience, compassion
philosophy_reasoning.jsonl Philosophy 2000 Conceptual analysis, ethical reasoning, fundamental questions
quantum_reasoning.jsonl Quantum 2000 Probabilistic thinking, superposition, complementarity
consciousness_reasoning.jsonl Consciousness 3000 Recursive cognition (RC+xi), meta-cognition, epistemic tension
multi_perspective_reasoning.jsonl Multi-Perspective 2500 Cross-lens synthesis, integrative reasoning
systems_architecture_reasoning.jsonl Systems Architecture 2000 Modularity, scalability, engineering principles
orchestrator_reasoning.jsonl Orchestrator 4000 Query routing, debate coordination, coherence monitoring

Total: ~24,500 training examples

Format

Each JSONL file contains records in chat-completion format:

{
  "messages": [
    {"role": "system", "content": "You are Codette, reasoning with Newtonian analytical precision."},
    {"role": "user", "content": "Explain the relationship between force and acceleration."},
    {"role": "assistant", "content": "From an analytical physics perspective..."}
  ]
}

Generation Method

Datasets are generated using a pure-Python template engine (no model inference required):

  1. Template Registry: 30-60 question templates per adapter with variable slots
  2. Topic Engine: 40-80 topics with subtopics for domain-specific coverage
  3. Answer Generator: Structured educational answers (80-200 words) with perspective-specific framing
  4. Counterexamples: 12% of examples include counterexample reasoning for robustness
  5. Phase 6+ Awareness: All templates incorporate semantic tension, coherence field, and AEGIS concepts

Phase 6+ Framework Coverage

The datasets teach these framework concepts across all perspectives:

  • Semantic Tension (xi): Measuring and working with epistemic disagreement
  • Coherence Field (Gamma): Monitoring reasoning health and detecting collapse
  • Quantum Spiderweb: Belief propagation and perspective interconnection
  • AEGIS Governance: Ethical validation across 6 frameworks (utilitarian, deontological, virtue, care, justice, rights)
  • Specialization Tracking: Domain expertise development and confidence calibration
  • Pre-flight Prediction: Anticipating conflicts before multi-agent debate

Usage

Load with HuggingFace Datasets

from datasets import load_dataset

ds = load_dataset("Raiff1982/Codette-Reasoning", data_files="newton_reasoning.jsonl")

Train a LoRA Adapter

from trl import SFTTrainer
from peft import LoraConfig

lora_config = LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    task_type="CAUSAL_LM",
)

trainer = SFTTrainer(
    model=base_model,
    train_dataset=ds["train"],
    peft_config=lora_config,
    max_seq_length=2048,
    num_train_epochs=3,
)
trainer.train()

Related Repos

License

Datasets are released under the same terms as the Llama 3.1 model they are designed to fine-tune. Subject to the Llama 3.1 Community License.

Downloads last month
-
GGUF
Model size
13.6M params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support