Text Generation
Transformers
English
Chinese
reflexive-intelligence
multi-reward-grpo
cognitive-architecture
financial-reasoning
observer-depth
phase-transition
ouroboros
mixture-of-experts
Instructions to use MMJBDS/ouroboros-v24 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MMJBDS/ouroboros-v24 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MMJBDS/ouroboros-v24")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("MMJBDS/ouroboros-v24", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use MMJBDS/ouroboros-v24 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MMJBDS/ouroboros-v24" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MMJBDS/ouroboros-v24", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/MMJBDS/ouroboros-v24
- SGLang
How to use MMJBDS/ouroboros-v24 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MMJBDS/ouroboros-v24" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MMJBDS/ouroboros-v24", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MMJBDS/ouroboros-v24" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MMJBDS/ouroboros-v24", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use MMJBDS/ouroboros-v24 with Docker Model Runner:
docker model run hf.co/MMJBDS/ouroboros-v24
| license: other | |
| license_name: research-only | |
| license_link: LICENSE | |
| language: | |
| - en | |
| - zh | |
| tags: | |
| - reflexive-intelligence | |
| - multi-reward-grpo | |
| - cognitive-architecture | |
| - financial-reasoning | |
| - observer-depth | |
| - phase-transition | |
| - ouroboros | |
| - mixture-of-experts | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| # Ouroboros V24: Cognitive Architecture for Reflexive Financial Reasoning | |
| **Ouroboros V24** is the latest iteration of a cognitive architecture designed for autonomous financial decision-making. Built on a 35B-parameter Mixture-of-Experts (MoE) base model with ~3B active parameters, trained through **24 iterative rounds** of multi-reward GRPO with a **54-dimensional cognitive reward topology**. | |
| > ⚠️ **Weights are not publicly released.** This model card documents the architecture and training methodology. For research collaboration inquiries, contact the author. | |
| ## Architecture | |
| ### Base Model | |
| - **Type**: Mixture-of-Experts (MoE) | |
| - **Total Parameters**: ~35B | |
| - **Active Parameters**: ~3B per token | |
| - **Context Window**: 32K tokens | |
| ### Training Methodology | |
| - **Algorithm**: R-GRPO (Reflexive Group Relative Policy Optimization) | |
| - **Training Rounds**: 24 iterative cycles (V1 → V24) | |
| - **Adapter Strategy**: 20-layer sequential LoRA merge chain | |
| - **Reward Architecture**: SCRGNDWMT (9-tier, 54 sub-dimensions) | |
| ### 9-Tier Reward Topology (SCRGNDWMT) | |
| | Tier | Name | Sub-dimensions | Description | | |
| |------|------|----------------|-------------| | |
| | **S** | Structure | 6 | XML formatting, JSON decision blocks | | |
| | **C** | Content | 7 | Domain expertise, data fidelity, causal depth | | |
| | **R** | Reasoning | 5 | Temporal-causal chains, counterfactual depth | | |
| | **G** | Game Theory | 5 | K-level thinking, deception detection, coalition | | |
| | **N** | Narrative | 4 | Scenario construction, debate, arc coherence | | |
| | **D** | Data Fidelity | 3 | Numerical accuracy, source attribution | | |
| | **W** | World Model | 6 | Regime detection, cross-market transmission, macro | | |
| | **M** | Metacognition | 7 | Self-awareness, Bayesian confidence, falsification | | |
| | **T** | Temporal-Causal | 5 | Causal chains, temporal depth, granularity | | |
| ### V24 Upgrades (from V22) | |
| - **C7 (CausalChainDepthV2)**: Multi-step causal chains with time-lag annotations | |
| - **M7 (BayesianConfidence)**: Calibrated confidence field in JSON decisions | |
| - **W3 (CrossMarketPath)**: Structural contagion paths (Market A → Mechanism → Market B) | |
| - **M5 (FalsificationV2)**: Quantitative, price-based invalidation conditions | |
| ### Key Training Parameters | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Learning rate | 5 × 10⁻⁷ | | |
| | Group size | 12 | | |
| | Max completion tokens | 1000 | | |
| | Temperature | 1.15 | | |
| | β-annealing | Stable (β=0.05) ↔ Break-up (β=0.03) | | |
| | LoRA rank | ≥ 10 | | |
| ## Key Results | |
| ### Reflexive Intelligence Emergence | |
| During V17 training, reflexive reasoning emerged through a **discontinuous phase transition** at Step 153 — after 150+ steps of zero reflexivity scores, the capability appeared spontaneously and sustained. This is documented in Papers 1-3 of the research program. | |
| ### V24 Training (ongoing) | |
| - **54-dimensional reward** actively guiding cognitive development | |
| - **Bayesian confidence calibration** observed from Step 18 | |
| - **Cross-market causal reasoning** emerging by Step 25 | |
| - **Zero gradient failures** through 55+ steps | |
| ## Research Program | |
| This model is part of a six-paper research program: | |
| | Paper | Title | DOI | | |
| |-------|-------|-----| | |
| | P1 | Reflexive Intelligence in LLMs | [10.5281/zenodo.19557261](https://doi.org/10.5281/zenodo.19557261) | | |
| | P2 | Observer Depth (ReflexBench) | [10.5281/zenodo.19627242](https://doi.org/10.5281/zenodo.19627242) | | |
| | P3 | When Rewards Collide (Multi-Reward GRPO) | [10.5281/zenodo.19665969](https://doi.org/10.5281/zenodo.19665969) | | |
| | P4 | Ouroboros V22 Architecture | [10.5281/zenodo.19666786](https://doi.org/10.5281/zenodo.19666786) | | |
| | P5 | The Cognitive Lifecycle | [10.5281/zenodo.19666806](https://doi.org/10.5281/zenodo.19666806) | | |
| | P6 | Cognitive Reward Topology | [10.5281/zenodo.19666829](https://doi.org/10.5281/zenodo.19666829) | | |
| ## Related Resources | |
| | Resource | Link | | |
| |----------|------| | |
| | **ReflexBench Dataset** | [MMJBDS/reflexbench](https://huggingface.co/datasets/MMJBDS/reflexbench) | | |
| | **ReflexBench Eval Results** | [MMJBDS/reflexbench-eval](https://huggingface.co/datasets/MMJBDS/reflexbench-eval) | | |
| | **Papers Repository** | [github.com/mmjbds/ouroboros-papers](https://github.com/mmjbds/ouroboros-papers) | | |
| | **Evaluation Code** | [github.com/mmjbds/reflexbench](https://github.com/mmjbds/reflexbench) | | |
| ## Citation | |
| ```bibtex | |
| @article{zhang2026ouroborosv22, | |
| title={Ouroboros V22: Bayesian Scenario Simulation and Recurrent Depth Cognition}, | |
| author={Zhang, Mian}, | |
| year={2026}, | |
| doi={10.5281/zenodo.19666786} | |
| } | |
| @article{zhang2026topology, | |
| title={Cognitive Reward Topology: A Nine-Tier Architecture for Multi-Reward GRPO}, | |
| author={Zhang, Mian}, | |
| year={2026}, | |
| doi={10.5281/zenodo.19666829} | |
| } | |
| ``` | |
| ## Author | |
| - **Mian Zhang** — Independent AI Researcher | |
| - **ORCID**: [0009-0001-9556-3839](https://orcid.org/0009-0001-9556-3839) | |
| - **Email**: 373743743@qq.com | |
| - **GitHub**: [@mmjbds](https://github.com/mmjbds) | |
| - **Twitter/X**: [@Henry_Avery666](https://x.com/Henry_Avery666) | |
| - **LinkedIn**: [henryavery-mianzhang](https://linkedin.com/in/henryavery-mianzhang) | |
| ## License | |
| This model card is released under CC BY 4.0. Model weights are not publicly available. | |