Spaces:
Running
Running
metadata
title: README
emoji: 💻
colorFrom: pink
colorTo: red
sdk: static
pinned: false
license: mit
short_description: 'ISAAC OS — Neural v1 (Agentic-Lite deterministic evaluation '
ISAAC OS — Neural v1 (Deterministic Evaluation, Agentic-Lite)
Organization: Isaac-AI-OS
Model ID: isaac-20b
Policy Version: agentic-lite-v1
Artifacts Dataset: isaac-20b-eval-artifacts
Docker Digest: isaac-hf@sha256:6fc9f0d85dfe56daba8fc92496718226f056014b3e84ee7a823df1d9271a57c0
ISAAC is a self-verifying neural operating system designed for reproducible, auditable AI.
The Agentic-Lite evaluation mode enforces deterministic sampling (temperature=0, top_p=0, seed=7) and code-only normalization, producing byte-identical artifacts across runs.
🔍 Current Subset Results
| Benchmark | Split | Metric | Score |
|---|---|---|---|
| HumanEval | N=5 | pass@1 | 0.60 |
| MBPP | N=5 | pass@1 | 0.80 |
| SWE-Bench Lite | 1/1 instance resolved | resolved via fallback_dataset_patch | 1 / 1 |
🧩 Reproducibility
- Deterministic “Agentic-Lite” mode: single plan, no concurrency, fixed seeds.
- Evaluation artifacts (LM, Code, SWE) are all published for cross-verification.
- Manifest pinned to Docker digest above for full audit trace.
📂 Artifacts
💡 Roadmap
- Add
logprobsto/v1/completions→ full MC reasoning (MMLU/ARC/HellaSwag). - Enable kernel-level determinism for multi-node HA.
- Publish “Agentic Swarm” uplift appendix once replay bundles are live.