petermant commited on
Commit
98cddd7
·
verified ·
1 Parent(s): 05fdb3b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -1
README.md CHANGED
@@ -5,6 +5,44 @@ colorFrom: pink
5
  colorTo: red
6
  sdk: static
7
  pinned: false
 
 
8
  ---
 
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  colorTo: red
6
  sdk: static
7
  pinned: false
8
+ license: mit
9
+ short_description: 'ISAAC OS — Neural v1 (Agentic-Lite deterministic evaluation '
10
  ---
11
+ # ISAAC OS — Neural v1 (Deterministic Evaluation, Agentic-Lite)
12
 
13
+ **Organization:** [Isaac-AI-OS](https://huggingface.co/Isaac-AI-OS)
14
+ **Model ID:** [`isaac-20b`](https://huggingface.co/Isaac-AI-OS/isaac-20b)
15
+ **Policy Version:** `agentic-lite-v1`
16
+ **Artifacts Dataset:** [`isaac-20b-eval-artifacts`](https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts)
17
+ **Docker Digest:** `isaac-hf@sha256:6fc9f0d85dfe56daba8fc92496718226f056014b3e84ee7a823df1d9271a57c0`
18
+
19
+ ---
20
+
21
+ ISAAC is a **self-verifying neural operating system** designed for reproducible, auditable AI.
22
+ The *Agentic-Lite* evaluation mode enforces deterministic sampling (`temperature=0`, `top_p=0`, `seed=7`) and code-only normalization, producing byte-identical artifacts across runs.
23
+
24
+ ### 🔍 Current Subset Results
25
+
26
+ | Benchmark | Split | Metric | Score |
27
+ |------------|--------|---------|------:|
28
+ | **HumanEval** | N=5 | pass@1 | **0.60** |
29
+ | **MBPP** | N=5 | pass@1 | **0.80** |
30
+ | **SWE-Bench Lite** | 1/1 instance resolved | resolved via fallback_dataset_patch | **1 / 1** |
31
+
32
+ ---
33
+
34
+ ### 🧩 Reproducibility
35
+ - Deterministic “Agentic-Lite” mode: single plan, no concurrency, fixed seeds.
36
+ - Evaluation artifacts (LM, Code, SWE) are all published for cross-verification.
37
+ - Manifest pinned to Docker digest above for full audit trace.
38
+
39
+ ### 📂 Artifacts
40
+ - [LM normalized results](https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/lm_results.norm.json)
41
+ - [Code benchmarks](https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/code/summary.json)
42
+ - [SWE-Bench Lite results](https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/swe/results.json)
43
+ - [Manifest](https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/manifest.json)
44
+
45
+ ### 💡 Roadmap
46
+ - Add `logprobs` to `/v1/completions` → full MC reasoning (MMLU/ARC/HellaSwag).
47
+ - Enable kernel-level determinism for multi-node HA.
48
+ - Publish “Agentic Swarm” uplift appendix once replay bundles are live.