walidsobhie-code commited on
Commit
3bc915b
·
1 Parent(s): 2088481

Update README: add requested badges (Apache 2.0, OpenRouter, Hugging Face, HumanEval, MBPP) and acknowledge sub-agent enhancements

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -1,6 +1,10 @@
1
  <p align="center">
2
  <img src="https://img.shields.io/github/stars/my-ai-stack/stack-2.9" alt="Stars">
3
- <img src="https://img.shields.io/github/license/my-ai-stack-stack-2.9" alt="License">
 
 
 
 
4
  <img src="https://img.shields.io/python version/3.10+-blue" alt="Python">
5
  <img src="https://img.shields.io/discord" alt="Discord">
6
  </p>
@@ -44,6 +48,8 @@ These scores were therefore **unverifiable** and potentially misleading.
44
 
45
  We are rebuilding the evaluation infrastructure with proper methodology:
46
 
 
 
47
  1. **Official datasets**: HumanEval (164 problems), MBPP (500 problems)
48
  2. **Reproducible runs**: Full logs, config files, and per-problem results
49
  3. **Standard metrics**: Pass@1 with confidence intervals, using k≥100 samples
 
1
  <p align="center">
2
  <img src="https://img.shields.io/github/stars/my-ai-stack/stack-2.9" alt="Stars">
3
+ <img src="https://img.shields.io/github/license/my-ai-stack/stack-2.9?logo=apache" alt="License: Apache 2.0">
4
+ <img src="https://img.shields.io/badge/OpenRouter-Supported-green?logo=openrouter" alt="OpenRouter">
5
+ <img src="https://img.shields.io/badge/Hugging%20Face-Model-green?logo=huggingface" alt="Hugging Face">
6
+ <img src="https://img.shields.io/badge/HumanEval-Evaluation%20In%20Progress-yellow?logo=python" alt="HumanEval">
7
+ <img src="https://img.shields.io/badge/MBPP-Evaluation%20In%20Progress-yellow?logo=python" alt="MBPP">
8
  <img src="https://img.shields.io/python version/3.10+-blue" alt="Python">
9
  <img src="https://img.shields.io/discord" alt="Discord">
10
  </p>
 
48
 
49
  We are rebuilding the evaluation infrastructure with proper methodology:
50
 
51
+ **🔬 Recent Enhancement**: This release's documentation improvements, OpenRouter integration, tool system documentation, and evaluation audit were performed by autonomous sub-agents. See [EVALUATION.md](EVALUATION.md) for details.
52
+
53
  1. **Official datasets**: HumanEval (164 problems), MBPP (500 problems)
54
  2. **Reproducible runs**: Full logs, config files, and per-problem results
55
  3. **Standard metrics**: Pass@1 with confidence intervals, using k≥100 samples