Spaces:

singhalamaan116
/

EcoEval-LLM

Running

singhalamaan116 commited on 11 days ago

Commit

4751408

verified ·

1 Parent(s): 8c35076

Create README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,13 +1,44 @@
 ---
-title: EcoEval LLM
-emoji: 🏆
-colorFrom: pink
-colorTo: green
 sdk: gradio
-sdk_version: 6.0.1
 app_file: app.py
-pinned: false
-short_description: 'Framework that benchmarks models for energy usage '
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: EcoEval-LLM
+emoji: 🌱
+colorFrom: green
+colorTo: blue
 sdk: gradio
+sdk_version: 4.0.0
 app_file: app.py
+pinned: true
 ---
+# 🌱 EcoEval-LLM: Energy & Carbon Benchmarking for LLM Code Generation
+**EcoEval-LLM** benchmarks code generation models on:
+- ✅ Task correctness (unit-test based pass rate)
+- ⏱ Runtime
+- ⚡ Energy consumption (kWh)
+- 🌍 CO₂ emissions (kg) via [CodeCarbon](https://github.com/mlco2/codecarbon)
+It runs a small benchmark of Python programming tasks, executes the generated code against unit tests, and measures the environmental footprint of the run.
+## How it works
+1. You choose:
+   - A Hugging Face Hub model ID (e.g. `Salesforce/codegen-350M-multi`)
+   - A built-in Python benchmark dataset
+2. The app:
+   - Loads the model and tokenizer via `transformers`
+   - Generates code for each task
+   - Executes unit tests to check correctness
+   - Wraps the whole process in a `CodeCarbon.EmissionsTracker` to measure energy and CO₂
+3. Results:
+   - Run-level summary (accuracy, runtime, energy, CO₂, energy per task, CO₂ per passed task)
+   - Per-task pass/fail and runtime
+   - Persistent leaderboard (`runs.csv`) across Space sessions
+## Run locally
+```bash
+git clone <this-repo-url>
+cd EcoEval-LLM
+pip install -r requirements.txt
+python app.py