Spaces:

ArshVerma
/

CodeLens

Sleeping

ArshVerma commited on Apr 5

Commit

c90ac2d

1 Parent(s): c1972ef

Skip DB init in TESTING; add reset_db

Avoid initializing the SQLite DB during tests by checking the TESTING env var in app.lifespan and logging accordingly. Add scripts/reset_db.py to remove and re-create the DB file for fresh state during development/benchmarks. Update tests: conftest sets TESTING=true and test_api uses a shared client fixture instead of creating TestClient per test. Update results.json sample data (IDs and durations) as part of test/sample data refresh.

Files changed (9) hide show

DEPLOYMENT.md +63 -0
app.py +6 -2
codelens_env/config.py +2 -0
codelens_env/database.py +14 -0
dashboard/vercel.json +15 -0
results.json +46 -46
scripts/reset_db.py +47 -0
tests/conftest.py +2 -0
tests/test_api.py +4 -8

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,63 @@

+# CodeLens. Deployment Guide (Production)
+Follow this guide to deploy **CodeLens. v1.0.0** to the professional cloud. This configuration uses **Vercel** for the frontend, **Render** for the backend, and **Supabase/Neon** for the PostgreSQL database.
+---
+## 1. 🗄️ Setup the Database (PostgreSQL)
+Since SQLite is disk-based and will be deleted at every restart on Render/Vercel, you **must** use a managed PostgreSQL service.
+1.  **Go to [Supabase](https://supabase.com)** or [Neon](https://neon.tech).
+2.  **Create a new Project** called "CodeLens".
+3.  **Copy your Connection String** (it should look like `postgres://user:pass@host:5432/dbname`).
+4.  **Important**: Keep this URL safe—it is your `DATABASE_URL`.
+---
+## 2. 🚀 Setup the Backend (Render)
+Render will host your FastAPI API and your Dockerized environment.
+1.  **Go to [Render Dashboard](https://dashboard.render.com)**.
+2.  **New -> Web Service** and connect your GitHub repository.
+3.  **Configure**:
+    -   **Runtime**: `Docker`.
+    -   **Environment Variables**:
+        -   `DATABASE_URL`: (Paste your Supabase/Neon URL here).
+        -   `API_KEY_ENABLED`: `true` (highly recommended for production).
+        -   `API_KEY`: A strong secret password.
+        -   `APP_ENV`: `production`.
+4.  **Deploy**: Render will automatically build the `Dockerfile` in the root and start the service.
+5.  **Identify**: Copy your Render URL (e.g., `https://codelens-api.onrender.com`).
+---
+## 3. 🎨 Setup the Frontend (Vercel)
+Vercel will host your React/Vite dashboard.
+1.  **Go to [Vercel](https://vercel.com)**.
+2.  **Import** your `dashboard` folder (or the whole repository and set the root directory to `dashboard`).
+3.  **Update `vercel.json`**:
+    -   Open [`dashboard/vercel.json`](file:///Users/arshverma/GitHub/open-ev-code-handler/dashboard/vercel.json).
+    -   Replace `https://YOUR_BACKEND_URL.render.com` with your **real** Render URL.
+4.  **Deploy**: Vercel will build the React application and provide a global dashboard link.
+---
+## 4. 🤖 Running Remote Evaluations
+Once deployed, you can run the benchmark script from your local machine (or any CI) against your **production** instance:
+```bash
+python scripts/evaluate.py --url https://your-render-url.com --api-key YOUR_SECRET_KEY
+```
+---
+> [!CAUTION]
+> **Database Migrations**: When you first deploy to a new PostgreSQL instance, the tables will be empty. The first request to the API will automatically trigger `create_db_and_tables()` via the lifespan hook—no manual SQL is required.
+> [!TIP]
+> **Vercel Rewrites**: The `vercel.json` rewrite rule is what allows the frontend to talk to the backend without CORS issues. Ensure the URL is exactly correct.

app.py CHANGED Viewed

@@ -42,9 +42,13 @@ logger = logging.getLogger("codelens_env")
 @asynccontextmanager
 async def lifespan(app: FastAPI):
     # Startup
-    create_db_and_tables()
     cleanup_task = asyncio.create_task(cleanup_expired_episodes())
-    logger.info(f"CodeLens API started — DB at {settings.db_path}")
     yield

 @asynccontextmanager
 async def lifespan(app: FastAPI):
     # Startup
+    if not os.getenv("TESTING"):
+        create_db_and_tables()
+        logger.info(f"CodeLens API started — DB at {settings.db_path}")
+    else:
+        logger.info("CodeLens API running in TESTING mode — DB initialization skipped")
     cleanup_task = asyncio.create_task(cleanup_expired_episodes())
     yield

codelens_env/config.py CHANGED Viewed

@@ -1,4 +1,5 @@
 from functools import lru_cache
 from pydantic_settings import BaseSettings, SettingsConfigDict
 class Settings(BaseSettings):
@@ -19,6 +20,7 @@ class Settings(BaseSettings):
     rate_limit_per_minute: int = 60        # requests per minute per IP
     # Persistence
     db_path: str = "./data/codelens.db"
     db_echo: bool = False    # Set True to log all SQL queries

 from functools import lru_cache
+from typing import Optional
 from pydantic_settings import BaseSettings, SettingsConfigDict
 class Settings(BaseSettings):
     rate_limit_per_minute: int = 60        # requests per minute per IP
     # Persistence
+    database_url: Optional[str] = None
     db_path: str = "./data/codelens.db"
     db_echo: bool = False    # Set True to log all SQL queries

codelens_env/database.py CHANGED Viewed

@@ -7,6 +7,20 @@ from codelens_env.models import EpisodeResult, TaskId
 def get_engine():
     settings = get_settings()
     Path(settings.db_path).parent.mkdir(parents=True, exist_ok=True)
     return create_engine(
         f"sqlite:///{settings.db_path}",

 def get_engine():
     settings = get_settings()
+    if settings.database_url:
+        # Support Render/Heroku 'postgres://' URLs by converting to 'postgresql://'
+        url = settings.database_url
+        if url.startswith("postgres://"):
+            url = url.replace("postgres://", "postgresql://", 1)
+        return create_engine(
+            url,
+            echo=settings.db_echo,
+            pool_pre_ping=True,  # Ensure connections are alive
+        )
+    # Fallback to local SQLite
     Path(settings.db_path).parent.mkdir(parents=True, exist_ok=True)
     return create_engine(
         f"sqlite:///{settings.db_path}",

dashboard/vercel.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "rewrites": [
+    {
+      "source": "/api/(.*)",
+      "destination": "https://YOUR_BACKEND_URL.render.com/$1"
+    },
+    {
+      "source": "/ws/(.*)",
+      "destination": "wss://YOUR_BACKEND_URL.render.com/ws/$1"
+    }
+  ],
+  "framework": "vite",
+  "buildCommand": "npm run build",
+  "outputDirectory": "dist"
+}

results.json CHANGED Viewed

@@ -1,6 +1,6 @@
 [
   {
-    "episode_id": "0d8d0a22-2a81-48a1-8dbe-c9662da7dc2b",
     "task_id": "bug_detection",
     "seed": 0,
     "final_score": 0.0,
@@ -9,10 +9,10 @@
     "issues_total": 1,
     "noise_penalties": 0,
     "terminated_reason": "terminal_action",
-    "duration_seconds": 0.03
   },
   {
-    "episode_id": "6ddc5a56-86c1-43f2-8839-b6ad9eac2ad9",
     "task_id": "bug_detection",
     "seed": 1,
     "final_score": 0.0,
@@ -24,7 +24,7 @@
     "duration_seconds": 0.02
   },
   {
-    "episode_id": "df00496e-0e19-4049-82d9-6a0f2a1e8f2a",
     "task_id": "bug_detection",
     "seed": 2,
     "final_score": 0.9167,
@@ -33,10 +33,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.06
   },
   {
-    "episode_id": "26837a97-d3ad-4fd7-8e74-d0d713b9b137",
     "task_id": "bug_detection",
     "seed": 3,
     "final_score": 0.9167,
@@ -45,10 +45,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.04
   },
   {
-    "episode_id": "41b3e01d-5498-42a1-a3f9-44bb87f542b6",
     "task_id": "bug_detection",
     "seed": 4,
     "final_score": 0.8267,
@@ -60,7 +60,7 @@
     "duration_seconds": 0.03
   },
   {
-    "episode_id": "8331b8bf-f397-49be-bc07-62bb0bdddd5c",
     "task_id": "bug_detection",
     "seed": 5,
     "final_score": 0.0,
@@ -69,10 +69,10 @@
     "issues_total": 1,
     "noise_penalties": 0,
     "terminated_reason": "terminal_action",
-    "duration_seconds": 0.02
   },
   {
-    "episode_id": "22341ab4-ea27-416b-ba40-53f93b6090c9",
     "task_id": "bug_detection",
     "seed": 6,
     "final_score": 0.0,
@@ -84,7 +84,7 @@
     "duration_seconds": 0.02
   },
   {
-    "episode_id": "9fc11809-6ba4-4bf6-b00e-9b8ffc61218f",
     "task_id": "bug_detection",
     "seed": 7,
     "final_score": 0.0,
@@ -96,7 +96,7 @@
     "duration_seconds": 0.02
   },
   {
-    "episode_id": "dce54792-7fba-4e8c-8feb-404aa655172f",
     "task_id": "bug_detection",
     "seed": 8,
     "final_score": 0.9167,
@@ -105,10 +105,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.04
   },
   {
-    "episode_id": "be559f48-b216-49b9-8f3d-ad9b50da3751",
     "task_id": "bug_detection",
     "seed": 9,
     "final_score": 0.0,
@@ -120,7 +120,7 @@
     "duration_seconds": 0.03
   },
   {
-    "episode_id": "b2029cb1-c72e-438a-bd96-98a0aa9bc141",
     "task_id": "security_audit",
     "seed": 0,
     "final_score": 0.0,
@@ -132,7 +132,7 @@
     "duration_seconds": 0.03
   },
   {
-    "episode_id": "e383f50e-5797-40d9-9f54-aa4142247a6e",
     "task_id": "security_audit",
     "seed": 1,
     "final_score": 1.0,
@@ -141,10 +141,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.04
   },
   {
-    "episode_id": "6560439d-9050-4c94-96b1-c31111081c2e",
     "task_id": "security_audit",
     "seed": 2,
     "final_score": 0.0,
@@ -156,7 +156,7 @@
     "duration_seconds": 0.03
   },
   {
-    "episode_id": "d9a9d12d-384c-4dd2-86cb-b55ea7604e60",
     "task_id": "security_audit",
     "seed": 3,
     "final_score": 0.85,
@@ -165,10 +165,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.04
   },
   {
-    "episode_id": "463b8d96-ad63-472f-ad93-f20c99afe96e",
     "task_id": "security_audit",
     "seed": 4,
     "final_score": 0.0,
@@ -180,7 +180,7 @@
     "duration_seconds": 0.03
   },
   {
-    "episode_id": "15e9c92a-559a-4ba6-b75b-b12ac6055f91",
     "task_id": "security_audit",
     "seed": 5,
     "final_score": 0.0,
@@ -189,10 +189,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.06
   },
   {
-    "episode_id": "38bd8513-7511-4a2f-8642-fc23f50b61bd",
     "task_id": "security_audit",
     "seed": 6,
     "final_score": 0.0,
@@ -204,7 +204,7 @@
     "duration_seconds": 0.03
   },
   {
-    "episode_id": "8a6e48fd-1b17-475f-b495-a1cd8ce2e70c",
     "task_id": "security_audit",
     "seed": 7,
     "final_score": 0.0,
@@ -216,7 +216,7 @@
     "duration_seconds": 0.03
   },
   {
-    "episode_id": "3ec40833-ce55-4a63-ad90-55b2e35370d1",
     "task_id": "security_audit",
     "seed": 8,
     "final_score": 0.0,
@@ -225,10 +225,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.03
   },
   {
-    "episode_id": "87fa57a3-27f3-4f9b-a0ad-008ff9b4c7f2",
     "task_id": "security_audit",
     "seed": 9,
     "final_score": 0.0,
@@ -240,7 +240,7 @@
     "duration_seconds": 0.03
   },
   {
-    "episode_id": "8a878e69-51c2-4992-8276-616cbc798efd",
     "task_id": "architectural_review",
     "seed": 0,
     "final_score": 0.0,
@@ -249,10 +249,10 @@
     "issues_total": 1,
     "noise_penalties": 0,
     "terminated_reason": "terminal_action",
-    "duration_seconds": 0.02
   },
   {
-    "episode_id": "c85766aa-d231-4998-8d95-b4dbbfd5aea6",
     "task_id": "architectural_review",
     "seed": 1,
     "final_score": 0.059,
@@ -261,10 +261,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.04
   },
   {
-    "episode_id": "29122fdb-ed30-4a46-8b04-4aa8c2025e6c",
     "task_id": "architectural_review",
     "seed": 2,
     "final_score": 0.661,
@@ -273,10 +273,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.04
   },
   {
-    "episode_id": "1388c816-8e96-4014-b255-81382e9353ba",
     "task_id": "architectural_review",
     "seed": 3,
     "final_score": 0.658,
@@ -288,7 +288,7 @@
     "duration_seconds": 0.03
   },
   {
-    "episode_id": "adee24d2-cb77-48b6-b00b-72c39d7d7c7e",
     "task_id": "architectural_review",
     "seed": 4,
     "final_score": 0.058,
@@ -300,7 +300,7 @@
     "duration_seconds": 0.03
   },
   {
-    "episode_id": "12508a15-eb46-4628-b750-cde1b5f91828",
     "task_id": "architectural_review",
     "seed": 5,
     "final_score": 0.657,
@@ -309,10 +309,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.04
   },
   {
-    "episode_id": "520fb8ad-58fe-48fa-8a96-89e552e12924",
     "task_id": "architectural_review",
     "seed": 6,
     "final_score": 0.059,
@@ -324,7 +324,7 @@
     "duration_seconds": 0.03
   },
   {
-    "episode_id": "86982fa4-cee7-48fc-b8d1-347a2f650f6d",
     "task_id": "architectural_review",
     "seed": 7,
     "final_score": 0.664,
@@ -333,10 +333,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.04
   },
   {
-    "episode_id": "086e541a-776e-4ac6-b44e-1eeb76d88b4e",
     "task_id": "architectural_review",
     "seed": 8,
     "final_score": 0.039,
@@ -345,10 +345,10 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.03
   },
   {
-    "episode_id": "3bcd91ce-0d0d-48c5-ab25-bc8534906f5b",
     "task_id": "architectural_review",
     "seed": 9,
     "final_score": 0.075,
@@ -357,6 +357,6 @@
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
-    "duration_seconds": 0.03
   }
 ]

 [
   {
+    "episode_id": "27c65a00-9315-4d7d-b457-f8d64b0da466",
     "task_id": "bug_detection",
     "seed": 0,
     "final_score": 0.0,
     "issues_total": 1,
     "noise_penalties": 0,
     "terminated_reason": "terminal_action",
+    "duration_seconds": 0.02
   },
   {
+    "episode_id": "8bdcee56-8a56-4ec8-88fc-a70e78ab48f2",
     "task_id": "bug_detection",
     "seed": 1,
     "final_score": 0.0,
     "duration_seconds": 0.02
   },
   {
+    "episode_id": "d3dcda88-ce7e-4965-9409-4e97c98cf444",
     "task_id": "bug_detection",
     "seed": 2,
     "final_score": 0.9167,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.04
   },
   {
+    "episode_id": "822dfb7d-d67b-4dc9-9cfe-866665a9e5b9",
     "task_id": "bug_detection",
     "seed": 3,
     "final_score": 0.9167,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.03
   },
   {
+    "episode_id": "c35c2096-b7b8-4a54-b0c7-d8fa1e1538bf",
     "task_id": "bug_detection",
     "seed": 4,
     "final_score": 0.8267,
     "duration_seconds": 0.03
   },
   {
+    "episode_id": "9b4e5191-57da-4b10-b31e-5b8236b9b1f4",
     "task_id": "bug_detection",
     "seed": 5,
     "final_score": 0.0,
     "issues_total": 1,
     "noise_penalties": 0,
     "terminated_reason": "terminal_action",
+    "duration_seconds": 0.01
   },
   {
+    "episode_id": "c1e64a16-512a-4716-b0f9-5d9fe14a142d",
     "task_id": "bug_detection",
     "seed": 6,
     "final_score": 0.0,
     "duration_seconds": 0.02
   },
   {
+    "episode_id": "cc8bd066-506e-46de-ba0f-0a446f045945",
     "task_id": "bug_detection",
     "seed": 7,
     "final_score": 0.0,
     "duration_seconds": 0.02
   },
   {
+    "episode_id": "20dd7158-a49f-4b6c-adcd-6b76b4274e94",
     "task_id": "bug_detection",
     "seed": 8,
     "final_score": 0.9167,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.03
   },
   {
+    "episode_id": "14b9d1a5-03b7-45b2-8b23-b6c387bced5b",
     "task_id": "bug_detection",
     "seed": 9,
     "final_score": 0.0,
     "duration_seconds": 0.03
   },
   {
+    "episode_id": "2b1c4b7f-cf05-4312-89d2-49883b9ed6c1",
     "task_id": "security_audit",
     "seed": 0,
     "final_score": 0.0,
     "duration_seconds": 0.03
   },
   {
+    "episode_id": "47145114-7a99-4c81-baa9-1f9c4221f48a",
     "task_id": "security_audit",
     "seed": 1,
     "final_score": 1.0,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.03
   },
   {
+    "episode_id": "5d9d3f74-a127-46b2-a331-45f30f0d2f6f",
     "task_id": "security_audit",
     "seed": 2,
     "final_score": 0.0,
     "duration_seconds": 0.03
   },
   {
+    "episode_id": "f383afc4-df2d-4a60-abd1-a583cb2de538",
     "task_id": "security_audit",
     "seed": 3,
     "final_score": 0.85,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.03
   },
   {
+    "episode_id": "df23e18b-c21a-4390-9afa-f77eb1b8050b",
     "task_id": "security_audit",
     "seed": 4,
     "final_score": 0.0,
     "duration_seconds": 0.03
   },
   {
+    "episode_id": "bcec83af-4aaf-4997-922a-1556ca63fcf3",
     "task_id": "security_audit",
     "seed": 5,
     "final_score": 0.0,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.03
   },
   {
+    "episode_id": "65afedd0-9ad1-45f7-9615-dafeaa390338",
     "task_id": "security_audit",
     "seed": 6,
     "final_score": 0.0,
     "duration_seconds": 0.03
   },
   {
+    "episode_id": "0e8b65e5-d59c-41fb-8038-1f3b1d4b657b",
     "task_id": "security_audit",
     "seed": 7,
     "final_score": 0.0,
     "duration_seconds": 0.03
   },
   {
+    "episode_id": "ae477e21-2718-477e-8a11-e46bda49bea7",
     "task_id": "security_audit",
     "seed": 8,
     "final_score": 0.0,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.04
   },
   {
+    "episode_id": "0cd3d18c-cdf6-4752-9dc8-cade848cee0e",
     "task_id": "security_audit",
     "seed": 9,
     "final_score": 0.0,
     "duration_seconds": 0.03
   },
   {
+    "episode_id": "d9f79316-57ea-42ad-b191-797ea895951b",
     "task_id": "architectural_review",
     "seed": 0,
     "final_score": 0.0,
     "issues_total": 1,
     "noise_penalties": 0,
     "terminated_reason": "terminal_action",
+    "duration_seconds": 0.01
   },
   {
+    "episode_id": "65112da3-d319-4eb3-9774-1c43313fe1ec",
     "task_id": "architectural_review",
     "seed": 1,
     "final_score": 0.059,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.03
   },
   {
+    "episode_id": "c1d6bd63-1abf-403b-97be-cf1962959910",
     "task_id": "architectural_review",
     "seed": 2,
     "final_score": 0.661,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.03
   },
   {
+    "episode_id": "fab56551-f2ae-42be-88eb-854ef236b29a",
     "task_id": "architectural_review",
     "seed": 3,
     "final_score": 0.658,
     "duration_seconds": 0.03
   },
   {
+    "episode_id": "8f4b8fa9-f281-4bb2-ae71-7ad6c7ca7fc9",
     "task_id": "architectural_review",
     "seed": 4,
     "final_score": 0.058,
     "duration_seconds": 0.03
   },
   {
+    "episode_id": "7eaa445d-774d-44b1-80d9-06e978b022df",
     "task_id": "architectural_review",
     "seed": 5,
     "final_score": 0.657,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.03
   },
   {
+    "episode_id": "103d6e2b-e40d-49ff-8f36-6910d463b48b",
     "task_id": "architectural_review",
     "seed": 6,
     "final_score": 0.059,
     "duration_seconds": 0.03
   },
   {
+    "episode_id": "5206e9d8-0ee0-482c-ad3d-34adf2ce57a7",
     "task_id": "architectural_review",
     "seed": 7,
     "final_score": 0.664,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.03
   },
   {
+    "episode_id": "c147abd7-f887-44a2-b166-5382e292d793",
     "task_id": "architectural_review",
     "seed": 8,
     "final_score": 0.039,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.08
   },
   {
+    "episode_id": "6fe54153-a878-47d7-a521-5cd70f55f6b8",
     "task_id": "architectural_review",
     "seed": 9,
     "final_score": 0.075,
     "issues_total": 1,
     "noise_penalties": 5,
     "terminated_reason": "noise_exhausted",
+    "duration_seconds": 0.09
   }
 ]

scripts/reset_db.py ADDED Viewed

	@@ -0,0 +1,47 @@

+#!/usr/bin/env python3
+"""
+Reset the CodeLens database: deletes the SQLite file and re-initializes tables.
+Useful for clearing test data and starting fresh evaluation benchmarks.
+"""
+import os
+import sys
+from pathlib import Path
+# Add project root to path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+from codelens_env.config import get_settings
+from codelens_env.database import create_db_and_tables
+def reset_db():
+    settings = get_settings()
+    db_path = Path(settings.db_path)
+    # 1. Delete existing database file
+    if db_path.exists():
+        print(f"Removing existing database at: {db_path}")
+        try:
+            os.remove(db_path)
+            print("Successfully deleted old records.")
+        except Exception as e:
+            print(f"Error deleting file: {e}")
+            sys.exit(1)
+    else:
+        print(f"No existing database found at {db_path}")
+    # 2. Re-initialize
+    print(f"Re-initializing schema...")
+    try:
+        create_db_and_tables()
+        print("Database reset successfully. You now have a clean dashboard.")
+    except Exception as e:
+        print(f"Error re-initializing: {e}")
+        sys.exit(1)
+if __name__ == "__main__":
+    confirm = input("This will permanently delete all leaderboard and episode data. Proceed? [y/N]: ")
+    if confirm.lower() == 'y':
+        reset_db()
+    else:
+        print("Reset aborted.")

tests/conftest.py CHANGED Viewed

@@ -1,4 +1,6 @@
 import pytest
 from fastapi.testclient import TestClient
 from sqlmodel import SQLModel, Session, create_engine
 from sqlmodel.pool import StaticPool

 import pytest
+import os
+os.environ["TESTING"] = "true"
 from fastapi.testclient import TestClient
 from sqlmodel import SQLModel, Session, create_engine
 from sqlmodel.pool import StaticPool

tests/test_api.py CHANGED Viewed

@@ -3,15 +3,13 @@ from fastapi.testclient import TestClient
 from app import app
 from codelens_env.models import TaskId, ActionType, Category, Severity, Verdict
-def test_api_health():
-    client = TestClient(app)
     response = client.get("/health")
     assert response.status_code == 200
     assert response.json()["status"] == "ok"
     assert response.json()["env_ready"] is True
-def test_api_workflow():
-    client = TestClient(app)
     # 1. Reset
     reset_resp = client.post("/reset", json={"task_id": "bug_detection", "seed": 1})
@@ -34,8 +32,7 @@ def test_api_workflow():
     assert result_resp.status_code == 200
     assert result_resp.json()["final_score"] >= 0
-def test_api_leaderboard():
-    client = TestClient(app)
     # Submit a score
     sub = {
         "agent_name": "test_agent",
@@ -55,8 +52,7 @@ def test_api_leaderboard():
     assert len(bug_entries) > 0
     assert bug_entries[0]["agent_name"] == "test_agent"
-def test_api_invalid_episode():
-    client = TestClient(app)
     response = client.post("/step/nonexistent-id", json={
         "action_type": "comment",
         "body": "hello"

 from app import app
 from codelens_env.models import TaskId, ActionType, Category, Severity, Verdict
+def test_api_health(client):
     response = client.get("/health")
     assert response.status_code == 200
     assert response.json()["status"] == "ok"
     assert response.json()["env_ready"] is True
+def test_api_workflow(client):
     # 1. Reset
     reset_resp = client.post("/reset", json={"task_id": "bug_detection", "seed": 1})
     assert result_resp.status_code == 200
     assert result_resp.json()["final_score"] >= 0
+def test_api_leaderboard(client):
     # Submit a score
     sub = {
         "agent_name": "test_agent",
     assert len(bug_entries) > 0
     assert bug_entries[0]["agent_name"] == "test_agent"
+def test_api_invalid_episode(client):
     response = client.post("/step/nonexistent-id", json={
         "action_type": "comment",
         "body": "hello"