Spaces:
Sleeping
Sleeping
| # RAGDebugEnv Reference Context | |
| This file is a fast orientation guide for contributors and coding agents working in this repository. | |
| ## What This Repository Implements | |
| A simulated OpenEnv environment for debugging RAG retrieval pipelines by RL-style interaction. | |
| - Server app: `server/app.py` | |
| - Environment logic: `server/rag_debug_env_environment.py` | |
| - Action/observation models: `models.py` | |
| - Client: `client.py` | |
| - Inference script: `inference.py` (competition-ready, uses OpenAI client) | |
| - Corpus build pipeline: `corpora/build_corpus.py` and `corpora/stages/*` | |
| ## Project Layout (Current) | |
| ```text | |
| rag_debug_env/ | |
| corpora/ | |
| build_corpus.py | |
| software/ | |
| climate/ | |
| medical/ | |
| stages/ | |
| s1_load.py | |
| s2_chunk.py | |
| s3_queries.py | |
| s4_multihop.py | |
| s5_embed.py | |
| s6_grade.py | |
| verify.py | |
| playground.py | |
| docs/ | |
| ARCHITECTURE.md | |
| BUILD_STATUS.md | |
| CLAUDE.md | |
| CORPUS_BUILD_PLAN.md | |
| MODELS_REFERENCE.md | |
| outputs/ | |
| eval_agent.py | |
| train_grpo.py | |
| server/ | |
| app.py | |
| constants.py | |
| corpus.py | |
| fault_math.py | |
| rag_debug_env_environment.py | |
| client.py | |
| inference.py | |
| models.py | |
| pyproject.toml | |
| openenv.yaml | |
| Dockerfile | |
| ``` | |
| ## Runtime Facts To Keep In Mind | |
| - All tasks currently use `max_steps=10` | |
| - `PipelineConfig.similarity_threshold` default is `0.3` (not `0.7`) | |
| - Task 3 starts with `embedding_model=legal` intentionally | |
| - Task success is based on task score thresholds in `_check_success`, not raw coverage alone | |
| - Synthetic corpus fallback exists in `server/corpus.py` for missing artifacts | |
| - HF Spaces port is `7860` (set in README frontmatter, Dockerfile, and openenv.yaml) | |
| ## Corpus Build Facts | |
| `corpora/build_corpus.py` runs all six stages and calls `verify_corpus`. | |
| Outputs per domain: | |
| - `docs.json` | |
| - `chunks.json` | |
| - `queries.json` | |
| - `ground_truth.json` | |
| - `S_true_general.npy` | |
| - `S_true_medical.npy` | |
| - `S_true_legal.npy` | |
| - `S_true_code.npy` | |
| - `corpus_stats.json` | |
| ## Scripts | |
| - `outputs/eval_agent.py` β GPT-4o-mini zero-shot eval agent (actively usable). | |
| - `outputs/train_grpo.py` β GRPO training scaffold (stub with TODOs). | |
| - `inference.py` β Competition inference script with [START]/[STEP]/[END] logging. | |
| ## Commands | |
| ```bash | |
| # Build corpus for one domain | |
| python -m corpora.build_corpus --domain software | |
| # Build all domains | |
| python -m corpora.build_corpus --domain all | |
| # Run server | |
| uvicorn server.app:app --host 0.0.0.0 --port 7860 | |
| # Run baseline evaluator | |
| python outputs/eval_agent.py --task 1 --episodes 3 | |
| # Run inference script | |
| python inference.py | |
| # Validate OpenEnv integration | |
| openenv validate | |
| ``` | |