rag_debug_env / docs /CLAUDE.md
vankap-grover's picture
Upload folder using huggingface_hub
ac224ce verified
# RAGDebugEnv Reference Context
This file is a fast orientation guide for contributors and coding agents working in this repository.
## What This Repository Implements
A simulated OpenEnv environment for debugging RAG retrieval pipelines by RL-style interaction.
- Server app: `server/app.py`
- Environment logic: `server/rag_debug_env_environment.py`
- Action/observation models: `models.py`
- Client: `client.py`
- Inference script: `inference.py` (competition-ready, uses OpenAI client)
- Corpus build pipeline: `corpora/build_corpus.py` and `corpora/stages/*`
## Project Layout (Current)
```text
rag_debug_env/
corpora/
build_corpus.py
software/
climate/
medical/
stages/
s1_load.py
s2_chunk.py
s3_queries.py
s4_multihop.py
s5_embed.py
s6_grade.py
verify.py
playground.py
docs/
ARCHITECTURE.md
BUILD_STATUS.md
CLAUDE.md
CORPUS_BUILD_PLAN.md
MODELS_REFERENCE.md
outputs/
eval_agent.py
train_grpo.py
server/
app.py
constants.py
corpus.py
fault_math.py
rag_debug_env_environment.py
client.py
inference.py
models.py
pyproject.toml
openenv.yaml
Dockerfile
```
## Runtime Facts To Keep In Mind
- All tasks currently use `max_steps=10`
- `PipelineConfig.similarity_threshold` default is `0.3` (not `0.7`)
- Task 3 starts with `embedding_model=legal` intentionally
- Task success is based on task score thresholds in `_check_success`, not raw coverage alone
- Synthetic corpus fallback exists in `server/corpus.py` for missing artifacts
- HF Spaces port is `7860` (set in README frontmatter, Dockerfile, and openenv.yaml)
## Corpus Build Facts
`corpora/build_corpus.py` runs all six stages and calls `verify_corpus`.
Outputs per domain:
- `docs.json`
- `chunks.json`
- `queries.json`
- `ground_truth.json`
- `S_true_general.npy`
- `S_true_medical.npy`
- `S_true_legal.npy`
- `S_true_code.npy`
- `corpus_stats.json`
## Scripts
- `outputs/eval_agent.py` β€” GPT-4o-mini zero-shot eval agent (actively usable).
- `outputs/train_grpo.py` β€” GRPO training scaffold (stub with TODOs).
- `inference.py` β€” Competition inference script with [START]/[STEP]/[END] logging.
## Commands
```bash
# Build corpus for one domain
python -m corpora.build_corpus --domain software
# Build all domains
python -m corpora.build_corpus --domain all
# Run server
uvicorn server.app:app --host 0.0.0.0 --port 7860
# Run baseline evaluator
python outputs/eval_agent.py --task 1 --episodes 3
# Run inference script
python inference.py
# Validate OpenEnv integration
openenv validate
```