# RAGDebugEnv Reference Context

This file is a fast orientation guide for contributors and coding agents working in this repository.

## What This Repository Implements

A simulated OpenEnv environment for debugging RAG retrieval pipelines by RL-style interaction.

- Server app: `server/app.py`
- Environment logic: `server/rag_debug_env_environment.py`
- Action/observation models: `models.py`
- Client: `client.py`
- Inference script: `inference.py` (competition-ready, uses OpenAI client)
- Corpus build pipeline: `corpora/build_corpus.py` and `corpora/stages/*`

## Project Layout (Current)

```text
rag_debug_env/
  corpora/
    build_corpus.py
    software/
    climate/
    medical/
    stages/
      s1_load.py
      s2_chunk.py
      s3_queries.py
      s4_multihop.py
      s5_embed.py
      s6_grade.py
      verify.py
      playground.py
  docs/
    ARCHITECTURE.md
    BUILD_STATUS.md
    CLAUDE.md
    CORPUS_BUILD_PLAN.md
    MODELS_REFERENCE.md
  outputs/
    eval_agent.py
    train_grpo.py
  server/
    app.py
    constants.py
    corpus.py
    fault_math.py
    rag_debug_env_environment.py
  client.py
  inference.py
  models.py
  pyproject.toml
  openenv.yaml
  Dockerfile
```

## Runtime Facts To Keep In Mind

- All tasks currently use `max_steps=10`
- `PipelineConfig.similarity_threshold` default is `0.3` (not `0.7`)
- Task 3 starts with `embedding_model=legal` intentionally
- Task success is based on task score thresholds in `_check_success`, not raw coverage alone
- Synthetic corpus fallback exists in `server/corpus.py` for missing artifacts
- HF Spaces port is `7860` (set in README frontmatter, Dockerfile, and openenv.yaml)

## Corpus Build Facts

`corpora/build_corpus.py` runs all six stages and calls `verify_corpus`.

Outputs per domain:

- `docs.json`
- `chunks.json`
- `queries.json`
- `ground_truth.json`
- `S_true_general.npy`
- `S_true_medical.npy`
- `S_true_legal.npy`
- `S_true_code.npy`
- `corpus_stats.json`

## Scripts

- `outputs/eval_agent.py` — GPT-4o-mini zero-shot eval agent (actively usable).
- `outputs/train_grpo.py` — GRPO training scaffold (stub with TODOs).
- `inference.py` — Competition inference script with [START]/[STEP]/[END] logging.

## Commands

```bash
# Build corpus for one domain
python -m corpora.build_corpus --domain software

# Build all domains
python -m corpora.build_corpus --domain all

# Run server
uvicorn server.app:app --host 0.0.0.0 --port 7860

# Run baseline evaluator
python outputs/eval_agent.py --task 1 --episodes 3

# Run inference script
python inference.py

# Validate OpenEnv integration
openenv validate
```