rag_debug_env / docs /CLAUDE.md
vankap-grover's picture
Upload folder using huggingface_hub
ac224ce verified

RAGDebugEnv Reference Context

This file is a fast orientation guide for contributors and coding agents working in this repository.

What This Repository Implements

A simulated OpenEnv environment for debugging RAG retrieval pipelines by RL-style interaction.

  • Server app: server/app.py
  • Environment logic: server/rag_debug_env_environment.py
  • Action/observation models: models.py
  • Client: client.py
  • Inference script: inference.py (competition-ready, uses OpenAI client)
  • Corpus build pipeline: corpora/build_corpus.py and corpora/stages/*

Project Layout (Current)

rag_debug_env/
  corpora/
    build_corpus.py
    software/
    climate/
    medical/
    stages/
      s1_load.py
      s2_chunk.py
      s3_queries.py
      s4_multihop.py
      s5_embed.py
      s6_grade.py
      verify.py
      playground.py
  docs/
    ARCHITECTURE.md
    BUILD_STATUS.md
    CLAUDE.md
    CORPUS_BUILD_PLAN.md
    MODELS_REFERENCE.md
  outputs/
    eval_agent.py
    train_grpo.py
  server/
    app.py
    constants.py
    corpus.py
    fault_math.py
    rag_debug_env_environment.py
  client.py
  inference.py
  models.py
  pyproject.toml
  openenv.yaml
  Dockerfile

Runtime Facts To Keep In Mind

  • All tasks currently use max_steps=10
  • PipelineConfig.similarity_threshold default is 0.3 (not 0.7)
  • Task 3 starts with embedding_model=legal intentionally
  • Task success is based on task score thresholds in _check_success, not raw coverage alone
  • Synthetic corpus fallback exists in server/corpus.py for missing artifacts
  • HF Spaces port is 7860 (set in README frontmatter, Dockerfile, and openenv.yaml)

Corpus Build Facts

corpora/build_corpus.py runs all six stages and calls verify_corpus.

Outputs per domain:

  • docs.json
  • chunks.json
  • queries.json
  • ground_truth.json
  • S_true_general.npy
  • S_true_medical.npy
  • S_true_legal.npy
  • S_true_code.npy
  • corpus_stats.json

Scripts

  • outputs/eval_agent.py — GPT-4o-mini zero-shot eval agent (actively usable).
  • outputs/train_grpo.py — GRPO training scaffold (stub with TODOs).
  • inference.py — Competition inference script with [START]/[STEP]/[END] logging.

Commands

# Build corpus for one domain
python -m corpora.build_corpus --domain software

# Build all domains
python -m corpora.build_corpus --domain all

# Run server
uvicorn server.app:app --host 0.0.0.0 --port 7860

# Run baseline evaluator
python outputs/eval_agent.py --task 1 --episodes 3

# Run inference script
python inference.py

# Validate OpenEnv integration
openenv validate