Spaces:

vankap-grover
/

rag_debug_env

Sleeping

App Files Files Community

rag_debug_env / docs /CLAUDE.md

vankap-grover

Upload folder using huggingface_hub

ac224ce verified about 2 months ago

preview code

raw

history blame contribute delete

2.62 kB

RAGDebugEnv Reference Context

This file is a fast orientation guide for contributors and coding agents working in this repository.

What This Repository Implements

A simulated OpenEnv environment for debugging RAG retrieval pipelines by RL-style interaction.

Server app: server/app.py
Environment logic: server/rag_debug_env_environment.py
Action/observation models: models.py
Client: client.py
Inference script: inference.py (competition-ready, uses OpenAI client)
Corpus build pipeline: corpora/build_corpus.py and corpora/stages/*

Project Layout (Current)

rag_debug_env/
  corpora/
    build_corpus.py
    software/
    climate/
    medical/
    stages/
      s1_load.py
      s2_chunk.py
      s3_queries.py
      s4_multihop.py
      s5_embed.py
      s6_grade.py
      verify.py
      playground.py
  docs/
    ARCHITECTURE.md
    BUILD_STATUS.md
    CLAUDE.md
    CORPUS_BUILD_PLAN.md
    MODELS_REFERENCE.md
  outputs/
    eval_agent.py
    train_grpo.py
  server/
    app.py
    constants.py
    corpus.py
    fault_math.py
    rag_debug_env_environment.py
  client.py
  inference.py
  models.py
  pyproject.toml
  openenv.yaml
  Dockerfile

Runtime Facts To Keep In Mind

All tasks currently use max_steps=10
PipelineConfig.similarity_threshold default is 0.3 (not 0.7)
Task 3 starts with embedding_model=legal intentionally
Task success is based on task score thresholds in _check_success, not raw coverage alone
Synthetic corpus fallback exists in server/corpus.py for missing artifacts
HF Spaces port is 7860 (set in README frontmatter, Dockerfile, and openenv.yaml)

Corpus Build Facts

corpora/build_corpus.py runs all six stages and calls verify_corpus.

Outputs per domain:

docs.json
chunks.json
queries.json
ground_truth.json
S_true_general.npy
S_true_medical.npy
S_true_legal.npy
S_true_code.npy
corpus_stats.json

Scripts

outputs/eval_agent.py — GPT-4o-mini zero-shot eval agent (actively usable).
outputs/train_grpo.py — GRPO training scaffold (stub with TODOs).
inference.py — Competition inference script with [START]/[STEP]/[END] logging.

Commands

# Build corpus for one domain
python -m corpora.build_corpus --domain software

# Build all domains
python -m corpora.build_corpus --domain all

# Run server
uvicorn server.app:app --host 0.0.0.0 --port 7860

# Run baseline evaluator
python outputs/eval_agent.py --task 1 --episodes 3

# Run inference script
python inference.py

# Validate OpenEnv integration
openenv validate