Spaces:

vankap-grover
/

rag_debug_env

Sleeping

App Files Files Community

rag_debug_env / docs /CLAUDE.md

vankap-grover

Upload folder using huggingface_hub

ac224ce verified about 2 months ago

preview code

raw

history blame contribute delete

2.62 kB

	# RAGDebugEnv Reference Context

	This file is a fast orientation guide for contributors and coding agents working in this repository.

	## What This Repository Implements

	A simulated OpenEnv environment for debugging RAG retrieval pipelines by RL-style interaction.

	- Server app: `server/app.py`
	- Environment logic: `server/rag_debug_env_environment.py`
	- Action/observation models: `models.py`
	- Client: `client.py`
	- Inference script: `inference.py` (competition-ready, uses OpenAI client)
	- Corpus build pipeline: `corpora/build_corpus.py` and `corpora/stages/*`

	## Project Layout (Current)

	```text
	rag_debug_env/
	corpora/
	build_corpus.py
	software/
	climate/
	medical/
	stages/
	s1_load.py
	s2_chunk.py
	s3_queries.py
	s4_multihop.py
	s5_embed.py
	s6_grade.py
	verify.py
	playground.py
	docs/
	ARCHITECTURE.md
	BUILD_STATUS.md
	CLAUDE.md
	CORPUS_BUILD_PLAN.md
	MODELS_REFERENCE.md
	outputs/
	eval_agent.py
	train_grpo.py
	server/
	app.py
	constants.py
	corpus.py
	fault_math.py
	rag_debug_env_environment.py
	client.py
	inference.py
	models.py
	pyproject.toml
	openenv.yaml
	Dockerfile
	```

	## Runtime Facts To Keep In Mind

	- All tasks currently use `max_steps=10`
	- `PipelineConfig.similarity_threshold` default is `0.3` (not `0.7`)
	- Task 3 starts with `embedding_model=legal` intentionally
	- Task success is based on task score thresholds in `_check_success`, not raw coverage alone
	- Synthetic corpus fallback exists in `server/corpus.py` for missing artifacts
	- HF Spaces port is `7860` (set in README frontmatter, Dockerfile, and openenv.yaml)

	## Corpus Build Facts

	`corpora/build_corpus.py` runs all six stages and calls `verify_corpus`.

	Outputs per domain:

	- `docs.json`
	- `chunks.json`
	- `queries.json`
	- `ground_truth.json`
	- `S_true_general.npy`
	- `S_true_medical.npy`
	- `S_true_legal.npy`
	- `S_true_code.npy`
	- `corpus_stats.json`

	## Scripts

	- `outputs/eval_agent.py` — GPT-4o-mini zero-shot eval agent (actively usable).
	- `outputs/train_grpo.py` — GRPO training scaffold (stub with TODOs).
	- `inference.py` — Competition inference script with [START]/[STEP]/[END] logging.

	## Commands

	```bash
	# Build corpus for one domain
	python -m corpora.build_corpus --domain software

	# Build all domains
	python -m corpora.build_corpus --domain all

	# Run server
	uvicorn server.app:app --host 0.0.0.0 --port 7860

	# Run baseline evaluator
	python outputs/eval_agent.py --task 1 --episodes 3

	# Run inference script
	python inference.py

	# Validate OpenEnv integration
	openenv validate
	```