Spaces:

FractalAI
/

Research

Sleeping

Research / README.md

Ready to Rumble

c57d186 verified 3 months ago

1.41 kB


	# Fathom R1 Chat — Full‑stack (React + FastAPI)

	ChatGPT‑style UI on React + a FastAPI backend that calls FractalAIResearch/Fathom-R1-14B via `transformers`.

	## Run with Docker (GPU)

	> Requires an NVIDIA GPU + NVIDIA Container Toolkit.

	```bash
	docker build -t fathom-r1-chat .
	docker run --gpus all -p 8000:8000 -e MODEL_ID=FractalAIResearch/Fathom-R1-14B -e QUANTIZE=auto fathom-r1-chat
	# Open http://localhost:8000
	```

	### Notes
	- Model is derived from DeepSeek-R1-Distill-Qwen-14B and targets 16K context usage. Use the tokenizer chat template.
	- For long answers, bump `max_new_tokens` in the request.
	- If you need private HF access, pass `-e HUGGING_FACE_HUB_TOKEN=...`.

	## Dev mode (run separately)
	```bash
	# backend
	cd backend
	python3 -m venv .venv && source .venv/bin/activate
	pip install --index-url https://download.pytorch.org/whl/cu121 torch==2.4.1+cu121
	pip install -r requirements.txt
	uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

	# frontend (new terminal)
	cd frontend
	npm ci
	npm run dev
	```

	## API
	- `POST /api/chat` with `{ messages: [{role, content}, ...], max_new_tokens, temperature, top_p }` → `{ reply, model }`

	## Hardware
	- 14B parameter model; for comfortable generation use >=24–40 GB VRAM or 4/8‑bit quantization on 16–24 GB GPUs.

	## License
	- MIT (model card states MIT) and this template is MIT.