Spaces:

google
/

radextract

Running

App Files Files Community

radextract / README.md

goelak

Initial commit for RadExtract

fab8051 6 months ago

preview code

raw

history blame contribute delete

4.56 kB

	---
	title: RadExtract
	emoji: 🗂️
	colorFrom: blue
	colorTo: green
	sdk: docker
	pinned: false
	license: apache-2.0
	header: mini
	app_port: 7870
	tags:
	- medical
	- nlp
	- radiology
	- langextract
	- gemini
	- structured-data
	---

	# RadExtract: Radiology Report Structuring Demo

	[![🤗 Hugging Face Spaces](https://img.shields.io/badge/🤗%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/google/radextract)
	[![LangExtract](https://img.shields.io/badge/Powered%20by-LangExtract-green)](https://github.com/google/langextract)
	[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

	A demonstration application powered by [LangExtract](https://github.com/google/langextract) that structures radiology reports using Gemini models. Transform unstructured radiology text into organized, interactive segments with clinical significance annotations.

	## Try the Demo

	[Launch RadExtract Demo](https://huggingface.co/spaces/google/radextract)

	Transform unstructured radiology reports into structured data with highlighted findings that are precisely mapped back to the original source text.

	## Key Features

	- Structured Output: Organizes reports into anatomical sections with clinical significance
	- Interactive Highlighting: Click any finding to see its exact source in the original text
	- Clinical Significance: Annotates findings as minor, significant, or grounding
	- Character-Level Mapping: Precise attribution back to source text
	- Multi-Model Support: Gemini 2.5 Flash (fast) and Pro (comprehensive)

	## Quick Start

	### Setup

	```bash
	git clone https://huggingface.co/spaces/google/radextract
	cd radextract
	python -m venv venv
	source venv/bin/activate
	pip install -e ".[dev]"
	cp env.list.example env.list
	# Edit env.list and set KEY=your_gemini_api_key_here
	```

	### Local Development

	```bash
	source venv/bin/activate
	export KEY=your_gemini_api_key_here
	python app.py
	```

	Access at: http://localhost:7870

	## API Usage

	### Example Request
	```bash
	curl -X POST \
	-H 'X-Model-ID: gemini-2.5-flash' \
	-H 'X-Use-Cache: true' \
	-d 'FINDINGS: Normal heart and lungs. IMPRESSION: Normal study.' \
	http://localhost:7870/predict
	```

	### Response Format
	```json
	{
	"segments": [{
	"type": "body",
	"label": "Chest",
	"content": "Normal heart and lungs",
	"intervals": [{"startPos": 10, "endPos": 32}],
	"significance": "minor"
	}],
	"text": "Chest:\n- Normal heart and lungs",
	"annotated_document_json": {...}
	}
	```

	## Architecture

	- Backend: Flask + Python 3.10+ with full type safety
	- NLP Engine: [LangExtract](https://github.com/google/langextract) for structured extraction
	- AI Models: Google Gemini 2.5 (Flash/Pro)
	- Frontend: Vanilla JavaScript with interactive UI
	- Deployment: Docker + Hugging Face Spaces
	- Package Details: See [pyproject.toml](https://huggingface.co/spaces/google/radextract/blob/main/pyproject.toml) for dependencies, metadata, and tooling

	## Project Structure

	```
	radextract/
	├── app.py # Flask API endpoints
	├── structure_report.py # Core structuring logic
	├── sanitize.py # Text preprocessing & normalization
	├── prompt_instruction.py # LangExtract prompt
	├── cache_manager.py # Response caching
	├── static/ # Frontend assets
	└── templates/ # HTML templates
	```

	## Development

	### Setup
	```bash
	git clone https://huggingface.co/spaces/google/radextract
	cd radextract
	python -m venv venv
	source venv/bin/activate
	pip install -e ".[dev]"
	```

	### Code Quality
	```bash
	# Format code
	pyink . && isort .

	# Type checking
	mypy . --ignore-missing-imports

	# Run tests
	pytest
	```

	### Docker
	```bash
	# Build and run
	docker build -t radextract .
	docker run -p 7870:7870 --env-file env.list radextract
	```

	## License

	Apache License 2.0 - see [LICENSE](LICENSE) for details.

	## Related Projects

	- [LangExtract](https://github.com/google/langextract): Core NLP library

	---

	Built for the medical AI community \| Hosted on Hugging Face Spaces

	## Disclaimer

	This is not an officially supported Google product. If you use RadExtract or LangExtract in production or publications, please cite accordingly and acknowledge usage. Use is subject to the [Apache 2.0 License](LICENSE). For health-related applications, use of LangExtract is also subject to the [Health AI Developer Foundations Terms of Use](https://developers.google.com/health-ai-foundations/terms).