radextract / README.md
goelak's picture
Initial commit for RadExtract
fab8051
---
title: RadExtract
emoji: πŸ—‚οΈ
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: apache-2.0
header: mini
app_port: 7870
tags:
- medical
- nlp
- radiology
- langextract
- gemini
- structured-data
---
# RadExtract: Radiology Report Structuring Demo
[![πŸ€— Hugging Face Spaces](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/google/radextract)
[![LangExtract](https://img.shields.io/badge/Powered%20by-LangExtract-green)](https://github.com/google/langextract)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
A demonstration application powered by [LangExtract](https://github.com/google/langextract) that structures radiology reports using Gemini models. Transform unstructured radiology text into organized, interactive segments with clinical significance annotations.
## Try the Demo
**[Launch RadExtract Demo](https://huggingface.co/spaces/google/radextract)**
Transform unstructured radiology reports into structured data with highlighted findings that are precisely mapped back to the original source text.
## Key Features
- **Structured Output**: Organizes reports into anatomical sections with clinical significance
- **Interactive Highlighting**: Click any finding to see its exact source in the original text
- **Clinical Significance**: Annotates findings as minor, significant, or grounding
- **Character-Level Mapping**: Precise attribution back to source text
- **Multi-Model Support**: Gemini 2.5 Flash (fast) and Pro (comprehensive)
## Quick Start
### Setup
```bash
git clone https://huggingface.co/spaces/google/radextract
cd radextract
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
cp env.list.example env.list
# Edit env.list and set KEY=your_gemini_api_key_here
```
### Local Development
```bash
source venv/bin/activate
export KEY=your_gemini_api_key_here
python app.py
```
Access at: http://localhost:7870
## API Usage
### Example Request
```bash
curl -X POST \
-H 'X-Model-ID: gemini-2.5-flash' \
-H 'X-Use-Cache: true' \
-d 'FINDINGS: Normal heart and lungs. IMPRESSION: Normal study.' \
http://localhost:7870/predict
```
### Response Format
```json
{
"segments": [{
"type": "body",
"label": "Chest",
"content": "Normal heart and lungs",
"intervals": [{"startPos": 10, "endPos": 32}],
"significance": "minor"
}],
"text": "Chest:\n- Normal heart and lungs",
"annotated_document_json": {...}
}
```
## Architecture
- **Backend**: Flask + Python 3.10+ with full type safety
- **NLP Engine**: [LangExtract](https://github.com/google/langextract) for structured extraction
- **AI Models**: Google Gemini 2.5 (Flash/Pro)
- **Frontend**: Vanilla JavaScript with interactive UI
- **Deployment**: Docker + Hugging Face Spaces
- **Package Details**: See [pyproject.toml](https://huggingface.co/spaces/google/radextract/blob/main/pyproject.toml) for dependencies, metadata, and tooling
## Project Structure
```
radextract/
β”œβ”€β”€ app.py # Flask API endpoints
β”œβ”€β”€ structure_report.py # Core structuring logic
β”œβ”€β”€ sanitize.py # Text preprocessing & normalization
β”œβ”€β”€ prompt_instruction.py # LangExtract prompt
β”œβ”€β”€ cache_manager.py # Response caching
β”œβ”€β”€ static/ # Frontend assets
└── templates/ # HTML templates
```
## Development
### Setup
```bash
git clone https://huggingface.co/spaces/google/radextract
cd radextract
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
```
### Code Quality
```bash
# Format code
pyink . && isort .
# Type checking
mypy . --ignore-missing-imports
# Run tests
pytest
```
### Docker
```bash
# Build and run
docker build -t radextract .
docker run -p 7870:7870 --env-file env.list radextract
```
## License
Apache License 2.0 - see [LICENSE](LICENSE) for details.
## Related Projects
- **[LangExtract](https://github.com/google/langextract)**: Core NLP library
---
**Built for the medical AI community** | **Hosted on Hugging Face Spaces**
## Disclaimer
This is not an officially supported Google product. If you use RadExtract or LangExtract in production or publications, please cite accordingly and acknowledge usage. Use is subject to the [Apache 2.0 License](LICENSE). For health-related applications, use of LangExtract is also subject to the [Health AI Developer Foundations Terms of Use](https://developers.google.com/health-ai-foundations/terms).