obj_localizer / README.md
sathvik's picture
docs: add HF Space YAML metadata to README
f3c81cc
|
Raw
History Blame Contribute Delete
6.26 kB
---
title: SpaceDebris Localizer
emoji: πŸ›°οΈ
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: true
license: mit
---
# SpaceDebris Localizer
Use **NVIDIA LocateAnything-3B** to locate space debris, satellite fragments, and spacecraft components in orbital imagery.
Orbital debris is a growing threat to satellite operations and crewed spaceflight. This project demonstrates how state-of-the-art vision-language grounding models can be applied to identify and localize objects in space imagery β€” from satellite solar panels and antennas to rocket bodies and debris fields. Built as a Hugging Face Spaces application, it provides a natural-language interface: describe what you're looking for, and the model draws bounding boxes around matching objects in the image.
## Why This Matters
There are over 36,000 tracked objects in Earth orbit, and millions of smaller fragments too tiny to track. Traditional detection pipelines require specialized training data and domain-specific models. Vision-language grounding models like LocateAnything-3B offer a different approach: describe the target in natural language and let the model find it. This prototype explores whether general-purpose visual grounding can serve as a rapid-deployment tool for orbital debris awareness, satellite inspection, and space situational awareness workflows.
## Architecture
```
User uploads image + text prompt
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Gradio Interface β”‚
β”‚ (app.py) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LocateAnythingWorkerβ”‚
β”‚ (src/inference.py) β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚ β”‚ nvidia/ β”‚β”‚
β”‚ β”‚ LocateAnything- β”‚β”‚
β”‚ β”‚ 3B (3B params) β”‚β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ raw text with <box> tokens
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Output Parser β”‚
β”‚ (src/parsing.py) β”‚
β”‚ Regex β†’ BBox list β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ structured BBox objects
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Visualizer β”‚
β”‚ (src/visualization) β”‚
β”‚ Draw boxes + labels β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
Annotated image + JSON metadata
```
## Setup
### Prerequisites
- Python 3.10+
- CUDA-capable GPU (recommended) or CPU (slow)
- ~8GB GPU memory for bfloat16 inference
### Local Installation
```bash
git clone https://github.com/YOUR_USERNAME/space-debris-localizer.git
cd space-debris-localizer
pip install -e ".[dev]"
```
### Run Locally
```bash
python app.py
```
The app launches at `http://localhost:7860`. First run downloads the model (~6GB).
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `MODEL_ID` | `nvidia/LocateAnything-3B` | HuggingFace model ID |
| `DEVICE` | `cuda` | Device (`cuda` or `cpu`) |
| `DTYPE` | `bfloat16` | Model precision |
| `MAX_NEW_TOKENS` | `8192` | Max generation tokens |
| `GENERATION_MODE` | `hybrid` | `fast`, `slow`, or `hybrid` |
| `PORT` | `7860` | Gradio server port |
## Deployment to Hugging Face Spaces
### Automatic Sync via GitHub Actions
1. Create a Hugging Face Space at [huggingface.co/new-space](https://huggingface.co/new-space) (select Gradio SDK)
2. Set these GitHub repository secrets:
- `HF_TOKEN` β€” your Hugging Face [access token](https://huggingface.co/settings/tokens)
- `HF_USERNAME` β€” your Hugging Face username
- `HF_SPACE_NAME` β€” your space name
3. Push to `main`. GitHub Actions will sync the repo to your HF Space automatically.
### Manual Push
```bash
# Clone your HF Space repo
git clone https://huggingface.co/spaces/YOUR_USERNAME/space-debris-localizer
cd space-debris-localizer
# Copy project files
cp -r /path/to/space-debris-localizer/* .
git add . && git commit -m "deploy" && git push
```
## Example Prompts
- `Locate all the instances that match the following description: space debris.`
- `Locate all the instances that match the following description: solar panel.`
- `Locate a single instance that matches the following description: spacecraft.`
- `Locate all the instances that match the following description: antenna.`
- `Locate all the instances that match the following description: rocket body.`
- `Locate all the instances that match the following description: thermal blanket.`
## Known Limitations
- **Domain gap:** The model was trained on general grounding data (COCO, LVIS, RefCOCO, etc.), not specifically on orbital imagery. Performance on space scenes is exploratory.
- **Small debris:** Objects below a few pixels are unlikely to be grounded reliably.
- **Image quality:** Detection depends heavily on image resolution and contrast.
- **No confidence calibration:** The model does not output calibrated confidence scores; displayed confidence is a placeholder.
- **GPU required:** CPU inference is extremely slow due to the 3B parameter size.
## Future Work
- Fine-tune on orbital debris datasets (e.g., ESA's DISCOS, ESA Clean Space imagery)
- Integrate with real satellite imagery APIs (e.g., ESA Copernicus, Planet Labs)
- Add temporal tracking across image sequences
- Support video input for debris tracking
- Add point-based localization for centroid estimation
- Deploy with quantized model for faster CPU inference
## Tech Stack
- **Model:** [nvidia/LocateAnything-3B](https://huggingface.co/nvidia/LocateAnything-3B)
- **Framework:** Gradio 5.x, Hugging Face Transformers
- **Language:** Python 3.10+
- **CI/CD:** GitHub Actions
- **Deployment:** Hugging Face Spaces
## License
MIT License. The underlying LocateAnything-3B model is subject to the [NVIDIA License](https://huggingface.co/nvidia/LocateAnything-3B/blob/main/LICENSE) (non-commercial research use).