Spaces:

Sathvik0101
/

obj_localizer

Running

App Files Files Community

obj_localizer / README.md

sathvik

docs: add HF Space YAML metadata to README

f3c81cc 13 days ago

preview code

Raw

History Blame Contribute Delete

6.26 kB

	---
	title: SpaceDebris Localizer
	emoji: 🛰️
	colorFrom: blue
	colorTo: gray
	sdk: gradio
	sdk_version: 5.12.0
	app_file: app.py
	pinned: true
	license: mit
	---

	# SpaceDebris Localizer

	Use NVIDIA LocateAnything-3B to locate space debris, satellite fragments, and spacecraft components in orbital imagery.

	Orbital debris is a growing threat to satellite operations and crewed spaceflight. This project demonstrates how state-of-the-art vision-language grounding models can be applied to identify and localize objects in space imagery — from satellite solar panels and antennas to rocket bodies and debris fields. Built as a Hugging Face Spaces application, it provides a natural-language interface: describe what you're looking for, and the model draws bounding boxes around matching objects in the image.

	## Why This Matters

	There are over 36,000 tracked objects in Earth orbit, and millions of smaller fragments too tiny to track. Traditional detection pipelines require specialized training data and domain-specific models. Vision-language grounding models like LocateAnything-3B offer a different approach: describe the target in natural language and let the model find it. This prototype explores whether general-purpose visual grounding can serve as a rapid-deployment tool for orbital debris awareness, satellite inspection, and space situational awareness workflows.

	## Architecture

	```
	User uploads image + text prompt
	│
	▼
	┌─────────────────────┐
	│ Gradio Interface │
	│ (app.py) │
	└────────┬────────────┘
	│
	▼
	┌─────────────────────┐
	│ LocateAnythingWorker│
	│ (src/inference.py) │
	│ ┌─────────────────┐│
	│ │ nvidia/ ││
	│ │ LocateAnything- ││
	│ │ 3B (3B params) ││
	│ └─────────────────┘│
	└────────┬────────────┘
	│ raw text with <box> tokens
	▼
	┌─────────────────────┐
	│ Output Parser │
	│ (src/parsing.py) │
	│ Regex → BBox list │
	└────────┬────────────┘
	│ structured BBox objects
	▼
	┌─────────────────────┐
	│ Visualizer │
	│ (src/visualization) │
	│ Draw boxes + labels │
	└────────┬────────────┘
	│
	▼
	Annotated image + JSON metadata
	```

	## Setup

	### Prerequisites

	- Python 3.10+
	- CUDA-capable GPU (recommended) or CPU (slow)
	- ~8GB GPU memory for bfloat16 inference

	### Local Installation

	```bash
	git clone https://github.com/YOUR_USERNAME/space-debris-localizer.git
	cd space-debris-localizer
	pip install -e ".[dev]"
	```

	### Run Locally

	```bash
	python app.py
	```

	The app launches at `http://localhost:7860`. First run downloads the model (~6GB).

	### Environment Variables

	\| Variable \| Default \| Description \|
	\|----------\|---------\|-------------\|
	\| `MODEL_ID` \| `nvidia/LocateAnything-3B` \| HuggingFace model ID \|
	\| `DEVICE` \| `cuda` \| Device (`cuda` or `cpu`) \|
	\| `DTYPE` \| `bfloat16` \| Model precision \|
	\| `MAX_NEW_TOKENS` \| `8192` \| Max generation tokens \|
	\| `GENERATION_MODE` \| `hybrid` \| `fast`, `slow`, or `hybrid` \|
	\| `PORT` \| `7860` \| Gradio server port \|

	## Deployment to Hugging Face Spaces

	### Automatic Sync via GitHub Actions

	1. Create a Hugging Face Space at [huggingface.co/new-space](https://huggingface.co/new-space) (select Gradio SDK)
	2. Set these GitHub repository secrets:
	- `HF_TOKEN` — your Hugging Face [access token](https://huggingface.co/settings/tokens)
	- `HF_USERNAME` — your Hugging Face username
	- `HF_SPACE_NAME` — your space name
	3. Push to `main`. GitHub Actions will sync the repo to your HF Space automatically.

	### Manual Push

	```bash
	# Clone your HF Space repo
	git clone https://huggingface.co/spaces/YOUR_USERNAME/space-debris-localizer
	cd space-debris-localizer
	# Copy project files
	cp -r /path/to/space-debris-localizer/* .
	git add . && git commit -m "deploy" && git push
	```

	## Example Prompts

	- `Locate all the instances that match the following description: space debris.`
	- `Locate all the instances that match the following description: solar panel.`
	- `Locate a single instance that matches the following description: spacecraft.`
	- `Locate all the instances that match the following description: antenna.`
	- `Locate all the instances that match the following description: rocket body.`
	- `Locate all the instances that match the following description: thermal blanket.`

	## Known Limitations

	- Domain gap: The model was trained on general grounding data (COCO, LVIS, RefCOCO, etc.), not specifically on orbital imagery. Performance on space scenes is exploratory.
	- Small debris: Objects below a few pixels are unlikely to be grounded reliably.
	- Image quality: Detection depends heavily on image resolution and contrast.
	- No confidence calibration: The model does not output calibrated confidence scores; displayed confidence is a placeholder.
	- GPU required: CPU inference is extremely slow due to the 3B parameter size.

	## Future Work

	- Fine-tune on orbital debris datasets (e.g., ESA's DISCOS, ESA Clean Space imagery)
	- Integrate with real satellite imagery APIs (e.g., ESA Copernicus, Planet Labs)
	- Add temporal tracking across image sequences
	- Support video input for debris tracking
	- Add point-based localization for centroid estimation
	- Deploy with quantized model for faster CPU inference

	## Tech Stack

	- Model: [nvidia/LocateAnything-3B](https://huggingface.co/nvidia/LocateAnything-3B)
	- Framework: Gradio 5.x, Hugging Face Transformers
	- Language: Python 3.10+
	- CI/CD: GitHub Actions
	- Deployment: Hugging Face Spaces

	## License

	MIT License. The underlying LocateAnything-3B model is subject to the [NVIDIA License](https://huggingface.co/nvidia/LocateAnything-3B/blob/main/LICENSE) (non-commercial research use).