visual-lineage / agents.md
verymehari's picture
update agents.md with full pipeline details
c6ce390
|
Raw
History Blame Contribute Delete
3.3 kB
# Visual Lineage β€” Agent Instructions
## What this is
Blend FLUX.2 klein instrument LoRAs, generate hybrid instrument images with full provenance tracking, and convert to 3D meshes for printing.
Built for the Build Small hackathon (June 2026). Track: Backyard AI.
## Published Components
### HF Space (Gradio UI)
- **URL:** https://build-small-hackathon-visual-lineage.hf.space
- **SDK:** Gradio 6.x
- **GPU:** zero-a10g (ZeroGPU, 60s duration), @spaces.GPU decorator required
- **App file:** `app/app.py`
- **API endpoint:** `POST /gradio_api/call/generate`
### Published LoRAs (HF Hub under build-small-hackathon org)
| LoRA | Trigger | HF Repo | Training |
|------|---------|---------|----------|
| Eritrean krar | ERTRN_KRAR | visual-lineage-eritrean_krar_v1 | 30 images, step 1500 |
| Korean gayageum | KR_GAYAGEUM | visual-lineage-korean_gayageum_v1 | 30 images, step 2000 |
| Berimbau (capoeira) | BR_BERIMBAU | visual-lineage-berimbau_v1 | 30 images, step 2000 |
### 3D Conversion Workflows (ComfyUI)
Located in `comfy_workflows/`:
- `tripo_api_image_to_3d.json` β€” Tripo cloud API. LoadImage β†’ TripoImageToModelNode β†’ Preview3DAdvanced. **Bring your own Tripo API key.**
- `triposr_image_to_3d.json` β€” Local TripoSR (experimental). Requires full TripoSR VAE (encoder + decoder). Not all hardware supported.
### How to call the Gradio API
```
POST /gradio_api/call/generate
{"data": ["eritrean_krar_v1", "korean_gayageum_v1", 60, "prompt describing hybrid instrument", 42, 768]}
β†’ returns {"event_id": "..."}
β†’ poll with POST /gradio_api/call/{event_id} for SSE result
```
## Dataset Pipeline (for training new LoRAs)
1. Wikimedia harvest: `python harvest/wikimedia_harvest.py --config harvest/configs/{lora_id}.yaml --download`
2. Openverse supplement: `python harvest/openverse_harvest_generic.py --config harvest/configs/{lora_id}.yaml`
3. Build dataset: `python harvest/build_dataset.py --config harvest/configs/{lora_id}.yaml --target 30`
4. Train on Modal: `VL_LORA_ID={lora_id} VL_RUN_ID={lora_id} modal run -m train.modal_train`
5. Publish: `python train/publish_to_hf.py ...`
## Project Structure
```
visual-lineage/
β”œβ”€β”€ app/app.py # Gradio UI with lineage panel
β”œβ”€β”€ compose/
β”‚ β”œβ”€β”€ merge.py # FLUX + LoRA inference (24 steps, guidance 2.0)
β”‚ └── provenance.py # Provenance builder with recursive ancestry
β”œβ”€β”€ registry/loras.json # Published LoRA metadata
β”œβ”€β”€ comfy_workflows/ # 3D conversion workflows
β”‚ β”œβ”€β”€ tripo_api_image_to_3d.json
β”‚ β”œβ”€β”€ triposr_image_to_3d.json
β”‚ └── README.md
β”œβ”€β”€ harvest/ # Dataset curation
β”‚ β”œβ”€β”€ wikimedia_harvest.py
β”‚ β”œβ”€β”€ openverse_harvest_generic.py
β”‚ β”œβ”€β”€ build_dataset.py
β”‚ └── configs/
β”œβ”€β”€ train/
β”‚ β”œβ”€β”€ modal_train.py # Modal A10G training
β”‚ β”œβ”€β”€ publish_to_hf.py # HF Hub publishing
β”‚ └── configs/
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── agents.md
```
## Requirements
- Python 3.11+
- Modal account + HF_TOKEN for training
- Openverse API credentials for expanded dataset sourcing
- Tripo API key for cloud-based 3D conversion
- ComfyUI Desktop for local 3D workflows