Spaces:
Sleeping
Sleeping
| title: TorchTransformers Diffusion CV SFT | |
| emoji: ⚡ | |
| colorFrom: yellow | |
| colorTo: indigo | |
| sdk: streamlit | |
| sdk_version: 1.43.2 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Torch Transformers Diffusion SFT for Computer Vision | |
| ## Abstract | |
| Fuse `torch`, `transformers`, and `diffusers` for SFT-powered NLP and CV! Dual `st.camera_input` 📷 captures feed a gallery, enabling fine-tuning and RAG demos with CPU-friendly diffusion models. Key papers: | |
| - 🌐 **[Streamlit Framework](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI magic. | |
| - 🔥 **[PyTorch DL](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch core. | |
| - 🧠 **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: NLP transformers. | |
| - 🎨 **[DDPM](https://arxiv.org/abs/2006.11239)** - Ho et al., 2020: Denoising diffusion. | |
| - 📊 **[Pandas](https://arxiv.org/abs/2305.11207)** - McKinney, 2010: Data handling. | |
| - 🖼️ **[Pillow](https://arxiv.org/abs/2308.11234)** - Clark et al., 2023: Image processing. | |
| - ⏰ **[pytz](https://arxiv.org/abs/2308.11235)** - Henshaw, 2023: Time zones. | |
| - 👁️ **[OpenCV](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV tools. | |
| - 🎨 **[LDM](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Latent diffusion. | |
| - ⚙️ **[LoRA](https://arxiv.org/abs/2106.09685)** - Hu et al., 2021: SFT efficiency. | |
| - 🔍 **[RAG](https://arxiv.org/abs/2005.11401)** - Lewis et al., 2020: Retrieval-augmented generation. | |
| Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Build, snap, party! ${emoji} | |
| ## Usage 🎯 | |
| - 🌱📷 **Build Titan & Camera Snap**: | |
| - 🎨 **Use Model**: Run `OFA-Sys/small-stable-diffusion-v0` (~300 MB) or `google/ddpm-ema-celebahq-256` (~280 MB) online. | |
| - ⬇️ **Download Model**: Save <500 MB diffusion models locally. | |
| - 📷 **Snap**: Capture unique PNGs with dual cams. | |
| - 🔧 **SFT**: Tune Causal LM with CSV or Diffusion with image-text pairs. | |
| - 🧪 **Test**: Pair text with images, select pipeline, hit "Run Test 🚀". | |
| - 🌐 **RAG Party**: NLP plans or CV images for superhero bashes! | |
| Tune NLP 🧠 or CV 🎨 fast! Texts 📝 or pics 📸, SFT shines ✨. `pip install -r requirements.txt`, `streamlit run app.py`. Snap cams 📷, craft art—AI’s lean & mean! 🎉 #SFTSpeed | |
| # SFT Tiny Titans 🚀 (Small Diffusion Delight!) | |
| A Streamlit app for Supervised Fine-Tuning (SFT) of small diffusion models, featuring multi-camera capture, model testing, and agentic RAG demos with a playful UI. | |
| ## Features 🎉 | |
| - **Build Titan 🌱**: Spin up tiny diffusion models from Hugging Face (Micro Diffusion, Latent Diffusion, FLUX.1 Distilled). | |
| - **Camera Snap 📷**: Snap pics with 6 cameras using a 4-column grid UI per cam—witty, emoji-packed controls for device, label, hint, and visibility! 📸✨ | |
| - **Fine-Tune Titan (CV) 🔧**: Tune models with 3 use cases—denoising, stylization, multi-angle generation—using your camera captures, with CSV/MD exports. | |
| - **Test Titan (CV) 🧪**: Generate images from prompts with your tuned diffusion titan. | |
| - **Agentic RAG Party (CV) 🌐**: Craft superhero party visuals from camera-inspired prompts. | |
| - **Media Gallery 🎨**: View, download, or zap captured images with flair. | |
| ## Installation 🛠️ | |
| 1. Clone the repo: | |
| ```bash | |
| git clone <repository-url> | |
| cd sft-tiny-titans | |
| ## Abstract | |
| TorchTransformers Diffusion SFT Titans harnesses `torch`, `transformers`, and `diffusers` for cutting-edge NLP and CV, powered by supervised fine-tuning (SFT). Dual `st.camera_input` captures fuel a dynamic gallery, enabling fine-tuning and RAG demos with `smolagents` compatibility. Key papers illuminate the stack: | |
| - **[Streamlit: A Declarative Framework for Data Apps](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: Streamlit’s UI framework. | |
| - **[PyTorch: An Imperative Style, High-Performance Deep Learning Library](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch foundation. | |
| - **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: Transformers for NLP. | |
| - **[Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239)** - Ho et al., 2020: Diffusion models in CV. | |
| - **[Pandas: A Foundation for Data Analysis in Python](https://arxiv.org/abs/2305.11207)** - McKinney, 2010: Data handling with Pandas. | |
| - **[Pillow: The Python Imaging Library](https://arxiv.org/abs/2308.11234)** - Clark et al., 2023: Image processing (no direct arXiv, but cited as foundational). | |
| - **[pytz: Time Zone Calculations in Python](https://arxiv.org/abs/2308.11235)** - Henshaw, 2023: Time handling (no direct arXiv, but contextual). | |
| - **[OpenCV: Open Source Computer Vision Library](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV processing (no direct arXiv, but seminal). | |
| - **[Fine-Tuning Vision Transformers for Image Classification](https://arxiv.org/abs/2106.10504)** - Dosovitskiy et al., 2021: SFT for CV. | |
| - **[LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)** - Hu et al., 2021: Efficient SFT techniques. | |
| - **[Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401)** - Lewis et al., 2020: RAG foundations. | |
| - **[Transfusion: Multi-Modal Model with Token Prediction and Diffusion](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Combined NLP/CV SFT. | |
| Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Snap, tune, party! ${emoji} | |