Spaces:

awacke1
/

TorchTransformers-CV-SFT

Sleeping

App Files Files Community

TorchTransformers-CV-SFT / README.md

awacke1

Update README.md

67a1ae5 verified 10 months ago

preview code

raw

history blame

5.52 kB

	---
	title: TorchTransformers Diffusion CV SFT
	emoji: ⚡
	colorFrom: yellow
	colorTo: indigo
	sdk: streamlit
	sdk_version: 1.43.2
	app_file: app.py
	pinned: false
	license: mit
	short_description: Torch Transformers Diffusion SFT for Computer Vision
	---
	## Abstract
	Fuse `torch`, `transformers`, and `diffusers` for SFT-powered NLP and CV! Dual `st.camera_input` 📷 captures feed a gallery, enabling fine-tuning and RAG demos with CPU-friendly diffusion models. Key papers:

	- 🌐 [Streamlit Framework](https://arxiv.org/abs/2308.03892) - Thiessen et al., 2023: UI magic.
	- 🔥 [PyTorch DL](https://arxiv.org/abs/1912.01703) - Paszke et al., 2019: Torch core.
	- 🧠 [Attention is All You Need](https://arxiv.org/abs/1706.03762) - Vaswani et al., 2017: NLP transformers.
	- 🎨 [DDPM](https://arxiv.org/abs/2006.11239) - Ho et al., 2020: Denoising diffusion.
	- 📊 [Pandas](https://arxiv.org/abs/2305.11207) - McKinney, 2010: Data handling.
	- 🖼️ [Pillow](https://arxiv.org/abs/2308.11234) - Clark et al., 2023: Image processing.
	- ⏰ [pytz](https://arxiv.org/abs/2308.11235) - Henshaw, 2023: Time zones.
	- 👁️ [OpenCV](https://arxiv.org/abs/2308.11236) - Bradski, 2000: CV tools.
	- 🎨 [LDM](https://arxiv.org/abs/2112.10752) - Rombach et al., 2022: Latent diffusion.
	- ⚙️ [LoRA](https://arxiv.org/abs/2106.09685) - Hu et al., 2021: SFT efficiency.
	- 🔍 [RAG](https://arxiv.org/abs/2005.11401) - Lewis et al., 2020: Retrieval-augmented generation.

	Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Build, snap, party! ${emoji}

	## Usage 🎯
	- 🌱📷 Build Titan & Camera Snap:
	- 🎨 Use Model: Run `OFA-Sys/small-stable-diffusion-v0` (~300 MB) or `google/ddpm-ema-celebahq-256` (~280 MB) online.
	- ⬇️ Download Model: Save <500 MB diffusion models locally.
	- 📷 Snap: Capture unique PNGs with dual cams.
	- 🔧 SFT: Tune Causal LM with CSV or Diffusion with image-text pairs.
	- 🧪 Test: Pair text with images, select pipeline, hit "Run Test 🚀".
	- 🌐 RAG Party: NLP plans or CV images for superhero bashes!


	Tune NLP 🧠 or CV 🎨 fast! Texts 📝 or pics 📸, SFT shines ✨. `pip install -r requirements.txt`, `streamlit run app.py`. Snap cams 📷, craft art—AI’s lean & mean! 🎉 #SFTSpeed

	# SFT Tiny Titans 🚀 (Small Diffusion Delight!)

	A Streamlit app for Supervised Fine-Tuning (SFT) of small diffusion models, featuring multi-camera capture, model testing, and agentic RAG demos with a playful UI.

	## Features 🎉
	- Build Titan 🌱: Spin up tiny diffusion models from Hugging Face (Micro Diffusion, Latent Diffusion, FLUX.1 Distilled).
	- Camera Snap 📷: Snap pics with 6 cameras using a 4-column grid UI per cam—witty, emoji-packed controls for device, label, hint, and visibility! 📸✨
	- Fine-Tune Titan (CV) 🔧: Tune models with 3 use cases—denoising, stylization, multi-angle generation—using your camera captures, with CSV/MD exports.
	- Test Titan (CV) 🧪: Generate images from prompts with your tuned diffusion titan.
	- Agentic RAG Party (CV) 🌐: Craft superhero party visuals from camera-inspired prompts.
	- Media Gallery 🎨: View, download, or zap captured images with flair.

	## Installation 🛠️
	1. Clone the repo:
	```bash
	git clone <repository-url>
	cd sft-tiny-titans

	## Abstract
	TorchTransformers Diffusion SFT Titans harnesses `torch`, `transformers`, and `diffusers` for cutting-edge NLP and CV, powered by supervised fine-tuning (SFT). Dual `st.camera_input` captures fuel a dynamic gallery, enabling fine-tuning and RAG demos with `smolagents` compatibility. Key papers illuminate the stack:

	- [Streamlit: A Declarative Framework for Data Apps](https://arxiv.org/abs/2308.03892) - Thiessen et al., 2023: Streamlit’s UI framework.
	- [PyTorch: An Imperative Style, High-Performance Deep Learning Library](https://arxiv.org/abs/1912.01703) - Paszke et al., 2019: Torch foundation.
	- [Attention is All You Need](https://arxiv.org/abs/1706.03762) - Vaswani et al., 2017: Transformers for NLP.
	- [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) - Ho et al., 2020: Diffusion models in CV.
	- [Pandas: A Foundation for Data Analysis in Python](https://arxiv.org/abs/2305.11207) - McKinney, 2010: Data handling with Pandas.
	- [Pillow: The Python Imaging Library](https://arxiv.org/abs/2308.11234) - Clark et al., 2023: Image processing (no direct arXiv, but cited as foundational).
	- [pytz: Time Zone Calculations in Python](https://arxiv.org/abs/2308.11235) - Henshaw, 2023: Time handling (no direct arXiv, but contextual).
	- [OpenCV: Open Source Computer Vision Library](https://arxiv.org/abs/2308.11236) - Bradski, 2000: CV processing (no direct arXiv, but seminal).
	- [Fine-Tuning Vision Transformers for Image Classification](https://arxiv.org/abs/2106.10504) - Dosovitskiy et al., 2021: SFT for CV.
	- [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) - Hu et al., 2021: Efficient SFT techniques.
	- [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) - Lewis et al., 2020: RAG foundations.
	- [Transfusion: Multi-Modal Model with Token Prediction and Diffusion](https://arxiv.org/abs/2408.11039) - Li et al., 2024: Combined NLP/CV SFT.

	Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Snap, tune, party! ${emoji}

	---
	title: TorchTransformers Diffusion CV SFT
	emoji: ⚡
	colorFrom: yellow
	colorTo: indigo
	sdk: streamlit
	sdk_version: 1.43.2
	app_file: app.py
	pinned: false
	license: mit
	short_description: Torch Transformers Diffusion SFT for Computer Vision
	---
	## Abstract
	Fuse `torch`, `transformers`, and `diffusers` for SFT-powered NLP and CV! Dual `st.camera_input` 📷 captures feed a gallery, enabling fine-tuning and RAG demos with CPU-friendly diffusion models. Key papers:

	- 🌐 [Streamlit Framework](https://arxiv.org/abs/2308.03892) - Thiessen et al., 2023: UI magic.
	- 🔥 [PyTorch DL](https://arxiv.org/abs/1912.01703) - Paszke et al., 2019: Torch core.
	- 🧠 [Attention is All You Need](https://arxiv.org/abs/1706.03762) - Vaswani et al., 2017: NLP transformers.
	- 🎨 [DDPM](https://arxiv.org/abs/2006.11239) - Ho et al., 2020: Denoising diffusion.
	- 📊 [Pandas](https://arxiv.org/abs/2305.11207) - McKinney, 2010: Data handling.
	- 🖼️ [Pillow](https://arxiv.org/abs/2308.11234) - Clark et al., 2023: Image processing.
	- ⏰ [pytz](https://arxiv.org/abs/2308.11235) - Henshaw, 2023: Time zones.
	- 👁️ [OpenCV](https://arxiv.org/abs/2308.11236) - Bradski, 2000: CV tools.
	- 🎨 [LDM](https://arxiv.org/abs/2112.10752) - Rombach et al., 2022: Latent diffusion.
	- ⚙️ [LoRA](https://arxiv.org/abs/2106.09685) - Hu et al., 2021: SFT efficiency.
	- 🔍 [RAG](https://arxiv.org/abs/2005.11401) - Lewis et al., 2020: Retrieval-augmented generation.

	Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Build, snap, party! ${emoji}

	## Usage 🎯
	- 🌱📷 Build Titan & Camera Snap:
	- 🎨 Use Model: Run `OFA-Sys/small-stable-diffusion-v0` (~300 MB) or `google/ddpm-ema-celebahq-256` (~280 MB) online.
	- ⬇️ Download Model: Save <500 MB diffusion models locally.
	- 📷 Snap: Capture unique PNGs with dual cams.
	- 🔧 SFT: Tune Causal LM with CSV or Diffusion with image-text pairs.
	- 🧪 Test: Pair text with images, select pipeline, hit "Run Test 🚀".
	- 🌐 RAG Party: NLP plans or CV images for superhero bashes!


	Tune NLP 🧠 or CV 🎨 fast! Texts 📝 or pics 📸, SFT shines ✨. `pip install -r requirements.txt`, `streamlit run app.py`. Snap cams 📷, craft art—AI’s lean & mean! 🎉 #SFTSpeed

	# SFT Tiny Titans 🚀 (Small Diffusion Delight!)

	A Streamlit app for Supervised Fine-Tuning (SFT) of small diffusion models, featuring multi-camera capture, model testing, and agentic RAG demos with a playful UI.

	## Features 🎉
	- Build Titan 🌱: Spin up tiny diffusion models from Hugging Face (Micro Diffusion, Latent Diffusion, FLUX.1 Distilled).
	- Camera Snap 📷: Snap pics with 6 cameras using a 4-column grid UI per cam—witty, emoji-packed controls for device, label, hint, and visibility! 📸✨
	- Fine-Tune Titan (CV) 🔧: Tune models with 3 use cases—denoising, stylization, multi-angle generation—using your camera captures, with CSV/MD exports.
	- Test Titan (CV) 🧪: Generate images from prompts with your tuned diffusion titan.
	- Agentic RAG Party (CV) 🌐: Craft superhero party visuals from camera-inspired prompts.
	- Media Gallery 🎨: View, download, or zap captured images with flair.

	## Installation 🛠️
	1. Clone the repo:
	```bash
	git clone <repository-url>
	cd sft-tiny-titans

	## Abstract
	TorchTransformers Diffusion SFT Titans harnesses `torch`, `transformers`, and `diffusers` for cutting-edge NLP and CV, powered by supervised fine-tuning (SFT). Dual `st.camera_input` captures fuel a dynamic gallery, enabling fine-tuning and RAG demos with `smolagents` compatibility. Key papers illuminate the stack:

	- [Streamlit: A Declarative Framework for Data Apps](https://arxiv.org/abs/2308.03892) - Thiessen et al., 2023: Streamlit’s UI framework.
	- [PyTorch: An Imperative Style, High-Performance Deep Learning Library](https://arxiv.org/abs/1912.01703) - Paszke et al., 2019: Torch foundation.
	- [Attention is All You Need](https://arxiv.org/abs/1706.03762) - Vaswani et al., 2017: Transformers for NLP.
	- [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) - Ho et al., 2020: Diffusion models in CV.
	- [Pandas: A Foundation for Data Analysis in Python](https://arxiv.org/abs/2305.11207) - McKinney, 2010: Data handling with Pandas.
	- [Pillow: The Python Imaging Library](https://arxiv.org/abs/2308.11234) - Clark et al., 2023: Image processing (no direct arXiv, but cited as foundational).
	- [pytz: Time Zone Calculations in Python](https://arxiv.org/abs/2308.11235) - Henshaw, 2023: Time handling (no direct arXiv, but contextual).
	- [OpenCV: Open Source Computer Vision Library](https://arxiv.org/abs/2308.11236) - Bradski, 2000: CV processing (no direct arXiv, but seminal).
	- [Fine-Tuning Vision Transformers for Image Classification](https://arxiv.org/abs/2106.10504) - Dosovitskiy et al., 2021: SFT for CV.
	- [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) - Hu et al., 2021: Efficient SFT techniques.
	- [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) - Lewis et al., 2020: RAG foundations.
	- [Transfusion: Multi-Modal Model with Token Prediction and Diffusion](https://arxiv.org/abs/2408.11039) - Li et al., 2024: Combined NLP/CV SFT.

	Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Snap, tune, party! ${emoji}