Campus-AI / docs /CAMPUS-AI-PROJECT-BRIEF.md
realruneett's picture
Final Release: CampusGen AI Pipeline & Compositor
a8aea21

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

CAMPUS-AI: PROJECT BRIEF

Universal Event Poster Generator

Project: CampusGen AI Type: AI-powered multi-modal event poster generation Hardware: Intel Ultra 9 275HX + RTX 5070 Ti (16GB VRAM) Deployment: Hugging Face Spaces (ZeroGPU โ€” Free Tier) Total Cost: $0 Last Updated: February 21, 2026


EXECUTIVE SUMMARY

CampusGen AI generates professional event posters for ANY occasion in 10โ€“15 seconds using:

  • Stable Diffusion XL 1.0 (2.6B params) fine-tuned on 55,000+ diverse poster images via LoRA
  • 5 Generation Modes: Textโ†’Poster, Reference Image, Image Transform, Inpainting, HD Upscale
  • Llama 3.3 70B (Groq) for intelligent prompt engineering
  • Real-ESRGAN for 4x HD upscaling
  • IP-Adapter for reference image style transfer
  • GPU-accelerated pipeline end-to-end

WHY THIS WINS

Metric CampusGen AI Typical Projects
Dataset 55,000+ images, 55 categories 100-500 images, 1-2 categories
Generation Modes 5 modes (text, reference, transform, inpaint, upscale) 1 mode (text only)
Training LoRA on RTX 5070 Ti (bf16) Quantized on Colab
Intelligence LLM-powered prompt engineering (10 styles, 19 event types) Template-based
Speed 10-15 seconds/poster 30-60+ seconds
Upscaling Real-ESRGAN 4x HD output None
Style Transfer IP-Adapter reference image None
Cost $0 (smart free tier) $0-200
Deployment Professional 5-tab HF Space Local/unstable

TECHNOLOGY RATIONALE (Why These Models?)

Technology Why We Chose It What It Replaces
SDXL 1.0 (2.6B) The gold standard open-source framework for local training. It perfectly fits within a 12GB VRAM envelope allowing for rapid bf16 fine-tuning without destructive memory swapping. Midjourney V6 / DALL-E 3 (closed source, un-finetunable)
LoRA (Low-Rank Adaptation) Training a 2.6 Billion parameter model from scratch requires supercomputers. LoRA trains tiny adapter layers (~80M parameters) that sit on top of the frozen base model. This makes training possible in a few hours on a consumer RTX 5070 Ti (12GB) without catastrophic forgetting of the base model's knowledge. Full Fine-Tuning (Requires multiple A100s, huge memory)
Florence-2-large Microsoft's highly efficient Vision-Language Model. Instead of running 3 different models, Florence-2 does Detailed Visual Summaries + OCR (reading text) + Dense Region Capturing all in one pass. Clean, rich captions are the secret to teaching the SDXL model what a "poster" is. BLIP-2 / LLaVA (bulkier, less strict OCR formatting)
Llama 3.3 70B (via Groq) Users write lazy prompts like "a cybersec hackathon." We use Llama 3.3 to intercept that prompt and intelligently explode it into a highly detailed, cinematic description referencing our 10 trained visual styles and 19 event types. Running it through the Groq API makes this essentially instantaneous and free. Hardcoded prompt templates (rigid, boring)
IP-Adapter It allows users to upload a reference image (e.g., a cool poster they found online) and injects that structural/stylistic "vibe" into the generation pipeline natively, without needing a secondary text prompt. ControlNet (heavier, overkill for pure style transfer)
Real-ESRGAN A specialized upscaler neural network that reconstructs high-frequency details. Generating a 4K image directly in SDXL takes immense VRAM and time. It is faster to generate at 1024x1024 and run it through Real-ESRGAN to get a massive 4K HD output with perfectly crisp text in 2 seconds. Bicubic interpolation (blurry, pixelated)

TRAINING SPECIFICATIONS

Model Architecture

Component Specification
Base Model Stable Diffusion XL 1.0 (2.6B parameters) โ€” FROZEN
Fine-tuning LoRA (Low-Rank Adaptation)
LoRA Rank 32
LoRA Alpha 16
LoRA Dropout 0.05
Trainable Parameters ~80 million (0.6% of base model)
Precision bf16 (bfloat16)
LoRA File Size ~150-300 MB (.safetensors)
Trigger Word campus_ai_poster

How LoRA Works

Base model: SDXL 1.0 (2.6B params) โ†’ FROZEN, not modified
                    โ†“
LoRA injects small adapter matrices into attention layers:
  Original W (4096ร—4096) = 16M params  โ†’ FROZEN
  LoRA: A (4096ร—32) + B (32ร—4096) = 262K params  โ†’ TRAINED
                    โ†“
~250 attention layers ร— 262K = ~80M trainable params (3% of 2.6B)

Training Configuration

Parameter Value
Optimizer AdamW 8-bit (bitsandbytes)
Learning Rate 1e-4 (Phase 1) โ†’ 2e-5 (Phase 2) โ†’ 1e-5 (Phase 3)
Batch Size 1
Gradient Accumulation 4 steps
Effective Batch Size 4
Max Steps 4000 (P1) + 20000 (P2) + 6448 (P3)
Phase 3 Dataset 6,448 highly curated typography & layout templates
Checkpoint Interval Every 500 steps
Resolution 1024ร—1024
Noise Scheduler DDPM
EMA Decay 0.99
Gradient Checkpointing Enabled
Train UNet Yes
Train Text Encoder No
Dependencies bitsandbytes (critical for 8-bit), diffusers==0.32.1 (for torchao compat)
Estimated Time ~7.5 hours on RTX 5070 Ti

DATASET SPECIFICATIONS

Overview

Metric Value
Raw images scraped 1900 per theme ร— 55 themes = **104,500**
After quality filter 1300 per theme = **71,500**
Train split 1000 per theme = 55,000
Validation split 200 per theme = 11,000
Test split 100 per theme = 5,500

55 Categories (Hierarchical)

Group Subcategories
Tech Fest Hackathon, AI/ML, Robotics, Coding Competition, Cybersecurity, Web Dev, Startup, Data Science, IoT, Open Source, Game Dev
Cultural Fest Dance, Music, Drama, Art Exhibition, Poetry, Fashion Show, Photography
College Events Annual Day, Freshers Party, Farewell, Alumni Meet, Orientation, Graduation
Sports Cricket, Football, Basketball, Athletics, Chess, Badminton, Volleyball
Festivals Diwali, Holi, Navratri/Garba, Ganesh Chaturthi, Eid, Christmas, Onam, Pongal
Workshops Technical Seminar, Business Workshop, Creative Workshop, Leadership, Research
Social Blood Donation, Charity, Environmental, Awareness Campaign, NSS/NCC
Entertainment DJ Night, Concert, Standup Comedy, Movie Screening, Open Mic

Quality Filtering (GPU-Accelerated)

Check Threshold Method
Resolution โ‰ฅ512px shortest side CPU
Sharpness Laplacian variance โ‰ฅ50 GPU (PyTorch conv2d)
Aspect Ratio 0.4โ€“2.5 CPU
File Size 20KBโ€“50MB CPU
Color Variance std โ‰ฅ15 GPU (torch.std)
Deduplication pHash distance โ‰ค5 CPU

Captioning

Component Detail
Model Florence-2-large (microsoft)
Device GPU (float16)
Captions campus_ai_poster trigger + category prefix + Florence-2 description
Output Image + .txt pairs in data/final/

DEPLOYMENT APP โ€” 5-Tab Architecture

Files

File Purpose
app.py 5-tab Gradio UI (~500 lines)
pipelines.py Pipeline manager โ€” lazy loads SDXL/IP-Adapter/ESRGAN (~230 lines)
prompt_engine.py Groq LLM with 10 styles, 19 event types (~250 lines)
requirements.txt HF Space dependencies
README.md HF Space card

5 Generation Modes

Tab What It Does Key Tech
โœ๏ธ Text โ†’ Poster Describe event โ†’ get poster(s) SDXL + LoRA + Groq LLM
๐Ÿ–ผ๏ธ Reference Image Upload a poster โ†’ copy its style IP-Adapter
๐Ÿ”„ Image Transform Upload โ†’ restyle existing poster Img2Img pipeline
๐Ÿ–Œ๏ธ Inpaint / Edit Draw mask โ†’ regenerate region Inpainting pipeline
๐Ÿ” HD Upscale 2x/4x upscale any image Real-ESRGAN

Shared Features

  • 7 resolution presets (768ร—1152, 1024ร—1024, etc.)
  • 10 visual styles
  • Batch generation (1-4 variants)
  • Seed control
  • LoRA strength slider
  • Generation metadata display

VRAM Management

  • Only ONE pipeline active at a time (text2img OR img2img OR inpaint)
  • Model CPU offloading for 16GB GPU / HF ZeroGPU
  • IP-Adapter loads as lightweight adapter (~300MB) on top of base model
  • Real-ESRGAN uses tiled processing (512px tiles) for memory efficiency

GPU PIPELINE SUMMARY

Step Device Time
Scraping (Pinterest) ๐Ÿ–ฅ๏ธ CPU (network-bound) ~6-12h
Quality Filter ๐ŸŽฎ GPU (Laplacian + color) ~5 min
Captioning (Florence-2) ๐ŸŽฎ GPU (float16) ~6-12h
Dataset Split ๐Ÿ–ฅ๏ธ CPU (file copy) ~1 min
LoRA Training ๐ŸŽฎ GPU (bf16) ~7.5h
Upload to HF ๐Ÿ–ฅ๏ธ CPU ~5 min
Live Demo โ˜๏ธ Cloud GPU (ZeroGPU) Real-time

EXECUTION PIPELINE

# Phase 1: Data Collection
python scripts/pinterest_scraper.py          # ๐Ÿ–ฅ๏ธ CPU โ€” overnight

# Phase 2: Data Processing
python scripts/quality_filter.py             # ๐ŸŽฎ GPU โ€” ~5 min
python scripts/caption_generator.py          # ๐ŸŽฎ GPU โ€” overnight
python scripts/split_dataset.py              # ๐Ÿ–ฅ๏ธ CPU โ€” ~1 min

# Phase 3: Training (Dual-Phase)
python scripts/create_training_config.py     # ๐Ÿ–ฅ๏ธ CPU โ€” Setup
python ai-toolkit/run.py configs/train_sdxl_lora.yaml  # ๐ŸŽฎ GPU โ€” Phase 1 (3h)
python ai-toolkit/run.py configs/train_sdxl_lora_phase2.yaml  # ๐ŸŽฎ GPU โ€” Phase 2 (4.5h)

# Phase 4: Deploy
huggingface-cli upload YOUR_USERNAME/campus-ai-poster-sdxl models/sdxl/checkpoints/campus_ai_poster_sdxl/ .
# Push deployment/ to HF Space

FILE STRUCTURE

campus-ai/
โ”œโ”€โ”€ .gitignore                       # Explicitly ignores data/ & models/ for GitHub push
โ”œโ”€โ”€ configs/
โ”‚   โ”œโ”€โ”€ config.yaml                  # Master configuration (w/ hf_token)
โ”‚   โ”œโ”€โ”€ train_sdxl_lora.yaml         # ai-toolkit Phase 1 generator
โ”‚   โ””โ”€โ”€ train_sdxl_lora_phase3.yaml  # Phase 3 implicit layout tuner
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ pinterest_scraper.py         # Image scraper (1900/theme)
โ”‚   โ”œโ”€โ”€ quality_filter.py            # GPU-accelerated quality filter
โ”‚   โ”œโ”€โ”€ caption_generator.py         # Florence-2 GPU captioning
โ”‚   โ”œโ”€โ”€ split_dataset.py             # Fixed 1000/200/100 split
โ”‚   โ”œโ”€โ”€ test_checkpoint.py           # LoRA inference testing
โ”‚   โ””โ”€โ”€ create_training_config.py    # ai-toolkit config generator
โ”œโ”€โ”€ deployment/
โ”‚   โ”œโ”€โ”€ app.py                       # 5-tab Gradio app
โ”‚   โ”œโ”€โ”€ pipelines.py                 # Pipeline manager
โ”‚   โ”œโ”€โ”€ prompt_engine.py             # Groq LLM prompt engine
โ”‚   โ”œโ”€โ”€ requirements.txt             # HF Space dependencies
โ”‚   โ””โ”€โ”€ README.md                    # HF Space card
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ raw/                         # ~104K scraped images
โ”‚   โ”œโ”€โ”€ processed/                   # ~71K quality-filtered
โ”‚   โ”œโ”€โ”€ final/                       # Captioned pairs
โ”‚   โ”œโ”€โ”€ train/                       # 55K (1000/theme)
โ”‚   โ”œโ”€โ”€ val/                         # 11K (200/theme)
โ”‚   โ””โ”€โ”€ test/                        # 5.5K (100/theme)
โ”œโ”€โ”€ models/sdxl/checkpoints/         # Trained LoRA weights
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ CAMPUS-AI-PROJECT-BRIEF.md   # This file
โ”‚   โ”œโ”€โ”€ README.md                    # Project overview
โ”‚   โ”œโ”€โ”€ SETUP.md                     # Setup guide
โ”‚   โ””โ”€โ”€ PIPELINE.md                  # Execution pipeline
โ””โ”€โ”€ requirements.txt                 # Local dependencies

COMPETITION STRATEGY

What Judges Will See

  1. Live 5-tab demo on Hugging Face (not just slides)
  2. 55,000+ image dataset (10-100x larger than competitors)
  3. 5 generation modes (competitors have 1)
  4. GPU-accelerated pipeline (professional engineering)
  5. $0 deployment (smart architecture)

Key Talking Points

  • "Trained on 55,000+ event posters across 55 categories โ€” 10x larger than typical projects"
  • "5 generation modes: text, reference image, transform, inpaint, upscale"
  • "80 million trainable parameters via LoRA on 2.6 billion parameter SDXL model"
  • "GPU-accelerated pipeline: quality filter, captioning, and training all on GPU"
  • "Zero cost โ€” entire project runs on free tier services"

Tough Questions

Q: "Only 80M params? That seems small." A: "That's the power of LoRA โ€” we get the quality of a 2.6B model while only training 80M adapter parameters. The base model already knows how to generate images; our LoRA teaches it our specific poster style. Bigger โ‰  better โ€” efficiency is the innovation."

Q: "How is this different from MidJourney?" A: "MidJourney is generic. Ours is specialized โ€” trained on 55,000 Indian event posters. It understands rangoli patterns, tech fest aesthetics, and college event culture. Plus, 5 generation modes including reference image style transfer and inpainting."

Q: "Can judges try it live?" A: "Absolutely โ€” here's the HF Space link. Pick any event, any style. Generate in 15 seconds."


SUCCESS METRICS

Metric Target Status
Dataset 55K+ captioned images โœ… Complete
Training Loss <0.10, coherent samples โณ Pending
Generation <20 seconds, professional quality โณ Pending
Deployment Live 5-tab HF Space โณ Pending
Demo All 5 tabs working flawlessly โณ Pending

Version: 4.1 Last Updated: February 22, 2026 Status: Dataset captioned โœ… โ†’ Training LoRA on RTX 5070 Ti ๐Ÿ”„