--- license: apache-2.0 language: - en library_name: transformers tags: - text-parsing - scene-understanding - physics-simulation - smollm2 - lora - fine-tuned base_model: HuggingFaceTB/SmolLM2-135M-Instruct pipeline_tag: text-generation --- # PhysicsGIF-135M � **Natural Language to Scene Parser**: A fine-tuned 135M parameter model that converts text descriptions into structured JSON scene specifications. > ⚠️ **Note**: This model is the **text parsing component** of a larger physics-based GIF generation pipeline. It does NOT generate GIFs directly, it outputs structured JSON that is then processed by a separate physics engine and renderer. ![Training Loss](01_training_loss.png) ## What This Model Does ``` "a red ball bouncing to the right" │ ▼ ┌─────────────────────┐ │ PhysicsGIF-135M │ ← THIS MODEL │ (Text → JSON) │ └──────────┬──────────┘ │ ▼ { "objects": [{"type": "ball", "color": "#FF0000"}], "motion": {"velocity": [3, 0], "gravity": 0.3, "bounce": 0.9}, "canvas": {"size": 128, "frames": 40} } ``` The JSON output is then processed by **separate Python code** (physics engine + renderer) to create the actual GIF. ## 🎬 Example Outputs | Prompt | Generated GIF | |--------|---------------| | "two triangles colliding with each other and exploding" | ![Triangles Exploding](output.gif) | | "a pink ball dropping slowly from up" | ![Pink Ball Falling](output_1.gif) | ## 📊 Training Results | Metric | Value | |--------|-------| | Base Model | SmolLM2-135M-Instruct | | Training Examples | 500 | | Epochs | 20 | | Final Loss | 0.092 | | Loss Reduction | 95.9% | | Training Time | 42 minutes | | LoRA Rank | 16 | | LoRA Alpha | 32 |
📈 Training Visualizations #### Training Loss Curve ![Training Loss](01_training_loss.png) #### Learning Rate Schedule ![Learning Rate](02_learning_rate.png) #### Gradient Norms ![Gradient Norms](03_gradient_norms.png) #### Per-Epoch Loss ![Epoch Losses](04_epoch_losses.png) #### Dataset Distribution ![Object Distribution](05_object_distribution.png) ![Motion Distribution](06_motion_distribution.png) #### Convergence Analysis ![Convergence](08_convergence_analysis.png)
## 🚀 Usage ### With the Full Pipeline (Recommended) To generate actual GIFs, you need the complete pipeline code: ```bash git clone https://github.com/vikramlingam/PhysicsGIF-135M cd PhysicsGIF-135M pip install torch transformers peft pillow numpy tqdm # Interactive mode - generates real GIFs python generate.py ``` ``` 🎬 PhysicsGIF Text-to-GIF Generator Enter prompt: a red ball bouncing Generating... ✓ Generated: output_1.gif ``` ### Using This Model Directly (Text → JSON only) ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("vikramlingam/PhysicsGIF-135M") tokenizer = AutoTokenizer.from_pretrained("vikramlingam/PhysicsGIF-135M") prompt = '''<|im_start|>system You are a scene description parser. Convert text to JSON scene specification.<|im_end|> <|im_start|>user Convert to scene JSON: a red ball bouncing to the right<|im_end|> <|im_start|>assistant ''' inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=200, do_sample=False) result = tokenizer.decode(outputs[0]) # Output: JSON scene specification # You need physics.py and renderer.py to convert this to a GIF ``` ## 🏗️ Full Pipeline Architecture ``` ┌───────────────────────────────────────────────────────────┐ │ Complete GIF Generation Pipeline │ ├───────────────────────────────────────────────────────────┤ │ │ │ User Input: "a red ball bouncing" │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────┐ │ │ │ PhysicsGIF-135M (THIS MODEL) │ │ │ │ Fine-tuned LLM │ │ │ │ Converts text → JSON DSL │ │ │ └──────────────────┬──────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────┐ │ │ │ physics.py (Python code) │ │ │ │ Newtonian physics simulation │ │ │ │ Calculates positions per frame │ │ │ └──────────────────┬──────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────┐ │ │ │ renderer.py (Python code) │ │ │ │ PIL-based frame rendering │ │ │ │ Saves as animated GIF │ │ │ └──────────────────┬──────────────────┘ │ │ │ │ │ ▼ │ │ output.gif │ │ │ └───────────────────────────────────────────────────────────┘ ``` ## 🎯 What This Model Understands ### Objects `ball`, `square`, `triangle` ### Colors `red`, `blue`, `green`, `yellow`, `orange`, `purple`, `pink`, `cyan`, `white` ### Motion Patterns - `bouncing` — Gravity + elastic bounce - `falling` / `dropping` — Falls from top - `floating` — No gravity - `colliding` — Objects collide - `exploding` — Triggers particle effects ### Multi-Object `two balls`, `three triangles` ## 📁 Required Files for GIF Generation This model alone cannot generate GIFs. You need: | File | Purpose | |------|---------| | `src/parser.py` | Integrates this model | | `src/physics.py` | Physics simulation | | `src/renderer.py` | GIF rendering | | `src/pipeline.py` | Combines all components | | `generate.py` | CLI interface | ## 🔬 Training Details - **Method**: LoRA fine-tuning - **Target Modules**: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj - **Hardware**: CPU only (MacBook Pro) - **Dataset**: 500 text-to-JSON examples ## 📜 License Apache 2.0