---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- text-parsing
- scene-understanding
- physics-simulation
- smollm2
- lora
- fine-tuned
base_model: HuggingFaceTB/SmolLM2-135M-Instruct
pipeline_tag: text-generation
---

# PhysicsGIF-135M

� **Natural Language to Scene Parser**: A fine-tuned 135M parameter model that converts text descriptions into structured JSON scene specifications.

> ⚠️ **Note**: This model is the **text parsing component** of a larger physics-based GIF generation pipeline. It does NOT generate GIFs directly, it outputs structured JSON that is then processed by a separate physics engine and renderer.

![Training Loss](01_training_loss.png)

## What This Model Does

```
"a red ball bouncing to the right"
              │
              ▼
    ┌─────────────────────┐
    │  PhysicsGIF-135M    │  ← THIS MODEL
    │  (Text → JSON)      │
    └──────────┬──────────┘
               │
               ▼
{
  "objects": [{"type": "ball", "color": "#FF0000"}],
  "motion": {"velocity": [3, 0], "gravity": 0.3, "bounce": 0.9},
  "canvas": {"size": 128, "frames": 40}
}
```

The JSON output is then processed by **separate Python code** (physics engine + renderer) to create the actual GIF.

## 🎬 Example Outputs

| Prompt | Generated GIF |
|--------|---------------|
| "two triangles colliding with each other and exploding" | ![Triangles Exploding](output.gif) |
| "a pink ball dropping slowly from up" | ![Pink Ball Falling](output_1.gif) |

## 📊 Training Results

| Metric | Value |
|--------|-------|
| Base Model | SmolLM2-135M-Instruct |
| Training Examples | 500 |
| Epochs | 20 |
| Final Loss | 0.092 |
| Loss Reduction | 95.9% |
| Training Time | 42 minutes |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |

<details>
<summary>📈 Training Visualizations</summary>

#### Training Loss Curve
![Training Loss](01_training_loss.png)

#### Learning Rate Schedule
![Learning Rate](02_learning_rate.png)

#### Gradient Norms
![Gradient Norms](03_gradient_norms.png)

#### Per-Epoch Loss
![Epoch Losses](04_epoch_losses.png)

#### Dataset Distribution
![Object Distribution](05_object_distribution.png)
![Motion Distribution](06_motion_distribution.png)

#### Convergence Analysis
![Convergence](08_convergence_analysis.png)

</details>

## 🚀 Usage

### With the Full Pipeline (Recommended)

To generate actual GIFs, you need the complete pipeline code:

```bash
git clone https://github.com/vikramlingam/PhysicsGIF-135M
cd PhysicsGIF-135M
pip install torch transformers peft pillow numpy tqdm

# Interactive mode - generates real GIFs
python generate.py
```

```
🎬 PhysicsGIF Text-to-GIF Generator
Enter prompt: a red ball bouncing
Generating...
✓ Generated: output_1.gif
```

### Using This Model Directly (Text → JSON only)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("vikramlingam/PhysicsGIF-135M")
tokenizer = AutoTokenizer.from_pretrained("vikramlingam/PhysicsGIF-135M")

prompt = '''<|im_start|>system
You are a scene description parser. Convert text to JSON scene specification.<|im_end|>
<|im_start|>user
Convert to scene JSON: a red ball bouncing to the right<|im_end|>
<|im_start|>assistant
'''

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=False)
result = tokenizer.decode(outputs[0])

# Output: JSON scene specification
# You need physics.py and renderer.py to convert this to a GIF
```

## 🏗️ Full Pipeline Architecture

```
┌───────────────────────────────────────────────────────────┐
│                 Complete GIF Generation Pipeline            │
├───────────────────────────────────────────────────────────┤
│                                                             │
│  User Input: "a red ball bouncing"                          │
│                      │                                      │
│                      ▼                                      │
│  ┌─────────────────────────────────────┐                   │
│  │      PhysicsGIF-135M (THIS MODEL)   │                    │
│  │      Fine-tuned LLM                 │                    │
│  │      Converts text → JSON DSL       │                    │
│  └──────────────────┬──────────────────┘                   │
│                     │                                       │
│                     ▼                                       │
│  ┌─────────────────────────────────────┐                   │
│  │      physics.py (Python code)       │                    │
│  │      Newtonian physics simulation   │                    │
│  │      Calculates positions per frame │                    │
│  └──────────────────┬──────────────────┘                   │
│                     │                                       │
│                     ▼                                       │
│  ┌─────────────────────────────────────┐                   │
│  │      renderer.py (Python code)      │                    │
│  │      PIL-based frame rendering      │                    │
│  │      Saves as animated GIF          │                    │
│  └──────────────────┬──────────────────┘                   │
│                     │                                       │
│                     ▼                                       │
│               output.gif                                    │
│                                                             │
└───────────────────────────────────────────────────────────┘
```

## 🎯 What This Model Understands

### Objects
`ball`, `square`, `triangle`

### Colors
`red`, `blue`, `green`, `yellow`, `orange`, `purple`, `pink`, `cyan`, `white`

### Motion Patterns
- `bouncing` — Gravity + elastic bounce
- `falling` / `dropping` — Falls from top
- `floating` — No gravity
- `colliding` — Objects collide
- `exploding` — Triggers particle effects

### Multi-Object
`two balls`, `three triangles`

## 📁 Required Files for GIF Generation

This model alone cannot generate GIFs. You need:

| File | Purpose |
|------|---------|
| `src/parser.py` | Integrates this model |
| `src/physics.py` | Physics simulation |
| `src/renderer.py` | GIF rendering |
| `src/pipeline.py` | Combines all components |
| `generate.py` | CLI interface |

## 🔬 Training Details

- **Method**: LoRA fine-tuning
- **Target Modules**: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
- **Hardware**: CPU only (MacBook Pro)
- **Dataset**: 500 text-to-JSON examples

## 📜 License

Apache 2.0