PhysicsGIF-135M / README.md

Update README.md

0329ec3 verified 25 days ago

7.65 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	tags:
	- text-parsing
	- scene-understanding
	- physics-simulation
	- smollm2
	- lora
	- fine-tuned
	base_model: HuggingFaceTB/SmolLM2-135M-Instruct
	pipeline_tag: text-generation
	---

	# PhysicsGIF-135M

	� Natural Language to Scene Parser: A fine-tuned 135M parameter model that converts text descriptions into structured JSON scene specifications.

	> ⚠️ Note: This model is the text parsing component of a larger physics-based GIF generation pipeline. It does NOT generate GIFs directly, it outputs structured JSON that is then processed by a separate physics engine and renderer.

	![Training Loss](01_training_loss.png)

	## What This Model Does

	```
	"a red ball bouncing to the right"
	│
	▼
	┌─────────────────────┐
	│ PhysicsGIF-135M │ ← THIS MODEL
	│ (Text → JSON) │
	└──────────┬──────────┘
	│
	▼
	{
	"objects": [{"type": "ball", "color": "#FF0000"}],
	"motion": {"velocity": [3, 0], "gravity": 0.3, "bounce": 0.9},
	"canvas": {"size": 128, "frames": 40}
	}
	```

	The JSON output is then processed by separate Python code (physics engine + renderer) to create the actual GIF.

	## 🎬 Example Outputs

	\| Prompt \| Generated GIF \|
	\|--------\|---------------\|
	\| "two triangles colliding with each other and exploding" \| ![Triangles Exploding](output.gif) \|
	\| "a pink ball dropping slowly from up" \| ![Pink Ball Falling](output_1.gif) \|

	## 📊 Training Results

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Base Model \| SmolLM2-135M-Instruct \|
	\| Training Examples \| 500 \|
	\| Epochs \| 20 \|
	\| Final Loss \| 0.092 \|
	\| Loss Reduction \| 95.9% \|
	\| Training Time \| 42 minutes \|
	\| LoRA Rank \| 16 \|
	\| LoRA Alpha \| 32 \|

	<details>
	<summary>📈 Training Visualizations</summary>

	#### Training Loss Curve
	![Training Loss](01_training_loss.png)

	#### Learning Rate Schedule
	![Learning Rate](02_learning_rate.png)

	#### Gradient Norms
	![Gradient Norms](03_gradient_norms.png)

	#### Per-Epoch Loss
	![Epoch Losses](04_epoch_losses.png)

	#### Dataset Distribution
	![Object Distribution](05_object_distribution.png)
	![Motion Distribution](06_motion_distribution.png)

	#### Convergence Analysis
	![Convergence](08_convergence_analysis.png)

	</details>

	## 🚀 Usage

	### With the Full Pipeline (Recommended)

	To generate actual GIFs, you need the complete pipeline code:

	```bash
	git clone https://github.com/vikramlingam/PhysicsGIF-135M
	cd PhysicsGIF-135M
	pip install torch transformers peft pillow numpy tqdm

	# Interactive mode - generates real GIFs
	python generate.py
	```

	```
	🎬 PhysicsGIF Text-to-GIF Generator
	Enter prompt: a red ball bouncing
	Generating...
	✓ Generated: output_1.gif
	```

	### Using This Model Directly (Text → JSON only)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("vikramlingam/PhysicsGIF-135M")
	tokenizer = AutoTokenizer.from_pretrained("vikramlingam/PhysicsGIF-135M")

	prompt = '''<\|im_start\|>system
	You are a scene description parser. Convert text to JSON scene specification.<\|im_end\|>
	<\|im_start\|>user
	Convert to scene JSON: a red ball bouncing to the right<\|im_end\|>
	<\|im_start\|>assistant
	'''

	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=200, do_sample=False)
	result = tokenizer.decode(outputs[0])

	# Output: JSON scene specification
	# You need physics.py and renderer.py to convert this to a GIF
	```

	## 🏗️ Full Pipeline Architecture

	```
	┌───────────────────────────────────────────────────────────┐
	│ Complete GIF Generation Pipeline │
	├───────────────────────────────────────────────────────────┤
	│ │
	│ User Input: "a red ball bouncing" │
	│ │ │
	│ ▼ │
	│ ┌─────────────────────────────────────┐ │
	│ │ PhysicsGIF-135M (THIS MODEL) │ │
	│ │ Fine-tuned LLM │ │
	│ │ Converts text → JSON DSL │ │
	│ └──────────────────┬──────────────────┘ │
	│ │ │
	│ ▼ │
	│ ┌─────────────────────────────────────┐ │
	│ │ physics.py (Python code) │ │
	│ │ Newtonian physics simulation │ │
	│ │ Calculates positions per frame │ │
	│ └──────────────────┬──────────────────┘ │
	│ │ │
	│ ▼ │
	│ ┌─────────────────────────────────────┐ │
	│ │ renderer.py (Python code) │ │
	│ │ PIL-based frame rendering │ │
	│ │ Saves as animated GIF │ │
	│ └──────────────────┬──────────────────┘ │
	│ │ │
	│ ▼ │
	│ output.gif │
	│ │
	└───────────────────────────────────────────────────────────┘
	```

	## 🎯 What This Model Understands

	### Objects
	`ball`, `square`, `triangle`

	### Colors
	`red`, `blue`, `green`, `yellow`, `orange`, `purple`, `pink`, `cyan`, `white`

	### Motion Patterns
	- `bouncing` — Gravity + elastic bounce
	- `falling` / `dropping` — Falls from top
	- `floating` — No gravity
	- `colliding` — Objects collide
	- `exploding` — Triggers particle effects

	### Multi-Object
	`two balls`, `three triangles`

	## 📁 Required Files for GIF Generation

	This model alone cannot generate GIFs. You need:

	\| File \| Purpose \|
	\|------\|---------\|
	\| `src/parser.py` \| Integrates this model \|
	\| `src/physics.py` \| Physics simulation \|
	\| `src/renderer.py` \| GIF rendering \|
	\| `src/pipeline.py` \| Combines all components \|
	\| `generate.py` \| CLI interface \|

	## 🔬 Training Details

	- Method: LoRA fine-tuning
	- Target Modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
	- Hardware: CPU only (MacBook Pro)
	- Dataset: 500 text-to-JSON examples

	## 📜 License

	Apache 2.0