skeleton-gif

Deterministic text β†’ skeleton-GIF generator. Any prompt produces a 512Γ—512 looping GIF of a skeleton performing the requested action, with an emotion visible on its face + body posture, in an optional scene backdrop.

Zero hallucination guarantee. No diffusion model is in the generation path. The only ML component is facebook/bart-large-mnli used strictly for zero-shot text classification into a closed label set (92 actions Γ— 10 emotions Γ— 78 scenes). Rendering is pure procedural PIL code β€” we draw every bone, every joint, every backdrop ourselves.

Quick start

from skeleton_gif_model import SkeletonGif

model = SkeletonGif.from_pretrained("ocmannazirbriet/skeleton-gif")   # or local dir
out = model("a sad man reading a book in a bedroom")

print(out.action, out.emotion, out.scene)   # 'reading' 'sad' 'bedroom'
out.save("result.gif")

Install

pip install -r requirements.txt

Dependencies: pillow, transformers, torch, huggingface_hub.

On first call the underlying facebook/bart-large-mnli is downloaded (~1.6 GB) and cached. Prompts that match built-in keyword rules skip the classifier entirely.

How it works

prompt  ──(keyword match / BART zero-shot)──▢  (action, emotion, scene)
        ──(procedural PIL render, 24 frames)──▢  .gif
  1. Prompt parsing. SkeletonGif.__call__ routes the prompt through a deterministic keyword matcher; if nothing unambiguous hits, it falls back to zero-shot classification over the closed label set. Confidence thresholds on each channel (action β‰₯ 0.35, emotion β‰₯ 0.55, scene β‰₯ 0.55) send low-confidence calls to safe defaults (standing_idle / neutral / none).
  2. Skeleton keyframes. Each of the 92 actions is a pure function action_X(t ∈ [0, 1)) -> {joint_id: (x, y)} computing joint positions at frame t. 15-joint OpenPose-style rig.
  3. Emotion transform. A post-processing pass re-shapes posture (slouch / lean / tremble / bounce) and sets face parameters (mouth and eye shapes) per emotion.
  4. PIL render. Each frame: scene backdrop β†’ bones β†’ ribs + pelvis β†’ joints β†’ prop β†’ skull w/ face.
  5. GIF export. 24 frames at 10 fps, looping.

Outputs

The model returns a SkeletonGifOutput dataclass:

field type description
prompt str original input
action str one of 92 closed-set labels
emotion str one of 10 closed-set labels
scene str one of 78 closed-set labels (or "none")
gif_bytes bytes GIF payload β€” write directly or .save(path)

Label sets

See config.json for the full canonical label lists. A few highlights:

  • Actions: walking, dancing, reading, working, eating, vacuuming, cooking, football, cricket, basketball, climbing, swimming, playing_guitar, meditating, texting, shopping, bowing, hugging, … (92 total)
  • Emotions: happy, sad, angry, tired, excited, neutral, scared, surprised, bored, confused
  • Scenes: bedroom, kitchen, office, park, beach, church, space, stadium, museum, cemetery, castle, airport, casino, nightclub, farm, living_room, … (78 total, plus none)

Guarantees

  • Output is always a .gif β€” PIL writes the bytes directly.
  • Output is always a skeleton β€” procedural drawing; no diffusion model can drift.
  • Emotion is always visible β€” on the face and in body posture.
  • No hallucination. Classifications are chosen from a closed set; rendering is deterministic math. For any fixed (prompt, engine version) the output is bit-identical.
  • No image-generation API. BART-MNLI is text-only. Nothing ever hits a remote image service.

Limitations

  • Art style is intentionally minimal (stick-figure-plus-skull). Fluid / photographic motion is not an output mode.
  • Labels are a closed set. Unusual prompts snap to the nearest canonical label or to neutral defaults β€” not to novel concepts.
  • Scene backdrops are schematic, not photorealistic.

Citation

If you use this in academic work, cite the closed-set-procedural-rendering approach:

@software{skeleton_gif,
  title  = {skeleton-gif: deterministic text-to-GIF with zero hallucination},
  year   = {2026},
  url    = {https://huggingface.co/ocmannazirbriet/skeleton-gif}
}

License

MIT.

Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support