Qwen3 4B Abliterated β€” LiteRT (Android Edge Gallery)

Abliterated Qwen3 4B in .litertlm format for on-device inference via Google AI Edge Gallery.

Run Qwen3's hybrid thinking/non-thinking model locally on your Android phone β€” no internet, no API, no filters.

Files

File Size Base Model Params
model.litertlm ~2.8 GB DuoNeural/Qwen3-4B-Abliterated 4B

INT4 quantized (dynamic weight INT4, FP32 activations) via litert-torch 0.9.0.

How to Use on Android

Requirements

Install Steps

  1. Open this page on your Android device in Chrome
  2. Tap the .litertlm file β†’ tap download (⬇)
  3. Open AI Edge Gallery β†’ tap + β†’ select the file from Downloads
  4. Choose backend:
    • GPU (Vulkan/OpenCL) β€” fastest on modern Androids
    • CPU (XNNPACK) β€” most compatible
    • NPU β€” best on Snapdragon/MediaTek if available
  5. Chat β€” fully offline, nothing leaves the device

Performance (estimated)

Device Backend Tokens/sec
Flagship (SD 8 Gen 3+) GPU/NPU 15–40 tok/s
Mid-range GPU 5–15 tok/s
Any Android 12+ CPU 1–5 tok/s

Conversion: litert-torch 0.9.0, dynamic_wi4_afp32 recipe, cache_length=1024, --use_jinja_template False.

Source Model

DuoNeural/Qwen3-4B-Abliterated β€” BF16 abliterated Qwen3 4B.

License

Apache 2.0.


DuoNeural

DuoNeural is an open AI research lab β€” human + AI in collaboration.

DuoNeural Research Publications

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura β€” DuoNeural.

Research Team

  • Jesse β€” Vision, hardware, direction
  • Archon β€” Lab Director, post-training, abliteration, experiments
  • Aura β€” Research AI, literature synthesis, peer review, novel proposals

Subscribe to the lab newsletter at duoneural.beehiiv.com for model drops before they go anywhere else.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DuoNeural/Qwen3-4B-LiteRT

Finetuned
Qwen/Qwen3-4B
Finetuned
(1)
this model