Alex Hant

hardhant

hardhant@gmail.com

AI & ML interests

None yet

Recent Activity

liked a model about 7 hours ago

Qwen/Qwen-AgentWorld-35B-A3B

liked a model 2 days ago

ASLP-lab/DiffRhythm-1_2-full

liked a model 2 days ago

HeartMuLa/HeartMuLaGen

View all activity

Organizations

None yet

liked a model about 7 hours ago

Qwen/Qwen-AgentWorld-35B-A3B

Text Generation • 35B • Updated 1 day ago • 223 • 151

liked 2 models 2 days ago

ASLP-lab/DiffRhythm-1_2-full

Updated Sep 2, 2025 • 27 • 8

HeartMuLa/HeartMuLaGen

Text-to-Audio • Updated Jan 19 • 33

reacted to projectlosangeles's post with ❤️ 2 days ago

Post

9398

🔥Check out HeartMuLa!!! 🔥

The best open-sourced music generation model in terms of lyrics controllability and music quality!!!

🤗https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B-happy-new-year🤗

❤️Listen to amazing HeartMuLa output samples here:
https://soundcloud.com/aleksandr-sigalov-61/sets/heartmula ❤️

@victor

4 replies

liked a Space 4 days ago

SoulX-Singer

🎤

167

Generate singing voice from lyrics and convert vocals

liked a model 4 days ago

Soul-AILab/SoulX-Singer

Text-to-Speech • Updated Mar 13 • 790 • 162

reacted to ajibawa-2023's post with 🚀 4 days ago

Post

6772

Shell-Code-Large
Dataset: ajibawa-2023/Shell-Code-Large

Shell-Code-Large is a large-scale corpus of Shell scripting source code comprising approximately 640,000 code samples stored in JSON Lines (.jsonl) format. The dataset is designed to support research in large language model (LLM) pretraining, code intelligence, DevOps automation, cloud infrastructure engineering, system administration, and software engineering automation.

By providing a high-volume, language-specific corpus focused exclusively on Shell scripting, Shell-Code-Large enables systematic experimentation in automation workflows, deployment pipelines, infrastructure management, and command-line tooling. These domains remain foundational to Linux systems, cloud-native platforms, CI/CD environments, and modern DevOps practices.

Shell-Code-Large addresses the need for a dedicated Shell-focused dataset at substantial scale, enabling targeted research into scripting patterns, command composition, workflow orchestration, infrastructure automation, and operational engineering practices

liked a model 6 days ago

Playtime-AI/LTX-2.3-Ted

Updated 9 days ago • 8

reacted to owensong's post with 🔥 6 days ago

Post

6405

I just released Inflect-Nano-v1, an ultra-small 4.63 parameter text-to-speech model.

The main idea is simple: instead of only making the acoustic model tiny and relying on a larger external vocoder, Inflect-Nano-v1 keeps the complete text-to-waveform stack under 5M parameters.

Quick facts:
- 4.63M total inference parameters
- 3.46M acoustic model
- 1.17M vocoder
- 24 kHz audio
- English-only
- Single male voice
- Runs locally with a simple PyTorch inference script

Why I made it:
Most modern TTS models are much larger, and even many “small TTS” projects depend on a separate vocoder. I wanted to see how far a complete tiny TTS stack could be pushed while still producing usable speech.

It is not SOTA, and I am not trying to claim it competes with large TTS systems. The interesting part is the size-to-functionality ratio.

What works:
It can generate arbitrary English speech locally, and the model is small enough to be interesting for:

- local voice assistants
- embedded/edge experiments
- browser or WASM-style TTS exploration
- efficient inference research
- tiny-model baselines

Limitations:
The quality is still limited. It can sound robotic, stumble on difficult unseen text, and the vocoder is still a clear bottleneck. Long or unusual prompts are less reliable.

So I would frame this as a research/demo release, not a production TTS engine.

I’d love feedback from people interested in:
- tiny speech models
- vocoders
- local TTS
- efficient inference
- embedded speech synthesis
- improving small-model generalization

If people find it useful, I’m interested in putting more training budget into a stronger v2.

Model page:
owensong/Inflect-Nano-v1

liked 2 models 7 days ago

ostris/ideogram_4_turbotime_lora

Text-to-Image • Updated 7 days ago • 5.12k • • 112

Comfy-Org/Boogu-Image

Updated 6 days ago • 90

reacted to prithivMLmods's post with 🔥 8 days ago

Post

3847

Wan2.2-I2V-Fast with highly upscaled sequential frame sampling is now available as a Spaces demo, built using Wan2.2-I2V and FLUX.2-Klein. Try the demo using the links below.👇

➠ wan2.2-i2v-fast : prithivMLmods/wan2.2-i2v-fast
➠ github: https://github.com/prithivsakthiur/wan2.2-i2v-fast
➠ collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

⤷ To learn more, visit the app page or the respective model pages.

liked a model 9 days ago

fal/ltx2.3-audio-reactive-lora

Image-to-Video • Updated 9 days ago • 2.2k • 45

liked 2 models 10 days ago

mradermacher/Darwin-28B-REASON-i1-GGUF

27B • Updated May 19 • 1.6k • 14

google/diffusiongemma-26B-A4B-it

Image-Text-to-Text • 26B • Updated 15 days ago • 1.04M • 1.06k

reacted to kasbsquall's post with 🔥 12 days ago

Post

4181

🔎 UX Crime Scene — every interface hides a crime.

Drop a screenshot of ANY website or app, and THE INSPECTOR — a film-noir detective — works it as a crime scene: he circles each UX flaw on the real pixels, names the charge, and files a verdict with a letter grade. A UX audit that plays like a detective thriller.

But the verdict is just the opening statement. Now it goes further:

⚖️ THE TRIAL — put the interface on trial. The guilty UI elements take the stand and defend themselves while the Inspector rules from the evidence.
🖼️ THE RECONSTRUCTION — one click and FLUX.2 Klein rebuilds the worst element FIXED, live. Before/after, on the real pixels.
🔊 THE VOICE — hear the verdict read aloud (Kokoro, local, no keys).
🚨 MOST WANTED — a public rogues' gallery. Book your case onto a shared board where the city's worst interfaces are ranked by their crimes. Booked by the public.

Three small models, all on Modal (scale-to-zero), none over 32B:
👁️ Qwen2.5-VL-7B (vision agent) · 🖼️ FLUX.2 Klein (reconstruction) · 🔊 Kokoro-82M (voice)

📊 Human-graded: 84% grounding / 92% valid charges.

▶️ Trailer: https://youtu.be/6u58YIEPrkA
📹 Full walkthrough: https://youtu.be/WyQbY0XJ_9E
🕵️ Try it: build-small-hackathon/ux-crime-scene

Built solo for #BuildSmallHackathon (Gradio × Hugging Face). Open the case — the Inspector is waiting.

1 reply

liked 2 models 12 days ago

lglg666/SongGeneration-v2-large

Updated Mar 9 • 352 • 22

moonshotai/Kimi-K2.7-Code

Image-Text-to-Text • 1.1T • Updated 10 days ago • 480k • • 984

liked a model 13 days ago

MiniMaxAI/MiniMax-M3

Image-Text-to-Text • 427B • Updated 1 day ago • 143k • • 1.23k

reacted to danielhanchen's post with 🔥 14 days ago

Post

1050

Google releases DiffusionGemma.✨
The new 26B-A4B diffusion text model runs locally on 18GB RAM.

Run with 4x faster text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio.

GGUF: unsloth/diffusiongemma-26B-A4B-it-GGUF
Guide: https://unsloth.ai/docs/models/diffusiongemma

1 reply

Alex Hant

AI & ML interests

Recent Activity

Organizations

hardhant's activity

SoulX-Singer