Activity Feed

AI & ML interests

Request to join this organization to beta-test notebooks on Hugging Face!

fffiloniย 
posted an update 15 days ago
fffiloniย 
posted an update 16 days ago
view post
Post
1802
๐Ÿš€ RB-Modulation is back on Hugging Face Spaces!

This is an older project that recently broke due to dependency changes, but itโ€™s now fixed and running again โœ…

๐Ÿ‘‰ Whatโ€™s fixed:
- GroundingDINO & LangSAM installation
- compatibility with recent environments
- GPU inference running smoothly again

๐Ÿ‘‰ Try it here:
fffiloni/RB-Modulation

Feel free to give it a try again โ€” feedback welcome!
fffiloniย 
posted an update 28 days ago
view post
Post
3164
โœจ PASD Magnify is back on Hugging Face Spaces

fffiloni/PASD

PASD isnโ€™t recent, but still delivers strong results โ€” worth restoring rather than replacing.

Getting it to run again wasnโ€™t a simple dependency issue.
It relied on parts of diffusers that no longer exist, while moving to Gradio 6 forced a much newer HF stack โ€” and I couldnโ€™t modify the original source directly.

Recreating the old environment wasnโ€™t practical.
So I patched the downloaded code at runtime before import and made it compatible with todayโ€™s stack.

That ended up being the only approach that held without forking or freezing everything to outdated versions.

If youโ€™ve used it before (or are curious), feel free to give it another try.
fffiloniย 
posted an update about 1 month ago
view post
Post
2862
โœ… Back up and running!

My TIGER app is now fully working again, with fixes and full compatibility with Gradio 6 ๐Ÿš€

It lets you:
- ๐ŸŽ™๏ธ Separate multiple speakers from an audio file
- ๐ŸŽฌ Extract each speaker directly from a video
- ๐ŸŽง Split audio into dialog, music, and sound effects (DnR)
- ๐ŸŽฅ Apply DnR separation directly on videos

All powered by lightweight TIGER models for fast and efficient speech separation.

Try it here ๐Ÿ‘‰ fffiloni/TIGER-audio-extraction
fffiloniย 
posted an update about 1 month ago
view post
Post
2279
AniDoc is back ๐ŸŽ‰

Iโ€™ve fixed the Space and brought it back to life:
- โœ… Working again after being broken for a while
- โœ… Updated to Gradio 6
- โœ… Compatible with ZeroGPU
- โœ… Output videos now preserve original resolution and FPS

I also added advanced controls so you can experiment more (tracking, seed, motion, sketch).

Try it here: fffiloni/AniDoc
fffiloniย 
posted an update about 2 months ago
view post
Post
4134
I brought DALLยทE mini back to life ๐Ÿค–๐ŸŽจ

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model โ€” still weird, still fun ๐Ÿ˜„
  • 4 replies
ยท
fffiloniย 
posted an update about 2 months ago
view post
Post
500
A clearer demo for TADA (now multilingual) ๐Ÿ”Š๐ŸŒ

I improved the public demo for TADA โ€” a generative framework for speech modeling via textโ€“acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

โ€ข load the model
โ€ข prepare a reference voice (optionally with transcript or Whisper auto-transcription)
โ€ข generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

๐Ÿ‘‰ fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)
mrfakenameย 
posted an update 5 months ago
view post
Post
23475
Excited to share that I've joined the Hugging Face Fellows program! ๐Ÿค—

Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! ๐Ÿš€
mrfakenameย 
posted an update 6 months ago
view post
Post
6366
Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: https://huggingface.co/spaces/mrfakename/EmoAct-MiMo

(Turn ๐Ÿ”Š on to hear audio samples)
  • 5 replies
ยท
merveย 
posted an update 7 months ago
view post
Post
12033
deepseek-ai/DeepSeek-OCR is out! ๐Ÿ”ฅ my take โคต๏ธ
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
  • 4 replies
ยท
lbourdoisย 
posted an update 7 months ago
merveย 
posted an update 8 months ago
view post
Post
7005
large AI labs open-sourced a ton of models last week ๐Ÿ”ฅ
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 ๐Ÿค
> IBM released a new Docling model with 258M params based on Granite (A2.0) ๐Ÿ“ ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana ๐ŸŒ (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset ๐Ÿ’ป OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash ๐Ÿ’ญ meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
ยท