laion/talent_plus_rl_groups_of_50_with_audiobox_scores Viewer • Updated about 19 hours ago • 3.38M • 152
view post Post 1558 Excited to share that I've joined the Hugging Face Fellows program! 🤗Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! 🚀 See translation 🤗 2 2 + Reply
ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods Paper • 2110.02871 • Published Oct 6, 2021
MuPT: A Generative Symbolic Music Pretrained Transformer Paper • 2404.06393 • Published Apr 9, 2024 • 16
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation Paper • 2211.06687 • Published Nov 12, 2022 • 4
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 14
A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning Paper • 2507.06542 • Published Jul 9
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation Paper • 2406.07529 • Published Jun 11, 2024
Improving GUI Grounding with Explicit Position-to-Coordinate Mapping Paper • 2510.03230 • Published Oct 3 • 3
Chronological Thinking in Full-Duplex Spoken Dialogue Language Models Paper • 2510.05150 • Published Oct 2
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published Oct 14
InteractComp: Evaluating Search Agents With Ambiguous Queries Paper • 2510.24668 • Published Oct 28 • 97
view post Post 6021 Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.Will probably kick off a new run later with some settings tweaked.Put up a demo here: https://huggingface.co/spaces/mrfakename/EmoAct-MiMo(Turn 🔊 on to hear audio samples) See translation 5 replies · 🔥 12 12 + Reply
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published Oct 14
VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering Paper • 2510.10828 • Published Oct 12 • 1