Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

Shrijanagain 
posted an update 3 days ago
view post
Post
5218
Surya-1.1T: Scaling Beyond Human-Level Reasoning via 146 Trillion Token Pre-training
Author: SKT AI LABS
Affiliation: SKT AI Labs / Project Surya
Model Architecture: Optimized Dense Transformer
Parameters: 1.1 Trillion
Training Tokens: 146 Trillion

Wanna collaborate us Friends let's Start Journey we have Collected 146 trillon tokens and done pre training but we need to made more powerfull

Whitepaper - https://github.com/SHRIJANAGAIN/PROFF
  • 49 replies
·
cahlen 
posted an update 1 day ago
view post
Post
1751
It’s wild to me how you can just make shit now.

You can take a weekend with a raspberry pi 5, a pi camera, a 3d printer, and a smidgen of custom fine tuning (wakeword, whisper, tinybert, and pipertts) and you have physical device as a talking personal assistant.

What a time to be alive.

Edge ai, physical ai, ai augmented animatronics… tiny models. Tiny agents.

Going to be a wild year.
  • 3 replies
·
DedeProGames 
posted an update 3 days ago
view post
Post
3760
Can small models program?

Although even if they are reasoning AIs, small AIs cannot create extensive and high-quality code, at least that's what is commonly thought.

We present OrionLLM/NanoCoder-0.6b, an AI with just 600 million parameters based on qwen3-0.6b and trained with the dataset nvidia/OpenCodeReasoning.

While not good at complex code, we observed a significant improvement in code generation (especially in Python code), demonstrating that, when trained correctly, small AIs can, in fact, program.
  • 2 replies
·
salma-remyx 
posted an update 1 day ago
view post
Post
1454
Looking to execute on your next great idea? 💡

Search for relevant papers and find pre-built Docker images to interactively explore the code with Remyx!

Check out the new space 🔍
remyxai/remyx-explorer
AINovice2005 
posted an update 2 days ago
view post
Post
3286
In celebration of the new storage graph feature on the Hub, here's mine 😊 :


Post inspired by @ZennyKenny
prithivMLmods 
posted an update 1 day ago
view post
Post
478
Map-Anything v1 (Universal Feed-Forward Metric 3D Reconstruction) demo is now available on Hugging Face Spaces. Built with Gradio and integrated with Rerun, it performs multi-image and video-based 3D reconstruction, depth, normal map, and interactive measurements.

🤗 Demo: prithivMLmods/Map-Anything-v1
🤗 Model: facebook/map-anything-v1
🤗 Hf-Papers: MapAnything: Universal Feed-Forward Metric 3D Reconstruction (2509.13414)
aufklarer 
posted an update 1 day ago
view post
Post
730
We benchmarked https://github.com/soniqo/speech-swift, our open-source Swift library for on-device speech AI, against Whisper Large v3 (FP16) on LibriSpeech test-clean.

Three models beat it. Two architectural approaches:

Qwen3-ASR (LALM — Qwen3 LLM as ASR decoder, AuT encoder pretrained on ~40M hours) hits 2.35% WER at 1.7B 8-bit, running at 43x real-time on MLX. Greedy decoding matches beam search — the LLM decoder is strong enough that the greedy path is nearly always optimal.

Parakeet TDT (non-autoregressive transducer — FastConformer + TDT joint network) hits 2.74% WER in 634 MB as a CoreML INT8 model on the Neural Engine. No generative hallucination by design. Leaves GPU completely free.

Two findings worth flagging:
- 4-bit quantization is catastrophic for non-English: Korean 6.89% → 19.95% WER on FLEURS. Use 8-bit for multilingual.
- On CoreML, INT8 is 3.3x *faster* than INT4 — opposite of GPU behavior. Native ANE INT8 MACs vs INT4 lookup table indirection.

All numbers reproducible in 15 minutes.

Full article: https://blog.ivan.digital/we-beat-whisper-large-v3-with-a-600m-model-running-entirely-on-your-mac-20e6ce191174

Library: https://github.com/soniqo/speech-swift

Models: Qwen/Qwen3-ASR-0.6B, Qwen/Qwen3-ASR-1.7B, nvidia/parakeet-tdt-0.6b-v2
kanaria007 
posted an update 2 days ago
view post
Post
125
✅ Article highlight: *Long-Horizon Planning under SI-Core* (art-60-046, v0.1)

TL;DR:
Most discussions stop at the next Jump, the next rollout wave, or the next experiment. This article asks a harder question: how do you bind *30-second decisions* and *30-year plans* into the same structural story?

The answer here is *Plan Jumps*: long-horizon artifacts for infrastructure programs, policy trajectories, and institutional reforms, evaluated over scenario bundles, monitored with explicit replan triggers, and kept auditable through the same SIR / EVAL / SCover / SCI / CAS logic used at shorter horizons.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• turns plans themselves into first-class, traceable objects instead of PDF promises
• connects operational Jumps, tactical adjustments, and decade-scale plans in one runtime story
• treats uncertainty, scenario comparison, and replanning as built-in structure, not afterthoughts
• keeps politics and governance explicit instead of pretending models should “choose the future”

What’s inside:
• *Plan Jumps* for 5–30 year horizons
• *scenario bundles* and long-horizon world models
• *Plan-GCS*, SCover / SCI / CAS over decades
• *policy-level Genius Replay* for reusable historical plan structure
• *PoLB + EVAL* for shadow / pilot / staged rollout of sub-policies
• *policy-to-goal contracts*, budget envelopes, and governance review cycles
• *uncertainty propagation*, confidence bands, and robust plan selection
• *replan triggers* for scheduled, threshold, event-driven, and learning-based revision
• *intergenerational equity* and future citizens as explicit principals

Key idea:
SI-Core should not only explain what happened this minute. It should also help humans steer what happens over the next 10–30 years — with plans that are structured, replayable, revisable, and politically inspectable.
wassemgtk 
posted an update about 7 hours ago
view post
Post
21
Releasing Chuck Norris LLM — full SFT fine-tune with chain-of-thought reasoning.

Trained on +100k examples across math, logic, and code. Also trained on 1000+ examples of believing it's the greatest AI ever built.

Its training loss went to zero. The loss function was too afraid to report anything else.

wassemgtk/chuck-norris-llm
DedeProGames 
posted an update 1 day ago
view post
Post
101
Introducing GRM2, a powerful 3b parameter model designed for long-term reasoning and high performance in complex tasks.

Even with only 3b of parameters, it outperforms qwen3-32b in several benchmarks.

With only 3b of parameters, it can also generate large and complex code of over 1000 lines, use tools in a way comparable to large models, and is perfect for agentic tasks.

GRM2 is licensed under Apache 2.0, making it perfect as a FineTune base for other tasks.

OrionLLM/GRM2-3b