In a Training Loop 🔄

John Smith PRO

John6666

John6666cat

AI & ML interests

None yet

Recent Activity

liked a model about 4 hours ago

Umranz/qwen3-14b-heretic-uncensored

liked a model about 4 hours ago

mradermacher/raw-uncensored-qwen3-14b-heretic-recovered-i1-GGUF

liked a model about 4 hours ago

mradermacher/raw-uncensored-qwen3-14b-heretic-recovered-GGUF

View all activity

Organizations

replied to Duskfallcrew's post 1 day ago

Yeah. Gradio feels like heaven for Python coders —
provided they’re always ready to rewrite all their GUI code whenever it gets upgraded.

reacted to kanaria007's post with 🧠 1 day ago

Post

141

✅ Article highlight: *LLM Wrappers as Proposal Engines, Not Authorities* (art-60-232, v0.1)

TL;DR:
This article argues that LLM wrappers should not hold runtime authority.

A wrapper may draft proposals, but it should not directly own world-facing effect power. In SI-style migration, the wrapper produces a proposal under a declared wrapper profile, that draft is parsed under a governed contract, parse failures are handled explicitly, gates evaluate the parsed proposal, and only then can runtime authority decide whether any effect is admissible.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• separates model suggestion from runtime authority
• makes parse failure a governed event instead of a silent fallback
• gives legacy LLM-agent stacks a realistic migration path without pretending the wrapper is already safe
• keeps effect-ledger discipline and runtime gating in the authority layer, not in the model shell

What’s inside:
• wrapper profiles as bounded proposal-generation contracts
• proposal drafts, parsed jump receipts, and jump outcome records
• governed handling for parse failure, partial parse, and draft rejection
• gates that evaluate parsed proposals before any live effect path opens
• the rule that effects execute under runtime authority and effect-ledger discipline, not under model autonomy

Key idea:
Do not say:

*“the agent decided and used tools.”*

Say:

*“the wrapper proposed, the proposal was parsed or failed under a governed contract, gates evaluated it, and any resulting effect was executed under runtime authority.”*

reacted to danielhanchen's post with 🔥 1 day ago

Post

5284

We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! 🚀

Learn how 3 optimizations help your home GPU train models faster:
1. Packed-sequence metadata caching
2. Double-buffered checkpoint reloads
3. Faster MoE routing

Guide: https://unsloth.ai/blog/nvidia-collab
GitHub: https://github.com/unslothai/unsloth

reacted to Duskfallcrew's post with 👍 1 day ago

Post

Just wrote a huge article on why Gradio is not your friend -- not for everything btw, it's not saying Gradio is dirt -- just the WebUI ecosystem that "EVERYONE JUST DO WHATEVERYONE ELSE IS DOING" drove me away from Forge/A111. Sadly i forced my butt out of the article sphere and I haven't had the energy to re-apply so here's the article over at Civit: https://civitai.com/articles/29765/gradio-is-not-your-friend-times-change

1 reply

reacted to tifischer's post with 👀 2 days ago

Post

633

I created a model using a notebook and am happy with its accuracy. The next step was to deploy it on huggingface space and use api to submit a form that executes the model on the backend.

My files are uploaded. My front end look good. By Backend health check is good. The space is running with no errors. However, when I submit the predict button the app just times out waiting for the backend to respond. THere are no errors in the logs. but the front end reponds API Error 500. If I switch to dev mode which I understand is in preview then the build breaks, so I cannot use that.

tifischer/SmartKart_Prediction

reacted to Sathya77's post with 🚀 2 days ago

Post

Trained a Swin-T from scratch on NWPU-RESISC45 — no pretrained weights, no fine-tuning.

Every component hand-coded in PyTorch: window partitioning, shifted window attention with relative positional bias, patch merging across 4 stages, ~28M parameters.

Architecture:

embed_dim=96, window_size=7, depths=[2, 2, 6, 2]
heads=[3, 6, 12, 24] across stages
Patch embed via Conv2d (4×4, stride 4) → 56×56 feature map
PatchMerging downsamples by concatenating 2×2 neighbors + linear projection
Global average pooling → linear classifier

Training:

AdamW (lr=3e-4, weight_decay=0.05)
Cosine annealing with 3-epoch linear warmup over 20 epochs
Mixed precision (autocast + GradScaler)
Gradient clipping (max_norm=1.0)
Label smoothing (0.1)
ImageNet normalization, batch size 32
80/20 train/test split, seed=42

Result: 82% test accuracy on 45 land-use categories, 31,500 images.
🔗 Sathya77/swin-transformer-satellite

What accuracy do you think is achievable on NWPU-RESISC45 with Swin-T trained from scratch, without any pretraining?

reacted to dvatine's post with 👀 2 days ago

Post

AP123/IllusionDiffusion
Isn't working...
How can we fix it?

1 reply

replied to dvatine's post 2 days ago

only the author could fix it actually...
https://discuss.huggingface.co/t/runtime-errror/175679/2

reacted to qgallouedec's post with 🤗 2 days ago

Post

129

**TRL v1.4 is out 🚀** Chunked NLL loss for SFT and a first-class **OpenReward** integration.

**Chunked NLL loss for SFT — drops peak VRAM by up to 14×**

Standard SFT materializes a full [batch × seq × vocab] logits tensor before computing cross-entropy, which dominates peak memory at long context lengths. The new loss_type="chunked_nll" path drops ignored-label tokens before the lm_head matmul and computes cross-entropy in checkpointed chunks of 256.

Peak GPU memory, AdamW fp32:
- Qwen3-14B, 8×H100 FSDP2, 16k seq: 58.9 GB → 38.9 GB
- Qwen3-4B, 1×H100 80GB, 16k seq: OOM → 63.8 GB
- Qwen3-32B, 8×H100 FSDP2, 8k seq: OOM → 71.2 GB

End-to-end it's consistently as fast or faster than nll, and unlocks sequence lengths that don't fit at all under the standard path.

SFTConfig(loss_type="chunked_nll")

Works with PEFT and VLMs out of the box.

**Open Reward Standard environment adapter**

The new trl.experimental.openreward adapter plugs any environment speaking the [Open Reward Standard](https://openrewardstandard.io) protocol into any TRL trainer that takes an environment_factory. One string — a catalog name or a URL — wires the dataset, factory, and reward_func slots; tools are bound dynamically from JSON Schema, no per-env wrapper code:

from trl import GRPOTrainer
from trl.experimental.openreward import OpenRewardSpec

spec = OpenRewardSpec("Eigent/SETA", num_tasks=64)

trainer = GRPOTrainer(
    ...,
    train_dataset=spec.train_dataset,
    environment_factory=spec.environment_factory,
    reward_funcs=spec.reward_funcs,
)

v1.4 also brings MFU helpers for dense + MoE models, GRPO support for Liger 0.8.0 (delta clipping + VESPO + KL bias correction), Tülu 3's length-normalized DPO loss, four more training chat templates (Cohere, Cohere2, Gemma 3, Qwen3-2507), and a 5+ GB CUDA memory leak fix in activation offloading.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.4.0

reacted to unmodeled-tyler's post with 🚀 2 days ago

Post

2919

Just started a fun project!

unmodeled-tyler/DoW-UFO-UAP-1

I'm getting the recently released DoW UFO/UAP documents (https://war.gov/ufo) cleaned and converted into a dataset here on Hugging Face!

There 161 different files in the gov release (pdfs, images, videos, audio, etc) and my current plan is to do it all in 1 dataset with 4 different shards - that way you can just call whichever tables you want/need when you import the dataset.

This is an ongoing project (I'm doing it on the side + my regular projects) so it's a bit of a growing entity. I'll also continuously refine the data over time to make sure it's as clean as possible.

Check it out! Who knows what you'll find in there?

3 replies

reacted to etemiz's post with 👀 2 days ago

Post

expanding my small dataset using

- contemplation on text (for further CPT)
- q&a generation (for GRPO)

after doing GRPO, the successful ones go again with a SFT.

almost doubled my dataset. although the new ones are synthetic, they are from important sources and important matters. focusing on controversial claims more than anything else because these actually move models.

started fine tuning qwen 3.6. using vibe coding to play with LoRA adapters. i made lots of LoRAs for qwen 3.5 and now i can apply them to 3.6 except one tensor type. all of MLP matches to 3.6 and most of attentions match to 3.6. that will save me a lot of time. fine tune of 3.6 will probably appear faster, with a better alignment since the dataset is expanded.

started a truth db project where i will compare all the claims in the world with each other and give them a score. claims will fight with each other, supporting or weakening each other. the result hopefully will be very useful for better fine tuning LLMs. it will also automate my curation processes..

reacted to DedeProGames's post with 🔥 2 days ago

Post

4899

🚀 Introducing the GRM-2.6 Family

The GRM-2.6 family is a new generation of reasoning-focused models from Orion LLM Labs, built for difficult tasks, coding, STEM, terminal agents, and advanced local AI workflows.

GRM-2.6-Plus is the main high-capability model in the family: a 27B-class reasoning model based on Qwen3.6, designed for strong structured reasoning, coding, agentic use, and practical local deployment.

GRM-2.6-Opus builds on GRM-2.6-Plus as a merge with an Opus-style reasoning distilled model, improving structured reasoning behavior, terminal-agent workflows, coding ability, and complex problem solving.

Both models are designed for users who want powerful reasoning models that remain practical for research, local inference, coding, and agent experiments.

Models:
GRM-2.6-Plus: OrionLLM/GRM-2.6-Plus
GRM-2.6-Opus: OrionLLM/GRM-2.6-Opus

Organization:

OrionLLM

1 reply

reacted to sergiopaniego's post with 🚀 2 days ago

Post

1575

OpenEnv is growing fast in tutorials. If you're looking to get started with RL environments, check them out

> evaluate your agents using OpenEnv
> learn how rewards work via rubrics
> connect agents via MCP
> many moreeeee!

anything you think it's missing?

https://meta-pytorch.org/OpenEnv/tutorials/index.html

reacted to mipo57's post with 👍 2 days ago

Post

1432

How do you train self-paying rl agents with jax? New colab that will set you up with Jaxpot is here: https://colab.research.google.com/drive/1-rm_Bh8CNaM861We97ZoicfgKxz0xOSi?usp=sharing

1 reply

reacted to ManniX-ITA's post with 👍 2 days ago

Post

1377

After the Feb/Mar '26 collapse of Claude Code I started building my own framework. The token crunch is still only mitigated, but the reasoning and quality are back — better than before. For research and LLM
training recipes, though, diverse knowledge and a second point of view are crucial. Pairing my Claude Max sub with an Ollama Pro sub has already saved me from days of botched trainings — multiple frontier
models helping Claude is next level. Acting as the middleman myself was interesting but inefficient, so I shipped skills that let Claude talk to Ollama models directly.

🚀 claude-hooks v1.1.0 ships two LLM-to-LLM skills.

💬 /get-advice — single-shot second opinion. Claude runs a multi-turn conversation with a configured Ollama advisor; the advisor grounds in your project through read_file / grep / glob / list_files /
recall_memory tools. Effort tiers cap fresh-session retries.

🤝 /consultants — multi-agent council for cross-cutting questions:
🧩 planner → researcher → critic → synthesizer

Each role runs its own Ollama model. 💾 Sessions persist to disk (summary.md + transcript.md + SQLite per-role message threads); closed sessions reopen and produce follow-ups 🔁 indistinguishable from warm
ones.

🎯 x-tier effort multiplies diversity:
• xmedium / xhigh — researcher fans across N models in parallel
• xmax — + multi-critic + meta-critic combine; critics anonymized as "Critic 1/2/3" to avoid model-bias

🛡️ Cloud-flap recovery, three layers: 15-attempt / ~15min retry budget; synthesizer failure-fallback model chain; degraded-answer composer surfaces researcher + critic work even when synthesis fails.

📊 7 cloud models benchmarked & Claude-graded on locked queries:
• PROD-READY (P:A R:A C:A S:A): kimi-k2.6, gemma4:31b, glm-5.1
• Role specialists: minimax-m2.7 (critic), qwen3.5 (planner)

🐧 Linux/macOS/Windows. No per-project setup.

🔗 github.com/mann1x/claude-hooks

reacted to AtlasCloud-AI's post with 👀 2 days ago

Post

We surveyed six long-video generation approaches and shipped one. Goal: ≥15s coherent video on a single GPU, under a minute wallclock. Wan2.2 is solid at 3–5s; 10s+ is where it gets interesting.

![Cat Adventure clip 1+3, 15s SVI on TurboWan, 33s single GPU]

Survey: TTT, LoL, Self Forcing, Self Forcing++, Infinite Talk, Helios. Each had a wall — training cost, static-only demos, VRAM saturation at 10s, no released weights, narrow A2V lane, full 14B retrain.

Three buckets fell out: extend attention (Type A, hits VRAM), compress history (Type B, costs retrain), stateful rolling (Type C, LoRA-only). We shipped Type C — SVI (Stable Video Infinity) on TurboWan.

Each 5s clip is conditioned on a global identity anchor + motion bridge from prior clip. The trick: train the LoRA on its own errors so it learns noisy historical context. Production: 15s output in 33s on a single GPU, 64% pass rate on a 14-case test set.

Full breakdown with attention diagrams, VRAM math, per-route table:
https://www.reddit.com/r/AtlasCloudAI/comments/1t64dy9/

reacted to witcheer's post with 🚀 2 days ago

Post

updated my MoE offload bench dataset + collection.

>>> previous finding: Qwen3.6-35B-A3B via full expert offload on RTX 4060 Ti 8GB + 32GB RAM → 7.4 tok/sec. RAM-ceilinged, disk-bound.

>>> new finding: built llama.cpp from source inside WSL2, swept -ncmoe values for partial offload.

ncmoe 32, 16K ctx → 29.7 tok/sec
ncmoe 30, 16K ctx → 32.0 tok/sec
ncmoe 30, 32K ctx → 35.4 tok/sec
ncmoe 28, 16K ctx → 16.3 tok/sec (VRAM cliff)
ncmoe 30, 65K ctx → 17.4 tok/sec (VRAM cliff)

4.8x faster than full offload. 8GB VRAM cliff is sharp - crossing ~7 GB halves throughput instantly.

the hybrid SSM+attention architecture means 32K context is nearly free (KV cache only scales for 10/40 layers).

dataset: witcheer/windows-rtx-4060ti-8gb-moe-offload-bench-2026-05

collection: https://hf.co/collections/witcheer/8gb-vram-local-llms-practitioner-tested

1 reply

reacted to bartowski's post with 👍🤗 2 days ago

Post

4429

You may have noticed that my upload of MiMo-V2.5 upload didn't have the author in the model name:

bartowski/MiMo-V2.5-GGUF

Going forward, I plan to upload models from major 1st party developers without the author name attached for cleanliness, I feel it results in a nicer and more expected user experience

I will continue to uploaded fine tunes with that author + "_" appended for clarity, I personally feel it's nice to know at a glance who's tune it is, but it's also for the reason I first started doing it, to avoid it being confused for a new version of the official release

I hope this change makes sense, it seemed most reasonable to me and a poll I did (forever ago, I move slow sometimes) made it seem likely others would find it reasonable as well (feel free to let me know if you disagree, may not change my mind but I do value knowing what others think)

Thanks for downloading :)

1 reply

reacted to Jaward's post with 🚀 2 days ago

Post

Anthropic’s new read introduces a new autoencoder (NLA) that now enables an LLM to reason in natural language (words) instead of activations (numbers). They trained Claude (with NLA) to translate its activations into human-readable text. NLA has two parameterized models: an activation verbalizer that converts activations to text, and an activation reconstructor that tries to recreate the activations back to text. While this is cool, it took GRPO to get here lol, proving how cutting-edge we can get when research is opensourced. Very useful for work on interpretability and alignment btw

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John6666's activity