Experimental global target bits‑per‑weight quantization of ServiceNow-AI/Apriel-1.6-15b-Thinker and zai-org/GLM-4.6V-Flash
Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.
Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.
Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards
Update: TRELLIS.2 (Text to 3D, Image to 3D) Gradio with Rerun Embedded demo with improved visualization of the 3D model previewer is now available on Hugging Face. Generate assets and view them in the 3D viewer, powered and streamlined with Microsoft’s TRELLIS.2 and Tongyi-MAI’s Z-Image-Turbo models.
What we learned about memory in 2025: 8 comprehensive resources
If models forget everything, how can they be reliable? AI systems need to remember past interactions, update knowledge, stay consistent over time, and work beyond a single prompt. That's why many start to talk more about memory in AI. Here’s a useful set of studies and videos on where AI memory stands today:
1. Memory in the Age of AI Agents (2512.13564) A great survey that organizes agent memory research. It gives concrete taxonomies across memory form, function, and dynamics, summarizes benchmarks, frameworks, and emerging directions for building systematic agent memory systems
2.When Will We Give AI True Memory? A conversation with Edo Liberty, CEO and founder @ Pinecone -> https://youtu.be/ITbwVFZYepc?si=_lAbRHciC740dNz0 Edo Liberty discusses what real memory in LLMs requires beyond RAG - from scalable vector storage to reliable knowledge systems - and why storage, not compute, is becoming the key bottleneck for building dependable AI agents.
3. Why AI Intelligence is Nothing Without Visual Memory | Shawn Shen on the Future of Embodied AI -> https://youtu.be/3ccDi4ZczFg?si=SbJg487kwrkVXgUu Shawn Shen argues AI needs a separate, hippocampus-like memory to move beyond chatbots, enabling long-term visual memory, object permanence, and on-device intelligence for robots, wearables, and the physical world
5. Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions -> https://arxiv.org/abs/2505.00675v2 Proposes a concrete taxonomy, core operations, and research directions to systematically organize and advance agent memory systems.
Summary: Single-agent “alignment” is the easy case. Real systems are *multi-owner* by default: cities, platforms, institutions, regulators, and users all carry distinct goal vectors—and the same action helps some while harming others.
This article sketches a *non-normative* extension: multi-agent *goal trade proposals* (structured, auditable “plea bargains” in goal-space) plus *semantic pricing* (treating information itself as a negotiable resource), with *PLB-M* as a nearline layer that learns stable cooperation patterns over time.
> Coordination isn’t vibes. > It’s *contracts over goal deltas*, under governance.
---
Why It Matters: • Turns “stakeholder conflict” into *explicit, bounded deals* instead of hidden politics • Provides an accounting surface for *fairness, compensation, and reciprocity* • Makes “information sharing” measurable: *how much does a semantic unit improve goals?* • Keeps the whole negotiation layer *auditable and rollbackable*, avoiding “dark markets”
---
What’s Inside: • Why multi-agent worlds force negotiation (cities, clouds, cross-org networks) • *GCS as negotiable deltas*: per-agent impact vectors for joint actions • A concrete schema: *Goal Trade Proposal (GTP)* as a first-class object • “Semantic value” and *pricing meaning* (not money—accounting under policy) • *PLB-M*: mining deal patterns + semantic flows → proposing safer templates • Threat model: manipulation/collusion/DoS + governance guardrails • Practical notes on clearing, complexity, stability (damping, circuit breakers)
Introducing the Qwen-Image-Edit-2511-LoRAs-Fast demo, featuring image property comparison and contrast, built on top of Gradio and the combined Rerun SDK. It supports single and multi-image edits with existing LoRAs that are lazily loaded. (Note: This is still an experimental Space for Qwen-Image-Edit-2511.)
Summary: Most stacks “learn” by fine-tuning weights and redeploying — powerful, but opaque. SI-Core already produces *structured evidence* (jump logs, ethics traces, effect ledgers, goal vectors, rollback traces), so learning can be *structural* instead:
*Upgrade policies, compensators, SIL code, and goal structures — using runtime evidence.*
> Learning isn’t a model tweak. > *It’s upgrading the structures that shape behavior.*
---
Why It Matters: • Makes improvement *localized and explainable* (what changed, where, and why) • Keeps “self-improvement” *governable* (versioned deltas + review + CI/CD) • Turns incidents/metric drift into *actionable patches*, not postmortem PDFs • Scales to real ops: ethics policies, rollback plans, semantic compression, goal estimators
---
What’s Inside: • What “learning” means in SI-Core (and what changes vs. classic ML) • The *Pattern-Learning-Bridge*: where it sits between runtime evidence and governed code • Safety properties: PLB proposes *versioned deltas*, never edits production directly • Validation pipeline: sandbox/simulation → conformance checks → golden diffs → rollout
---
📖 Structured Intelligence Engineering Series A non-normative, implementable design for “learning from failures” without sacrificing auditability.
Happy Holidays all! geofractal architectural expansions; timm is now a core component for experimenting. As it stands, the system is growing rapidly in one direction, and timm brings a whole lot to the table in another rapid-prototyping direction. Therefore, timm is now a core component for ease-of-use.
BaseUtil is a new core component; aka src.geofractal.router.base_util inherits BaseComponent's behavior, so it should allow device movement for util operations which will direct utilization for device-to-device behavior for the upcoming accelerate integration.
I'm trying to mitigate the base component structure as much as possible, but the need to chain components in specific orders presented a unique problem. By compartmentalizing utils into structures that can be delegated and moved, these structures can be repurposed, expanded autonomously, reduced autonomously, and more.
ChainComponent inherits a subsystem specifically designed to organize multi-system multi-device formulas designated for inception and synchronization purposes. This is meant to allow distributed tasking to multiple-devices in chained utilization. This also enables ease-of-integration into nn.ModuleList with a few other caveats that will be ironed out meant to target wide-distributed models.
FusionComponent is specifically dedicated to the new fusion processing system meant for experimental expansion. This includes sub-module schedule control, Component and Tower functional control, device-movement, and will be packaged under the term "gfu.UtilType" as a standard naming convention. "gfc.ComponentTypeName" "gfr.RouterTypeName" "gfu.UtilityTypeName" "gft.TowerTypeName" All of which are basically just import thing as. "gf.AnythingTopLevelPackaged" which will include the core.
Better debugging for compilation I'm in prototyping phases of a better debugging for compiled wide models and will prepare a baseline component readout structure by the end of the day today or tomorrow.
Just sharing a result of a homelab infrastructure experiment:
I've managed to setup a distributed inference infra at home using a DGX Spark (128GB unified gddr6) and a linux workstation with an RTX 6000 Pro (96GB gddr7) connected via 100Gbps RoCEv2. The model I've used (https://lnkd.in/gx6J7YuB) is about 140GB so could not fit either of the GPU. Full setup and tutorial soon on devquasar.com
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!
Key findings from our research on optimal architectures for small language models:
→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning
We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.
What if an AI agent could be tricked into stealing your data, just by reading a tool's description? A new paper reports it's possible.
The "Attractive Metadata Attack" paper details this stealthy new threat. To measure the real-world impact of their attack, the researchers needed a source of sensitive data for the agent to leak. We're proud that the AI4Privacy corpus was used to create the synthetic user profiles containing standardized PII for their experiments.
This is a perfect win-win. Our open-source data helped researchers Kanghua Mo, 龙昱丞, Zhihao Li from Guangzhou University and The Hong Kong Polytechnic University to not just demonstrate a new attack, but also quantify its potential for harm. This data-driven evidence is what pushes the community to build better, execution-level defenses for AI agents.
🔗 Check out their paper to see how easily an agent's trust in tool metadata could be exploited: https://arxiv.org/pdf/2508.02110
- Supports all of AMD, Nvidia and Apple Silicon 🧑🧑🧒🧒 - Beautiful TUI with themes (who said monitoring should be boring?) 💅 - Shareable Rig Cards! Boast to friends, family and foes alike 🫨
Get it now! uvx picomon or pip install picomon then picomon
Summary: Most “AI governance” advice still assumes you can bolt audits on after the fact. This note takes the opposite stance: **make auditability a runtime property**.
Regulators usually want two things:
* a **control plane** (“where do we push STOP / SAFE-MODE / MORE AUDIT?”) * **evidence** (“what exactly happened, and can you prove it?”)
This article explains how **SI-Core invariants** turn those into *first-class* system surfaces—so an incident review becomes routine, not heroic.
---
Why It Matters: • Moves “transparency” from PDFs to **cryptographically chained operational traces** • Makes **policy enforcement inspectable** (which rule/version was applied, to which action) • Treats rollback as a **governance primitive** (how far back can you put the world?) • Shows how to balance **auditability + erasure** via GDPR-style ethical redaction patterns
---
What’s Inside: **Audit invariants (regulator language):** observation gating, identity/origin, ethics overlay decisions, risk gating, append-only memory, rollback maturity levels **Evidence model:** structured “what it knew / why it chose / what it did” histories (not token soup) **Metrics auditors can actually ask for:** determinism/stability, ethics enforcement availability, audit completeness, rollback latency/integrity, contradiction rates **Compliance bridges (illustrative):** how the same runtime hooks map across GDPR, sector rules, and ISO-style regimes
---
📖 Structured Intelligence Engineering Series Not a new law. A runtime architecture for answering law-like questions with evidence.
NVIDIA’s Groq deal ... I think, inference efficiency is becoming the main driver of profitability, and NVIDIA’s Groq deal is evidence the market is moving from “who can train biggest” to “who can serve cheapest and fastest at scale.” That points to a maturing phase of AI, not necessarily the end of a bubble, but definitely a correction in what “wins” long-term. What do you think?
We release open-weight early experimental Codeforce metatune-gpt20b, fine tuned version of OpenAI's gpt-oss-20b model, this is one of the first public release recursive self improving AI.
I recently tested LFM2-2.6B-Exp, an experimental language model developed by Liquid AI, to see how well it handles differential equations in a practical, step-by-step setting.
LFM2-2.6B-Exp is notable for how it was trained: it is an RL-first experimental checkpoint, built without supervised fine-tuning warm-up or distillation. Reinforcement learning was applied sequentially, starting with instruction following and later expanding to knowledge and math. This makes it a particularly interesting model to evaluate beyond benchmark scores.
In hands-on testing, the model performed surprisingly well for its size on standard undergraduate-level differential equations—first-order ODEs, second-order linear equations with constant coefficients, and nonhomogeneous problems using undetermined coefficients. It followed instructions closely and produced clear, structured solution steps.
However, the model showed limitations on more subtle methods, such as Laplace transforms with time shifting and variation of parameters, where maintaining mathematical invariants matters more than following a familiar template. In these cases, answers often looked correct structurally but failed under careful verification. This behavior is consistent with an RL-first training approach: strong at producing expected answer forms, but not always robust on deeper theoretical details.
Liquid AI, the company behind this model, is strongly focused on edge AI, developing efficient models designed for deployment outside large data-center environments. Their model lineup spans from very small models (millions of paramet
Qwen Image Edit 2511 model just published and it is literally competing against Nano Banana Pro at image editing tasks. With native whopping 2560x2560 pixels image output capability and with only 12 steps it is next level. With our installers and specially made Quant FP8 Scaled model, you can run this amazing beast even as low as 6 GB GPUs. In this tutorial, I have compared Qwen Image Edit 2511 with previous successor model Qwen Image 2509 with 12 different unique and hard prompts and cases. Everything is step by step explained and provided.