Our preprint is out! We attempt to model human teaching behaviors into agents yielding a unified framework that enables adaptive personalized learning experiences: LectūraAgents addresses the prevailing limitations in current AI learning systems with three essential capabilities: (1) a hierarchical multi-agent architecture modeled on academic standards. we observe that agents collaborating across hierarchies yield better personalized learning outcomes. (2) an adaptive embodied teaching mechanism, in which the instructor agent executes visible and pedagogically motivated teaching actions (e.g. handwrite, highlight, circle etc) on contents in a teaching environment while speaking. (3) to achieve this we propose a novel teaching action-speech alignment algorithm (TASA) that dynamically aligns speech with visual teaching actions: specifically, TASA temporally chops up speech segments into word-level tokens, performs salience heuristics analysis on learning contents (texts, images etc) then identifies relevant regions to apply pedagogical teaching actions that guide attention and augment understanding.
We conducted several experiments to assess these capabilities: starting with pedagogical evaluation of the various components under frontier models, comparative analysis with existing frameworks and an efficacy study with real students.
Results show consistent gains in standard instructional metrics (curated by expert educators) spanning lecture content quality, embodied teaching quality, assessment, and personalization over baseline systems, positioning LectūraAgents as a pedagogically grounded framework for personalized learning at scale.
Our preprint is out! We attempt to model human teaching behaviors into agents yielding a unified framework that enables adaptive personalized learning experiences: LectūraAgents addresses the prevailing limitations in current AI learning systems with three essential capabilities: (1) a hierarchical multi-agent architecture modeled on academic standards. we observe that agents collaborating across hierarchies yield better personalized learning outcomes. (2) an adaptive embodied teaching mechanism, in which the instructor agent executes visible and pedagogically motivated teaching actions (e.g. handwrite, highlight, circle etc) on contents in a teaching environment while speaking. (3) to achieve this we propose a novel teaching action-speech alignment algorithm (TASA) that dynamically aligns speech with visual teaching actions: specifically, TASA temporally chops up speech segments into word-level tokens, performs salience heuristics analysis on learning contents (texts, images etc) then identifies relevant regions to apply pedagogical teaching actions that guide attention and augment understanding.
We conducted several experiments to assess these capabilities: starting with pedagogical evaluation of the various components under frontier models, comparative analysis with existing frameworks and an efficacy study with real students.
Results show consistent gains in standard instructional metrics (curated by expert educators) spanning lecture content quality, embodied teaching quality, assessment, and personalization over baseline systems, positioning LectūraAgents as a pedagogically grounded framework for personalized learning at scale.
Been wrapping my head around this theoretical banger. Bro just casually answered a fundamental yet very important question: why does weight decay work or in other words why should penalizing weight magnitude help a model perform better on unseen data? He argues that the minimum neural weight norm required to represent a target dataset is closely related to the Kolmogorov complexity of that target dataset: I.e. smaller weight norms correspond to simpler solutions (lower Kolmogorov complexity), and simpler solutions tend to generalize better. This explains why bigger models generalize well on noisy data because there’s enough room to account for optimal KC. So the question now is not hinged on parameter size but on how much information is encoded in those parameters. Thus If norm is related to complexity, researchers can design regularizers that more directly control complexity, cool! It holds true for fixed precision only tho, and he explained clearly why
Anthropic’s new read introduces a new autoencoder (NLA) that now enables an LLM to reason in natural language (words) instead of activations (numbers). They trained Claude (with NLA) to translate its activations into human-readable text. NLA has two parameterized models: an activation verbalizer that converts activations to text, and an activation reconstructor that tries to recreate the activations back to text. While this is cool, it took GRPO to get here lol, proving how cutting-edge we can get when research is opensourced. Very useful for work on interpretability and alignment btw
Supercool! You can now easily train a JEPA world model (15M params) from end-to-end on a single GPU, with planning done under 1s 🤯. - trained with classic prediction loss + SIGReg. - plans purely in raw pixels. - beats SOTA DINO-WM and PLDM. - single hyper-parameter with no heuristics. - fully open sourced!!
Kimi team dropped a major improvement to the transformer architecture and it quietly targets one of the most taken-for-granted components: residual connections.
For nearly a decade, transformers (since introduction) have relied on residuals that simply add all previous layer outputs equally. It works but it’s also kind of… dumb.
Kimi’s new paper, “Attention Residuals (AttnRes)”, replaces that with something much more intelligent: → instead of blindly summing past layers, → it learns which layers matter, → and dynamically weight contributions across depth.
So attention is no longer just over tokens…it’s now also over layers (depth). This means effectively turning depth into a dynamic memory system, phenomenal!