Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
nicolay-rΒ 
posted an update 24 days ago
Post
298
πŸ“’ I know this space is mosly for sharing works, but in this case I am open to work πŸ’ΌπŸ‡¬πŸ‡§

I know there are outstanding research labs and teams following this space πŸ™Œ

I would genuinely love to learn on ways to contribute, learn from strong lab environments, and help shape ideas into working systems.

What I bring (https://nicolayr.com):
β€’ Applied NLP & deployment LLM-powered worflows with reasoning for IR (LangChain, LiteLLM)
β€’ Architectures Engeneering: Transformers and / or backends (PyTorch, Tensorflow, flaxformer)
β€’ End-to-end engineering: Frontend (JS, ReactJS) β†’ Backend REST APIs (FastAPI) / Keycloak β†’ Docker / NGINX β†’ Cloud / MLOps
β€’ Domain-specific experience in Healthcare (deploy & handle: DICOM-SR/SEG, NIFTI, databases: ORTHANC, Frontend: OHIF / cornerstone)
β€’ Pasion about open-source NLP tooling for handling data (https://github.com/nicolay-r)

Would be happy to connect or hear any relevant suggestions on seeking for team 🧩

Good luck!

Mixture of Inference (MoI) β€” Summary

The Problem

Current AI systems handle multiple skills (writing, coding, research, etc.) by either injecting skill instructions as text into the prompt, or routing tasks to separate specialized AI agents. Both approaches are slow and expensive because skills are processed one at a time, never influencing each other during generation.

The Proposed Solution

MoI suggests encoding skills as lightweight LoRA adapters (small weight modifications) that run simultaneously inside a single forward pass through the model, rather than sequentially. Think of it like multiple specialists working together in real-time versus passing a document around one at a time.

How It Works

Six skill adapters are organized into three groups (Factual, Form, Technical) and merged in two stages as information flows through the model β€” first merging within groups (at 1/3 depth), then merging across groups (at 2/3 depth) β€” producing one unified output.

Key Claimed Benefits

  • Eliminates the "token tax" of putting skill instructions in every prompt
  • True parallel processing instead of sequential
  • Skills influence each other during generation, not just at the end
  • Cost scales sublinearly as you add more skills

Important Caveats

I would like to be transparent that this is an unproven proposal. The biggest open question is whether independently trained adapters can be merged without degrading output quality. I am actively seeking collaborators, compute resources, and funding to test it.

Bottom Line

It's a theoretically grounded, intellectually honest research proposal that could meaningfully reduce the cost and latency of multi-skill AI systems β€” if the core assumptions hold up empirically. Would this be a project you would consider to collaborate on? Email Sam at Samsmbn@aol.com.