@nicolay-r on Hugging Face: " 📢 I know this space is mosly for sharing works, but in this case I am open…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Feb 22

Post

352

📢 I know this space is mosly for sharing works, but in this case I am open to work 💼🇬🇧

I know there are outstanding research labs and teams following this space 🙌

I would genuinely love to learn on ways to contribute, learn from strong lab environments, and help shape ideas into working systems.

What I bring (https://nicolayr.com):
• Applied NLP & deployment LLM-powered worflows with reasoning for IR (LangChain, LiteLLM)
• Architectures Engeneering: Transformers and / or backends (PyTorch, Tensorflow, flaxformer)
• End-to-end engineering: Frontend (JS, ReactJS) → Backend REST APIs (FastAPI) / Keycloak → Docker / NGINX → Cloud / MLOps
• Domain-specific experience in Healthcare (deploy & handle: DICOM-SR/SEG, NIFTI, databases: ORTHANC, Frontend: OHIF / cornerstone)
• Pasion about open-source NLP tooling for handling data (https://github.com/nicolay-r)

Would be happy to connect or hear any relevant suggestions on seeking for team 🧩

kostakoff

Feb 22

Good luck!

cowboymeta

Feb 24

Mixture of Inference (MoI) — Summary

The Problem

Current AI systems handle multiple skills (writing, coding, research, etc.) by either injecting skill instructions as text into the prompt, or routing tasks to separate specialized AI agents. Both approaches are slow and expensive because skills are processed one at a time, never influencing each other during generation.

The Proposed Solution

MoI suggests encoding skills as lightweight LoRA adapters (small weight modifications) that run simultaneously inside a single forward pass through the model, rather than sequentially. Think of it like multiple specialists working together in real-time versus passing a document around one at a time.

How It Works

Six skill adapters are organized into three groups (Factual, Form, Technical) and merged in two stages as information flows through the model — first merging within groups (at 1/3 depth), then merging across groups (at 2/3 depth) — producing one unified output.

Key Claimed Benefits

Eliminates the "token tax" of putting skill instructions in every prompt
True parallel processing instead of sequential
Skills influence each other during generation, not just at the end
Cost scales sublinearly as you add more skills

Important Caveats

I would like to be transparent that this is an unproven proposal. The biggest open question is whether independently trained adapters can be merged without degrading output quality. I am actively seeking collaborators, compute resources, and funding to test it.

Bottom Line

It's a theoretically grounded, intellectually honest research proposal that could meaningfully reduce the cost and latency of multi-skill AI systems — if the core assumptions hold up empirically. Would this be a project you would consider to collaborate on? Email Sam at Samsmbn@aol.com.

In this post