Just published a hands-on guide on building a Kubernetes cluster from scratch on AWS EC2 using kubeadm, no managed services, no shortcuts.
If you want to truly understand how the control plane and workers communicate, how pod networking works with Flannel, and how to lock down access with security groups ,then this is the kind of exercise that makes it click.
The guide covers a full 3-node setup (1 control plane + 2 workers) on Amazon Linux 2023, from instance provisioning all the way to deploying your first workload.
The latest release of the Haystack OSS LLM framework adds a long-requested feature: image support!
📓 Notebooks below
This isn't just about passing images to an LLM. We built several features to enable practical multimodal use cases.
What's new? 🧠 Support for multiple LLM providers: OpenAI, Amazon Bedrock, Google Gemini, Mistral, NVIDIA, OpenRouter, Ollama and more (support for Hugging Face API coming 🔜) 🎛️ Prompt template language to handle structured inputs, including images 📄 PDF and image converters 🔍 Image embedders using CLIP-like models 🧾 LLM-based extractor to pull text from images 🧩 Components to build multimodal RAG pipelines and Agents
I had the chance of leading this effort with @sjrhuschlee (great collab).
Build something cool with Nano Banana aka Gemini 2.5 Flash Image AIO [All-in-One]. Draw and transform on canvas, edit images, and generate images—all in one place!🍌
✦︎ Constructed with the Gemini API (GCP). Try it here: prithivMLmods/Nano-Banana-AIO (Added the Space recently! - Sep 18 '25)
Low-Rank Adaptation (LoRA) is the go-to method for efficient model fine-tuning that adds small low-rank matrices instead of retraining full models. The field isn’t standing still – new LoRA variants push the limits of efficiency, generalization, and personalization. So we’re sharing 10 of the latest LoRA approaches you should know about:
4. aLoRA (Activated LoRA) → Activated LoRA: Fine-tuned LLMs for Intrinsics (2504.12397) Only applies LoRA after invocation, letting the model reuse the base model’s KV cache instead of recomputing the full turn’s KV cache. Efficient in multi-turn conversations