Text Generation
MLX
Safetensors
qwen3
meeting
insights
action-items
summarization
on-device
apple-silicon
lora
fine-tuned
conversational
4-bit precision
Instructions to use SearchingBinary/nolitai-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use SearchingBinary/nolitai-2b with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("SearchingBinary/nolitai-2b") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use SearchingBinary/nolitai-2b with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "SearchingBinary/nolitai-2b"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "SearchingBinary/nolitai-2b" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use SearchingBinary/nolitai-2b with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "SearchingBinary/nolitai-2b"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default SearchingBinary/nolitai-2b
Run Hermes
hermes
- MLX LM
How to use SearchingBinary/nolitai-2b with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "SearchingBinary/nolitai-2b"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "SearchingBinary/nolitai-2b" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SearchingBinary/nolitai-2b", "messages": [ {"role": "user", "content": "Hello"} ] }'
nolitai-2b — Meeting Intelligence Model (MLX 4-bit)
A fine-tuned Qwen3-1.7B model specialized for extracting structured meeting intelligence from transcripts. Optimized for Apple Silicon inference via MLX.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3-1.7B |
| Parameters | 1.7B (4-bit quantized, ~948 MB) |
| Training | QLoRA (rank=16, alpha=640, scale=40x) on q/k/v/o attention projections |
| Framework | MLX (Apple Silicon optimized) |
| Languages | English, Portuguese, Spanish, French, German |
Capabilities
Given a meeting transcript, nolitai-2b extracts:
- Action Items — Tasks with owners, deadlines, and priority
- Decisions — Key decisions made during the meeting
- Key Points — Important discussion topics
- Questions — Open questions raised but not resolved
- Summaries — Concise, specific meeting summaries (no filler phrases)
Example
Input:
Extract insights from this meeting transcript:
[10:00] Sarah: We need to finalize the Q4 budget by Friday.
[10:02] Mike: I'll prepare the marketing numbers today.
[10:05] Sarah: Great. Let's also decide on the conference — I vote for Web Summit.
[10:07] Mike: Agreed. Web Summit it is.
Output:
{
"actionItems": [
{"task": "Prepare marketing numbers for Q4 budget", "owner": "Mike", "deadline": "today", "priority": "high"}
],
"decisions": [
{"content": "Attending Web Summit conference", "madeBy": "Sarah, Mike"}
],
"keyPoints": [
{"content": "Q4 budget finalization deadline is Friday"}
],
"questions": []
}
Performance
Evaluated on a held-out validation set (97.4% overall):
| Task | Score |
|---|---|
| Insight Extraction (action items, decisions, questions) | 100% |
| Meeting Summaries | 94.1% |
| Overall | 97.4% |
Usage with MLX
from mlx_lm import load, generate
model, tokenizer = load("SearchingBinary/nolitai-2b")
prompt = """Extract insights from this meeting transcript:
[10:00] Alice: The new API is ready for testing.
[10:02] Bob: I'll write the integration tests by Wednesday.
[10:05] Alice: Should we use the staging or production environment?
"""
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=text, max_tokens=500)
print(response)
Usage with Swift (MLX Swift)
import MLXLLM
let model = try await LLMModelFactory.shared.load(
hub: .init(id: "SearchingBinary/nolitai-2b")
)
Training Details
- Method: QLoRA (4-bit NF4 quantization + LoRA adapters)
- LoRA Config: rank=16, alpha=640 (scale=40x), dropout=0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj
- Dataset: ~10K examples across 5 languages (en, pt, es, fr, de)
- Epochs: 2
- Learning Rate: 1e-5 (cosine scheduler, 5% warmup)
- Hardware: NVIDIA A40 48GB (RunPod)
- Training Time: ~85 minutes
- Final Eval Loss: 0.0178 (98.2% token accuracy)
Intended Use
This model is designed for:
- On-device meeting intelligence extraction
- Real-time meeting summarization on Apple Silicon Macs
- Multilingual meeting support (5 languages)
Limitations
- Optimized for meeting transcripts — may not generalize well to other text formats
- Best results with structured transcript input (timestamps + speaker labels)
- 4-bit quantization may slightly reduce quality vs full precision
- Requires Apple Silicon (M1/M2/M3/M4) for MLX inference
Part of nolit.ai
This model powers nolit.ai — a native macOS meeting copilot that processes everything locally on your Mac. Not Lost in Translation — lit up by AI.
License
Apache 2.0
- Downloads last month
- 8
Model size
0.3B params
Tensor type
F16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit