nolitai-2b — Meeting Intelligence Model (MLX 4-bit)

A fine-tuned Qwen3-1.7B model specialized for extracting structured meeting intelligence from transcripts. Optimized for Apple Silicon inference via MLX.

Model Details

Property Value
Base Model Qwen/Qwen3-1.7B
Parameters 1.7B (4-bit quantized, ~948 MB)
Training QLoRA (rank=16, alpha=640, scale=40x) on q/k/v/o attention projections
Framework MLX (Apple Silicon optimized)
Languages English, Portuguese, Spanish, French, German

Capabilities

Given a meeting transcript, nolitai-2b extracts:

  • Action Items — Tasks with owners, deadlines, and priority
  • Decisions — Key decisions made during the meeting
  • Key Points — Important discussion topics
  • Questions — Open questions raised but not resolved
  • Summaries — Concise, specific meeting summaries (no filler phrases)

Example

Input:

Extract insights from this meeting transcript:

[10:00] Sarah: We need to finalize the Q4 budget by Friday.
[10:02] Mike: I'll prepare the marketing numbers today.
[10:05] Sarah: Great. Let's also decide on the conference — I vote for Web Summit.
[10:07] Mike: Agreed. Web Summit it is.

Output:

{
  "actionItems": [
    {"task": "Prepare marketing numbers for Q4 budget", "owner": "Mike", "deadline": "today", "priority": "high"}
  ],
  "decisions": [
    {"content": "Attending Web Summit conference", "madeBy": "Sarah, Mike"}
  ],
  "keyPoints": [
    {"content": "Q4 budget finalization deadline is Friday"}
  ],
  "questions": []
}

Performance

Evaluated on a held-out validation set (97.4% overall):

Task Score
Insight Extraction (action items, decisions, questions) 100%
Meeting Summaries 94.1%
Overall 97.4%

Usage with MLX

from mlx_lm import load, generate

model, tokenizer = load("SearchingBinary/nolitai-2b")

prompt = """Extract insights from this meeting transcript:

[10:00] Alice: The new API is ready for testing.
[10:02] Bob: I'll write the integration tests by Wednesday.
[10:05] Alice: Should we use the staging or production environment?
"""

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=text, max_tokens=500)
print(response)

Usage with Swift (MLX Swift)

import MLXLLM

let model = try await LLMModelFactory.shared.load(
    hub: .init(id: "SearchingBinary/nolitai-2b")
)

Training Details

  • Method: QLoRA (4-bit NF4 quantization + LoRA adapters)
  • LoRA Config: rank=16, alpha=640 (scale=40x), dropout=0.05
  • Target Modules: q_proj, k_proj, v_proj, o_proj
  • Dataset: ~10K examples across 5 languages (en, pt, es, fr, de)
  • Epochs: 2
  • Learning Rate: 1e-5 (cosine scheduler, 5% warmup)
  • Hardware: NVIDIA A40 48GB (RunPod)
  • Training Time: ~85 minutes
  • Final Eval Loss: 0.0178 (98.2% token accuracy)

Intended Use

This model is designed for:

  • On-device meeting intelligence extraction
  • Real-time meeting summarization on Apple Silicon Macs
  • Multilingual meeting support (5 languages)

Limitations

  • Optimized for meeting transcripts — may not generalize well to other text formats
  • Best results with structured transcript input (timestamps + speaker labels)
  • 4-bit quantization may slightly reduce quality vs full precision
  • Requires Apple Silicon (M1/M2/M3/M4) for MLX inference

Part of nolit.ai

This model powers nolit.ai — a native macOS meeting copilot that processes everything locally on your Mac. Not Lost in Translation — lit up by AI.

License

Apache 2.0

Downloads last month
26
Safetensors
Model size
0.3B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SearchingBinary/nolitai-2b

Finetuned
Qwen/Qwen3-1.7B
Adapter
(490)
this model