nolitai-2b — Meeting Intelligence Model (MLX 4-bit)

A fine-tuned Qwen3-1.7B model specialized for extracting structured meeting intelligence from transcripts. Optimized for Apple Silicon inference via MLX.

Model Details

Property	Value
Base Model	Qwen/Qwen3-1.7B
Parameters	1.7B (4-bit quantized, ~948 MB)
Training	QLoRA (rank=16, alpha=640, scale=40x) on q/k/v/o attention projections
Framework	MLX (Apple Silicon optimized)
Languages	English, Portuguese, Spanish, French, German

Capabilities

Given a meeting transcript, nolitai-2b extracts:

Action Items — Tasks with owners, deadlines, and priority
Decisions — Key decisions made during the meeting
Key Points — Important discussion topics
Questions — Open questions raised but not resolved
Summaries — Concise, specific meeting summaries (no filler phrases)

Example

Input:

Extract insights from this meeting transcript:

[10:00] Sarah: We need to finalize the Q4 budget by Friday.
[10:02] Mike: I'll prepare the marketing numbers today.
[10:05] Sarah: Great. Let's also decide on the conference — I vote for Web Summit.
[10:07] Mike: Agreed. Web Summit it is.

Output:

{
  "actionItems": [
    {"task": "Prepare marketing numbers for Q4 budget", "owner": "Mike", "deadline": "today", "priority": "high"}
  ],
  "decisions": [
    {"content": "Attending Web Summit conference", "madeBy": "Sarah, Mike"}
  ],
  "keyPoints": [
    {"content": "Q4 budget finalization deadline is Friday"}
  ],
  "questions": []
}

Performance

Evaluated on a held-out validation set (97.4% overall):

Task	Score
Insight Extraction (action items, decisions, questions)	100%
Meeting Summaries	94.1%
Overall	97.4%

Usage with MLX

from mlx_lm import load, generate

model, tokenizer = load("SearchingBinary/nolitai-2b")

prompt = """Extract insights from this meeting transcript:

[10:00] Alice: The new API is ready for testing.
[10:02] Bob: I'll write the integration tests by Wednesday.
[10:05] Alice: Should we use the staging or production environment?
"""

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=text, max_tokens=500)
print(response)

Usage with Swift (MLX Swift)

import MLXLLM

let model = try await LLMModelFactory.shared.load(
    hub: .init(id: "SearchingBinary/nolitai-2b")
)

Training Details

Method: QLoRA (4-bit NF4 quantization + LoRA adapters)
LoRA Config: rank=16, alpha=640 (scale=40x), dropout=0.05
Target Modules: q_proj, k_proj, v_proj, o_proj
Dataset: ~10K examples across 5 languages (en, pt, es, fr, de)
Epochs: 2
Learning Rate: 1e-5 (cosine scheduler, 5% warmup)
Hardware: NVIDIA A40 48GB (RunPod)
Training Time: ~85 minutes
Final Eval Loss: 0.0178 (98.2% token accuracy)

Intended Use

This model is designed for:

On-device meeting intelligence extraction
Real-time meeting summarization on Apple Silicon Macs
Multilingual meeting support (5 languages)

Limitations

Optimized for meeting transcripts — may not generalize well to other text formats
Best results with structured transcript input (timestamps + speaker labels)
4-bit quantization may slightly reduce quality vs full precision
Requires Apple Silicon (M1/M2/M3/M4) for MLX inference

Part of nolit.ai

This model powers nolit.ai — a native macOS meeting copilot that processes everything locally on your Mac. Not Lost in Translation — lit up by AI.

License

Apache 2.0

Downloads last month: 26

Safetensors

Model size

0.3B params

Tensor type

F16

U32

MLX

Hardware compatibility

4-bit

Model tree for SearchingBinary/nolitai-2b

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(490)

this model