Amsi-fin-o1 MLX (8-bit Quantized)

Quantized Financial Vision-Language Model for Apple Silicon

Model Description

This is the 8-bit quantized MLX conversion of AITRADER/Amsi-fin-o1, a specialized financial vision-language model. The 8-bit quantization reduces memory usage by ~50% while maintaining excellent performance for financial analysis tasks.

Key Features

Reduced Memory Footprint: ~4GB vs ~8GB for bf16 version
Faster Inference: Optimized for speed on Apple Silicon
Financial Document Analysis: Extract and analyze financial statements and reports
Chart Understanding: Interpret financial charts and visualizations
Chain-of-Thought Reasoning: Advanced thinking for complex financial calculations
Apple Silicon Native: Optimized for M1/M2/M3/M4 chips

Quantization Details

Parameter	Value
Quantization Type	8-bit Integer
Group Size	64
Memory Reduction	~50%
Quality Retention	~98%+

Model Architecture

Component	Specification
Base Architecture	Qwen3-VL (4B parameters)
Text Model	36 layers, 2560 hidden size
Vision Encoder	24 layers, 1024 hidden size
Attention Heads	32 (8 KV heads)
Context Length	Up to 131,072 tokens
Precision	8-bit Quantized
Model Size	~4GB

Installation

# Install mlx-vlm
pip install -U mlx-vlm

Quick Start

Command Line

# Basic image analysis
python -m mlx_vlm.generate \
    --model AITRADER/Amsi-fin-o1-MLX-8bit \
    --max-tokens 512 \
    --temperature 0.7 \
    --prompt "Analyze this financial chart and explain the trends." \
    --image path/to/financial_chart.png

Python API

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load model
model_path = "AITRADER/Amsi-fin-o1-MLX-8bit"
model, processor = load(model_path)
config = load_config(model_path)

# Prepare prompt
prompt = apply_chat_template(
    processor,
    config,
    "Analyze the financial performance shown in this quarterly report.",
    num_images=1
)

# Generate response
output = generate(
    model,
    processor,
    prompt,
    image="path/to/report.png",
    max_tokens=512,
    temperature=0.7
)
print(output)

Streaming Generation

from mlx_vlm import load, stream_generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model_path = "AITRADER/Amsi-fin-o1-MLX-8bit"
model, processor = load(model_path)
config = load_config(model_path)

prompt = apply_chat_template(
    processor,
    config,
    "What are the key insights from this financial statement?",
    num_images=1
)

# Stream the response
for token in stream_generate(
    model,
    processor,
    prompt,
    image="financial_statement.png",
    max_tokens=512
):
    print(token, end="", flush=True)

Use Cases

1. Financial Chart Analysis

prompt = """Analyze this stock chart and provide:
1. The overall trend (bullish/bearish/neutral)
2. Key support and resistance levels
3. Volume analysis
4. Trading recommendations"""

2. Financial Statement Review

prompt = """Review this income statement and:
1. Calculate key financial ratios
2. Compare YoY performance
3. Identify areas of concern
4. Highlight positive indicators"""

3. Document OCR and Extraction

prompt = """Extract all numerical data from this financial document
and organize it in a structured format."""

4. Investment Analysis

prompt = """Based on this quarterly report:
1. Summarize the company's financial health
2. Calculate growth metrics
3. Provide an investment thesis"""

Performance Comparison

Metric	8-bit (This)	BF16
Memory Usage	~4GB	~8GB
Inference Speed	Faster	Baseline
Quality	Very Good	Highest
Recommended For	Limited RAM / Speed	Maximum Quality

Hardware Requirements

Apple Silicon	Performance
M1 (8GB)	Good
M1 Pro/Max (16GB+)	Very Good
M2/M2 Pro/Max	Excellent
M3/M3 Pro/Max	Excellent
M4/M4 Pro/Max	Best

Minimum: 8GB unified memory Recommended: 16GB+ for larger batch sizes

Model Variants

Variant	Precision	Size	Speed	Quality
bf16	BFloat16	~8GB	Baseline	Highest
8bit	8-bit Quantized	~4GB	Faster	Very Good

When to Use This Model

Choose the 8-bit version if:

You have limited RAM (8-16GB)
You need faster inference
You're running on battery power
You want to run multiple models simultaneously

Choose the bf16 version if:

You have 16GB+ RAM
You need maximum accuracy
You're doing fine-tuning or evaluation
Quality is more important than speed

Training Data

This model was fine-tuned on specialized financial datasets:

FinTrain: Comprehensive financial training data
MultiFinBen-EnglishOCR: Multi-lingual financial document OCR
SecureFinAI Contest: Financial security and analysis tasks
ChartQA: Chart question answering
NuminaMath-CoT: Mathematical reasoning with chain-of-thought
FinCoT: Financial chain-of-thought reasoning
COTA-LLaVA: Complex reasoning visual data

Limitations

Primarily trained on English financial data
Slight quality reduction compared to bf16 due to quantization
May require prompt engineering for optimal results
Performance varies with image quality
Not a replacement for professional financial advice

Citation

@misc{amsi-fin-o1-mlx-8bit,
  title={Amsi-fin-o1 MLX 8-bit: Quantized Financial Vision-Language Model for Apple Silicon},
  author={AITRADER},
  year={2025},
  url={https://huggingface.co/AITRADER/Amsi-fin-o1-MLX-8bit}
}

Acknowledgments

Original model: AITRADER/Amsi-fin-o1
Base architecture: Qwen3-VL
Conversion: mlx-vlm v0.3.9
Framework: Apple MLX

License

This model is released under the Apache 2.0 License.

Optimized & Quantized for Apple Silicon with MLX
Made with Apple MLX Framework

Downloads last month: 10

Safetensors

Model size

2B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for AITRADER/Amsi-fin-o1-MLX-8bit

Base model

Qwen/Qwen3-VL-4B-Thinking

Finetuned

huihui-ai/Huihui-Qwen3-VL-4B-Thinking-abliterated