Amsi-fin-o1 MLX (8-bit Quantized)

Model Logo

Quantized Financial Vision-Language Model for Apple Silicon

MLX Apple Silicon 8-bit License

Model Description

This is the 8-bit quantized MLX conversion of AITRADER/Amsi-fin-o1, a specialized financial vision-language model. The 8-bit quantization reduces memory usage by ~50% while maintaining excellent performance for financial analysis tasks.

Key Features

  • Reduced Memory Footprint: ~4GB vs ~8GB for bf16 version
  • Faster Inference: Optimized for speed on Apple Silicon
  • Financial Document Analysis: Extract and analyze financial statements and reports
  • Chart Understanding: Interpret financial charts and visualizations
  • Chain-of-Thought Reasoning: Advanced thinking for complex financial calculations
  • Apple Silicon Native: Optimized for M1/M2/M3/M4 chips

Quantization Details

Parameter Value
Quantization Type 8-bit Integer
Group Size 64
Memory Reduction ~50%
Quality Retention ~98%+

Model Architecture

Component Specification
Base Architecture Qwen3-VL (4B parameters)
Text Model 36 layers, 2560 hidden size
Vision Encoder 24 layers, 1024 hidden size
Attention Heads 32 (8 KV heads)
Context Length Up to 131,072 tokens
Precision 8-bit Quantized
Model Size ~4GB

Installation

# Install mlx-vlm
pip install -U mlx-vlm

Quick Start

Command Line

# Basic image analysis
python -m mlx_vlm.generate \
    --model AITRADER/Amsi-fin-o1-MLX-8bit \
    --max-tokens 512 \
    --temperature 0.7 \
    --prompt "Analyze this financial chart and explain the trends." \
    --image path/to/financial_chart.png

Python API

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load model
model_path = "AITRADER/Amsi-fin-o1-MLX-8bit"
model, processor = load(model_path)
config = load_config(model_path)

# Prepare prompt
prompt = apply_chat_template(
    processor,
    config,
    "Analyze the financial performance shown in this quarterly report.",
    num_images=1
)

# Generate response
output = generate(
    model,
    processor,
    prompt,
    image="path/to/report.png",
    max_tokens=512,
    temperature=0.7
)
print(output)

Streaming Generation

from mlx_vlm import load, stream_generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model_path = "AITRADER/Amsi-fin-o1-MLX-8bit"
model, processor = load(model_path)
config = load_config(model_path)

prompt = apply_chat_template(
    processor,
    config,
    "What are the key insights from this financial statement?",
    num_images=1
)

# Stream the response
for token in stream_generate(
    model,
    processor,
    prompt,
    image="financial_statement.png",
    max_tokens=512
):
    print(token, end="", flush=True)

Use Cases

1. Financial Chart Analysis

prompt = """Analyze this stock chart and provide:
1. The overall trend (bullish/bearish/neutral)
2. Key support and resistance levels
3. Volume analysis
4. Trading recommendations"""

2. Financial Statement Review

prompt = """Review this income statement and:
1. Calculate key financial ratios
2. Compare YoY performance
3. Identify areas of concern
4. Highlight positive indicators"""

3. Document OCR and Extraction

prompt = """Extract all numerical data from this financial document
and organize it in a structured format."""

4. Investment Analysis

prompt = """Based on this quarterly report:
1. Summarize the company's financial health
2. Calculate growth metrics
3. Provide an investment thesis"""

Performance Comparison

Metric 8-bit (This) BF16
Memory Usage ~4GB ~8GB
Inference Speed Faster Baseline
Quality Very Good Highest
Recommended For Limited RAM / Speed Maximum Quality

Hardware Requirements

Apple Silicon Performance
M1 (8GB) Good
M1 Pro/Max (16GB+) Very Good
M2/M2 Pro/Max Excellent
M3/M3 Pro/Max Excellent
M4/M4 Pro/Max Best

Minimum: 8GB unified memory Recommended: 16GB+ for larger batch sizes

Model Variants

Variant Precision Size Speed Quality
bf16 BFloat16 ~8GB Baseline Highest
8bit 8-bit Quantized ~4GB Faster Very Good

When to Use This Model

Choose the 8-bit version if:

  • You have limited RAM (8-16GB)
  • You need faster inference
  • You're running on battery power
  • You want to run multiple models simultaneously

Choose the bf16 version if:

  • You have 16GB+ RAM
  • You need maximum accuracy
  • You're doing fine-tuning or evaluation
  • Quality is more important than speed

Training Data

This model was fine-tuned on specialized financial datasets:

  • FinTrain: Comprehensive financial training data
  • MultiFinBen-EnglishOCR: Multi-lingual financial document OCR
  • SecureFinAI Contest: Financial security and analysis tasks
  • ChartQA: Chart question answering
  • NuminaMath-CoT: Mathematical reasoning with chain-of-thought
  • FinCoT: Financial chain-of-thought reasoning
  • COTA-LLaVA: Complex reasoning visual data

Limitations

  • Primarily trained on English financial data
  • Slight quality reduction compared to bf16 due to quantization
  • May require prompt engineering for optimal results
  • Performance varies with image quality
  • Not a replacement for professional financial advice

Citation

@misc{amsi-fin-o1-mlx-8bit,
  title={Amsi-fin-o1 MLX 8-bit: Quantized Financial Vision-Language Model for Apple Silicon},
  author={AITRADER},
  year={2025},
  url={https://huggingface.co/AITRADER/Amsi-fin-o1-MLX-8bit}
}

Acknowledgments

License

This model is released under the Apache 2.0 License.


Optimized & Quantized for Apple Silicon with MLX
Made with Apple MLX Framework
Downloads last month
12
Safetensors
Model size
2B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AITRADER/Amsi-fin-o1-MLX-8bit

Datasets used to train AITRADER/Amsi-fin-o1-MLX-8bit