Amsi-fin-o1 MLX (8-bit Quantized)
Model Description
This is the 8-bit quantized MLX conversion of AITRADER/Amsi-fin-o1, a specialized financial vision-language model. The 8-bit quantization reduces memory usage by ~50% while maintaining excellent performance for financial analysis tasks.
Key Features
- Reduced Memory Footprint: ~4GB vs ~8GB for bf16 version
- Faster Inference: Optimized for speed on Apple Silicon
- Financial Document Analysis: Extract and analyze financial statements and reports
- Chart Understanding: Interpret financial charts and visualizations
- Chain-of-Thought Reasoning: Advanced thinking for complex financial calculations
- Apple Silicon Native: Optimized for M1/M2/M3/M4 chips
Quantization Details
| Parameter | Value |
|---|---|
| Quantization Type | 8-bit Integer |
| Group Size | 64 |
| Memory Reduction | ~50% |
| Quality Retention | ~98%+ |
Model Architecture
| Component | Specification |
|---|---|
| Base Architecture | Qwen3-VL (4B parameters) |
| Text Model | 36 layers, 2560 hidden size |
| Vision Encoder | 24 layers, 1024 hidden size |
| Attention Heads | 32 (8 KV heads) |
| Context Length | Up to 131,072 tokens |
| Precision | 8-bit Quantized |
| Model Size | ~4GB |
Installation
# Install mlx-vlm
pip install -U mlx-vlm
Quick Start
Command Line
# Basic image analysis
python -m mlx_vlm.generate \
--model AITRADER/Amsi-fin-o1-MLX-8bit \
--max-tokens 512 \
--temperature 0.7 \
--prompt "Analyze this financial chart and explain the trends." \
--image path/to/financial_chart.png
Python API
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load model
model_path = "AITRADER/Amsi-fin-o1-MLX-8bit"
model, processor = load(model_path)
config = load_config(model_path)
# Prepare prompt
prompt = apply_chat_template(
processor,
config,
"Analyze the financial performance shown in this quarterly report.",
num_images=1
)
# Generate response
output = generate(
model,
processor,
prompt,
image="path/to/report.png",
max_tokens=512,
temperature=0.7
)
print(output)
Streaming Generation
from mlx_vlm import load, stream_generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
model_path = "AITRADER/Amsi-fin-o1-MLX-8bit"
model, processor = load(model_path)
config = load_config(model_path)
prompt = apply_chat_template(
processor,
config,
"What are the key insights from this financial statement?",
num_images=1
)
# Stream the response
for token in stream_generate(
model,
processor,
prompt,
image="financial_statement.png",
max_tokens=512
):
print(token, end="", flush=True)
Use Cases
1. Financial Chart Analysis
prompt = """Analyze this stock chart and provide:
1. The overall trend (bullish/bearish/neutral)
2. Key support and resistance levels
3. Volume analysis
4. Trading recommendations"""
2. Financial Statement Review
prompt = """Review this income statement and:
1. Calculate key financial ratios
2. Compare YoY performance
3. Identify areas of concern
4. Highlight positive indicators"""
3. Document OCR and Extraction
prompt = """Extract all numerical data from this financial document
and organize it in a structured format."""
4. Investment Analysis
prompt = """Based on this quarterly report:
1. Summarize the company's financial health
2. Calculate growth metrics
3. Provide an investment thesis"""
Performance Comparison
| Metric | 8-bit (This) | BF16 |
|---|---|---|
| Memory Usage | ~4GB | ~8GB |
| Inference Speed | Faster | Baseline |
| Quality | Very Good | Highest |
| Recommended For | Limited RAM / Speed | Maximum Quality |
Hardware Requirements
| Apple Silicon | Performance |
|---|---|
| M1 (8GB) | Good |
| M1 Pro/Max (16GB+) | Very Good |
| M2/M2 Pro/Max | Excellent |
| M3/M3 Pro/Max | Excellent |
| M4/M4 Pro/Max | Best |
Minimum: 8GB unified memory Recommended: 16GB+ for larger batch sizes
Model Variants
| Variant | Precision | Size | Speed | Quality |
|---|---|---|---|---|
| bf16 | BFloat16 | ~8GB | Baseline | Highest |
| 8bit | 8-bit Quantized | ~4GB | Faster | Very Good |
When to Use This Model
Choose the 8-bit version if:
- You have limited RAM (8-16GB)
- You need faster inference
- You're running on battery power
- You want to run multiple models simultaneously
Choose the bf16 version if:
- You have 16GB+ RAM
- You need maximum accuracy
- You're doing fine-tuning or evaluation
- Quality is more important than speed
Training Data
This model was fine-tuned on specialized financial datasets:
- FinTrain: Comprehensive financial training data
- MultiFinBen-EnglishOCR: Multi-lingual financial document OCR
- SecureFinAI Contest: Financial security and analysis tasks
- ChartQA: Chart question answering
- NuminaMath-CoT: Mathematical reasoning with chain-of-thought
- FinCoT: Financial chain-of-thought reasoning
- COTA-LLaVA: Complex reasoning visual data
Limitations
- Primarily trained on English financial data
- Slight quality reduction compared to bf16 due to quantization
- May require prompt engineering for optimal results
- Performance varies with image quality
- Not a replacement for professional financial advice
Citation
@misc{amsi-fin-o1-mlx-8bit,
title={Amsi-fin-o1 MLX 8-bit: Quantized Financial Vision-Language Model for Apple Silicon},
author={AITRADER},
year={2025},
url={https://huggingface.co/AITRADER/Amsi-fin-o1-MLX-8bit}
}
Acknowledgments
- Original model: AITRADER/Amsi-fin-o1
- Base architecture: Qwen3-VL
- Conversion: mlx-vlm v0.3.9
- Framework: Apple MLX
License
This model is released under the Apache 2.0 License.
Optimized & Quantized for Apple Silicon with MLX
Made with Apple MLX Framework
Made with Apple MLX Framework
- Downloads last month
- 12
Model tree for AITRADER/Amsi-fin-o1-MLX-8bit
Base model
Qwen/Qwen3-VL-4B-Thinking
Finetuned
AITRADER/Amsi-fin-o1