Instructions to use iPwnds/finanalyst-qwen1.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use iPwnds/finanalyst-qwen1.5b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="iPwnds/finanalyst-qwen1.5b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("iPwnds/finanalyst-qwen1.5b", dtype="auto") - PEFT
How to use iPwnds/finanalyst-qwen1.5b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use iPwnds/finanalyst-qwen1.5b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "iPwnds/finanalyst-qwen1.5b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iPwnds/finanalyst-qwen1.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/iPwnds/finanalyst-qwen1.5b
- SGLang
How to use iPwnds/finanalyst-qwen1.5b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "iPwnds/finanalyst-qwen1.5b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iPwnds/finanalyst-qwen1.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "iPwnds/finanalyst-qwen1.5b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iPwnds/finanalyst-qwen1.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use iPwnds/finanalyst-qwen1.5b with Docker Model Runner:
docker model run hf.co/iPwnds/finanalyst-qwen1.5b
finanalyst-qwen1.5b
A financial analyst LLM fine-tuned from Qwen2.5-1.5B-Instruct using QLoRA (4-bit NF4 quantisation + LoRA adapters) on custom instruction-tuning data generated from live stock market data. The model produces analyst-style responses for stock analysis, market overviews, and free-form financial Q&A.
It runs fully locally — no API keys required. On Apple Silicon it uses MPS acceleration; on a machine with a CUDA GPU it uses that automatically; otherwise it falls back to CPU.
Model Details
Model Description
finanalyst-qwen1.5b is a parameter-efficient fine-tune of Qwen/Qwen2.5-1.5B-Instruct trained to behave like a senior sell-side analyst. It was fine-tuned using QLoRA: the base model is loaded in 4-bit NF4 quantisation (keeping weights at ~1 GB) while small LoRA adapter matrices are injected into the attention projection layers and trained on financial instruction data. Only 0.14% of the total parameters (2.18M out of 1.54B) were updated during training.
The model is the generative core of the AI Stock Market Analyst CLI — a Bloomberg-style terminal that uses this model for all text generation (stock deep-dives, market overviews, watchlist digests, and natural-language Q&A).
- Developed by: Florian Braun (@iPwnds)
- Model type: Causal language model — instruction-tuned with QLoRA
- Language: English
- License: Apache 2.0
- Fine-tuned from:
Qwen/Qwen2.5-1.5B-Instruct
Model Sources
- Repository: github.com/iPwnds/bloomberg-terminal
- Training notebook:
notebooks/FinAnalyst_Generative.ipynb - Companion sentiment model:
iPwnds/finsentiment-distilbert
Uses
Direct Use
The model is designed to generate analyst-style financial commentary given a structured prompt containing live market data (price, fundamentals, news sentiment). It handles three task types out of the box:
- Stock analysis — given price history, fundamentals, and sentiment, writes a deep-dive covering valuation, momentum, catalysts, and risks.
- Market overview — given index performance, sector rotation, and top movers, writes a macro narrative.
- Analyst Q&A — answers free-form financial questions in plain English, referencing provided data where available.
Downstream Use
The model plugs directly into the AI Stock Market Analyst CLI via analysis/llm.py, which loads it with AutoPeftModelForCausalLM, merges the LoRA adapter into the base weights for faster inference, and exposes ask_llm() / ask_llm_json() helper functions used throughout the application.
It can also be used as a drop-in instruction-following LLM for any financial NLP pipeline that needs analyst-style prose generation.
Out-of-Scope Use
- Real-time trading decisions — the model does not have access to live data and does not produce structured buy/sell signals. Any output should be treated as educational commentary, not financial advice.
- Precise numerical forecasting — price targets or earnings estimates produced by the model are illustrative, not quantitative predictions.
- Non-English text — training data was English-only.
- Domains outside finance — the fine-tuning data is domain-specific; performance on general instruction-following tasks may be degraded compared to the base model.
Bias, Risks, and Limitations
- The training set contains only 68 examples across three task types, making the model susceptible to overfitting to the style and tickers present in the training data. Generalisation to less common or non-US equities may be weaker.
- Training data was generated from a single point-in-time snapshot of live market data. The model may reproduce market conditions or narratives from that period.
- The base model,
Qwen2.5-1.5B-Instruct, may carry biases inherited from its pre-training corpus. - At 1.5B parameters the model is relatively small. Responses are generally coherent but may occasionally hallucinate specific figures (e.g. exact P/E ratios or earnings dates) if not grounded by a data-rich prompt.
Recommendations
Always supply the model with current, factual market data in the prompt (price, fundamentals, news). Do not rely on the model's parametric knowledge for specific numerical claims. All output should be reviewed by a qualified professional before informing any financial decision.
How to Get Started with the Model
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer, pipeline
import torch
MODEL = "iPwnds/finanalyst-qwen1.5b"
# Detect device
if torch.backends.mps.is_available():
device_map, dtype = {"": "mps"}, torch.float16
elif torch.cuda.is_available():
device_map, dtype = "auto", torch.float16
else:
device_map, dtype = {"": "cpu"}, torch.float32
# Load base model + LoRA adapter, then merge for faster inference
model = AutoPeftModelForCausalLM.from_pretrained(
MODEL,
torch_dtype=dtype,
device_map=device_map,
).merge_and_unload()
tokenizer = AutoTokenizer.from_pretrained(MODEL)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [
{"role": "system", "content": "You are a senior equity analyst. Be concise and data-driven."},
{"role": "user", "content": "Is NVDA overbought at current levels given its AI growth story?"},
]
result = pipe(messages, max_new_tokens=512, temperature=0.3, do_sample=True)
print(result[0]["generated_text"][-1]["content"])
The model uses the ChatML chat template (<|im_start|> / <|im_end|>) inherited from Qwen2.5-1.5B-Instruct. Always pass messages as a list of role/content dicts rather than raw strings.
Training Details
Training Data
Training data was generated programmatically by scripts/generate_training_data.py, which:
- Fetches live fundamentals, price history, and news via
yfinancefor a curated list of 23 tickers (large-cap US equities across major sectors). - Constructs structured prompts containing price returns, RSI, P/E, market cap, revenue, and news headlines.
- Calls the base
Qwen2.5-1.5B-Instructto generate reference analyst responses. - Saves the prompt–response pairs as JSONL in the
{"instruction", "input", "output", "task_type", "ticker"}format.
The final dataset contains 68 instruction pairs split across:
| Task type | Count |
|---|---|
stock_analysis |
21 |
ask (free-form Q&A) |
46 |
market_overview |
1 |
The 90/10 train/test split gives 61 training examples and 7 evaluation examples.
Training Procedure
Fine-tuning was performed in Google Colab on a T4 GPU (15 GB VRAM) using SFTTrainer from the trl library.
Preprocessing
Each example was formatted into the Qwen2.5 ChatML template:
<|im_start|>system
{instruction}<|im_end|>
<|im_start|>user
{input}<|im_end|>
<|im_start|>assistant
{output}<|im_end|>
Sequences were truncated to a maximum of 512 tokens (set via tokenizer.model_max_length).
Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Quantisation | 4-bit NF4 + double quantisation |
| Compute dtype | float16 |
LoRA rank (r) |
16 |
| LoRA alpha | 32 |
| LoRA target modules | q_proj, v_proj |
| LoRA dropout | 0.05 |
| Trainable parameters | 2,179,072 / 1,545,893,376 (0.14%) |
| Epochs | 3 |
| Per-device batch size | 1 |
| Gradient accumulation steps | 8 (effective batch size: 8) |
| Learning rate | 2e-4 |
| Max sequence length | 512 tokens |
| Mixed precision | None (fp16=False, bf16=False) |
| Gradient checkpointing | Enabled |
| Optimizer | AdamW (default) |
Note on mixed precision: Qwen2.5's layer norms remain in BFloat16 regardless of
dtype, which conflicts with PyTorch's fp16 AMP scaler. Settingfp16=False, bf16=Falseletsbitsandbyteshandle precision internally and avoids the_amp_foreach_non_finite_check_and_unscale_cudaerror on T4.
Speeds, Sizes, Times
| Training time | ~4 minutes (T4 GPU, Google Colab) |
| Total steps | 24 |
| Adapter size (pushed to Hub) | ~8 MB |
| Base model size (downloaded at inference) | ~3 GB |
| Training loss (final) | 1.5266 |
Evaluation
Testing Data
A random 10% split (7 examples, seed=42) held out from the same generated dataset. The small size means evaluation loss should be interpreted qualitatively — it indicates whether the model is learning the response style rather than serving as a rigorous benchmark.
Metrics
Evaluation loss (cross-entropy on held-out completions) was used as the primary training signal. No external benchmarks were run; the model is evaluated end-to-end within the CLI application by inspecting output quality on real analyst prompts.
Results
| Metric | Value |
|---|---|
| Final training loss | 1.5266 |
| Final evaluation loss | 1.4721 |
| Eval runtime | 3.78 s (7 samples) |
The evaluation loss being slightly lower than training loss is consistent with the very small dataset size and indicates the model generalised to the held-out examples rather than overfitting.
Summary
The model successfully learns the instruction-following format and analyst prose style within 3 epochs on 61 examples. Responses are coherent, stay on-topic, and reference the data provided in the prompt. The primary limitation is dataset size — broader coverage of tickers, task types, and market conditions would improve robustness.
Environmental Impact
Training was performed on a Google Colab T4 GPU for approximately 4 minutes. Estimated carbon emissions are negligible at this scale.
- Hardware type: NVIDIA T4 (Google Colab)
- Hours used: ~0.07 hours
- Cloud provider: Google (Colab)
- Compute region: US (Colab default)
- Carbon emitted: < 1 g CO₂eq (estimated)
Technical Specifications
Model Architecture and Objective
- Base architecture: Qwen2.5-1.5B-Instruct (decoder-only transformer, 1.54B parameters)
- Fine-tuning method: QLoRA — 4-bit NF4 weight quantisation via
bitsandbytes, with low-rank adapter matrices (LoRA) injected into theq_projandv_projlayers of each attention block - Objective: Next-token prediction (causal LM) on ChatML-formatted instruction–response pairs
- Chat format: ChatML (
<|im_start|>/<|im_end|>)
Compute Infrastructure
Hardware
- Google Colab T4 GPU (15 GB VRAM) for training
- Apple Silicon (MPS), CUDA GPU, or CPU for inference
Software
| Package | Role |
|---|---|
transformers |
Model loading, tokenisation, pipeline |
peft |
LoRA adapter injection and AutoPeftModelForCausalLM |
trl |
SFTTrainer / SFTConfig for supervised fine-tuning |
bitsandbytes ≥ 0.46.1 |
4-bit NF4 quantisation |
accelerate |
Device placement and distributed training |
datasets |
Dataset formatting and train/test split |
Model Card Authors
Florian Braun (@iPwnds)