Instructions to use iPwnds/finanalyst-qwen1.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use iPwnds/finanalyst-qwen1.5b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="iPwnds/finanalyst-qwen1.5b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("iPwnds/finanalyst-qwen1.5b", dtype="auto")

PEFT
How to use iPwnds/finanalyst-qwen1.5b with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use iPwnds/finanalyst-qwen1.5b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "iPwnds/finanalyst-qwen1.5b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iPwnds/finanalyst-qwen1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/iPwnds/finanalyst-qwen1.5b

SGLang

How to use iPwnds/finanalyst-qwen1.5b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "iPwnds/finanalyst-qwen1.5b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iPwnds/finanalyst-qwen1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "iPwnds/finanalyst-qwen1.5b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iPwnds/finanalyst-qwen1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use iPwnds/finanalyst-qwen1.5b with Docker Model Runner:
```
docker model run hf.co/iPwnds/finanalyst-qwen1.5b
```

finanalyst-qwen1.5b

A financial analyst LLM fine-tuned from Qwen2.5-1.5B-Instruct using QLoRA (4-bit NF4 quantisation + LoRA adapters) on custom instruction-tuning data generated from live stock market data. The model produces analyst-style responses for stock analysis, market overviews, and free-form financial Q&A.

It runs fully locally — no API keys required. On Apple Silicon it uses MPS acceleration; on a machine with a CUDA GPU it uses that automatically; otherwise it falls back to CPU.

Model Details

Model Description

finanalyst-qwen1.5b is a parameter-efficient fine-tune of Qwen/Qwen2.5-1.5B-Instruct trained to behave like a senior sell-side analyst. It was fine-tuned using QLoRA: the base model is loaded in 4-bit NF4 quantisation (keeping weights at ~1 GB) while small LoRA adapter matrices are injected into the attention projection layers and trained on financial instruction data. Only 0.14% of the total parameters (2.18M out of 1.54B) were updated during training.

The model is the generative core of the AI Stock Market Analyst CLI — a Bloomberg-style terminal that uses this model for all text generation (stock deep-dives, market overviews, watchlist digests, and natural-language Q&A).

Developed by: Florian Braun (@iPwnds)
Model type: Causal language model — instruction-tuned with QLoRA
Language: English
License: Apache 2.0
Fine-tuned from: Qwen/Qwen2.5-1.5B-Instruct

Model Sources

Repository: github.com/iPwnds/bloomberg-terminal
Training notebook: notebooks/FinAnalyst_Generative.ipynb
Companion sentiment model: iPwnds/finsentiment-distilbert

Uses

Direct Use

The model is designed to generate analyst-style financial commentary given a structured prompt containing live market data (price, fundamentals, news sentiment). It handles three task types out of the box:

Stock analysis — given price history, fundamentals, and sentiment, writes a deep-dive covering valuation, momentum, catalysts, and risks.
Market overview — given index performance, sector rotation, and top movers, writes a macro narrative.
Analyst Q&A — answers free-form financial questions in plain English, referencing provided data where available.

Downstream Use

The model plugs directly into the AI Stock Market Analyst CLI via analysis/llm.py, which loads it with AutoPeftModelForCausalLM, merges the LoRA adapter into the base weights for faster inference, and exposes ask_llm() / ask_llm_json() helper functions used throughout the application.

It can also be used as a drop-in instruction-following LLM for any financial NLP pipeline that needs analyst-style prose generation.

Out-of-Scope Use

Real-time trading decisions — the model does not have access to live data and does not produce structured buy/sell signals. Any output should be treated as educational commentary, not financial advice.
Precise numerical forecasting — price targets or earnings estimates produced by the model are illustrative, not quantitative predictions.
Non-English text — training data was English-only.
Domains outside finance — the fine-tuning data is domain-specific; performance on general instruction-following tasks may be degraded compared to the base model.

Bias, Risks, and Limitations

The training set contains only 68 examples across three task types, making the model susceptible to overfitting to the style and tickers present in the training data. Generalisation to less common or non-US equities may be weaker.
Training data was generated from a single point-in-time snapshot of live market data. The model may reproduce market conditions or narratives from that period.
The base model, Qwen2.5-1.5B-Instruct, may carry biases inherited from its pre-training corpus.
At 1.5B parameters the model is relatively small. Responses are generally coherent but may occasionally hallucinate specific figures (e.g. exact P/E ratios or earnings dates) if not grounded by a data-rich prompt.

Recommendations

Always supply the model with current, factual market data in the prompt (price, fundamentals, news). Do not rely on the model's parametric knowledge for specific numerical claims. All output should be reviewed by a qualified professional before informing any financial decision.

How to Get Started with the Model

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer, pipeline
import torch

MODEL = "iPwnds/finanalyst-qwen1.5b"

# Detect device
if torch.backends.mps.is_available():
    device_map, dtype = {"": "mps"}, torch.float16
elif torch.cuda.is_available():
    device_map, dtype = "auto", torch.float16
else:
    device_map, dtype = {"": "cpu"}, torch.float32

# Load base model + LoRA adapter, then merge for faster inference
model = AutoPeftModelForCausalLM.from_pretrained(
    MODEL,
    torch_dtype=dtype,
    device_map=device_map,
).merge_and_unload()

tokenizer = AutoTokenizer.from_pretrained(MODEL)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

messages = [
    {"role": "system", "content": "You are a senior equity analyst. Be concise and data-driven."},
    {"role": "user",   "content": "Is NVDA overbought at current levels given its AI growth story?"},
]

result = pipe(messages, max_new_tokens=512, temperature=0.3, do_sample=True)
print(result[0]["generated_text"][-1]["content"])

The model uses the ChatML chat template (<|im_start|> / <|im_end|>) inherited from Qwen2.5-1.5B-Instruct. Always pass messages as a list of role/content dicts rather than raw strings.

Training Details

Training Data

Training data was generated programmatically by scripts/generate_training_data.py, which:

Fetches live fundamentals, price history, and news via yfinance for a curated list of 23 tickers (large-cap US equities across major sectors).
Constructs structured prompts containing price returns, RSI, P/E, market cap, revenue, and news headlines.
Calls the base Qwen2.5-1.5B-Instruct to generate reference analyst responses.
Saves the prompt–response pairs as JSONL in the {"instruction", "input", "output", "task_type", "ticker"} format.

The final dataset contains 68 instruction pairs split across:

Task type	Count
`stock_analysis`	21
`ask` (free-form Q&A)	46
`market_overview`	1

The 90/10 train/test split gives 61 training examples and 7 evaluation examples.

Training Procedure

Fine-tuning was performed in Google Colab on a T4 GPU (15 GB VRAM) using SFTTrainer from the trl library.

Preprocessing

Each example was formatted into the Qwen2.5 ChatML template:

<|im_start|>system
{instruction}<|im_end|>
<|im_start|>user
{input}<|im_end|>
<|im_start|>assistant
{output}<|im_end|>

Sequences were truncated to a maximum of 512 tokens (set via tokenizer.model_max_length).

Training Hyperparameters

Hyperparameter	Value
Base model	`Qwen/Qwen2.5-1.5B-Instruct`
Quantisation	4-bit NF4 + double quantisation
Compute dtype	`float16`
LoRA rank (`r`)	16
LoRA alpha	32
LoRA target modules	`q_proj`, `v_proj`
LoRA dropout	0.05
Trainable parameters	2,179,072 / 1,545,893,376 (0.14%)
Epochs	3
Per-device batch size	1
Gradient accumulation steps	8 (effective batch size: 8)
Learning rate	2e-4
Max sequence length	512 tokens
Mixed precision	None (`fp16=False, bf16=False`)
Gradient checkpointing	Enabled
Optimizer	AdamW (default)

Note on mixed precision: Qwen2.5's layer norms remain in BFloat16 regardless of dtype, which conflicts with PyTorch's fp16 AMP scaler. Setting fp16=False, bf16=False lets bitsandbytes handle precision internally and avoids the _amp_foreach_non_finite_check_and_unscale_cuda error on T4.

Speeds, Sizes, Times


Training time	~4 minutes (T4 GPU, Google Colab)
Total steps	24
Adapter size (pushed to Hub)	~8 MB
Base model size (downloaded at inference)	~3 GB
Training loss (final)	1.5266

Evaluation

Testing Data

A random 10% split (7 examples, seed=42) held out from the same generated dataset. The small size means evaluation loss should be interpreted qualitatively — it indicates whether the model is learning the response style rather than serving as a rigorous benchmark.

Metrics

Evaluation loss (cross-entropy on held-out completions) was used as the primary training signal. No external benchmarks were run; the model is evaluated end-to-end within the CLI application by inspecting output quality on real analyst prompts.

Results

Metric	Value
Final training loss	1.5266
Final evaluation loss	1.4721
Eval runtime	3.78 s (7 samples)

The evaluation loss being slightly lower than training loss is consistent with the very small dataset size and indicates the model generalised to the held-out examples rather than overfitting.

Summary

The model successfully learns the instruction-following format and analyst prose style within 3 epochs on 61 examples. Responses are coherent, stay on-topic, and reference the data provided in the prompt. The primary limitation is dataset size — broader coverage of tickers, task types, and market conditions would improve robustness.

Environmental Impact

Training was performed on a Google Colab T4 GPU for approximately 4 minutes. Estimated carbon emissions are negligible at this scale.

Hardware type: NVIDIA T4 (Google Colab)
Hours used: ~0.07 hours
Cloud provider: Google (Colab)
Compute region: US (Colab default)
Carbon emitted: < 1 g CO₂eq (estimated)

Technical Specifications

Model Architecture and Objective

Base architecture: Qwen2.5-1.5B-Instruct (decoder-only transformer, 1.54B parameters)
Fine-tuning method: QLoRA — 4-bit NF4 weight quantisation via bitsandbytes, with low-rank adapter matrices (LoRA) injected into the q_proj and v_proj layers of each attention block
Objective: Next-token prediction (causal LM) on ChatML-formatted instruction–response pairs
Chat format: ChatML (<|im_start|> / <|im_end|>)

Compute Infrastructure

Hardware

Google Colab T4 GPU (15 GB VRAM) for training
Apple Silicon (MPS), CUDA GPU, or CPU for inference

Software

Package	Role
`transformers`	Model loading, tokenisation, pipeline
`peft`	LoRA adapter injection and `AutoPeftModelForCausalLM`
`trl`	`SFTTrainer` / `SFTConfig` for supervised fine-tuning
`bitsandbytes` ≥ 0.46.1	4-bit NF4 quantisation
`accelerate`	Device placement and distributed training
`datasets`	Dataset formatting and train/test split

Model Card Authors

Florian Braun (@iPwnds)

Model Card Contact

huggingface.co/iPwnds

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for iPwnds/finanalyst-qwen1.5b

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1569)

this model