Update README.md

d8a695a verified 29 days ago

5.59 kB

	---
	license: cc-by-2.0
	language:
	- en
	base_model:
	- distilbert/distilbert-base-uncased
	tags:
	- text-classification
	- finance
	- sentiment-analysis
	- distilbert
	pipeline_tag: text-classification
	---

	# FinesseBERT

	FinesseBERT is a fine-tuned sequence classification model based on DistilBERT, built to predict the sentiment of stock market and crypto news articles from the headline and metadata alone — no full article body required. Built by [SentientMerchant](https://sentientmerchant.com/), a platform exploring AI-driven tools for financial intelligence.

	## Purpose

	Given the high volume and velocity of financial news, FinesseBERT enables fast, scalable sentiment analysis at the point of discovery — making it well-suited for real-time trading signals, news aggregators, and financial dashboards.

	The model classifies inputs into three sentiment categories:
	- `Positive-Outlook-On-Stock-News`
	- `Neutral-Outlook-On-Stock-News`
	- `Negative-Outlook-On-Stock-News`

	## Attribution Requirement

	This model is licensed under CC-BY. If you use this model in your research, application, or product, you must provide attribution by linking back to [sentientmerchant.com](https://sentientmerchant.com/).

	## Training Data

	FinesseBERT was fine-tuned on a dataset of 2,900 labeled stock news examples. Each example was structured as a single concatenated string capturing four fields sourced from financial news articles:

	```json
	{
	"reference": "https://finance.yahoo.com/...",
	"text": "security: LULULEMON ATHLETICA INC (LULU); title: Lululemon (LULU) Dips More Than Broader Market: What You Should Know; description: In the closing of the recent trading day, Lululemon (LULU) stood at $138.16, denoting a -2.97% move from the preceding trading day.; author: yahoo.com",
	"labels": 2
	}
	```

	\| Field \| Description \|
	\|---\|---\|
	\| `security` \| The full company name and ticker symbol \|
	\| `title` \| The article headline \|
	\| `description` \| A short article summary or lede sentence \|
	\| `author` \| The publishing source \|

	The `labels` field maps to: `0` → Positive, `1` → Neutral, `2` → Negative.

	## Optimal Inference Format

	To get the best results from FinesseBERT, structure your input text to mirror the training data format exactly — a semicolon-delimited string with the four named fields in the same order:

	```
	security: <COMPANY NAME> (<TICKER>); title: <HEADLINE>; description: <DESCRIPTION>; author: <SOURCE>
	```

	Example:
	```
	security: LULULEMON ATHLETICA INC (LULU); title: Lululemon (LULU) Dips More Than Broader Market: What You Should Know; description: In the closing of the recent trading day, Lululemon (LULU) stood at $138.16, denoting a -2.97% move from the preceding trading day.; author: yahoo.com
	```

	Deviating from this format — such as passing a raw headline string alone — may degrade classification accuracy, as the model learned the sentiment signal from the full structured context it was trained on.

	## How to Use

	You can load and use this model directly from the Hugging Face Hub using the `transformers` library.

	### 1. Install Dependencies

	Make sure you have the required libraries installed:

	```bash
	pip install transformers torch
	```

	### 2. Loading the Model

	Use the `Auto` classes to load the model and tokenizer directly from the Hugging Face Hub.

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_id = "MattELab/FinesseBERT"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForSequenceClassification.from_pretrained(model_id)
	```

	### 3. Running Inference

	Here is a quick example of how to pass text through the model to get predictions.

	```python
	# 1. Define your input text (see Optimal Inference Format above)
	text = "security: LULULEMON ATHLETICA INC (LULU); title: Lululemon (LULU) Dips More Than Broader Market: What You Should Know; description: In the closing of the recent trading day, Lululemon (LULU) stood at $138.16, denoting a -2.97% move from the preceding trading day.; author: yahoo.com"

	# 2. Tokenize the input
	# Note: DistilBERT has a maximum input length of 512 tokens. Inputs longer than
	# this will be silently truncated, which may degrade prediction quality.
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)

	# 3. Run the model (using torch.no_grad() for faster, memory-efficient inference)
	with torch.no_grad():
	outputs = model(**inputs)

	# 4. Convert logits to probabilities
	probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)

	# 5. Get the predicted class
	predicted_class = torch.argmax(probabilities, dim=-1).item()

	# 6. Map the predicted class ID to a human-readable label
	label_map = {0: "Positive-Outlook-On-Stock-News", 1: "Neutral-Outlook-On-Stock-News", 2: "Negative-Outlook-On-Stock-News"}

	print(f"Probabilities: {probabilities}")
	print(f"Predicted Class ID: {predicted_class}")
	print(f"Predicted Sentiment: {label_map[predicted_class]}")
	```

	## Model Details

	* Architecture: DistilBERT (`AutoModelForSequenceClassification`)
	* Task: Text Classification
	* Classes: 3 (`Positive-Outlook-On-Stock-News`, `Neutral-Outlook-On-Stock-News`, `Negative-Outlook-On-Stock-News`)
	* Creator: MattELab

	## About SentientMerchant

	SentientMerchant provides real-time stock, crypto, and international market data to keep you up-to-date. Find top news headlines, individual and overall news sentiment across various timelines, build a watchlist, buy US & SG stocks, and create and manage your portfolio.