shadow91102
/

autonomous_trading

Text Classification

sentiment analysis

financial sentiment analysis

Model card Files Files and versions

autonomous_trading / README.md

shadow91102's picture

Upload folder using huggingface_hub

d26d312 verified 13 days ago

|

history blame contribute delete

3.32 kB

	---
	license: mit
	tags:
	- sentiment analysis
	- financial sentiment analysis
	- bert
	- text-classification
	- finance
	- finbert
	- financial
	---
	# Trading Hero Financial Sentiment Analysis

	Model Description: This model is a fine-tuned version of [FinBERT](https://huggingface.co/yiyanghkust/finbert-pretrain), a BERT model pre-trained on financial texts. The fine-tuning process was conducted to adapt the model to specific financial NLP tasks, enhancing its performance on domain-specific applications for sentiment analysis.

	## Model Use

	Primary Users: Financial analysts, NLP researchers, and developers working on financial data.

	## Training Data

	Training Dataset: The model was fine-tuned on a custom dataset of financial communication texts. The dataset was split into training, validation, and test sets as follows:

	Training Set: 10,918,272 tokens

	Validation Set: 1,213,184 tokens

	Test Set: 1,347,968 tokens

	Pre-training Dataset: FinBERT was pre-trained on a large financial corpus totaling 4.9 billion tokens, including:

	Corporate Reports (10-K & 10-Q): 2.5 billion tokens

	Earnings Call Transcripts: 1.3 billion tokens

	Analyst Reports: 1.1 billion tokens

	## Evaluation

	* Test Accuracy = 0.908469
	* Test Precision = 0.927788
	* Test Recall = 0.908469
	* Test F1 = 0.913267
	* Labels: 0 -> Neutral; 1 -> Positive; 2 -> Negative


	## Usage

	```
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
	tokenizer = AutoTokenizer.from_pretrained("fuchenru/Trading-Hero-LLM")
	model = AutoModelForSequenceClassification.from_pretrained("fuchenru/Trading-Hero-LLM")
	nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
	# Preprocess the input text
	def preprocess(text, tokenizer, max_length=128):
	inputs = tokenizer(text, truncation=True, padding='max_length', max_length=max_length, return_tensors='pt')
	return inputs

	# Function to perform prediction
	def predict_sentiment(input_text):
	# Tokenize the input text
	inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True)

	# Perform inference
	with torch.no_grad():
	outputs = model(**inputs)

	# Get predicted label
	predicted_label = torch.argmax(outputs.logits, dim=1).item()

	# Map the predicted label to the original labels
	label_map = {0: 'neutral', 1: 'positive', 2: 'negative'}
	predicted_sentiment = label_map[predicted_label]

	return predicted_sentiment

	stock_news = [
	"Market analysts predict a stable outlook for the coming weeks.",
	"The market remained relatively flat today, with minimal movement in stock prices.",
	"Investor sentiment improved following news of a potential trade deal.",
	.......
	]


	for i in stock_news:
	predicted_sentiment = predict_sentiment(i)
	print("Predicted Sentiment:", predicted_sentiment)
	```
	```
	Predicted Sentiment: neutral
	Predicted Sentiment: neutral
	Predicted Sentiment: positive
	```

	## Citation

	```
	@misc{yang2020finbert,
	title={FinBERT: A Pretrained Language Model for Financial Communications},
	author={Yi Yang and Mark Christopher Siy UY and Allen Huang},
	year={2020},
	eprint={2006.08097},
	archivePrefix={arXiv},
	}
	```