simonschoe
/

TransformationTransformer

Text Classification

text-embeddings-inference

Model card Files Files and versions

TransformationTransformer / README.md

simonschoe's picture

Update README.md

82aa653 almost 4 years ago

|

2.95 kB

	---
	language:
	- en
	pipeline_tag: text-classification
	tags:
	widget:
	- text: "And it was great to see how our Chinese team very much aware of that and of shifting all the resourcing to really tap into these opportunities."
	example_title: "Examplary Transformation Sentence"
	- text: "But we will continue to recruit even after that because we expect that the volumes are going to continue to grow."
	example_title: "Examplary Non-Transformation Sentence"
	- text: "So and again, we'll be disclosing the current taxes that are there in Guyana, along with that revenue adjustment."
	example_title: "Examplary Non-Transformation Sentence"

	---

	# TransformationTransformer

	TransformationTransformer is a fine-tuned [distilroberta](https://huggingface.co/distilroberta-base) model. It is trained and evaluated on 10,000 manually annotated sentences gleaned from the Q&A-section of quarterly earnings conference calls. In particular, it was trained on sentences issued by firm executives to discriminate between setnences that allude to business transformation vis-à-vis those that discuss topics other than business transformations. More details about the training procedure can be found [below](#model-training).


	## Background

	Context on the project.


	## Usage

	The model is intented to be used for sentence classification: It creates a contextual text representation from the input sentence and outputs a probability value. `LABEL_1` refers to a sentence that is predicted to contains transformation-related content (vice versa for `LABEL_0`). The query should consist of a single sentence.


	## Usage (API)

	```python
	import json
	import requests

	API_TOKEN = <TOKEN>

	headers = {"Authorization": f"Bearer {API_TOKEN}"}
	API_URL = "https://api-inference.huggingface.co/models/simonschoe/call2vec"

	def query(payload):
	data = json.dumps(payload)
	response = requests.request("POST", API_URL, headers=headers, data=data)
	return json.loads(response.content.decode("utf-8"))

	query({"inputs": "<insert-sentence-here>"})
	```

	## Usage (transformers)

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("simonschoe/TransformationTransformer")
	model = AutoModelForSequenceClassification.from_pretrained("simonschoe/TransformationTransformer")

	classifier = pipeline('text-classification', model=model, tokenizer=tokenizer)
	classifier('<insert-sentence-here>')
	```


	## Model Training

	The model has been trained on text data stemming from earnings call transcripts. The data is restricted to a call's question-and-answer (Q&A) section and the remarks by firm executives. The data has been segmented into individual sentences using [`spacy`](https://spacy.io/).

	Statistics of Training Data:
	- Labeled sentences: 10,000
	- Data distribution: xxx
	- Inter-coder agreement: xxx

	The following code snippets presents the training pipeline:
	<link to script>