Update README.md

cf47f8d verified 3 months ago

9.08 kB

	---
	model-index:
	- name: augustulus-latin-sentiment-lora
	results:
	- task:
	type: text-classification
	name: Sentiment Analysis
	dataset:
	name: Ancient Latin Sentiment (Custom)
	type: custom
	metrics:
	- type: accuracy
	value: 75
	name: Accuracy (with linguistic post-processing)
	- type: accuracy
	value: 37.5
	name: Raw Model Accuracy
	license: llama3.1
	language:
	- la
	base_model:
	- meta-llama/Llama-3.1-8B-Instruct
	tags:
	- gguf
	- quantized
	- llama-cpp
	---

	# Augustulus Latin Sentiment Analysis LoRA

	Developed by Team Trojan Parse University of Florida Senior Design Project

	A LoRA (Low-Rank Adaptation) adapter fine-tuned on Llama-3.1-8B-Instruct for fine-grained sentiment classification of Ancient Latin texts across seven emotional intensity levels.

	## Project Information

	- Team Name: Trojan Parse
	- Team Members:
	- Alex John
	- Ryan Willson
	- Byron Boatright
	- Jake Marotta
	- Duncan Fuller
	- Project Repository: [GitHub: Trojan-Parse-Project](https://github.com/alxxjohn/Trojan-Parse-Project)
	- Advisor: Eleni Bozia, Ph.D., Dr. phil. (Associate Professor of Classics and Digital Humanities)
	- Advisor Department: Department of Classics, University of Florida

	## Model Description

	- Model Name: `augustulus-latin-sentiment-lora`
	- Model type: LoRA Adapter for Ancient Language Sentiment Classification
	- Language: Classical/Ancient Latin
	- Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
	- License: Llama 3.1 Community License
	- Purpose: Academic research and historical text analysis

	## Sentiment Categories

	Our model classifies Ancient Latin texts into six emotional intensity levels:

	### Positive Sentiments
	- EXTREMELY POSITIVE (+3): exsultatio, jubilum, beatitudo, summa felicitas
	- Examples: Triumphal declarations, ultimate joy, divine blessing

	- VERY POSITIVE (+2): gaudium, laetitia, amor, gloria, victoria, laudare
	- Examples: Military victories, celebrations, expressions of love/honor

	- MODERATELY POSITIVE (+1): felix, laetus, bonus, pulcher, spes
	- Examples: General contentment, hope, pleasant situations

	### Neutral (0)
	- Factual statements, descriptions without emotional valence

	### Negative Sentiments
	- MODERATELY NEGATIVE (-1): malus, tristis, anxius, timor
	- Examples: Minor concerns, sadness, mild fear

	- VERY NEGATIVE (-2): dolor magnus, timor vehemens, ira, furor
	- Examples: Great pain, intense anger, serious threats

	- EXTREMELY NEGATIVE (-3): desperatio, exitium, cruciatus, malum
	- Examples: Utter despair, destruction, torture, ultimate evil

	## Performance

	\| Configuration \| Accuracy \| Notes \|
	\|:---\|:---:\|:---\|
	\| Base Llama 3.1 (zero-shot) \| 43.8% \| Unreliable, biased toward extremes \|
	\| LoRA Adapter (raw predictions) \| 37.5% \| Systematic but conservative \|
	\| LoRA + Linguistic Rules \| 75.0% \| Production-ready \|

	### Category-Level Performance
	- Neutral Detection: 100% accuracy (3/3 test cases)
	- Moderate Categories: 100% accuracy (learned systematic patterns)
	- Extreme Categories: 83.3% accuracy (with intensity calibration)

	## Training Approach

	Our training methodology combined multiple data sources and validation strategies:

	### Data Pipeline (5-day development cycle)

	Phase 1: Initial Generation
	- Few-shot generation using base Llama 3.1
	- Context-aware synthetic examples
	- Balanced across all six sentiment categories

	Phase 2: Consensus Filtering
	- Trained multiple LoRA variants on hand-annotated data
	- Consensus filtering: kept examples where ≥2 models agreed
	- Reduced noise and improved training data quality

	Phase 3: Corpus Mining
	- Mined authentic Ancient Latin texts from Perseus Digital Library
	- Extracted high-confidence positive examples (previously underrepresented)
	- Combined ~40,000 corpus examples with synthetic data

	Phase 4: Final Training & Iteration
	- Balanced dataset: 9,000 examples (1,500 per category)
	- Distributed training with data-parallel strategy
	- Multiple training runs to optimize hyperparameters

	### Final Training Configuration
	- Training Examples: 9,000 (balanced across 7 categories)
	- Training Epochs: 15
	- Architecture: LoRA adapter (rank: 128, alpha: 256)
	- Optimization: 8-bit quantization for efficiency
	- Hardware: High-performance GPU cluster
	- Framework: PyTorch, HuggingFace Transformers, PEFT

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	# Load base model and adapter
	base_model = "meta-llama/Llama-3.1-8B-Instruct"
	model = AutoModelForCausalLM.from_pretrained(
	base_model,
	device_map="auto",
	torch_dtype=torch.float16,
	trust_remote_code=True
	)

	# Load Team Trojan Parse's adapter
	# Replace YOUR_USERNAME with your Hugging Face username
	model = PeftModel.from_pretrained(model, "YOUR_USERNAME/augustulus-latin-sentiment-lora")
	tokenizer = AutoTokenizer.from_pretrained(base_model)

	# Classify sentiment
	def classify_latin_sentiment(text):
	prompt = f'''Classify the sentiment of this Latin text as: VERY NEGATIVE, MODERATELY NEGATIVE, NEUTRAL, MODERATELY POSITIVE, VERY POSITIVE, or EXTREMELY POSITIVE.

	Latin text: {text}

	Sentiment:'''

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=20,
	temperature=0.1,
	do_sample=False
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	return response.split("Sentiment:")[-1].strip()

	# Example: Extreme positive (triumph)
	text = "Victoria splendidissima! Dux gloriam aeternam meruit!"
	print(classify_latin_sentiment(text))
	# Output: EXTREMELY POSITIVE

	# Example: Extreme negative (despair)
	text = "Bellum crudele et longum populum afflixerat."
	print(classify_latin_sentiment(text))
	# Output: VERY NEGATIVE
	```

	---

	## GGUF Model Download and Local Usage (Merged Fine-Tune)

	The LoRA adapter has been merged with the base model and quantized to Q8_0 (8-bit) precision for efficient deployment on CPU/GPU via tools like `llama.cpp` and `Ollama`.

	### 💾 File Details
	- File Name: `augustulus-latin-sentiment-8b-q8_0.gguf`
	- Size: 8.0 GB
	- Quantization: Q8_0 (Recommended for best balance of speed and accuracy)

	---

	### Llama 3.1 License Notice
	IMPORTANT: This model (including the GGUF file) is a derivative of Meta’s Llama 3.1 model and is governed by the [Meta Llama 3.1 Community License](https://llama.meta.com/llama3_1/license).

	- Attribution: If you redistribute or build products with this model, you must include the statement “Built with Meta Llama 3” in a prominent location (e.g., README, UI footer, about page).
	- Commercial Use: Allowed without additional permission as long as your product or service has fewer than 700 million monthly active users. Above that threshold, you need a separate commercial license from Meta.

	See the full license text here: https://llama.meta.com/llama3_1/license
	---

	### Usage Example (with Ollama)
	This workflow uses a custom Modelfile to set the strict sentiment task and gives the model a simple local name.

	#### Create Modelfile
	Save the following content as a file named `Modelfile`:

	```text
	# Modelfile for the Augustulus Latin Sentiment Model

	FROM hf.co/TronCodes/augustulus-latin-sentiment-lora/augustulus-latin-sentiment-8b-q8_0.gguf

	SYSTEM """
	You are Augustulus, an expert in Classical Latin sentiment analysis. Your task is to respond ONLY with one of the following exact labels: EXTREMELY POSITIVE, VERY POSITIVE, MODERATELY POSITIVE, NEUTRAL, MODERATELY NEGATIVE, VERY NEGATIVE, or EXTREMELY NEGATIVE. Do not provide any conversational text or explanation.
	"""

	TEMPLATE """
	{{ if .System }}<\|start_header_id\|>system<\|end_header_id\|>{{ .System }}<\|eot_id\|>{{ end }}{{ if .Prompt }}<\|start_header_id\|>user<\|end_header_id\|>{{ .Prompt }}<\|eot_id\|>{{ end }}<\|start_header_id\|>assistant<\|end_header_id\|>
	"""

	PARAMETER temperature 0.1
	PARAMETER num_predict 20
	PARAMETER stop "<\|eot_id\|>"
	```

	---

	#### Create and Run
	```bash
	ollama create augustulus-latin -f Modelfile
	ollama run augustulus-latin
	```


	## Acknowledgments

	We gratefully acknowledge:
	* Dr. Eleni Bozia (Ph.D., Dr. phil.) - Senior Project Advisor
	* University of Florida Department of Humanities - Computing resources and support
	* Perseus Digital Library - Access to Classical Latin corpus
	* Meta AI - Llama 3.1 base model
	* HuggingFace - PEFT library and model hosting infrastructure

	## Citation

	```bibtex
	@misc{trojan_parse_latin_sentiment_2025,
	author = {{Team Trojan Parse}},
	title = {Augustulus Latin Sentiment Analysis LoRA},
	year = {2025},
	publisher = {University of Florida},
	journal = {HuggingFace Model Hub},
	howpublished = {\\url{[https://huggingface.co/TronCodes/augustulus-latin-sentiment-lora](https://huggingface.co/TronCodes/augustulus-latin-sentiment-lora)}}
	}