|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
library_name: peft |
|
|
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct |
|
|
tags: |
|
|
- llama |
|
|
- lora |
|
|
- claim-extraction |
|
|
- fact-checking |
|
|
- news |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# NewsScope LoRA Adapter |
|
|
|
|
|
This repository contains a **LoRA adapter** fine-tuned for **schema-grounded claim extraction** from news articles. |
|
|
|
|
|
It produces structured JSON outputs with: |
|
|
- domain |
|
|
- headline |
|
|
- key_points |
|
|
- whos_involved |
|
|
- how_it_unfolded |
|
|
- claims (2-3 verifiable claims with evidence) |
|
|
|
|
|
## Key Result (Human Evaluation) |
|
|
- **NewsScope:** 89.4% accuracy |
|
|
- **GPT-4o-mini baseline:** 93.7% |
|
|
- Reported difference is not statistically significant (p=0.07) |
|
|
|
|
|
## Important: LLaMA License |
|
|
You must accept the **Meta LLaMA** license for the base model on Hugging Face: |
|
|
`meta-llama/Meta-Llama-3.1-8B-Instruct` |
|
|
|
|
|
Then either: |
|
|
- run `huggingface-cli login`, or |
|
|
- set `HF_TOKEN` in your environment. |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
|
|
|
base = AutoModelForCausalLM.from_pretrained( |
|
|
"meta-llama/Meta-Llama-3.1-8B-Instruct", |
|
|
torch_dtype=torch.float16, |
|
|
device_map="auto", |
|
|
) |
|
|
|
|
|
model = PeftModel.from_pretrained(base_model, "nidhipandya/NewsScope-lora") |
|
|
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct") |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
- **Base model:** meta-llama/Meta-Llama-3.1-8B-Instruct |
|
|
- **LoRA rank:** 16 |
|
|
- **Training set size:** 315 articles (URLs + annotations; article text not publicly redistributed) |
|
|
- **Notes:** Training reproduction requires fetching article text from URLs due to copyright. |
|
|
|
|
|
## Links |
|
|
- **Code:** https://github.com/nidhip1611/NewsScope |
|
|
- **Benchmark:** GitHub Releases (benchmark.zip) |
|
|
- **Paper:** arXiv (TBD) |
|
|
|
|
|
## Citation |
|
|
```bibtex |
|
|
@article{pandyaNewsscope, |
|
|
title={NewsScope: Schema-Grounded Cross-Domain News Claim Extraction with Open Models}, |
|
|
author={Pandya, Nidhi}, |
|
|
journal={arXiv preprint arXiv:TBD}, |
|
|
year={TBD} |
|
|
} |
|
|
``` |
|
|
|