giuid
/

flan_t5_large_summarization_v2

Text Generation

text2text-generation

Model card Files Files and versions

giuid commited on Jan 16, 2025

Commit

6594e64

·

verified ·

1 Parent(s): aa43926

Create README.md

Files changed (1) hide show

README.md +54 -0

README.md ADDED Viewed

	@@ -0,0 +1,54 @@

+---
+language: en
+datasets:
+  - efra
+license: apache-2.0
+tags:
+  - summarization
+  - flan-t5
+  - legal
+  - food
+model_type: t5
+pipeline_tag: text2text-generation
+---
+# Flan-T5 Large Fine-Tuned on EFRA Dataset
+This is a fine-tuned version of [Flan-T5 Large](https://huggingface.co/google/flan-t5-large) on the **EFRA dataset** for summarizing legal documents related to food regulations and policies.
+## Model Description
+Flan-T5 is a sequence-to-sequence model trained for text-to-text tasks. This fine-tuned version is specifically optimized for summarizing legal text in the domain of food legislation, regulatory requirements, and compliance documents.
+### Fine-Tuning Details
+- **Base Model**: [google/flan-t5-large](https://huggingface.co/google/flan-t5-large)
+- **Dataset**: EFRA (a curated dataset of legal documents in the food domain)
+- **Objective**: Summarization of legal documents
+- **Framework**: Hugging Face Transformers
+## Applications
+This model is suitable for:
+- Summarizing legal texts in the food domain
+- Extracting key information from lengthy regulatory documents
+- Assisting legal professionals and food companies in understanding compliance requirements
+## Example Usage
+```python
+from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+# Load the model and tokenizer
+model = AutoModelForSeq2SeqLM.from_pretrained("[your-username]/flan-t5-large-efra")
+tokenizer = AutoTokenizer.from_pretrained("[your-username]/flan-t5-large-efra")
+# Input text
+input_text = "Your lengthy legal document text here..."
+# Tokenize and generate summary
+inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)
+outputs = model.generate(inputs.input_ids, max_length=150, num_beams=5, early_stopping=True)
+# Decode summary
+summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(summary)