--- language: en license: apache-2.0 tags: - text-summarization - meeting-summarization - t5 - transformers - qmsum datasets: - qmsum metrics: - rouge pipeline_tag: summarization --- # Meeting Summarizer This model is a fine-tuned version of `t5-small` for meeting summarization tasks. ## Model Details - **Base Model**: t5-small - **Task**: Abstractive Meeting Summarization - **Training Data**: QMSum Dataset + Enhanced Training - **Parameters**: t5-small architecture ## Training Configuration - **Max Input Length**: 256 tokens - **Max Output Length**: 64 tokens - **Batch Size**: 16 - **Learning Rate**: 5e-05 - **Training Epochs**: 1 - **Training Samples**: N/A ## Usage ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("CodeXRyu/meeting-summarizer") model = AutoModelForSeq2SeqLM.from_pretrained("CodeXRyu/meeting-summarizer") def generate_summary(meeting_text, max_length=150): # Prepare input input_text = "summarize: " + meeting_text inputs = tokenizer(input_text, max_length=512, truncation=True, return_tensors="pt") # Generate summary summary_ids = model.generate( inputs["input_ids"], max_length=max_length, num_beams=4, length_penalty=2.0, early_stopping=True ) return tokenizer.decode(summary_ids[0], skip_special_tokens=True) # Example usage meeting_transcript = ''' John: Good morning team. Let's discuss our Q3 results. Sarah: Our sales exceeded targets by 15%, reaching $2.1M in revenue. Mike: The new marketing campaign was very effective. John: Great work everyone. Let's plan for Q4. ''' summary = generate_summary(meeting_transcript) print(summary) ``` ## Training Data This model was trained on the QMSum dataset, which contains real meeting transcripts from multiple domains: - Academic meetings - Product development meetings - Committee meetings ## Performance The model achieves competitive ROUGE scores on meeting summarization benchmarks. ## Limitations - Optimized for English meeting transcripts - Performance may vary on very long meetings (>512 tokens input) - Best suited for structured meeting formats with speaker labels ## Citation If you use this model, please cite: ``` @misc{meeting-summarizer-codexryu, author = {CodeXRyu}, title = {Meeting Summarizer}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/CodeXRyu/meeting-summarizer} } ```