Mudasir692 commited on
Commit
a59b78a
·
verified ·
1 Parent(s): 7212a6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -24
README.md CHANGED
@@ -1,37 +1,79 @@
1
- tokenizer = MBart50Tokenizer.from_pretrained(checkpoint)
 
2
 
3
- # Example text
4
- text = """تعلیم ایک معاشرتی ترقی کا بنیادی عنصر ہے۔ حالیہ برسوں میں مختلف اداروں نے تعلیمی معیار کو بہتر بنانے اور زیادہ بچوں تک تعلیم کی رسائی ممکن بنانے کے لیے مختلف اقدامات کیے ہیں۔ ان اقدامات میں اسکولوں کی تعداد بڑھانا، اساتذہ کی تربیت میں اضافہ کرنا، اور تعلیمی مواد کی دستیابی کو یقینی بنانا شامل ہے۔ ماہرین کا خیال ہے کہ اگر یہ کوششیں مؤثر طریقے سے کی جائیں تو معاشرتی ترقی میں تیزی لائی جا سکتی ہے۔"""
 
5
 
6
- # Tokenize and generate summary
7
- inputs = tokenizer(text, return_tensors="pt")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  with torch.no_grad():
9
- outputs = summarizer.generate(**inputs)
10
 
11
- # Decode the summary
12
- summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
13
- print("summary:", summary)
14
  Training Details
15
  Training Data
16
- The model was fine-tuned on a dataset composed of headline-based examples to focus on generating concise and relevant summaries.
17
 
18
  Training Procedure
19
- The training procedure included using standard transformer training practices, optimizing the model's performance for generating summaries that preserve essential information.
20
 
21
  Training Hyperparameters
22
  Training regime: Mixed precision (fp16)
 
 
 
 
23
  Evaluation
24
- The evaluation focused on metrics such as ROUGE to measure the quality of summaries in terms of relevance and conciseness.
25
 
26
- Environmental Impact
27
- Hardware Type: Cloud-based GPUs (e.g., NVIDIA V100)
28
- Hours used: Approximately [Enter Number] hours
29
- Cloud Provider: [Enter Cloud Provider]
30
- Carbon Emitted: Estimated using the Machine Learning Impact calculator.
31
- Citation
32
- BibTeX:
33
 
34
- bash
35
  Copy code
36
  @model{mudasir692_bart_urdu_summarizer,
37
  author = {Mudasir},
@@ -39,7 +81,4 @@ Copy code
39
  year = {2024},
40
  url = {https://huggingface.co/Mudasir692/bart-urdu-summarizer}
41
  }
42
- APA: Mudasir. (2024). Bart-Urdu-Summarizer. Retrieved from Hugging Face Model Page Link.
43
-
44
- Model Card Contact
45
- For questions or collaborations, please contact mirmudasir692@gmail.com.
 
1
+ Model Card for Bart Urdu Summarizer
2
+ This model is designed to summarize Urdu text using the BART architecture, fine-tuned on a custom Urdu summarization dataset.
3
 
4
+ Model Details
5
+ Model Description
6
+ This model leverages the BART (Bidirectional and Auto-Regressive Transformers) architecture to perform Urdu text summarization. The model was fine-tuned on a headline-based Urdu dataset to generate concise and meaningful summaries. It is well-suited for tasks like news summarization, article summarization, and extracting key points from long texts.
7
 
8
+ Developed by: Mudasir692
9
+ Model type: BART
10
+ Language(s) (NLP): Urdu
11
+ License: MIT
12
+ Finetuned from model: facebook/bart-large
13
+ Model Sources
14
+ Repository: https://huggingface.co/Mudasir692/bart-urdu-summarizer
15
+ Uses
16
+ Direct Use
17
+ This model is intended for generating concise summaries of Urdu text directly from input data.
18
+
19
+ Downstream Use
20
+ The model can be fine-tuned further for specific tasks involving Urdu summarization or adapted for multilingual summarization tasks.
21
+
22
+ Out-of-Scope Use
23
+ The model may not perform well on highly specialized domains or technical documents without additional fine-tuning. It is not suitable for generating summaries of text in languages other than Urdu.
24
+
25
+ Bias, Risks, and Limitations
26
+ The model may inherit biases from the training data, particularly in topics and vocabulary frequently represented in the dataset. The summaries may occasionally miss critical context or introduce ambiguities.
27
+
28
+ Recommendations
29
+ Users should validate the summaries in sensitive applications and consider fine-tuning or additional post-processing for domain-specific requirements.
30
+
31
+ How to Get Started with the Model
32
+ To get started with the model, use the following code snippet to load the model and tokenizer, input Urdu text, and generate concise summaries.
33
+
34
+ python
35
+ Copy code
36
+ import torch
37
+ from transformers import MBartForConditionalGeneration, MBart50Tokenizer
38
+
39
+ # Load the tokenizer and model
40
+ tokenizer = MBart50Tokenizer.from_pretrained("Mudasir692/bart-urdu-summarizer")
41
+ model = MBartForConditionalGeneration.from_pretrained("Mudasir692/bart-urdu-summarizer")
42
+
43
+ # Example input text (Urdu)
44
+ input_text = """
45
+ تعلیم ایک معاشرتی ترقی کا بنیادی عنصر ہے۔ حالیہ برسوں میں مختلف اداروں نے تعلیمی معیار کو بہتر بنانے اور زیادہ بچوں تک تعلیم کی رسائی ممکن بنانے کے لیے مختلف اقدامات کیے ہیں۔
46
+ ان اقدامات میں اسکولوں کی تعداد بڑھانا، اساتذہ کی تربیت میں اضافہ کرنا، اور تعلیمی مواد کی دستیابی کو یقینی بنانا شامل ہے۔ ماہرین کا خیال ہے کہ اگر یہ کوششیں مؤثر طریقے سے کی جائیں تو معاشرتی ترقی میں تیزی لائی جا سکتی ہے۔
47
+ """
48
+
49
+ # Tokenize the input text
50
+ inputs = tokenizer(input_text, return_tensors="pt")
51
+
52
+ # Generate the summary
53
  with torch.no_grad():
54
+ outputs = model.generate(**inputs)
55
 
56
+ # Decode the summary and print the result
57
+ summary_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
58
+ print("Summary (Urdu):", summary_text)
59
  Training Details
60
  Training Data
61
+ The model was fine-tuned on a custom dataset of Urdu text paired with concise summaries, focusing on headline-based examples. The dataset included a variety of topics to improve the generalization capabilities of the model.
62
 
63
  Training Procedure
64
+ The model was fine-tuned using techniques like mixed precision to optimize training efficiency and performance.
65
 
66
  Training Hyperparameters
67
  Training regime: Mixed precision (fp16)
68
+ Maximum sequence length: 512
69
+ Batch size: 2
70
+ accumulation_steps = 8
71
+ Learning rate: 3e-5
72
  Evaluation
73
+ The model's performance was evaluated using ROUGE metrics, which showed strong alignment between the generated summaries and reference summaries in the dataset.
74
 
 
 
 
 
 
 
 
75
 
76
+ bibtex
77
  Copy code
78
  @model{mudasir692_bart_urdu_summarizer,
79
  author = {Mudasir},
 
81
  year = {2024},
82
  url = {https://huggingface.co/Mudasir692/bart-urdu-summarizer}
83
  }
84
+ APA: Mudasir. (2024). Bart-Urdu-Summarizer. Retrieved from https://huggingface.co/Mudasir692/bart-urdu-summarizer.