Updated model with improvements

Browse files

Files changed (6) hide show

README.md +166 -73
adapter_model.safetensors +1 -1
special_tokens_map.json +7 -0
tokenizer.json +7 -2
tokenizer_config.json +2 -1
training_args.bin +3 -0

README.md CHANGED Viewed

@@ -1,114 +1,207 @@
 ---
-license: mit
-language:
-- ti
-metrics:
-- perplexity
-- accuracy
 pipeline_tag: text-generation
-library_name: transformers
 tags:
-- gpt2
-- tigriniya
 - lora
-- text-generation-inference
-- causal-lm
-- low_resource
 ---
-# GPT-2 Tigrinya (LoRA Fine-Tuned)
-## Model Details
-- **Developed by**: Abrhaley (Warsaw University of Technology, MSc student)
-- **Model type**: Causal Language Model (decoder-only Transformer, GPT-2 architecture)
-- **Languages**: Tigrinya (`ti`)
-- **License**: MIT
-- **Finetuned from model**: [gpt2](https://huggingface.co/gpt2)
-- **Framework**: [Transformers](https://huggingface.co/transformers), [PEFT](https://github.com/huggingface/peft)
----
-## Model Description
-This model is a **GPT-2 small** fine-tuned using **LoRA (Low-Rank Adaptation)** on a custom Tigrinya dataset.
-It is designed to generate coherent Tigrinya text for tasks such as dialogue, storytelling, and text continuation.
-- **Architecture**: GPT-2 (124M parameters, with LoRA adapters trained on attention layers)
-- **LoRA Config**: r=8, alpha=32, dropout=0.05
-- **Tokenizer**: GPT-2 tokenizer, extended with EOS as padding
----
-## Model Sources
-- **Repository**:  (https://huggingface.co/abrhaley/gpt2-tigrinya-lora)
-- **Training Script**: Hugging Face `Trainer` + PEFT
----
 ## Uses
 ### Direct Use
-- Text generation in Tigrinya
-- Chatbot / dialogue systems
-- Story and content generation
-### Downstream Use
-- Further fine-tuning for domain-specific Tigrinya applications (e.g., news, education, cultural storytelling)
 ### Out-of-Scope Use
-- Generating harmful, offensive, or misleading content
-- Using for critical decision-making without human supervision
----
 ## Bias, Risks, and Limitations
-- The dataset may not fully represent all dialects of Tigrinya.
-- Risk of generating biased, offensive, or incoherent outputs.
-- Not suitable for factual QA or tasks requiring truthfulness.
----
-## Recommendations
-Users should:
-- Verify outputs before real-world use.
-- Avoid sensitive or harmful applications.
----
 ## How to Get Started with the Model
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
-model_id = "abrhaley/gpt2-tigrinya-lora"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id)
-generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
-prompt = "ኣብ ኣዲስ ኣበባ"
-print(generator(prompt, max_length=100, do_sample=True))
-## Eval Results
-| Metric             | Value   |
-|--------------------|---------|
-| Training loss      | 1.74    |
-| Validation loss    | 1.61    |
-| Training PPL       | 5.73    |
-| Validation PPL     | 5.00    |
-| Runtime (1 epoch)  | ~5.5h   |
-| GPU                | Colab T4 |
-##Citation
-@misc{abrhaley2025gpt2tigrinya,
-  title   = {GPT-2 Tigrinya LoRA Fine-Tuned},
-  author  = {Abrhaley},
-  year    = {2025},
-  url     = {https://huggingface.co/abrhaley/gpt2-tigrinya-lora}
-}

 ---
+base_model: gpt2
+library_name: peft
 pipeline_tag: text-generation
 tags:
+- base_model:adapter:gpt2
 - lora
+- transformers
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
 ## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
 ### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
 ## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:60a298bde169f28c779477a2466f32011c3914607ebfdebd48ac0a8b9d6b9aa5
 size 3253104

 version https://git-lfs.github.com/spec/v1
+oid sha256:130792c9a7496b3758e7a1e049ca51698a5776e59e1fc2911ff6c9d8ebd3fa05
 size 3253104

special_tokens_map.json CHANGED Viewed

@@ -1,5 +1,12 @@
 {
   "bos_token": "<|endoftext|>",
   "eos_token": "<|endoftext|>",
   "unk_token": "<|endoftext|>"
 }

 {
   "bos_token": "<|endoftext|>",
   "eos_token": "<|endoftext|>",
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
   "unk_token": "<|endoftext|>"
 }

tokenizer.json CHANGED Viewed

@@ -1,6 +1,11 @@
 {
   "version": "1.0",
-  "truncation": null,
   "padding": null,
   "added_tokens": [
     {
@@ -9,7 +14,7 @@
       "single_word": false,
       "lstrip": false,
       "rstrip": false,
-      "normalized": true,
       "special": true
     }
   ],

 {
   "version": "1.0",
+  "truncation": {
+    "direction": "Right",
+    "max_length": 128,
+    "strategy": "LongestFirst",
+    "stride": 0
+  },
   "padding": null,
   "added_tokens": [
     {
       "single_word": false,
       "lstrip": false,
       "rstrip": false,
+      "normalized": false,
       "special": true
     }
   ],

tokenizer_config.json CHANGED Viewed

@@ -4,7 +4,7 @@
     "50256": {
       "content": "<|endoftext|>",
       "lstrip": false,
-      "normalized": true,
       "rstrip": false,
       "single_word": false,
       "special": true
@@ -15,6 +15,7 @@
   "eos_token": "<|endoftext|>",
   "extra_special_tokens": {},
   "model_max_length": 1024,
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"
 }

     "50256": {
       "content": "<|endoftext|>",
       "lstrip": false,
+      "normalized": false,
       "rstrip": false,
       "single_word": false,
       "special": true
   "eos_token": "<|endoftext|>",
   "extra_special_tokens": {},
   "model_max_length": 1024,
+  "pad_token": "<|endoftext|>",
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"
 }

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6d5c486ee6dd50d77356d027e9f5ee758356cb1cd998170456c16034147fb681
+size 5713