Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +174 -158
adapter_config.json +2 -2
adapter_model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -5,73 +5,97 @@ library_name: peft
 # Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
@@ -83,120 +107,112 @@ Use the code below to get started with the model.
 ### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
 ## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
 ### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]
 ### Framework versions
-- PEFT 0.15.2

 # Model Card for Model ID
+This is a fined-tuned Phi 3.5 Vision Instruct model for receipt OCR specifically.
+It has been fine-tuned on the SROIEv2 datasets and the annotations were generated using Qwen2.5-3B VL.
+The dataset is **[available on Kaggle](https://www.kaggle.com/datasets/sovitrath/receipt-ocr-input)**.
 ## Model Details
+- The base model is **[sovitrath/Phi-3.5-vision-instruct](sovitrath/Phi-3.5-vision-instruct)**.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
+```python
+import torch
+import matplotlib.pyplot as plt
+import transformers
+from PIL import Image
+from transformers import AutoModelForCausalLM, AutoProcessor
+from transformers import BitsAndBytesConfig
+model_id = 'sovitrath/Phi-3.5-Vision-Instruct-OCR'
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map='auto',
+    torch_dtype=torch.bfloat16,
+    trust_remote_code=True,
+    # _attn_implementation='flash_attention_2', # Use `flash_attention_2` on Ampere GPUs and above and `eager` on older GPUs.
+    _attn_implementation='eager', # Use `flash_attention_2` on Ampere GPUs and above and `eager` on older GPUs.
+)
+# processor = AutoProcessor.from_pretrained('sovitrath/Phi-3.5-vision-instruct', trust_remote_code=True)
+processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
+test_image = Image.open('../inference_data/image_1.jpeg').convert('RGB')
+plt.figure(figsize=(9, 7))
+plt.imshow(test_image)
+plt.show()
+def test(model, processor, image, max_new_tokens=1024, device='cuda'):
+    placeholder = f"<|image_1|>\n"
+    messages = [
+        {
+            'role': 'user',
+            'content': placeholder + 'OCR this image accurately'
+        },
+    ]
+    # Prepare the text input by applying the chat template
+    text_input = processor.tokenizer.apply_chat_template(
+        messages,
+        add_generation_prompt=True,
+        tokenize=False
+    )
+    if image.mode != 'RGB':
+        image = image.convert('RGB')
+    # Prepare the inputs for the model
+    model_inputs = processor(
+        text=text_input,
+        images=[image],
+        return_tensors='pt',
+    ).to(device)  # Move inputs to the specified device
+    # Generate text with the model
+    generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens)
+    # Trim the generated ids to remove the input ids
+    trimmed_generated_ids = [
+        out_ids[len(in_ids):] for in_ids, out_ids in zip(model_inputs.input_ids, generated_ids)
+    ]
+    # Decode the output text
+    output_text = processor.batch_decode(
+        trimmed_generated_ids,
+        skip_special_tokens=True,
+        clean_up_tokenization_spaces=False
+    )
+    return output_text[0]  # Return the first decoded output text
+output = test(model, processor, test_image)
+print(output)
+```
 ## Training Details
 ### Training Procedure
+* It has been fine-tuned for 1200 steps. However, the checkpoints correspond to the model saved at 400 steps which gave the best loss.
+* The text file annotations were generated using Qwen2.5-3B VL.
 #### Training Hyperparameters
+* It is a LoRA model.
+**LoRA configuration:**
+```python
+# Configure LoRA
+peft_config = LoraConfig(
+    r=8,
+    lora_alpha=16,
+    lora_dropout=0.0,
+    target_modules=['down_proj','o_proj','k_proj','q_proj','gate_proj','up_proj','v_proj'],
+    use_dora=True,
+    init_lora_weights='gaussian'
+)
+# Apply PEFT model adaptation
+peft_model = get_peft_model(model, peft_config)
+# Print trainable parameters
+peft_model.print_trainable_parameters()
+```
+**Trainer configuration:**
+```python
+# Configure training arguments using SFTConfig
+training_args = transformers.TrainingArguments(
+    output_dir=output_dir,
+    logging_dir=output_dir,
+    # num_train_epochs=1,
+    max_steps=1200, # 625,
+    per_device_train_batch_size=1, # Batch size MUST be 1 for Phi 3.5 Vision Instruct fine-tuning
+    per_device_eval_batch_size=1, # Batch size MUST be 1 for Phi 3.5 Vision Instruct fine-tuning
+    gradient_accumulation_steps=4, # 4
+    warmup_steps=50,
+    learning_rate=1e-4,
+    weight_decay=0.01,
+    logging_steps=400,
+    eval_steps=400,
+    save_steps=400,
+    logging_strategy='steps',
+    eval_strategy='steps',
+    save_strategy='steps',
+    save_total_limit=2,
+    optim='adamw_torch_fused',
+    bf16=True,
+    report_to='wandb',
+    remove_unused_columns=False,
+    gradient_checkpointing=True,
+    dataloader_num_workers=4,
+    # dataset_text_field='',
+    # dataset_kwargs={'skip_prepare_dataset': True},
+    load_best_model_at_end=True,
+    save_safetensors=True,
+)
+```
 ## Evaluation
+The current best validation loss is **0.377421**.
+The CER on the test set is **0.355**. The Qwen2.5-3B VL test annotations were used as ground truth.
 ## Technical Specifications [optional]
 ### Compute Infrastructure
+The model was trained on a system with 10GB RTX 3080 GPU, 10th generation i7 CPU, and 32GB RAM.
 ### Framework versions
+```
+torch==2.5.1
+torchvision==0.20.1
+torchaudio==2.5.1
+flash-attn==2.7.2.post1
+triton==3.1.0
+transformers==4.51.3
+accelerate==1.2.0
+datasets==4.1.1
+huggingface-hub==0.31.1
+peft==0.15.2
+trl==0.18.0
+safetensors==0.4.5
+sentencepiece==0.2.0
+tiktoken==0.8.0
+einops==0.8.0
+opencv-python==4.10.0.84
+pillow==10.2.0
+numpy==2.2.0
+scipy==1.14.1
+tqdm==4.66.4
+pandas==2.2.2
+pyarrow==21.0.0
+regex==2024.11.6
+requests==2.32.3
+python-dotenv==1.1.1
+wandb==0.22.1
+rich==13.9.4
+jiwer==4.0.0
+bitsandbytes==0.45.0
+```

adapter_config.json CHANGED Viewed

@@ -18,7 +18,7 @@
   "loftq_config": {},
   "lora_alpha": 16,
   "lora_bias": false,
-  "lora_dropout": 0.1,
   "megatron_config": null,
   "megatron_core": "megatron.core",
   "modules_to_save": null,
@@ -30,8 +30,8 @@
     "v_proj",
     "k_proj",
     "down_proj",
-    "gate_proj",
     "q_proj",
     "o_proj",
     "up_proj"
   ],

   "loftq_config": {},
   "lora_alpha": 16,
   "lora_bias": false,
+  "lora_dropout": 0.0,
   "megatron_config": null,
   "megatron_core": "megatron.core",
   "modules_to_save": null,
     "v_proj",
     "k_proj",
     "down_proj",
     "q_proj",
+    "gate_proj",
     "o_proj",
     "up_proj"
   ],

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c17caf0f7db5ab1a99a01645dce61fe1aaf468e0737b8af86cbd0e49fca8ecb7
 size 23692472

 version https://git-lfs.github.com/spec/v1
+oid sha256:12d72b2be539aaa15a35d3467108ab4020c951d1a03fc33336725a86ad93ac3e
 size 23692472

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0c3bdaee66245ab8ad4b054eb08da1033954d9be35bbe74700293192ed7f5a4c
 size 5304

 version https://git-lfs.github.com/spec/v1
+oid sha256:354760f82ee21230895bce5a4846f7a7b7665f442926237e320afb2996b01373
 size 5304