sophiargh
/

MNLP_M3_mcqa_model_v2

@@ -4,22 +4,19 @@ license: apache-2.0
 base_model: Qwen/Qwen3-0.6B-Base
 tags:
 - generated_from_trainer
-metrics:
-- accuracy
 model-index:
-- name: MNLP_M3_mcqa_model_2
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# MNLP_M3_mcqa_model_2
-This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.2953
-- Accuracy: 0.9001
 ## Model description
@@ -38,31 +35,24 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 2
-- eval_batch_size: 2
 - seed: 42
 - gradient_accumulation_steps: 4
-- total_train_batch_size: 8
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.01
 - num_epochs: 4
 ### Training results
-| Training Loss | Epoch  | Step  | Validation Loss | Accuracy |
-|:-------------:|:------:|:-----:|:---------------:|:--------:|
-| 0.2687        | 0.2278 | 1000  | 0.2625          | 0.8896   |
-| 0.2575        | 0.4555 | 2000  | 0.2582          | 0.8926   |
-| 0.2475        | 0.6833 | 3000  | 0.2482          | 0.8975   |
-| 0.2497        | 0.9111 | 4000  | 0.2461          | 0.8968   |
-| 0.1928        | 1.1387 | 5000  | 0.2594          | 0.8999   |
-| 0.1868        | 1.3665 | 6000  | 0.2640          | 0.8997   |
-| 0.1974        | 1.5942 | 7000  | 0.2714          | 0.9001   |
-| 0.1935        | 1.8220 | 8000  | 0.2666          | 0.9006   |
-| 0.171         | 2.0497 | 9000  | 0.2907          | 0.8998   |
-| 0.1628        | 2.2774 | 10000 | 0.2953          | 0.9001   |
 ### Framework versions

 base_model: Qwen/Qwen3-0.6B-Base
 tags:
 - generated_from_trainer
 model-index:
+- name: MNLP_M3_mcqa_model
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# MNLP_M3_mcqa_model
+This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0028
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 4
+- eval_batch_size: 4
 - seed: 42
 - gradient_accumulation_steps: 4
+- total_train_batch_size: 16
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 4
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 0.0019        | 1.0   | 2196 | 0.0018          |
+| 0.0011        | 2.0   | 4392 | 0.0018          |
+| 0.0003        | 3.0   | 6588 | 0.0028          |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:54fd0f1e955c305c66c7781a270405271ae9a2f36b28c95dabe6a7bf6e1a9e75
 size 1192135096

 version https://git-lfs.github.com/spec/v1
+oid sha256:8de2862e84ecf33c3d493b8e4001dbfbe454f101dda137b481388d60eb290120
 size 1192135096

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:08165811af69b25e105edc482088f618cb57f80eb53879d985df3b3265b7e0a6
 size 5304

 version https://git-lfs.github.com/spec/v1
+oid sha256:b81453047a38e1b725777fd605757ed38025f96709bcf86ca902845d154f4bb6
 size 5304