sujr commited on Jun 3, 2024

Commit

8a3746a

verified ·

1 Parent(s): ba9f0b5

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

checkpoint-1200/README.md +203 -0
checkpoint-1200/adapter_config.json +380 -0
checkpoint-1200/adapter_model.safetensors +3 -0
checkpoint-1200/latest +1 -0
checkpoint-1200/qwen.tiktoken +0 -0
checkpoint-1200/rng_state_0.pth +3 -0
checkpoint-1200/rng_state_1.pth +3 -0
checkpoint-1200/rng_state_2.pth +3 -0
checkpoint-1200/rng_state_3.pth +3 -0
checkpoint-1200/rng_state_4.pth +3 -0
checkpoint-1200/rng_state_5.pth +3 -0
checkpoint-1200/rng_state_6.pth +3 -0
checkpoint-1200/rng_state_7.pth +3 -0
checkpoint-1200/scheduler.pt +3 -0
checkpoint-1200/special_tokens_map.json +3 -0
checkpoint-1200/tokenization_qwen.py +598 -0
checkpoint-1200/tokenizer_config.json +14 -0
checkpoint-1200/trainer_state.json +873 -0
checkpoint-1200/training_args.bin +3 -0
checkpoint-1200/zero_to_fp32.py +587 -0
checkpoint-1600/README.md +203 -0
checkpoint-1600/adapter_config.json +380 -0
checkpoint-1600/adapter_model.safetensors +3 -0
checkpoint-1600/latest +1 -0
checkpoint-1600/qwen.tiktoken +0 -0
checkpoint-1600/rng_state_0.pth +3 -0
checkpoint-1600/rng_state_1.pth +3 -0
checkpoint-1600/rng_state_2.pth +3 -0
checkpoint-1600/rng_state_3.pth +3 -0
checkpoint-1600/rng_state_4.pth +3 -0
checkpoint-1600/rng_state_5.pth +3 -0
checkpoint-1600/rng_state_6.pth +3 -0
checkpoint-1600/rng_state_7.pth +3 -0
checkpoint-1600/scheduler.pt +3 -0
checkpoint-1600/special_tokens_map.json +3 -0
checkpoint-1600/tokenization_qwen.py +598 -0
checkpoint-1600/tokenizer_config.json +14 -0
checkpoint-1600/trainer_state.json +1153 -0
checkpoint-1600/training_args.bin +3 -0
checkpoint-1600/zero_to_fp32.py +587 -0
checkpoint-2000/README.md +203 -0
checkpoint-2000/adapter_config.json +380 -0
checkpoint-2000/adapter_model.safetensors +3 -0
checkpoint-2000/latest +1 -0
checkpoint-2000/qwen.tiktoken +0 -0
checkpoint-2000/rng_state_0.pth +3 -0
checkpoint-2000/rng_state_1.pth +3 -0
checkpoint-2000/rng_state_2.pth +3 -0
checkpoint-2000/rng_state_3.pth +3 -0
checkpoint-2000/rng_state_4.pth +3 -0

checkpoint-1200/README.md ADDED Viewed

	@@ -0,0 +1,203 @@

+---
+library_name: peft
+base_model: Qwen/Qwen-VL-Chat
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.10.0
+- PEFT 0.11.1

checkpoint-1200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,380 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen-VL-Chat",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "transformer.h.16.mlp.w1",
+    "transformer.visual.transformer.resblocks.13.attn.out_proj",
+    "transformer.h.28.mlp.w1",
+    "transformer.h.16.attn.c_attn",
+    "transformer.h.3.mlp.w1",
+    "transformer.visual.transformer.resblocks.29.attn.in_proj",
+    "transformer.visual.transformer.resblocks.19.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.47.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.34.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.4.attn.out_proj",
+    "transformer.h.31.attn.c_attn",
+    "transformer.h.16.mlp.w2",
+    "transformer.visual.transformer.resblocks.5.attn.out_proj",
+    "transformer.h.2.mlp.w1",
+    "transformer.visual.transformer.resblocks.7.attn.in_proj",
+    "transformer.h.20.mlp.w2",
+    "transformer.h.19.mlp.w1",
+    "transformer.visual.transformer.resblocks.18.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.27.attn.out_proj",
+    "transformer.visual.transformer.resblocks.10.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.43.mlp.c_fc",
+    "transformer.h.5.mlp.w1",
+    "transformer.visual.transformer.resblocks.15.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.25.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.10.attn.out_proj",
+    "transformer.visual.transformer.resblocks.4.mlp.c_fc",
+    "transformer.h.31.mlp.w2",
+    "transformer.visual.transformer.resblocks.37.attn.out_proj",
+    "transformer.h.8.attn.c_proj",
+    "transformer.h.29.attn.c_attn",
+    "transformer.visual.transformer.resblocks.24.mlp.c_proj",
+    "transformer.h.19.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.11.attn.out_proj",
+    "transformer.h.13.mlp.c_proj",
+    "transformer.h.27.mlp.c_proj",
+    "transformer.h.31.mlp.w1",
+    "transformer.visual.transformer.resblocks.7.mlp.c_proj",
+    "transformer.h.28.mlp.w2",
+    "transformer.visual.transformer.resblocks.3.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.13.attn.in_proj",
+    "transformer.h.21.attn.c_attn",
+    "transformer.visual.transformer.resblocks.23.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.33.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.42.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.3.attn.in_proj",
+    "transformer.h.13.mlp.w1",
+    "transformer.visual.transformer.resblocks.22.attn.out_proj",
+    "transformer.visual.transformer.resblocks.20.mlp.c_fc",
+    "transformer.h.26.mlp.w2",
+    "transformer.h.14.attn.c_attn",
+    "transformer.h.16.attn.c_proj",
+    "transformer.h.1.mlp.w1",
+    "transformer.visual.transformer.resblocks.21.attn.out_proj",
+    "transformer.visual.transformer.resblocks.39.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.4.attn.in_proj",
+    "transformer.h.29.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.12.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.14.attn.in_proj",
+    "transformer.h.28.attn.c_proj",
+    "transformer.h.18.mlp.w1",
+    "transformer.h.27.mlp.w2",
+    "transformer.h.18.attn.c_attn",
+    "transformer.visual.transformer.resblocks.33.attn.out_proj",
+    "transformer.h.5.mlp.w2",
+    "transformer.visual.transformer.resblocks.37.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.2.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.42.attn.out_proj",
+    "transformer.visual.transformer.resblocks.15.attn.in_proj",
+    "transformer.visual.transformer.resblocks.6.mlp.c_fc",
+    "transformer.h.13.mlp.w2",
+    "transformer.h.23.attn.c_proj",
+    "transformer.h.20.mlp.c_proj",
+    "transformer.h.14.mlp.w2",
+    "transformer.visual.transformer.resblocks.9.attn.in_proj",
+    "transformer.visual.transformer.resblocks.46.attn.in_proj",
+    "transformer.h.9.attn.c_attn",
+    "transformer.visual.transformer.resblocks.36.mlp.c_proj",
+    "transformer.h.31.attn.c_proj",
+    "transformer.visual.transformer.resblocks.19.mlp.c_fc",
+    "transformer.h.17.mlp.w1",
+    "transformer.h.2.attn.c_proj",
+    "transformer.visual.transformer.resblocks.47.attn.in_proj",
+    "transformer.visual.transformer.resblocks.45.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.46.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.27.attn.in_proj",
+    "transformer.visual.transformer.resblocks.26.attn.out_proj",
+    "transformer.h.22.attn.c_proj",
+    "transformer.visual.transformer.resblocks.40.attn.out_proj",
+    "transformer.visual.transformer.resblocks.46.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.18.attn.out_proj",
+    "transformer.h.27.attn.c_proj",
+    "transformer.visual.transformer.resblocks.26.attn.in_proj",
+    "transformer.h.4.mlp.w1",
+    "transformer.h.10.attn.c_proj",
+    "transformer.h.6.attn.c_attn",
+    "transformer.h.2.attn.c_attn",
+    "transformer.h.22.mlp.w1",
+    "transformer.visual.transformer.resblocks.39.mlp.c_fc",
+    "transformer.h.8.mlp.w2",
+    "transformer.h.4.attn.c_attn",
+    "transformer.h.26.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.29.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.5.mlp.c_proj",
+    "transformer.h.11.mlp.c_proj",
+    "transformer.h.0.mlp.w2",
+    "transformer.visual.transformer.resblocks.36.attn.out_proj",
+    "transformer.h.29.mlp.w1",
+    "transformer.h.12.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.2.attn.in_proj",
+    "transformer.visual.transformer.resblocks.2.mlp.c_fc",
+    "transformer.h.25.attn.c_attn",
+    "transformer.visual.transformer.resblocks.19.attn.in_proj",
+    "transformer.visual.transformer.resblocks.43.attn.out_proj",
+    "transformer.visual.transformer.resblocks.35.attn.out_proj",
+    "transformer.h.22.attn.c_attn",
+    "transformer.h.0.mlp.w1",
+    "transformer.h.3.attn.c_attn",
+    "transformer.h.28.attn.c_attn",
+    "transformer.visual.transformer.resblocks.25.attn.in_proj",
+    "transformer.visual.transformer.resblocks.34.attn.out_proj",
+    "transformer.h.21.attn.c_proj",
+    "transformer.h.6.attn.c_proj",
+    "transformer.visual.transformer.resblocks.11.mlp.c_proj",
+    "transformer.h.13.attn.c_attn",
+    "transformer.visual.transformer.resblocks.38.attn.out_proj",
+    "transformer.h.3.attn.c_proj",
+    "transformer.visual.transformer.resblocks.17.mlp.c_fc",
+    "transformer.h.26.mlp.w1",
+    "transformer.visual.transformer.resblocks.36.mlp.c_fc",
+    "transformer.h.26.attn.c_attn",
+    "transformer.visual.transformer.resblocks.29.attn.out_proj",
+    "transformer.h.7.mlp.w1",
+    "transformer.visual.transformer.resblocks.40.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.9.attn.out_proj",
+    "transformer.h.3.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.26.mlp.c_fc",
+    "transformer.h.11.mlp.w2",
+    "transformer.visual.transformer.resblocks.33.attn.in_proj",
+    "transformer.visual.transformer.resblocks.42.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.32.attn.out_proj",
+    "transformer.h.4.attn.c_proj",
+    "transformer.visual.transformer.resblocks.27.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.11.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.25.attn.out_proj",
+    "transformer.visual.transformer.resblocks.23.attn.in_proj",
+    "transformer.h.5.attn.c_attn",
+    "transformer.h.16.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.14.mlp.c_proj",
+    "transformer.h.22.mlp.w2",
+    "transformer.h.25.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.10.mlp.c_fc",
+    "transformer.h.24.mlp.c_proj",
+    "transformer.h.19.mlp.w2",
+    "transformer.h.14.mlp.w1",
+    "transformer.visual.transformer.resblocks.40.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.28.attn.out_proj",
+    "transformer.visual.transformer.resblocks.24.mlp.c_fc",
+    "transformer.h.8.attn.c_attn",
+    "transformer.h.9.mlp.w1",
+    "transformer.h.6.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.19.attn.out_proj",
+    "transformer.visual.transformer.resblocks.32.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.7.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.44.attn.in_proj",
+    "transformer.visual.transformer.resblocks.34.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.9.mlp.c_fc",
+    "transformer.visual.conv1",
+    "transformer.visual.transformer.resblocks.8.attn.out_proj",
+    "transformer.h.23.mlp.w2",
+    "transformer.h.7.mlp.w2",
+    "transformer.h.24.attn.c_proj",
+    "transformer.h.30.attn.c_proj",
+    "transformer.h.29.attn.c_proj",
+    "transformer.visual.transformer.resblocks.9.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.35.attn.in_proj",
+    "transformer.visual.transformer.resblocks.21.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.41.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.38.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.13.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.41.attn.out_proj",
+    "transformer.visual.transformer.resblocks.16.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.45.attn.out_proj",
+    "transformer.h.11.mlp.w1",
+    "transformer.visual.transformer.resblocks.16.attn.in_proj",
+    "transformer.visual.transformer.resblocks.47.attn.out_proj",
+    "transformer.h.9.attn.c_proj",
+    "transformer.h.31.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.12.attn.in_proj",
+    "transformer.visual.transformer.resblocks.28.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.20.attn.out_proj",
+    "transformer.h.12.attn.c_attn",
+    "transformer.h.24.mlp.w1",
+    "transformer.visual.transformer.resblocks.21.attn.in_proj",
+    "transformer.visual.transformer.resblocks.41.attn.in_proj",
+    "transformer.h.10.mlp.w1",
+    "transformer.h.1.mlp.w2",
+    "transformer.h.0.mlp.c_proj",
+    "transformer.h.22.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.18.attn.in_proj",
+    "transformer.visual.transformer.resblocks.38.mlp.c_proj",
+    "transformer.h.12.mlp.w1",
+    "transformer.h.1.attn.c_attn",
+    "transformer.visual.transformer.resblocks.31.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.44.mlp.c_proj",
+    "transformer.h.15.mlp.c_proj",
+    "transformer.h.6.mlp.w1",
+    "transformer.visual.transformer.resblocks.16.mlp.c_proj",
+    "transformer.h.13.attn.c_proj",
+    "transformer.h.15.attn.c_attn",
+    "transformer.h.15.mlp.w1",
+    "transformer.h.17.mlp.w2",
+    "transformer.visual.transformer.resblocks.10.attn.in_proj",
+    "transformer.h.26.attn.c_proj",
+    "transformer.visual.transformer.resblocks.20.attn.in_proj",
+    "transformer.h.10.mlp.w2",
+    "transformer.h.24.attn.c_attn",
+    "transformer.h.8.mlp.w1",
+    "transformer.h.23.mlp.w1",
+    "transformer.visual.transformer.resblocks.1.mlp.c_proj",
+    "transformer.h.4.mlp.w2",
+    "transformer.visual.transformer.resblocks.38.attn.in_proj",
+    "transformer.h.12.mlp.w2",
+    "transformer.h.7.attn.c_proj",
+    "transformer.h.4.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.31.attn.out_proj",
+    "transformer.visual.transformer.resblocks.17.mlp.c_proj",
+    "transformer.h.21.mlp.w2",
+    "transformer.visual.transformer.resblocks.5.attn.in_proj",
+    "transformer.h.18.attn.c_proj",
+    "transformer.visual.transformer.resblocks.31.mlp.c_fc",
+    "transformer.h.18.mlp.w2",
+    "transformer.visual.transformer.resblocks.6.attn.out_proj",
+    "transformer.visual.transformer.resblocks.8.attn.in_proj",
+    "transformer.visual.transformer.resblocks.30.mlp.c_proj",
+    "transformer.h.30.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.30.attn.out_proj",
+    "transformer.visual.transformer.resblocks.16.attn.out_proj",
+    "transformer.visual.transformer.resblocks.14.attn.out_proj",
+    "transformer.h.25.mlp.w1",
+    "transformer.visual.transformer.resblocks.45.attn.in_proj",
+    "transformer.h.11.attn.c_proj",
+    "transformer.visual.transformer.resblocks.30.attn.in_proj",
+    "transformer.visual.transformer.resblocks.43.mlp.c_proj",
+    "transformer.h.10.mlp.c_proj",
+    "transformer.h.21.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.43.attn.in_proj",
+    "transformer.visual.transformer.resblocks.3.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.44.attn.out_proj",
+    "transformer.h.23.attn.c_attn",
+    "transformer.visual.transformer.resblocks.22.attn.in_proj",
+    "transformer.visual.transformer.resblocks.6.attn.in_proj",
+    "transformer.visual.transformer.resblocks.44.mlp.c_fc",
+    "transformer.h.17.attn.c_attn",
+    "transformer.h.7.attn.c_attn",
+    "transformer.visual.transformer.resblocks.42.attn.in_proj",
+    "transformer.visual.transformer.resblocks.20.mlp.c_proj",
+    "transformer.h.8.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.17.attn.out_proj",
+    "transformer.h.14.attn.c_proj",
+    "transformer.visual.transformer.resblocks.40.attn.in_proj",
+    "transformer.h.25.attn.c_proj",
+    "transformer.h.28.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.35.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.36.attn.in_proj",
+    "transformer.visual.transformer.resblocks.41.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.14.mlp.c_fc",
+    "transformer.h.30.mlp.w2",
+    "transformer.h.20.mlp.w1",
+    "transformer.visual.transformer.resblocks.33.mlp.c_fc",
+    "transformer.h.29.mlp.w2",
+    "transformer.visual.transformer.resblocks.47.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.30.mlp.c_fc",
+    "transformer.h.10.attn.c_attn",
+    "transformer.visual.transformer.resblocks.1.attn.in_proj",
+    "transformer.h.1.attn.c_proj",
+    "transformer.visual.transformer.resblocks.8.mlp.c_proj",
+    "transformer.h.19.attn.c_proj",
+    "transformer.visual.transformer.resblocks.37.attn.in_proj",
+    "transformer.h.15.attn.c_proj",
+    "transformer.h.5.attn.c_proj",
+    "transformer.visual.transformer.resblocks.32.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.3.attn.out_proj",
+    "transformer.visual.transformer.resblocks.32.attn.in_proj",
+    "transformer.h.21.mlp.w1",
+    "transformer.h.23.mlp.c_proj",
+    "transformer.h.30.mlp.w1",
+    "transformer.h.0.attn.c_attn",
+    "transformer.visual.transformer.resblocks.24.attn.out_proj",
+    "transformer.visual.transformer.resblocks.31.attn.in_proj",
+    "transformer.h.18.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.25.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.22.mlp.c_fc",
+    "transformer.h.30.attn.c_attn",
+    "transformer.visual.transformer.resblocks.13.mlp.c_fc",
+    "transformer.h.17.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.24.attn.in_proj",
+    "transformer.h.11.attn.c_attn",
+    "transformer.h.2.mlp.w2",
+    "transformer.visual.transformer.resblocks.8.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.0.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.2.attn.out_proj",
+    "transformer.visual.transformer.resblocks.35.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.39.attn.out_proj",
+    "transformer.h.12.attn.c_proj",
+    "transformer.visual.transformer.resblocks.28.attn.in_proj",
+    "transformer.visual.transformer.resblocks.29.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.0.attn.out_proj",
+    "transformer.visual.transformer.resblocks.23.mlp.c_proj",
+    "transformer.h.20.attn.c_attn",
+    "transformer.visual.transformer.resblocks.7.attn.out_proj",
+    "transformer.visual.transformer.resblocks.15.attn.out_proj",
+    "transformer.h.7.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.1.attn.out_proj",
+    "transformer.h.3.mlp.w2",
+    "transformer.h.9.mlp.w2",
+    "transformer.visual.transformer.resblocks.34.attn.in_proj",
+    "transformer.h.27.attn.c_attn",
+    "transformer.visual.transformer.resblocks.12.mlp.c_fc",
+    "transformer.h.6.mlp.w2",
+    "transformer.visual.transformer.resblocks.39.attn.in_proj",
+    "transformer.h.15.mlp.w2",
+    "transformer.visual.transformer.resblocks.18.mlp.c_proj",
+    "transformer.h.0.attn.c_proj",
+    "transformer.h.19.attn.c_attn",
+    "transformer.visual.transformer.resblocks.27.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.23.attn.out_proj",
+    "transformer.h.14.mlp.c_proj",
+    "transformer.h.9.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.12.attn.out_proj",
+    "transformer.visual.transformer.resblocks.0.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.5.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.28.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.6.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.22.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.37.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.17.attn.in_proj",
+    "transformer.visual.transformer.resblocks.46.attn.out_proj",
+    "transformer.h.24.mlp.w2",
+    "transformer.h.27.mlp.w1",
+    "transformer.visual.transformer.resblocks.11.attn.in_proj",
+    "transformer.visual.transformer.resblocks.4.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.21.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.26.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.15.mlp.c_fc",
+    "transformer.h.2.mlp.c_proj",
+    "transformer.h.1.mlp.c_proj",
+    "transformer.h.5.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.45.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.0.attn.in_proj",
+    "transformer.h.25.mlp.w2",
+    "transformer.h.20.attn.c_proj",
+    "transformer.h.17.attn.c_proj",
+    "transformer.visual.transformer.resblocks.1.mlp.c_fc"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:25433b71645e2f2f21acef45e1cd3dd51471fc7d7d8cbcfa08984f46e78ae8ab
+size 469105640

checkpoint-1200/latest ADDED Viewed

	@@ -0,0 +1 @@


1	+ global_step1200

checkpoint-1200/qwen.tiktoken ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1200/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d0e05d703defebb48cb1ce8c7911952ccae578d1a7947d21425f3ff731f0503c
+size 15920

checkpoint-1200/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84b1cb9d1609ea4d4950ef70e57b5a4c92bd381b97235bb8f28c84dd2d1c8b9f
+size 15920

checkpoint-1200/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:856080e41a8ab6aae185d671f94419379f1fa3fb0f0e7be7beacb1f897ff85b1
+size 15920

checkpoint-1200/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf81d56a0cdea27b5e3e9186c6df18ce9e3f7be5271892df15accb3df0e0c218
+size 15920

checkpoint-1200/rng_state_4.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:11b4622ea11d41a3e43b7c396b97a48c41e47f53cd9ee003472fe4ed7d8bcfd6
+size 15920

checkpoint-1200/rng_state_5.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4ed6bae991517d1ab99fa861cfc1756d30b51a35dccf81c79c6476ebed2ddd93
+size 15920

checkpoint-1200/rng_state_6.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:27c9309aec78b496fd1d73ec24a274926f4b1442325c3303730b620697588e2e
+size 15920

checkpoint-1200/rng_state_7.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:82bc476f5997e3852636a20556416202397a1d429d441c40112a9011e79ef517
+size 15920

checkpoint-1200/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4585a0555f6a1312741348f75004d3499afabae4ab299739739d92b9544be0c
+size 1064

checkpoint-1200/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "pad_token": "<|endoftext|>"
+}

checkpoint-1200/tokenization_qwen.py ADDED Viewed

	@@ -0,0 +1,598 @@

+# Copyright (c) Alibaba Cloud.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+"""Tokenization classes for QWen."""
+import base64
+import logging
+import os
+import requests
+import unicodedata
+from typing import Collection, Dict, List, Set, Tuple, Union, Any, Callable, Optional
+import tiktoken
+import numpy as np
+from PIL import Image
+from PIL import ImageFont
+from PIL import ImageDraw
+from transformers import PreTrainedTokenizer, AddedToken
+from transformers.utils import try_to_load_from_cache
+import matplotlib.colors as mcolors
+from matplotlib.font_manager import FontProperties
+logger = logging.getLogger(__name__)
+VOCAB_FILES_NAMES = {"vocab_file": "qwen.tiktoken", "ttf": "SimSun.ttf"}
+FONT_PATH = try_to_load_from_cache("Qwen/Qwen-VL-Chat", "SimSun.ttf")
+if FONT_PATH is None:
+    if not os.path.exists("SimSun.ttf"):
+        ttf = requests.get("https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/SimSun.ttf")
+        open("SimSun.ttf", "wb").write(ttf.content)
+    FONT_PATH = "SimSun.ttf"
+PAT_STR = r"""(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+"""
+ENDOFTEXT = "<|endoftext|>"
+IMSTART = "<|im_start|>"
+IMEND = "<|im_end|>"
+# as the default behavior is changed to allow special tokens in
+# regular texts, the surface forms of special tokens need to be
+# as different as possible to minimize the impact
+EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205)))
+SPECIAL_TOKENS = (
+    ENDOFTEXT,
+    IMSTART,
+    IMEND,
+) + EXTRAS
+IMG_TOKEN_SPAN = 256
+def _load_tiktoken_bpe(tiktoken_bpe_file: str) -> Dict[bytes, int]:
+    with open(tiktoken_bpe_file, "rb") as f:
+        contents = f.read()
+    return {
+        base64.b64decode(token): int(rank)
+        for token, rank in (line.split() for line in contents.splitlines() if line)
+    }
+def _list_find(
+    input_list: List[Any],
+    candidates: Tuple[Any],
+    start: int = 0,
+):
+    for i in range(start, len(input_list)):
+        if input_list[i] in candidates:
+            return i
+    return -1
+def _replace_closed_tag(
+    input_tokens: List[Any],
+    start_tags: Union[Any, Tuple[Any]],
+    end_tags: Union[Any, Tuple[Any]],
+    inclusive_replace_func: Callable,
+    exclusive_replace_func: Callable = lambda x: x,
+):
+    if isinstance(start_tags, (str, int)):
+        start_tags = (start_tags,)
+    if isinstance(end_tags, (str, int)):
+        end_tags = (end_tags,)
+    assert len(start_tags) == len(end_tags)
+    output_tokens = []
+    end = 0
+    while True:
+        start = _list_find(input_tokens, start_tags, end)
+        if start == -1:
+            break
+        output_tokens.extend(exclusive_replace_func(input_tokens[end : start]))
+        tag_idx = start_tags.index(input_tokens[start])
+        end = _list_find(input_tokens, (end_tags[tag_idx],), start)
+        if end == -1:
+            raise ValueError("Unclosed image token")
+        output_tokens.extend(inclusive_replace_func(input_tokens[start : end + 1]))
+        end += 1
+    output_tokens.extend(exclusive_replace_func(input_tokens[end : ]))
+    return output_tokens
+class QWenTokenizer(PreTrainedTokenizer):
+    """QWen tokenizer."""
+    vocab_files_names = VOCAB_FILES_NAMES
+    def __init__(
+        self,
+        vocab_file,
+        errors="replace",
+        image_start_tag='<img>',
+        image_end_tag='</img>',
+        image_pad_tag='<imgpad>',
+        ref_start_tag='<ref>',
+        ref_end_tag='</ref>',
+        box_start_tag='<box>',
+        box_end_tag='</box>',
+        quad_start_tag='<quad>',
+        quad_end_tag='</quad>',
+        **kwargs,
+    ):
+        self.image_start_tag = image_start_tag
+        self.image_end_tag = image_end_tag
+        self.image_pad_tag = image_pad_tag
+        self.ref_start_tag = ref_start_tag
+        self.ref_end_tag = ref_end_tag
+        self.box_start_tag = box_start_tag
+        self.box_end_tag = box_end_tag
+        self.quad_start_tag = quad_start_tag
+        self.quad_end_tag = quad_end_tag
+        self.IMAGE_ST = (
+            ref_start_tag, ref_end_tag,
+            box_start_tag, box_end_tag,
+            quad_start_tag, quad_end_tag,
+            image_start_tag, image_end_tag,
+            image_pad_tag
+        )
+        super().__init__(**kwargs)
+        self.errors = errors  # how to handle errors in decoding
+        self.mergeable_ranks = _load_tiktoken_bpe(vocab_file)  # type: dict[bytes, int]
+        self.special_tokens = {
+            token: index
+            for index, token in enumerate(
+                SPECIAL_TOKENS + self.IMAGE_ST, start=len(self.mergeable_ranks)
+            )
+        }
+        self.img_start_id = self.special_tokens[self.image_start_tag]
+        self.img_end_id = self.special_tokens[self.image_end_tag]
+        self.img_pad_id = self.special_tokens[self.image_pad_tag]
+        self.ref_start_id = self.special_tokens[self.ref_start_tag]
+        self.ref_end_id = self.special_tokens[self.ref_end_tag]
+        self.box_start_id = self.special_tokens[self.box_start_tag]
+        self.box_end_id = self.special_tokens[self.box_end_tag]
+        self.quad_start_id = self.special_tokens[self.quad_start_tag]
+        self.quad_end_id = self.special_tokens[self.quad_end_tag]
+        self.image_special_tokens = set([
+            self.ref_start_id, self.ref_end_id, self.box_start_id, self.box_end_id,
+            self.quad_start_id, self.quad_end_id,
+        ])
+        enc = tiktoken.Encoding(
+            "Qwen",
+            pat_str=PAT_STR,
+            mergeable_ranks=self.mergeable_ranks,
+            special_tokens=self.special_tokens,
+        )
+        assert (
+            len(self.mergeable_ranks) + len(self.special_tokens) == enc.n_vocab
+        ), f"{len(self.mergeable_ranks) + len(self.special_tokens)} != {enc.n_vocab} in encoding"
+        self.decoder = {
+            v: k for k, v in self.mergeable_ranks.items()
+        }  # type: dict[int, bytes|str]
+        self.decoder.update({v: k for k, v in self.special_tokens.items()})
+        self.tokenizer = enc  # type: tiktoken.Encoding
+        self.eod_id = self.tokenizer.eot_token
+        self.im_start_id = self.special_tokens[IMSTART]
+        self.im_end_id = self.special_tokens[IMEND]
+    def __getstate__(self):
+        # for pickle lovers
+        state = self.__dict__.copy()
+        del state['tokenizer']
+        return state
+    def __setstate__(self, state):
+        # tokenizer is not python native; don't pass it; rebuild it
+        self.__dict__.update(state)
+        enc = tiktoken.Encoding(
+            "Qwen",
+            pat_str=PAT_STR,
+            mergeable_ranks=self.mergeable_ranks,
+            special_tokens=self.special_tokens,
+        )
+        self.tokenizer = enc
+    def __len__(self) -> int:
+        return self.tokenizer.n_vocab
+    def get_vocab(self) -> Dict[bytes, int]:
+        return self.mergeable_ranks
+    def convert_tokens_to_ids(
+        self, tokens: Union[bytes, str, List[Union[bytes, str]]]
+    ) -> List[int]:
+        ids = []
+        if isinstance(tokens, (str, bytes)):
+            if tokens in self.special_tokens:
+                return self.special_tokens[tokens]
+            else:
+                return self.mergeable_ranks.get(tokens)
+        for token in tokens:
+            if token in self.special_tokens:
+                ids.append(self.special_tokens[token])
+            else:
+                ids.append(self.mergeable_ranks.get(token))
+        return ids
+    def _add_tokens(self, new_tokens: Union[List[str], List[AddedToken]], special_tokens: bool = False) -> int:
+        if not special_tokens and new_tokens:
+            raise ValueError('Adding regular tokens is not supported')
+        for token in new_tokens:
+            surface_form = token.content if isinstance(token, AddedToken) else token
+            if surface_form not in SPECIAL_TOKENS + self.IMAGE_ST:
+                raise ValueError('Adding unknown special tokens is not supported')
+        return 0
+    def save_vocabulary(self, save_directory: str, **kwargs) -> Tuple[str]:
+        """
+        Save only the vocabulary of the tokenizer (vocabulary).
+        Returns:
+            `Tuple(str)`: Paths to the files saved.
+        """
+        file_path = os.path.join(save_directory, "qwen.tiktoken")
+        with open(file_path, "w", encoding="utf8") as w:
+            for k, v in self.mergeable_ranks.items():
+                line = base64.b64encode(k).decode("utf8") + " " + str(v) + "\n"
+                w.write(line)
+        return (file_path,)
+    def tokenize(
+        self,
+        text: str,
+        allowed_special: Union[Set, str] = "all",
+        disallowed_special: Union[Collection, str] = (),
+        **kwargs,
+    ) -> List[Union[bytes, str]]:
+        """
+        Converts a string in a sequence of tokens.
+        Args:
+            text (`str`):
+                The sequence to be encoded.
+            allowed_special (`Literal["all"]` or `set`):
+                The surface forms of the tokens to be encoded as special tokens in regular texts.
+                Default to "all".
+            disallowed_special (`Literal["all"]` or `Collection`):
+                The surface forms of the tokens that should not be in regular texts and trigger errors.
+                Default to an empty tuple.
+            kwargs (additional keyword arguments, *optional*):
+                Will be passed to the underlying model specific encode method.
+        Returns:
+            `List[bytes|str]`: The list of tokens.
+        """
+        tokens = []
+        text = unicodedata.normalize("NFC", text)
+        # this implementation takes a detour: text -> token id -> token surface forms
+        for t in self.tokenizer.encode(
+            text, allowed_special=allowed_special, disallowed_special=disallowed_special
+        ):
+            tokens.append(self.decoder[t])
+        def _encode_imgurl(img_tokens):
+            assert img_tokens[0] == self.image_start_tag and img_tokens[-1] == self.image_end_tag
+            img_tokens = img_tokens[1:-1]
+            img_url = b''.join(img_tokens)
+            out_img_tokens = list(map(self.decoder.get, img_url))
+            if len(out_img_tokens) > IMG_TOKEN_SPAN:
+                raise ValueError("The content in {}..{} is too long".format(
+                    self.image_start_tag, self.image_end_tag))
+            out_img_tokens.extend([self.image_pad_tag] * (IMG_TOKEN_SPAN - len(out_img_tokens)))
+            out_img_tokens = [self.image_start_tag] + out_img_tokens + [self.image_end_tag]
+            return out_img_tokens
+        return _replace_closed_tag(tokens, self.image_start_tag, self.image_end_tag, _encode_imgurl)
+    def convert_tokens_to_string(self, tokens: List[Union[bytes, str]]) -> str:
+        """
+        Converts a sequence of tokens in a single string.
+        """
+        text = ""
+        temp = b""
+        for t in tokens:
+            if isinstance(t, str):
+                if temp:
+                    text += temp.decode("utf-8", errors=self.errors)
+                    temp = b""
+                text += t
+            elif isinstance(t, bytes):
+                temp += t
+            else:
+                raise TypeError("token should only be of type types or str")
+        if temp:
+            text += temp.decode("utf-8", errors=self.errors)
+        return text
+    @property
+    def vocab_size(self):
+        return self.tokenizer.n_vocab
+    def _convert_id_to_token(self, index: int) -> Union[bytes, str]:
+        """Converts an id to a token, special tokens included"""
+        if index in self.decoder:
+            return self.decoder[index]
+        raise ValueError("unknown ids")
+    def _convert_token_to_id(self, token: Union[bytes, str]) -> int:
+        """Converts a token to an id using the vocab, special tokens included"""
+        if token in self.special_tokens:
+            return self.special_tokens[token]
+        if token in self.mergeable_ranks:
+            return self.mergeable_ranks[token]
+        raise ValueError("unknown token")
+    def _tokenize(self, text: str, **kwargs):
+        """
+        Converts a string in a sequence of tokens (string), using the tokenizer. Split in words for word-based
+        vocabulary or sub-words for sub-word-based vocabularies (BPE/SentencePieces/WordPieces).
+        Do NOT take care of added tokens.
+        """
+        raise NotImplementedError
+    def _decode(
+        self,
+        token_ids: Union[int, List[int]],
+        skip_special_tokens: bool = False,
+        errors: str = None,
+        **kwargs,
+    ) -> str:
+        if isinstance(token_ids, int):
+            token_ids = [token_ids]
+        def _decode_imgurl(img_token_ids):
+            assert img_token_ids[0] == self.img_start_id and img_token_ids[-1] == self.img_end_id
+            img_token_ids = img_token_ids[1:-1]
+            img_token_ids = img_token_ids[ : img_token_ids.index(self.img_pad_id)]
+            img_url = bytes(img_token_ids).decode('utf-8')
+            return [self.img_start_id] + self.tokenizer.encode(img_url) + [self.img_end_id]
+        token_ids = _replace_closed_tag(token_ids, self.img_start_id, self.img_end_id, _decode_imgurl)
+        if skip_special_tokens:
+            if kwargs.get('keep_image_special', False):
+                token_ids = [i for i in token_ids if i < self.eod_id
+                    or i in self.image_special_tokens]
+            else:
+                token_ids = [i for i in token_ids if i < self.eod_id]
+        return self.tokenizer.decode(token_ids, errors=errors or self.errors)
+    def to_list_format(self, text: str):
+        text = unicodedata.normalize("NFC", text)
+        token_ids = self.tokenizer.encode(
+            text, allowed_special=set(self.IMAGE_ST + (ENDOFTEXT,)))
+        def _encode_vl_info(tokens):
+            if len(tokens) == 0:
+                return []
+            if tokens[0] == self.img_start_id and tokens[-1] == self.img_end_id:
+                key = 'image'
+            elif tokens[0] == self.ref_start_id and tokens[-1] == self.ref_end_id:
+                key = 'ref'
+            elif tokens[0] == self.box_start_id and tokens[-1] == self.box_end_id:
+                key = 'box'
+            elif tokens[0] == self.quad_start_id and tokens[-1] == self.quad_end_id:
+                key = 'quad'
+            else:
+                _tobytes = lambda x: x.encode('utf-8') if isinstance(x, str) else x
+                return [{'text': b''.join(map(_tobytes, map(self.decoder.get, tokens))).decode('utf-8')}]
+            _tobytes = lambda x: x.encode('utf-8') if isinstance(x, str) else x
+            val = b''.join(map(_tobytes, map(self.decoder.get, tokens[1:-1]))).decode('utf-8')
+            return [{key: val}]
+        return _replace_closed_tag(
+            token_ids,
+            (self.img_start_id, self.ref_start_id, self.box_start_id, self.quad_start_id),
+            (self.img_end_id, self.ref_end_id, self.box_end_id, self.quad_end_id),
+            _encode_vl_info,
+            _encode_vl_info,
+        )
+    def from_list_format(self, list_format: List[Dict]):
+        text = ''
+        num_images = 0
+        for ele in list_format:
+            if 'image' in ele:
+                num_images += 1
+                text += f'Picture {num_images}: '
+                text += self.image_start_tag + ele['image'] + self.image_end_tag
+                text += '\n'
+            elif 'text' in ele:
+                text += ele['text']
+            elif 'box' in ele:
+                if 'ref' in ele:
+                    text += self.ref_start_tag + ele['ref'] + self.ref_end_tag
+                for box in ele['box']:
+                    text += self.box_start_tag + '(%d,%d),(%d,%d)' % (box[0], box[1], box[2], box[3]) + self.box_end_tag
+            else:
+                raise ValueError("Unsupport element: " + str(ele))
+        return text
+    def _fetch_latest_picture(self, response, history):
+        if history is None:
+            history = []
+        _history = history + [(response, None)]
+        for q, r in _history[::-1]:
+            for ele in self.to_list_format(q)[::-1]:
+                if 'image' in ele:
+                    return ele['image']
+        return None
+    def _fetch_all_box_with_ref(self, text):
+        list_format = self.to_list_format(text)
+        output = []
+        for i, ele in enumerate(list_format):
+            if 'box' in ele:
+                bbox = tuple(map(int, ele['box'].replace('(', '').replace(')', '').split(',')))
+                assert len(bbox) == 4
+                output.append({'box': bbox})
+                if i > 0 and 'ref' in list_format[i-1]:
+                    output[-1]['ref'] = list_format[i-1]['ref'].strip()
+        return output
+    def draw_bbox_on_latest_picture(
+        self,
+        response,
+        history=None,
+    ) -> Optional[Image.Image]:
+        image = self._fetch_latest_picture(response, history)
+        if image is None:
+            return None
+        if image.startswith("http://") or image.startswith("https://"):
+            image = Image.open(requests.get(image, stream=True).raw).convert("RGB")
+            h, w = image.height, image.width
+        else:
+            image = np.asarray(Image.open(image).convert("RGB"))
+            h, w = image.shape[0], image.shape[1]
+        visualizer = Visualizer(image)
+        boxes = self._fetch_all_box_with_ref(response)
+        if not boxes:
+            return None
+        color = random.choice([_ for _ in mcolors.TABLEAU_COLORS.keys()]) # init color
+        for box in boxes:
+            if 'ref' in box: # random new color for new refexps
+                color = random.choice([_ for _ in mcolors.TABLEAU_COLORS.keys()])
+            x1, y1, x2, y2 = box['box']
+            x1, y1, x2, y2 = (int(x1 / 1000 * w), int(y1 / 1000 * h), int(x2 / 1000 * w), int(y2 / 1000 * h))
+            visualizer.draw_box((x1, y1, x2, y2), alpha=1, edge_color=color)
+            if 'ref' in box:
+                visualizer.draw_text(box['ref'], (x1, y1), color=color, horizontal_alignment="left")
+        return visualizer.output
+import colorsys
+import logging
+import math
+import numpy as np
+import matplotlib as mpl
+import matplotlib.colors as mplc
+import matplotlib.figure as mplfigure
+import torch
+from matplotlib.backends.backend_agg import FigureCanvasAgg
+from PIL import Image
+import random
+logger = logging.getLogger(__name__)
+class VisImage:
+    def __init__(self, img, scale=1.0):
+        self.img = img
+        self.scale = scale
+        self.width, self.height = img.shape[1], img.shape[0]
+        self._setup_figure(img)
+    def _setup_figure(self, img):
+        fig = mplfigure.Figure(frameon=False)
+        self.dpi = fig.get_dpi()
+        # add a small 1e-2 to avoid precision lost due to matplotlib's truncation
+        # (https://github.com/matplotlib/matplotlib/issues/15363)
+        fig.set_size_inches(
+            (self.width * self.scale + 1e-2) / self.dpi,
+            (self.height * self.scale + 1e-2) / self.dpi,
+        )
+        self.canvas = FigureCanvasAgg(fig)
+        # self.canvas = mpl.backends.backend_cairo.FigureCanvasCairo(fig)
+        ax = fig.add_axes([0.0, 0.0, 1.0, 1.0])
+        ax.axis("off")
+        self.fig = fig
+        self.ax = ax
+        self.reset_image(img)
+    def reset_image(self, img):
+        img = img.astype("uint8")
+        self.ax.imshow(img, extent=(0, self.width, self.height, 0), interpolation="nearest")
+    def save(self, filepath):
+        self.fig.savefig(filepath)
+    def get_image(self):
+        canvas = self.canvas
+        s, (width, height) = canvas.print_to_buffer()
+        buffer = np.frombuffer(s, dtype="uint8")
+        img_rgba = buffer.reshape(height, width, 4)
+        rgb, alpha = np.split(img_rgba, [3], axis=2)
+        return rgb.astype("uint8")
+class Visualizer:
+    def __init__(self, img_rgb, metadata=None, scale=1.0):
+        self.img = np.asarray(img_rgb).clip(0, 255).astype(np.uint8)
+        self.font_path = FONT_PATH
+        self.output = VisImage(self.img, scale=scale)
+        self.cpu_device = torch.device("cpu")
+        # too small texts are useless, therefore clamp to 14
+        self._default_font_size = max(
+            np.sqrt(self.output.height * self.output.width) // 30, 15 // scale
+        )
+    def draw_text(
+        self,
+        text,
+        position,
+        *,
+        font_size=None,
+        color="g",
+        horizontal_alignment="center",
+        rotation=0,
+    ):
+        if not font_size:
+            font_size = self._default_font_size
+        # since the text background is dark, we don't want the text to be dark
+        color = np.maximum(list(mplc.to_rgb(color)), 0.2)
+        color[np.argmax(color)] = max(0.8, np.max(color))
+        x, y = position
+        self.output.ax.text(
+            x,
+            y,
+            text,
+            size=font_size * self.output.scale,
+            fontproperties=FontProperties(fname=self.font_path),
+            bbox={"facecolor": "black", "alpha": 0.8, "pad": 0.7, "edgecolor": "none"},
+            verticalalignment="top",
+            horizontalalignment=horizontal_alignment,
+            color=color,
+            zorder=10,
+            rotation=rotation,
+        )
+        return self.output
+    def draw_box(self, box_coord, alpha=0.5, edge_color="g", line_style="-"):
+        x0, y0, x1, y1 = box_coord
+        width = x1 - x0
+        height = y1 - y0
+        linewidth = max(self._default_font_size / 4, 1)
+        self.output.ax.add_patch(
+            mpl.patches.Rectangle(
+                (x0, y0),
+                width,
+                height,
+                fill=False,
+                edgecolor=edge_color,
+                linewidth=linewidth * self.output.scale,
+                alpha=alpha,
+                linestyle=line_style,
+            )
+        )
+        return self.output
+    def get_output(self):
+        return self.output

checkpoint-1200/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "added_tokens_decoder": {},
+  "auto_map": {
+    "AutoTokenizer": [
+      "Qwen/Qwen-VL-Chat--tokenization_qwen.QWenTokenizer",
+      null
+    ]
+  },
+  "clean_up_tokenization_spaces": true,
+  "model_max_length": 768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "tokenizer_class": "QWenTokenizer"
+}

checkpoint-1200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,873 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.07831364615284213,
+  "eval_steps": 500,
+  "global_step": 1200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0006526137179403511,
+      "grad_norm": 17.690582114691438,
+      "learning_rate": 1.948051948051948e-06,
+      "loss": 1.3559,
+      "step": 10
+    },
+    {
+      "epoch": 0.0013052274358807021,
+      "grad_norm": 7.768088366444893,
+      "learning_rate": 3.896103896103896e-06,
+      "loss": 1.2706,
+      "step": 20
+    },
+    {
+      "epoch": 0.001957841153821053,
+      "grad_norm": 7.705313536090087,
+      "learning_rate": 5.844155844155845e-06,
+      "loss": 1.3781,
+      "step": 30
+    },
+    {
+      "epoch": 0.0026104548717614043,
+      "grad_norm": 34.39078827766783,
+      "learning_rate": 7.792207792207792e-06,
+      "loss": 1.2749,
+      "step": 40
+    },
+    {
+      "epoch": 0.0032630685897017554,
+      "grad_norm": 68.28824334896528,
+      "learning_rate": 9.74025974025974e-06,
+      "loss": 1.2955,
+      "step": 50
+    },
+    {
+      "epoch": 0.003915682307642106,
+      "grad_norm": 14.220322607917241,
+      "learning_rate": 1.168831168831169e-05,
+      "loss": 1.2315,
+      "step": 60
+    },
+    {
+      "epoch": 0.0045682960255824575,
+      "grad_norm": 12.611848231734811,
+      "learning_rate": 1.3636363636363637e-05,
+      "loss": 1.0953,
+      "step": 70
+    },
+    {
+      "epoch": 0.0052209097435228086,
+      "grad_norm": 6.055664298727015,
+      "learning_rate": 1.5584415584415583e-05,
+      "loss": 1.105,
+      "step": 80
+    },
+    {
+      "epoch": 0.00587352346146316,
+      "grad_norm": 3.52269227801977,
+      "learning_rate": 1.753246753246753e-05,
+      "loss": 0.9563,
+      "step": 90
+    },
+    {
+      "epoch": 0.006526137179403511,
+      "grad_norm": 10.771884023354394,
+      "learning_rate": 1.948051948051948e-05,
+      "loss": 0.9523,
+      "step": 100
+    },
+    {
+      "epoch": 0.007178750897343862,
+      "grad_norm": 33.41476483216757,
+      "learning_rate": 2.1428571428571428e-05,
+      "loss": 0.832,
+      "step": 110
+    },
+    {
+      "epoch": 0.007831364615284213,
+      "grad_norm": 31.120240364617406,
+      "learning_rate": 2.337662337662338e-05,
+      "loss": 0.8376,
+      "step": 120
+    },
+    {
+      "epoch": 0.008483978333224564,
+      "grad_norm": 5.517231564060886,
+      "learning_rate": 2.5324675324675325e-05,
+      "loss": 0.8293,
+      "step": 130
+    },
+    {
+      "epoch": 0.009136592051164915,
+      "grad_norm": 4.311605388342058,
+      "learning_rate": 2.7272727272727273e-05,
+      "loss": 0.8295,
+      "step": 140
+    },
+    {
+      "epoch": 0.009789205769105266,
+      "grad_norm": 6.997724163121519,
+      "learning_rate": 2.922077922077922e-05,
+      "loss": 0.7662,
+      "step": 150
+    },
+    {
+      "epoch": 0.010441819487045617,
+      "grad_norm": 6.517836234400708,
+      "learning_rate": 2.999998841890695e-05,
+      "loss": 0.8158,
+      "step": 160
+    },
+    {
+      "epoch": 0.011094433204985968,
+      "grad_norm": 4.186989141019666,
+      "learning_rate": 2.99999176456253e-05,
+      "loss": 0.8037,
+      "step": 170
+    },
+    {
+      "epoch": 0.01174704692292632,
+      "grad_norm": 5.181546943355458,
+      "learning_rate": 2.9999782533305785e-05,
+      "loss": 0.7274,
+      "step": 180
+    },
+    {
+      "epoch": 0.01239966064086667,
+      "grad_norm": 3.767076521211455,
+      "learning_rate": 2.9999583082527935e-05,
+      "loss": 0.7474,
+      "step": 190
+    },
+    {
+      "epoch": 0.013052274358807021,
+      "grad_norm": 18.84416377940188,
+      "learning_rate": 2.999931929414726e-05,
+      "loss": 0.7708,
+      "step": 200
+    },
+    {
+      "epoch": 0.013704888076747372,
+      "grad_norm": 3.169160630444992,
+      "learning_rate": 2.999899116929522e-05,
+      "loss": 0.8279,
+      "step": 210
+    },
+    {
+      "epoch": 0.014357501794687724,
+      "grad_norm": 1.912782077307437,
+      "learning_rate": 2.999859870937924e-05,
+      "loss": 0.7407,
+      "step": 220
+    },
+    {
+      "epoch": 0.015010115512628075,
+      "grad_norm": 3.3906505952914974,
+      "learning_rate": 2.9998141916082696e-05,
+      "loss": 0.7732,
+      "step": 230
+    },
+    {
+      "epoch": 0.015662729230568426,
+      "grad_norm": 2.7144492322383584,
+      "learning_rate": 2.999762079136491e-05,
+      "loss": 0.7272,
+      "step": 240
+    },
+    {
+      "epoch": 0.01631534294850878,
+      "grad_norm": 7.109330196029837,
+      "learning_rate": 2.9997035337461135e-05,
+      "loss": 0.7748,
+      "step": 250
+    },
+    {
+      "epoch": 0.016967956666449128,
+      "grad_norm": 1.6054280593801813,
+      "learning_rate": 2.9996385556882555e-05,
+      "loss": 0.7676,
+      "step": 260
+    },
+    {
+      "epoch": 0.01762057038438948,
+      "grad_norm": 10.883212441614672,
+      "learning_rate": 2.9995671452416274e-05,
+      "loss": 0.735,
+      "step": 270
+    },
+    {
+      "epoch": 0.01827318410232983,
+      "grad_norm": 3.511064886507805,
+      "learning_rate": 2.999489302712529e-05,
+      "loss": 0.7741,
+      "step": 280
+    },
+    {
+      "epoch": 0.018925797820270183,
+      "grad_norm": 3.618603818375307,
+      "learning_rate": 2.9994050284348497e-05,
+      "loss": 0.749,
+      "step": 290
+    },
+    {
+      "epoch": 0.019578411538210532,
+      "grad_norm": 6.012944880342178,
+      "learning_rate": 2.9993143227700668e-05,
+      "loss": 0.7411,
+      "step": 300
+    },
+    {
+      "epoch": 0.020231025256150885,
+      "grad_norm": 2.348670372295822,
+      "learning_rate": 2.9992171861072428e-05,
+      "loss": 0.7394,
+      "step": 310
+    },
+    {
+      "epoch": 0.020883638974091234,
+      "grad_norm": 4.728309497649916,
+      "learning_rate": 2.9991136188630263e-05,
+      "loss": 0.8077,
+      "step": 320
+    },
+    {
+      "epoch": 0.021536252692031587,
+      "grad_norm": 15.611917863290122,
+      "learning_rate": 2.9990036214816467e-05,
+      "loss": 0.7209,
+      "step": 330
+    },
+    {
+      "epoch": 0.022188866409971936,
+      "grad_norm": 3.7315277354070817,
+      "learning_rate": 2.998887194434916e-05,
+      "loss": 0.7101,
+      "step": 340
+    },
+    {
+      "epoch": 0.02284148012791229,
+      "grad_norm": 6.618759094750745,
+      "learning_rate": 2.998764338222222e-05,
+      "loss": 0.7759,
+      "step": 350
+    },
+    {
+      "epoch": 0.02349409384585264,
+      "grad_norm": 6.770044306239603,
+      "learning_rate": 2.998635053370533e-05,
+      "loss": 0.7398,
+      "step": 360
+    },
+    {
+      "epoch": 0.02414670756379299,
+      "grad_norm": 12.471224202357552,
+      "learning_rate": 2.998499340434389e-05,
+      "loss": 0.7046,
+      "step": 370
+    },
+    {
+      "epoch": 0.02479932128173334,
+      "grad_norm": 4.147359416986547,
+      "learning_rate": 2.9983571999959013e-05,
+      "loss": 0.761,
+      "step": 380
+    },
+    {
+      "epoch": 0.025451934999673693,
+      "grad_norm": 34.84722866603778,
+      "learning_rate": 2.9982086326647533e-05,
+      "loss": 0.757,
+      "step": 390
+    },
+    {
+      "epoch": 0.026104548717614043,
+      "grad_norm": 5.245498180313093,
+      "learning_rate": 2.998053639078193e-05,
+      "loss": 0.7536,
+      "step": 400
+    },
+    {
+      "epoch": 0.026757162435554396,
+      "grad_norm": 36.55990241841121,
+      "learning_rate": 2.997892219901034e-05,
+      "loss": 0.7395,
+      "step": 410
+    },
+    {
+      "epoch": 0.027409776153494745,
+      "grad_norm": 5.03198653806696,
+      "learning_rate": 2.9977243758256494e-05,
+      "loss": 0.7208,
+      "step": 420
+    },
+    {
+      "epoch": 0.028062389871435098,
+      "grad_norm": 11.376914733036081,
+      "learning_rate": 2.997550107571972e-05,
+      "loss": 0.719,
+      "step": 430
+    },
+    {
+      "epoch": 0.028715003589375447,
+      "grad_norm": 2.958119684662306,
+      "learning_rate": 2.9973694158874898e-05,
+      "loss": 0.7271,
+      "step": 440
+    },
+    {
+      "epoch": 0.0293676173073158,
+      "grad_norm": 6.037096737490817,
+      "learning_rate": 2.9971823015472418e-05,
+      "loss": 0.7356,
+      "step": 450
+    },
+    {
+      "epoch": 0.03002023102525615,
+      "grad_norm": 5.3042973640363575,
+      "learning_rate": 2.9969887653538164e-05,
+      "loss": 0.7207,
+      "step": 460
+    },
+    {
+      "epoch": 0.030672844743196502,
+      "grad_norm": 2.4985603001745624,
+      "learning_rate": 2.996788808137347e-05,
+      "loss": 0.7769,
+      "step": 470
+    },
+    {
+      "epoch": 0.03132545846113685,
+      "grad_norm": 7.607065841315647,
+      "learning_rate": 2.9965824307555084e-05,
+      "loss": 0.7091,
+      "step": 480
+    },
+    {
+      "epoch": 0.03197807217907721,
+      "grad_norm": 4.322533035107957,
+      "learning_rate": 2.9963696340935144e-05,
+      "loss": 0.7114,
+      "step": 490
+    },
+    {
+      "epoch": 0.03263068589701756,
+      "grad_norm": 5.878565903250334,
+      "learning_rate": 2.9961504190641108e-05,
+      "loss": 0.7284,
+      "step": 500
+    },
+    {
+      "epoch": 0.033283299614957906,
+      "grad_norm": 5.0026507027119855,
+      "learning_rate": 2.9959247866075764e-05,
+      "loss": 0.6992,
+      "step": 510
+    },
+    {
+      "epoch": 0.033935913332898256,
+      "grad_norm": 7.12632150273901,
+      "learning_rate": 2.9956927376917137e-05,
+      "loss": 0.7285,
+      "step": 520
+    },
+    {
+      "epoch": 0.03458852705083861,
+      "grad_norm": 5.211123255860348,
+      "learning_rate": 2.9954542733118496e-05,
+      "loss": 0.7511,
+      "step": 530
+    },
+    {
+      "epoch": 0.03524114076877896,
+      "grad_norm": 9.925273547498618,
+      "learning_rate": 2.995209394490827e-05,
+      "loss": 0.7699,
+      "step": 540
+    },
+    {
+      "epoch": 0.03589375448671931,
+      "grad_norm": 7.418381681996765,
+      "learning_rate": 2.9949581022790025e-05,
+      "loss": 0.759,
+      "step": 550
+    },
+    {
+      "epoch": 0.03654636820465966,
+      "grad_norm": 4.352380973507467,
+      "learning_rate": 2.9947003977542423e-05,
+      "loss": 0.7537,
+      "step": 560
+    },
+    {
+      "epoch": 0.037198981922600016,
+      "grad_norm": 9.712842120769198,
+      "learning_rate": 2.9944362820219167e-05,
+      "loss": 0.7063,
+      "step": 570
+    },
+    {
+      "epoch": 0.037851595640540366,
+      "grad_norm": 5.757600819230482,
+      "learning_rate": 2.994165756214895e-05,
+      "loss": 0.7893,
+      "step": 580
+    },
+    {
+      "epoch": 0.038504209358480715,
+      "grad_norm": 5.529209601152462,
+      "learning_rate": 2.9938888214935426e-05,
+      "loss": 0.6771,
+      "step": 590
+    },
+    {
+      "epoch": 0.039156823076421064,
+      "grad_norm": 10.550479346499758,
+      "learning_rate": 2.9936054790457127e-05,
+      "loss": 0.737,
+      "step": 600
+    },
+    {
+      "epoch": 0.03980943679436142,
+      "grad_norm": 8.284279553451016,
+      "learning_rate": 2.9933157300867437e-05,
+      "loss": 0.7182,
+      "step": 610
+    },
+    {
+      "epoch": 0.04046205051230177,
+      "grad_norm": 8.18511648646326,
+      "learning_rate": 2.9930195758594542e-05,
+      "loss": 0.6901,
+      "step": 620
+    },
+    {
+      "epoch": 0.04111466423024212,
+      "grad_norm": 14.569754827631956,
+      "learning_rate": 2.9927170176341365e-05,
+      "loss": 0.7008,
+      "step": 630
+    },
+    {
+      "epoch": 0.04176727794818247,
+      "grad_norm": 4.214581273685441,
+      "learning_rate": 2.992408056708551e-05,
+      "loss": 0.7489,
+      "step": 640
+    },
+    {
+      "epoch": 0.042419891666122825,
+      "grad_norm": 10.038596627079452,
+      "learning_rate": 2.9920926944079224e-05,
+      "loss": 0.7649,
+      "step": 650
+    },
+    {
+      "epoch": 0.043072505384063174,
+      "grad_norm": 2.386544029221306,
+      "learning_rate": 2.9917709320849305e-05,
+      "loss": 0.7223,
+      "step": 660
+    },
+    {
+      "epoch": 0.043725119102003523,
+      "grad_norm": 8.286359254511249,
+      "learning_rate": 2.9914427711197096e-05,
+      "loss": 0.7089,
+      "step": 670
+    },
+    {
+      "epoch": 0.04437773281994387,
+      "grad_norm": 4.235819327444911,
+      "learning_rate": 2.9911082129198372e-05,
+      "loss": 0.7138,
+      "step": 680
+    },
+    {
+      "epoch": 0.04503034653788423,
+      "grad_norm": 5.187338033698449,
+      "learning_rate": 2.9907672589203316e-05,
+      "loss": 0.7192,
+      "step": 690
+    },
+    {
+      "epoch": 0.04568296025582458,
+      "grad_norm": 6.360475337181379,
+      "learning_rate": 2.9904199105836443e-05,
+      "loss": 0.7094,
+      "step": 700
+    },
+    {
+      "epoch": 0.04633557397376493,
+      "grad_norm": 4.906400836156689,
+      "learning_rate": 2.990066169399654e-05,
+      "loss": 0.654,
+      "step": 710
+    },
+    {
+      "epoch": 0.04698818769170528,
+      "grad_norm": 17.600495314130633,
+      "learning_rate": 2.9897060368856603e-05,
+      "loss": 0.7299,
+      "step": 720
+    },
+    {
+      "epoch": 0.04764080140964563,
+      "grad_norm": 7.765935941492389,
+      "learning_rate": 2.989339514586377e-05,
+      "loss": 0.7486,
+      "step": 730
+    },
+    {
+      "epoch": 0.04829341512758598,
+      "grad_norm": 7.30026395137639,
+      "learning_rate": 2.9889666040739252e-05,
+      "loss": 0.6941,
+      "step": 740
+    },
+    {
+      "epoch": 0.04894602884552633,
+      "grad_norm": 4.676985481218465,
+      "learning_rate": 2.9885873069478275e-05,
+      "loss": 0.7701,
+      "step": 750
+    },
+    {
+      "epoch": 0.04959864256346668,
+      "grad_norm": 42.50656974727186,
+      "learning_rate": 2.9882016248350006e-05,
+      "loss": 0.7428,
+      "step": 760
+    },
+    {
+      "epoch": 0.05025125628140704,
+      "grad_norm": 3.9893667031114766,
+      "learning_rate": 2.9878095593897474e-05,
+      "loss": 0.7204,
+      "step": 770
+    },
+    {
+      "epoch": 0.05090386999934739,
+      "grad_norm": 8.909028486553332,
+      "learning_rate": 2.9874111122937518e-05,
+      "loss": 0.7336,
+      "step": 780
+    },
+    {
+      "epoch": 0.051556483717287736,
+      "grad_norm": 5.256925284136456,
+      "learning_rate": 2.9870062852560698e-05,
+      "loss": 0.7674,
+      "step": 790
+    },
+    {
+      "epoch": 0.052209097435228086,
+      "grad_norm": 5.835535487534073,
+      "learning_rate": 2.986595080013123e-05,
+      "loss": 0.7547,
+      "step": 800
+    },
+    {
+      "epoch": 0.05286171115316844,
+      "grad_norm": 4.7337998648314565,
+      "learning_rate": 2.9861774983286913e-05,
+      "loss": 0.7412,
+      "step": 810
+    },
+    {
+      "epoch": 0.05351432487110879,
+      "grad_norm": 4.020304406250962,
+      "learning_rate": 2.9857535419939053e-05,
+      "loss": 0.7351,
+      "step": 820
+    },
+    {
+      "epoch": 0.05416693858904914,
+      "grad_norm": 7.005748568175158,
+      "learning_rate": 2.9853232128272367e-05,
+      "loss": 0.7146,
+      "step": 830
+    },
+    {
+      "epoch": 0.05481955230698949,
+      "grad_norm": 12.598315147497464,
+      "learning_rate": 2.984886512674494e-05,
+      "loss": 0.7066,
+      "step": 840
+    },
+    {
+      "epoch": 0.055472166024929846,
+      "grad_norm": 5.636755294839953,
+      "learning_rate": 2.9844434434088114e-05,
+      "loss": 0.8033,
+      "step": 850
+    },
+    {
+      "epoch": 0.056124779742870196,
+      "grad_norm": 2.5964949457129305,
+      "learning_rate": 2.9839940069306436e-05,
+      "loss": 0.718,
+      "step": 860
+    },
+    {
+      "epoch": 0.056777393460810545,
+      "grad_norm": 5.496060434333994,
+      "learning_rate": 2.9835382051677548e-05,
+      "loss": 0.7382,
+      "step": 870
+    },
+    {
+      "epoch": 0.057430007178750894,
+      "grad_norm": 3.367511777906771,
+      "learning_rate": 2.9830760400752117e-05,
+      "loss": 0.7049,
+      "step": 880
+    },
+    {
+      "epoch": 0.05808262089669125,
+      "grad_norm": 12.228282751386294,
+      "learning_rate": 2.9826075136353762e-05,
+      "loss": 0.7135,
+      "step": 890
+    },
+    {
+      "epoch": 0.0587352346146316,
+      "grad_norm": 7.426066867205744,
+      "learning_rate": 2.9821326278578955e-05,
+      "loss": 0.6966,
+      "step": 900
+    },
+    {
+      "epoch": 0.05938784833257195,
+      "grad_norm": 5.720080945169142,
+      "learning_rate": 2.981651384779693e-05,
+      "loss": 0.7325,
+      "step": 910
+    },
+    {
+      "epoch": 0.0600404620505123,
+      "grad_norm": 3.3362738196336275,
+      "learning_rate": 2.9811637864649622e-05,
+      "loss": 0.7013,
+      "step": 920
+    },
+    {
+      "epoch": 0.060693075768452655,
+      "grad_norm": 5.5481143050516675,
+      "learning_rate": 2.980669835005154e-05,
+      "loss": 0.7107,
+      "step": 930
+    },
+    {
+      "epoch": 0.061345689486393004,
+      "grad_norm": 2.7247889305754533,
+      "learning_rate": 2.980169532518971e-05,
+      "loss": 0.6839,
+      "step": 940
+    },
+    {
+      "epoch": 0.06199830320433335,
+      "grad_norm": 12.705144630158374,
+      "learning_rate": 2.9796628811523576e-05,
+      "loss": 0.7061,
+      "step": 950
+    },
+    {
+      "epoch": 0.0626509169222737,
+      "grad_norm": 3.1174966376805777,
+      "learning_rate": 2.9791498830784896e-05,
+      "loss": 0.706,
+      "step": 960
+    },
+    {
+      "epoch": 0.06330353064021406,
+      "grad_norm": 6.454819870022971,
+      "learning_rate": 2.9786305404977657e-05,
+      "loss": 0.6901,
+      "step": 970
+    },
+    {
+      "epoch": 0.06395614435815442,
+      "grad_norm": 8.62099817289566,
+      "learning_rate": 2.9781048556377982e-05,
+      "loss": 0.6737,
+      "step": 980
+    },
+    {
+      "epoch": 0.06460875807609476,
+      "grad_norm": 12.649532843245389,
+      "learning_rate": 2.977572830753404e-05,
+      "loss": 0.6777,
+      "step": 990
+    },
+    {
+      "epoch": 0.06526137179403511,
+      "grad_norm": 5.019508830810828,
+      "learning_rate": 2.9770344681265925e-05,
+      "loss": 0.7125,
+      "step": 1000
+    },
+    {
+      "epoch": 0.06591398551197546,
+      "grad_norm": 5.417114630539967,
+      "learning_rate": 2.9764897700665595e-05,
+      "loss": 0.7558,
+      "step": 1010
+    },
+    {
+      "epoch": 0.06656659922991581,
+      "grad_norm": 13.487574757960102,
+      "learning_rate": 2.975938738909674e-05,
+      "loss": 0.7305,
+      "step": 1020
+    },
+    {
+      "epoch": 0.06721921294785617,
+      "grad_norm": 4.115297871929447,
+      "learning_rate": 2.97538137701947e-05,
+      "loss": 0.7382,
+      "step": 1030
+    },
+    {
+      "epoch": 0.06787182666579651,
+      "grad_norm": 4.218133725965425,
+      "learning_rate": 2.974817686786636e-05,
+      "loss": 0.7131,
+      "step": 1040
+    },
+    {
+      "epoch": 0.06852444038373687,
+      "grad_norm": 23.754945260227526,
+      "learning_rate": 2.9742476706290044e-05,
+      "loss": 0.6854,
+      "step": 1050
+    },
+    {
+      "epoch": 0.06917705410167722,
+      "grad_norm": 9.992382581534882,
+      "learning_rate": 2.973671330991541e-05,
+      "loss": 0.7224,
+      "step": 1060
+    },
+    {
+      "epoch": 0.06982966781961757,
+      "grad_norm": 9.022842665053004,
+      "learning_rate": 2.973088670346336e-05,
+      "loss": 0.69,
+      "step": 1070
+    },
+    {
+      "epoch": 0.07048228153755792,
+      "grad_norm": 7.180693480173149,
+      "learning_rate": 2.97249969119259e-05,
+      "loss": 0.6752,
+      "step": 1080
+    },
+    {
+      "epoch": 0.07113489525549826,
+      "grad_norm": 4.631581340679664,
+      "learning_rate": 2.9719043960566088e-05,
+      "loss": 0.7078,
+      "step": 1090
+    },
+    {
+      "epoch": 0.07178750897343862,
+      "grad_norm": 3.8365551360021497,
+      "learning_rate": 2.9713027874917867e-05,
+      "loss": 0.7455,
+      "step": 1100
+    },
+    {
+      "epoch": 0.07244012269137898,
+      "grad_norm": 20.612721990589407,
+      "learning_rate": 2.9706948680785984e-05,
+      "loss": 0.7123,
+      "step": 1110
+    },
+    {
+      "epoch": 0.07309273640931932,
+      "grad_norm": 8.515913036269723,
+      "learning_rate": 2.9700806404245893e-05,
+      "loss": 0.6755,
+      "step": 1120
+    },
+    {
+      "epoch": 0.07374535012725968,
+      "grad_norm": 8.702591994450561,
+      "learning_rate": 2.9694601071643607e-05,
+      "loss": 0.743,
+      "step": 1130
+    },
+    {
+      "epoch": 0.07439796384520003,
+      "grad_norm": 20.204623397644042,
+      "learning_rate": 2.968833270959562e-05,
+      "loss": 0.6995,
+      "step": 1140
+    },
+    {
+      "epoch": 0.07505057756314037,
+      "grad_norm": 3.4150625200259563,
+      "learning_rate": 2.9682001344988768e-05,
+      "loss": 0.7245,
+      "step": 1150
+    },
+    {
+      "epoch": 0.07570319128108073,
+      "grad_norm": 4.827412673105033,
+      "learning_rate": 2.967560700498013e-05,
+      "loss": 0.6764,
+      "step": 1160
+    },
+    {
+      "epoch": 0.07635580499902107,
+      "grad_norm": 5.9778449783108965,
+      "learning_rate": 2.9669149716996897e-05,
+      "loss": 0.7094,
+      "step": 1170
+    },
+    {
+      "epoch": 0.07700841871696143,
+      "grad_norm": 4.626419468156439,
+      "learning_rate": 2.9662629508736278e-05,
+      "loss": 0.7139,
+      "step": 1180
+    },
+    {
+      "epoch": 0.07766103243490179,
+      "grad_norm": 8.23953369228554,
+      "learning_rate": 2.9656046408165344e-05,
+      "loss": 0.7132,
+      "step": 1190
+    },
+    {
+      "epoch": 0.07831364615284213,
+      "grad_norm": 5.755275462407804,
+      "learning_rate": 2.964940044352095e-05,
+      "loss": 0.6923,
+      "step": 1200
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 15323,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 400,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.280284708293837e+18,
+  "train_batch_size": 8,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3a6a5052a9445cc570063f5939fdeea3ff8007e9c2718674bb335b9eea0bfff
+size 6520

checkpoint-1200/zero_to_fp32.py ADDED Viewed

	@@ -0,0 +1,587 @@

+#!/usr/bin/env python
+# Copyright (c) Microsoft Corporation.
+# SPDX-License-Identifier: Apache-2.0
+# DeepSpeed Team
+# This script extracts fp32 consolidated weights from a zero 1, 2 and 3 DeepSpeed checkpoints. It gets
+# copied into the top level checkpoint dir, so the user can easily do the conversion at any point in
+# the future. Once extracted, the weights don't require DeepSpeed and can be used in any
+# application.
+#
+# example: python zero_to_fp32.py . pytorch_model.bin
+import argparse
+import torch
+import glob
+import math
+import os
+import re
+from collections import OrderedDict
+from dataclasses import dataclass
+# while this script doesn't use deepspeed to recover data, since the checkpoints are pickled with
+# DeepSpeed data structures it has to be available in the current python environment.
+from deepspeed.utils import logger
+from deepspeed.checkpoint.constants import (DS_VERSION, OPTIMIZER_STATE_DICT, SINGLE_PARTITION_OF_FP32_GROUPS,
+                                            FP32_FLAT_GROUPS, ZERO_STAGE, PARTITION_COUNT, PARAM_SHAPES, BUFFER_NAMES,
+                                            FROZEN_PARAM_SHAPES, FROZEN_PARAM_FRAGMENTS)
+@dataclass
+class zero_model_state:
+    buffers: dict()
+    param_shapes: dict()
+    shared_params: list
+    ds_version: int
+    frozen_param_shapes: dict()
+    frozen_param_fragments: dict()
+debug = 0
+# load to cpu
+device = torch.device('cpu')
+def atoi(text):
+    return int(text) if text.isdigit() else text
+def natural_keys(text):
+    '''
+    alist.sort(key=natural_keys) sorts in human order
+    http://nedbatchelder.com/blog/200712/human_sorting.html
+    (See Toothy's implementation in the comments)
+    '''
+    return [atoi(c) for c in re.split(r'(\d+)', text)]
+def get_model_state_file(checkpoint_dir, zero_stage):
+    if not os.path.isdir(checkpoint_dir):
+        raise FileNotFoundError(f"Directory '{checkpoint_dir}' doesn't exist")
+    # there should be only one file
+    if zero_stage <= 2:
+        file = os.path.join(checkpoint_dir, "mp_rank_00_model_states.pt")
+    elif zero_stage == 3:
+        file = os.path.join(checkpoint_dir, "zero_pp_rank_0_mp_rank_00_model_states.pt")
+    if not os.path.exists(file):
+        raise FileNotFoundError(f"can't find model states file at '{file}'")
+    return file
+def get_checkpoint_files(checkpoint_dir, glob_pattern):
+    # XXX: need to test that this simple glob rule works for multi-node setup too
+    ckpt_files = sorted(glob.glob(os.path.join(checkpoint_dir, glob_pattern)), key=natural_keys)
+    if len(ckpt_files) == 0:
+        raise FileNotFoundError(f"can't find {glob_pattern} files in directory '{checkpoint_dir}'")
+    return ckpt_files
+def get_optim_files(checkpoint_dir):
+    return get_checkpoint_files(checkpoint_dir, "*_optim_states.pt")
+def get_model_state_files(checkpoint_dir):
+    return get_checkpoint_files(checkpoint_dir, "*_model_states.pt")
+def parse_model_states(files):
+    zero_model_states = []
+    for file in files:
+        state_dict = torch.load(file, map_location=device)
+        if BUFFER_NAMES not in state_dict:
+            raise ValueError(f"{file} is not a model state checkpoint")
+        buffer_names = state_dict[BUFFER_NAMES]
+        if debug:
+            print("Found buffers:", buffer_names)
+        # recover just the buffers while restoring them to fp32 if they were saved in fp16
+        buffers = {k: v.float() for k, v in state_dict["module"].items() if k in buffer_names}
+        param_shapes = state_dict[PARAM_SHAPES]
+        # collect parameters that are included in param_shapes
+        param_names = []
+        for s in param_shapes:
+            for name in s.keys():
+                param_names.append(name)
+        # update with frozen parameters
+        frozen_param_shapes = state_dict.get(FROZEN_PARAM_SHAPES, None)
+        if frozen_param_shapes is not None:
+            if debug:
+                print(f"Found frozen_param_shapes: {frozen_param_shapes}")
+            param_names += list(frozen_param_shapes.keys())
+        # handle shared params
+        shared_params = [[k, v] for k, v in state_dict["shared_params"].items()]
+        ds_version = state_dict.get(DS_VERSION, None)
+        frozen_param_fragments = state_dict.get(FROZEN_PARAM_FRAGMENTS, None)
+        z_model_state = zero_model_state(buffers=buffers,
+                                         param_shapes=param_shapes,
+                                         shared_params=shared_params,
+                                         ds_version=ds_version,
+                                         frozen_param_shapes=frozen_param_shapes,
+                                         frozen_param_fragments=frozen_param_fragments)
+        zero_model_states.append(z_model_state)
+    return zero_model_states
+def parse_optim_states(files, ds_checkpoint_dir):
+    total_files = len(files)
+    state_dicts = []
+    for f in files:
+        state_dict = torch.load(f, map_location=device)
+        # immediately discard the potentially huge 2 optimizer states as we only care for fp32 master weights
+        # and also handle the case where it was already removed by another helper script
+        state_dict["optimizer_state_dict"].pop("optimizer_state_dict", None)
+        state_dicts.append(state_dict)
+    if not ZERO_STAGE in state_dicts[0][OPTIMIZER_STATE_DICT]:
+        raise ValueError(f"{files[0]} is not a zero checkpoint")
+    zero_stage = state_dicts[0][OPTIMIZER_STATE_DICT][ZERO_STAGE]
+    world_size = state_dicts[0][OPTIMIZER_STATE_DICT][PARTITION_COUNT]
+    # For ZeRO-2 each param group can have different partition_count as data parallelism for expert
+    # parameters can be different from data parallelism for non-expert parameters. So we can just
+    # use the max of the partition_count to get the dp world_size.
+    if type(world_size) is list:
+        world_size = max(world_size)
+    if world_size != total_files:
+        raise ValueError(
+            f"Expected {world_size} of '*_optim_states.pt' under '{ds_checkpoint_dir}' but found {total_files} files. "
+            "Possibly due to an overwrite of an old checkpoint, or a checkpoint didn't get saved by one or more processes."
+        )
+    # the groups are named differently in each stage
+    if zero_stage <= 2:
+        fp32_groups_key = SINGLE_PARTITION_OF_FP32_GROUPS
+    elif zero_stage == 3:
+        fp32_groups_key = FP32_FLAT_GROUPS
+    else:
+        raise ValueError(f"unknown zero stage {zero_stage}")
+    if zero_stage <= 2:
+        fp32_flat_groups = [state_dicts[i][OPTIMIZER_STATE_DICT][fp32_groups_key] for i in range(len(state_dicts))]
+    elif zero_stage == 3:
+        # if there is more than one param group, there will be multiple flattened tensors - one
+        # flattened tensor per group - for simplicity merge them into a single tensor
+        #
+        # XXX: could make the script more memory efficient for when there are multiple groups - it
+        # will require matching the sub-lists of param_shapes for each param group flattened tensor
+        fp32_flat_groups = [
+            torch.cat(state_dicts[i][OPTIMIZER_STATE_DICT][fp32_groups_key], 0) for i in range(len(state_dicts))
+        ]
+    return zero_stage, world_size, fp32_flat_groups
+def _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir):
+    """
+    Returns fp32 state_dict reconstructed from ds checkpoint
+    Args:
+        - ``ds_checkpoint_dir``: path to the deepspeed checkpoint folder (where the optimizer files are)
+    """
+    print(f"Processing zero checkpoint '{ds_checkpoint_dir}'")
+    optim_files = get_optim_files(ds_checkpoint_dir)
+    zero_stage, world_size, fp32_flat_groups = parse_optim_states(optim_files, ds_checkpoint_dir)
+    print(f"Detected checkpoint of type zero stage {zero_stage}, world_size: {world_size}")
+    model_files = get_model_state_files(ds_checkpoint_dir)
+    zero_model_states = parse_model_states(model_files)
+    print(f'Parsing checkpoint created by deepspeed=={zero_model_states[0].ds_version}')
+    if zero_stage <= 2:
+        return _get_fp32_state_dict_from_zero2_checkpoint(world_size, fp32_flat_groups, zero_model_states)
+    elif zero_stage == 3:
+        return _get_fp32_state_dict_from_zero3_checkpoint(world_size, fp32_flat_groups, zero_model_states)
+def _zero2_merge_frozen_params(state_dict, zero_model_states):
+    if zero_model_states[0].frozen_param_shapes is None or len(zero_model_states[0].frozen_param_shapes) == 0:
+        return
+    frozen_param_shapes = zero_model_states[0].frozen_param_shapes
+    frozen_param_fragments = zero_model_states[0].frozen_param_fragments
+    if debug:
+        num_elem = sum(s.numel() for s in frozen_param_shapes.values())
+        print(f'rank 0: {FROZEN_PARAM_SHAPES}.numel = {num_elem}')
+        wanted_params = len(frozen_param_shapes)
+        wanted_numel = sum(s.numel() for s in frozen_param_shapes.values())
+        avail_numel = sum([p.numel() for p in frozen_param_fragments.values()])
+        print(f'Frozen params: Have {avail_numel} numels to process.')
+        print(f'Frozen params: Need {wanted_numel} numels in {wanted_params} params')
+    total_params = 0
+    total_numel = 0
+    for name, shape in frozen_param_shapes.items():
+        total_params += 1
+        unpartitioned_numel = shape.numel()
+        total_numel += unpartitioned_numel
+        state_dict[name] = frozen_param_fragments[name]
+        if debug:
+            print(f"{name} full shape: {shape} unpartitioned numel {unpartitioned_numel} ")
+    print(f"Reconstructed Frozen fp32 state dict with {total_params} params {total_numel} elements")
+def _zero2_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states):
+    param_shapes = zero_model_states[0].param_shapes
+    # Reconstruction protocol:
+    #
+    # XXX: document this
+    if debug:
+        for i in range(world_size):
+            for j in range(len(fp32_flat_groups[0])):
+                print(f"{FP32_FLAT_GROUPS}[{i}][{j}].shape={fp32_flat_groups[i][j].shape}")
+    # XXX: memory usage doubles here (zero2)
+    num_param_groups = len(fp32_flat_groups[0])
+    merged_single_partition_of_fp32_groups = []
+    for i in range(num_param_groups):
+        merged_partitions = [sd[i] for sd in fp32_flat_groups]
+        full_single_fp32_vector = torch.cat(merged_partitions, 0)
+        merged_single_partition_of_fp32_groups.append(full_single_fp32_vector)
+    avail_numel = sum(
+        [full_single_fp32_vector.numel() for full_single_fp32_vector in merged_single_partition_of_fp32_groups])
+    if debug:
+        wanted_params = sum([len(shapes) for shapes in param_shapes])
+        wanted_numel = sum([sum(shape.numel() for shape in shapes.values()) for shapes in param_shapes])
+        # not asserting if there is a mismatch due to possible padding
+        print(f"Have {avail_numel} numels to process.")
+        print(f"Need {wanted_numel} numels in {wanted_params} params.")
+    # params
+    # XXX: for huge models that can't fit into the host's RAM we will have to recode this to support
+    # out-of-core computing solution
+    total_numel = 0
+    total_params = 0
+    for shapes, full_single_fp32_vector in zip(param_shapes, merged_single_partition_of_fp32_groups):
+        offset = 0
+        avail_numel = full_single_fp32_vector.numel()
+        for name, shape in shapes.items():
+            unpartitioned_numel = shape.numel()
+            total_numel += unpartitioned_numel
+            total_params += 1
+            if debug:
+                print(f"{name} full shape: {shape} unpartitioned numel {unpartitioned_numel} ")
+            state_dict[name] = full_single_fp32_vector.narrow(0, offset, unpartitioned_numel).view(shape)
+            offset += unpartitioned_numel
+        # Z2 started to align to 2*world_size to improve nccl performance. Therefore both offset and
+        # avail_numel can differ by anywhere between 0..2*world_size. Due to two unrelated complex
+        # paddings performed in the code it's almost impossible to predict the exact numbers w/o the
+        # live optimizer object, so we are checking that the numbers are within the right range
+        align_to = 2 * world_size
+        def zero2_align(x):
+            return align_to * math.ceil(x / align_to)
+        if debug:
+            print(f"original offset={offset}, avail_numel={avail_numel}")
+        offset = zero2_align(offset)
+        avail_numel = zero2_align(avail_numel)
+        if debug:
+            print(f"aligned  offset={offset}, avail_numel={avail_numel}")
+        # Sanity check
+        if offset != avail_numel:
+            raise ValueError(f"consumed {offset} numels out of {avail_numel} - something is wrong")
+    print(f"Reconstructed fp32 state dict with {total_params} params {total_numel} elements")
+def _get_fp32_state_dict_from_zero2_checkpoint(world_size, fp32_flat_groups, zero_model_states):
+    state_dict = OrderedDict()
+    # buffers
+    buffers = zero_model_states[0].buffers
+    state_dict.update(buffers)
+    if debug:
+        print(f"added {len(buffers)} buffers")
+    _zero2_merge_frozen_params(state_dict, zero_model_states)
+    _zero2_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states)
+    # recover shared parameters
+    for pair in zero_model_states[0].shared_params:
+        if pair[1] in state_dict:
+            state_dict[pair[0]] = state_dict[pair[1]]
+    return state_dict
+def zero3_partitioned_param_info(unpartitioned_numel, world_size):
+    remainder = unpartitioned_numel % world_size
+    padding_numel = (world_size - remainder) if remainder else 0
+    partitioned_numel = math.ceil(unpartitioned_numel / world_size)
+    return partitioned_numel, padding_numel
+def _zero3_merge_frozen_params(state_dict, world_size, zero_model_states):
+    if zero_model_states[0].frozen_param_shapes is None or len(zero_model_states[0].frozen_param_shapes) == 0:
+        return
+    if debug:
+        for i in range(world_size):
+            num_elem = sum(s.numel() for s in zero_model_states[i].frozen_param_fragments.values())
+            print(f'rank {i}: {FROZEN_PARAM_SHAPES}.numel = {num_elem}')
+        frozen_param_shapes = zero_model_states[0].frozen_param_shapes
+        wanted_params = len(frozen_param_shapes)
+        wanted_numel = sum(s.numel() for s in frozen_param_shapes.values())
+        avail_numel = sum([p.numel() for p in zero_model_states[0].frozen_param_fragments.values()]) * world_size
+        print(f'Frozen params: Have {avail_numel} numels to process.')
+        print(f'Frozen params: Need {wanted_numel} numels in {wanted_params} params')
+    total_params = 0
+    total_numel = 0
+    for name, shape in zero_model_states[0].frozen_param_shapes.items():
+        total_params += 1
+        unpartitioned_numel = shape.numel()
+        total_numel += unpartitioned_numel
+        param_frags = tuple(model_state.frozen_param_fragments[name] for model_state in zero_model_states)
+        state_dict[name] = torch.cat(param_frags, 0).narrow(0, 0, unpartitioned_numel).view(shape)
+        partitioned_numel, partitioned_padding_numel = zero3_partitioned_param_info(unpartitioned_numel, world_size)
+        if debug:
+            print(
+                f"Frozen params: {total_params} {name} full shape: {shape} partition0 numel={partitioned_numel} partitioned_padding_numel={partitioned_padding_numel}"
+            )
+    print(f"Reconstructed Frozen fp32 state dict with {total_params} params {total_numel} elements")
+def _zero3_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states):
+    param_shapes = zero_model_states[0].param_shapes
+    avail_numel = fp32_flat_groups[0].numel() * world_size
+    # Reconstruction protocol: For zero3 we need to zip the partitions together at boundary of each
+    # param, re-consolidating each param, while dealing with padding if any
+    # merge list of dicts, preserving order
+    param_shapes = {k: v for d in param_shapes for k, v in d.items()}
+    if debug:
+        for i in range(world_size):
+            print(f"{FP32_FLAT_GROUPS}[{i}].shape={fp32_flat_groups[i].shape}")
+        wanted_params = len(param_shapes)
+        wanted_numel = sum(shape.numel() for shape in param_shapes.values())
+        # not asserting if there is a mismatch due to possible padding
+        avail_numel = fp32_flat_groups[0].numel() * world_size
+        print(f"Trainable params: Have {avail_numel} numels to process.")
+        print(f"Trainable params: Need {wanted_numel} numels in {wanted_params} params.")
+    # params
+    # XXX: for huge models that can't fit into the host's RAM we will have to recode this to support
+    # out-of-core computing solution
+    offset = 0
+    total_numel = 0
+    total_params = 0
+    for name, shape in param_shapes.items():
+        unpartitioned_numel = shape.numel()
+        total_numel += unpartitioned_numel
+        total_params += 1
+        partitioned_numel, partitioned_padding_numel = zero3_partitioned_param_info(unpartitioned_numel, world_size)
+        if debug:
+            print(
+                f"Trainable params: {total_params} {name} full shape: {shape} partition0 numel={partitioned_numel} partitioned_padding_numel={partitioned_padding_numel}"
+            )
+        # XXX: memory usage doubles here
+        state_dict[name] = torch.cat(
+            tuple(fp32_flat_groups[i].narrow(0, offset, partitioned_numel) for i in range(world_size)),
+            0).narrow(0, 0, unpartitioned_numel).view(shape)
+        offset += partitioned_numel
+    offset *= world_size
+    # Sanity check
+    if offset != avail_numel:
+        raise ValueError(f"consumed {offset} numels out of {avail_numel} - something is wrong")
+    print(f"Reconstructed Trainable fp32 state dict with {total_params} params {total_numel} elements")
+def _get_fp32_state_dict_from_zero3_checkpoint(world_size, fp32_flat_groups, zero_model_states):
+    state_dict = OrderedDict()
+    # buffers
+    buffers = zero_model_states[0].buffers
+    state_dict.update(buffers)
+    if debug:
+        print(f"added {len(buffers)} buffers")
+    _zero3_merge_frozen_params(state_dict, world_size, zero_model_states)
+    _zero3_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states)
+    # recover shared parameters
+    for pair in zero_model_states[0].shared_params:
+        if pair[1] in state_dict:
+            state_dict[pair[0]] = state_dict[pair[1]]
+    return state_dict
+def get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, tag=None):
+    """
+    Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated state_dict that can be loaded with
+    ``load_state_dict()`` and used for training without DeepSpeed or shared with others, for example
+    via a model hub.
+    Args:
+        - ``checkpoint_dir``: path to the desired checkpoint folder
+        - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in 'latest' file. e.g., ``global_step14``
+    Returns:
+        - pytorch ``state_dict``
+    Note: this approach may not work if your application doesn't have sufficient free CPU memory and
+    you may need to use the offline approach using the ``zero_to_fp32.py`` script that is saved with
+    the checkpoint.
+    A typical usage might be ::
+        from deepspeed.utils.zero_to_fp32 import get_fp32_state_dict_from_zero_checkpoint
+        # do the training and checkpoint saving
+        state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir) # already on cpu
+        model = model.cpu() # move to cpu
+        model.load_state_dict(state_dict)
+        # submit to model hub or save the model to share with others
+    In this example the ``model`` will no longer be usable in the deepspeed context of the same
+    application. i.e. you will need to re-initialize the deepspeed engine, since
+    ``model.load_state_dict(state_dict)`` will remove all the deepspeed magic from it.
+    If you want it all done for you, use ``load_state_dict_from_zero_checkpoint`` instead.
+    """
+    if tag is None:
+        latest_path = os.path.join(checkpoint_dir, 'latest')
+        if os.path.isfile(latest_path):
+            with open(latest_path, 'r') as fd:
+                tag = fd.read().strip()
+        else:
+            raise ValueError(f"Unable to find 'latest' file at {latest_path}")
+    ds_checkpoint_dir = os.path.join(checkpoint_dir, tag)
+    if not os.path.isdir(ds_checkpoint_dir):
+        raise FileNotFoundError(f"Directory '{ds_checkpoint_dir}' doesn't exist")
+    return _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir)
+def convert_zero_checkpoint_to_fp32_state_dict(checkpoint_dir, output_file, tag=None):
+    """
+    Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated ``state_dict`` file that can be
+    loaded with ``torch.load(file)`` + ``load_state_dict()`` and used for training without DeepSpeed.
+    Args:
+        - ``checkpoint_dir``: path to the desired checkpoint folder. (one that contains the tag-folder, like ``global_step14``)
+        - ``output_file``: path to the pytorch fp32 state_dict output file (e.g. path/pytorch_model.bin)
+        - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in the file named ``latest`` in the checkpoint folder, e.g., ``global_step14``
+    """
+    state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, tag)
+    print(f"Saving fp32 state dict to {output_file}")
+    torch.save(state_dict, output_file)
+def load_state_dict_from_zero_checkpoint(model, checkpoint_dir, tag=None):
+    """
+    1. Put the provided model to cpu
+    2. Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated ``state_dict``
+    3. Load it into the provided model
+    Args:
+        - ``model``: the model object to update
+        - ``checkpoint_dir``: path to the desired checkpoint folder. (one that contains the tag-folder, like ``global_step14``)
+        - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in the file named ``latest`` in the checkpoint folder, e.g., ``global_step14``
+    Returns:
+        - ``model`: modified model
+    Make sure you have plenty of CPU memory available before you call this function. If you don't
+    have enough use the ``zero_to_fp32.py`` utility to do the conversion. You will find it
+    conveniently placed for you in the checkpoint folder.
+    A typical usage might be ::
+        from deepspeed.utils.zero_to_fp32 import load_state_dict_from_zero_checkpoint
+        model = load_state_dict_from_zero_checkpoint(trainer.model, checkpoint_dir)
+        # submit to model hub or save the model to share with others
+    Note, that once this was run, the ``model`` will no longer be usable in the deepspeed context
+    of the same application. i.e. you will need to re-initialize the deepspeed engine, since
+    ``model.load_state_dict(state_dict)`` will remove all the deepspeed magic from it.
+    """
+    logger.info(f"Extracting fp32 weights")
+    state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, tag)
+    logger.info(f"Overwriting model with fp32 weights")
+    model = model.cpu()
+    model.load_state_dict(state_dict, strict=False)
+    return model
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("checkpoint_dir",
+                        type=str,
+                        help="path to the desired checkpoint folder, e.g., path/checkpoint-12")
+    parser.add_argument(
+        "output_file",
+        type=str,
+        help="path to the pytorch fp32 state_dict output file (e.g. path/checkpoint-12/pytorch_model.bin)")
+    parser.add_argument("-t",
+                        "--tag",
+                        type=str,
+                        default=None,
+                        help="checkpoint tag used as a unique identifier for checkpoint. e.g., global_step1")
+    parser.add_argument("-d", "--debug", action='store_true', help="enable debug")
+    args = parser.parse_args()
+    debug = args.debug
+    convert_zero_checkpoint_to_fp32_state_dict(args.checkpoint_dir, args.output_file, tag=args.tag)

checkpoint-1600/README.md ADDED Viewed

	@@ -0,0 +1,203 @@

+---
+library_name: peft
+base_model: Qwen/Qwen-VL-Chat
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.10.0
+- PEFT 0.11.1

checkpoint-1600/adapter_config.json ADDED Viewed

	@@ -0,0 +1,380 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen-VL-Chat",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "transformer.h.16.mlp.w1",
+    "transformer.visual.transformer.resblocks.13.attn.out_proj",
+    "transformer.h.28.mlp.w1",
+    "transformer.h.16.attn.c_attn",
+    "transformer.h.3.mlp.w1",
+    "transformer.visual.transformer.resblocks.29.attn.in_proj",
+    "transformer.visual.transformer.resblocks.19.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.47.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.34.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.4.attn.out_proj",
+    "transformer.h.31.attn.c_attn",
+    "transformer.h.16.mlp.w2",
+    "transformer.visual.transformer.resblocks.5.attn.out_proj",
+    "transformer.h.2.mlp.w1",
+    "transformer.visual.transformer.resblocks.7.attn.in_proj",
+    "transformer.h.20.mlp.w2",
+    "transformer.h.19.mlp.w1",
+    "transformer.visual.transformer.resblocks.18.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.27.attn.out_proj",
+    "transformer.visual.transformer.resblocks.10.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.43.mlp.c_fc",
+    "transformer.h.5.mlp.w1",
+    "transformer.visual.transformer.resblocks.15.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.25.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.10.attn.out_proj",
+    "transformer.visual.transformer.resblocks.4.mlp.c_fc",
+    "transformer.h.31.mlp.w2",
+    "transformer.visual.transformer.resblocks.37.attn.out_proj",
+    "transformer.h.8.attn.c_proj",
+    "transformer.h.29.attn.c_attn",
+    "transformer.visual.transformer.resblocks.24.mlp.c_proj",
+    "transformer.h.19.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.11.attn.out_proj",
+    "transformer.h.13.mlp.c_proj",
+    "transformer.h.27.mlp.c_proj",
+    "transformer.h.31.mlp.w1",
+    "transformer.visual.transformer.resblocks.7.mlp.c_proj",
+    "transformer.h.28.mlp.w2",
+    "transformer.visual.transformer.resblocks.3.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.13.attn.in_proj",
+    "transformer.h.21.attn.c_attn",
+    "transformer.visual.transformer.resblocks.23.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.33.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.42.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.3.attn.in_proj",
+    "transformer.h.13.mlp.w1",
+    "transformer.visual.transformer.resblocks.22.attn.out_proj",
+    "transformer.visual.transformer.resblocks.20.mlp.c_fc",
+    "transformer.h.26.mlp.w2",
+    "transformer.h.14.attn.c_attn",
+    "transformer.h.16.attn.c_proj",
+    "transformer.h.1.mlp.w1",
+    "transformer.visual.transformer.resblocks.21.attn.out_proj",
+    "transformer.visual.transformer.resblocks.39.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.4.attn.in_proj",
+    "transformer.h.29.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.12.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.14.attn.in_proj",
+    "transformer.h.28.attn.c_proj",
+    "transformer.h.18.mlp.w1",
+    "transformer.h.27.mlp.w2",
+    "transformer.h.18.attn.c_attn",
+    "transformer.visual.transformer.resblocks.33.attn.out_proj",
+    "transformer.h.5.mlp.w2",
+    "transformer.visual.transformer.resblocks.37.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.2.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.42.attn.out_proj",
+    "transformer.visual.transformer.resblocks.15.attn.in_proj",
+    "transformer.visual.transformer.resblocks.6.mlp.c_fc",
+    "transformer.h.13.mlp.w2",
+    "transformer.h.23.attn.c_proj",
+    "transformer.h.20.mlp.c_proj",
+    "transformer.h.14.mlp.w2",
+    "transformer.visual.transformer.resblocks.9.attn.in_proj",
+    "transformer.visual.transformer.resblocks.46.attn.in_proj",
+    "transformer.h.9.attn.c_attn",
+    "transformer.visual.transformer.resblocks.36.mlp.c_proj",
+    "transformer.h.31.attn.c_proj",
+    "transformer.visual.transformer.resblocks.19.mlp.c_fc",
+    "transformer.h.17.mlp.w1",
+    "transformer.h.2.attn.c_proj",
+    "transformer.visual.transformer.resblocks.47.attn.in_proj",
+    "transformer.visual.transformer.resblocks.45.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.46.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.27.attn.in_proj",
+    "transformer.visual.transformer.resblocks.26.attn.out_proj",
+    "transformer.h.22.attn.c_proj",
+    "transformer.visual.transformer.resblocks.40.attn.out_proj",
+    "transformer.visual.transformer.resblocks.46.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.18.attn.out_proj",
+    "transformer.h.27.attn.c_proj",
+    "transformer.visual.transformer.resblocks.26.attn.in_proj",
+    "transformer.h.4.mlp.w1",
+    "transformer.h.10.attn.c_proj",
+    "transformer.h.6.attn.c_attn",
+    "transformer.h.2.attn.c_attn",
+    "transformer.h.22.mlp.w1",
+    "transformer.visual.transformer.resblocks.39.mlp.c_fc",
+    "transformer.h.8.mlp.w2",
+    "transformer.h.4.attn.c_attn",
+    "transformer.h.26.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.29.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.5.mlp.c_proj",
+    "transformer.h.11.mlp.c_proj",
+    "transformer.h.0.mlp.w2",
+    "transformer.visual.transformer.resblocks.36.attn.out_proj",
+    "transformer.h.29.mlp.w1",
+    "transformer.h.12.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.2.attn.in_proj",
+    "transformer.visual.transformer.resblocks.2.mlp.c_fc",
+    "transformer.h.25.attn.c_attn",
+    "transformer.visual.transformer.resblocks.19.attn.in_proj",
+    "transformer.visual.transformer.resblocks.43.attn.out_proj",
+    "transformer.visual.transformer.resblocks.35.attn.out_proj",
+    "transformer.h.22.attn.c_attn",
+    "transformer.h.0.mlp.w1",
+    "transformer.h.3.attn.c_attn",
+    "transformer.h.28.attn.c_attn",
+    "transformer.visual.transformer.resblocks.25.attn.in_proj",
+    "transformer.visual.transformer.resblocks.34.attn.out_proj",
+    "transformer.h.21.attn.c_proj",
+    "transformer.h.6.attn.c_proj",
+    "transformer.visual.transformer.resblocks.11.mlp.c_proj",
+    "transformer.h.13.attn.c_attn",
+    "transformer.visual.transformer.resblocks.38.attn.out_proj",
+    "transformer.h.3.attn.c_proj",
+    "transformer.visual.transformer.resblocks.17.mlp.c_fc",
+    "transformer.h.26.mlp.w1",
+    "transformer.visual.transformer.resblocks.36.mlp.c_fc",
+    "transformer.h.26.attn.c_attn",
+    "transformer.visual.transformer.resblocks.29.attn.out_proj",
+    "transformer.h.7.mlp.w1",
+    "transformer.visual.transformer.resblocks.40.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.9.attn.out_proj",
+    "transformer.h.3.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.26.mlp.c_fc",
+    "transformer.h.11.mlp.w2",
+    "transformer.visual.transformer.resblocks.33.attn.in_proj",
+    "transformer.visual.transformer.resblocks.42.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.32.attn.out_proj",
+    "transformer.h.4.attn.c_proj",
+    "transformer.visual.transformer.resblocks.27.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.11.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.25.attn.out_proj",
+    "transformer.visual.transformer.resblocks.23.attn.in_proj",
+    "transformer.h.5.attn.c_attn",
+    "transformer.h.16.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.14.mlp.c_proj",
+    "transformer.h.22.mlp.w2",
+    "transformer.h.25.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.10.mlp.c_fc",
+    "transformer.h.24.mlp.c_proj",
+    "transformer.h.19.mlp.w2",
+    "transformer.h.14.mlp.w1",
+    "transformer.visual.transformer.resblocks.40.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.28.attn.out_proj",
+    "transformer.visual.transformer.resblocks.24.mlp.c_fc",
+    "transformer.h.8.attn.c_attn",
+    "transformer.h.9.mlp.w1",
+    "transformer.h.6.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.19.attn.out_proj",
+    "transformer.visual.transformer.resblocks.32.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.7.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.44.attn.in_proj",
+    "transformer.visual.transformer.resblocks.34.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.9.mlp.c_fc",
+    "transformer.visual.conv1",
+    "transformer.visual.transformer.resblocks.8.attn.out_proj",
+    "transformer.h.23.mlp.w2",
+    "transformer.h.7.mlp.w2",
+    "transformer.h.24.attn.c_proj",
+    "transformer.h.30.attn.c_proj",
+    "transformer.h.29.attn.c_proj",
+    "transformer.visual.transformer.resblocks.9.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.35.attn.in_proj",
+    "transformer.visual.transformer.resblocks.21.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.41.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.38.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.13.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.41.attn.out_proj",
+    "transformer.visual.transformer.resblocks.16.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.45.attn.out_proj",
+    "transformer.h.11.mlp.w1",
+    "transformer.visual.transformer.resblocks.16.attn.in_proj",
+    "transformer.visual.transformer.resblocks.47.attn.out_proj",
+    "transformer.h.9.attn.c_proj",
+    "transformer.h.31.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.12.attn.in_proj",
+    "transformer.visual.transformer.resblocks.28.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.20.attn.out_proj",
+    "transformer.h.12.attn.c_attn",
+    "transformer.h.24.mlp.w1",
+    "transformer.visual.transformer.resblocks.21.attn.in_proj",
+    "transformer.visual.transformer.resblocks.41.attn.in_proj",
+    "transformer.h.10.mlp.w1",
+    "transformer.h.1.mlp.w2",
+    "transformer.h.0.mlp.c_proj",
+    "transformer.h.22.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.18.attn.in_proj",
+    "transformer.visual.transformer.resblocks.38.mlp.c_proj",
+    "transformer.h.12.mlp.w1",
+    "transformer.h.1.attn.c_attn",
+    "transformer.visual.transformer.resblocks.31.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.44.mlp.c_proj",
+    "transformer.h.15.mlp.c_proj",
+    "transformer.h.6.mlp.w1",
+    "transformer.visual.transformer.resblocks.16.mlp.c_proj",
+    "transformer.h.13.attn.c_proj",
+    "transformer.h.15.attn.c_attn",
+    "transformer.h.15.mlp.w1",
+    "transformer.h.17.mlp.w2",
+    "transformer.visual.transformer.resblocks.10.attn.in_proj",
+    "transformer.h.26.attn.c_proj",
+    "transformer.visual.transformer.resblocks.20.attn.in_proj",
+    "transformer.h.10.mlp.w2",
+    "transformer.h.24.attn.c_attn",
+    "transformer.h.8.mlp.w1",
+    "transformer.h.23.mlp.w1",
+    "transformer.visual.transformer.resblocks.1.mlp.c_proj",
+    "transformer.h.4.mlp.w2",
+    "transformer.visual.transformer.resblocks.38.attn.in_proj",
+    "transformer.h.12.mlp.w2",
+    "transformer.h.7.attn.c_proj",
+    "transformer.h.4.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.31.attn.out_proj",
+    "transformer.visual.transformer.resblocks.17.mlp.c_proj",
+    "transformer.h.21.mlp.w2",
+    "transformer.visual.transformer.resblocks.5.attn.in_proj",
+    "transformer.h.18.attn.c_proj",
+    "transformer.visual.transformer.resblocks.31.mlp.c_fc",
+    "transformer.h.18.mlp.w2",
+    "transformer.visual.transformer.resblocks.6.attn.out_proj",
+    "transformer.visual.transformer.resblocks.8.attn.in_proj",
+    "transformer.visual.transformer.resblocks.30.mlp.c_proj",
+    "transformer.h.30.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.30.attn.out_proj",
+    "transformer.visual.transformer.resblocks.16.attn.out_proj",
+    "transformer.visual.transformer.resblocks.14.attn.out_proj",
+    "transformer.h.25.mlp.w1",
+    "transformer.visual.transformer.resblocks.45.attn.in_proj",
+    "transformer.h.11.attn.c_proj",
+    "transformer.visual.transformer.resblocks.30.attn.in_proj",
+    "transformer.visual.transformer.resblocks.43.mlp.c_proj",
+    "transformer.h.10.mlp.c_proj",
+    "transformer.h.21.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.43.attn.in_proj",
+    "transformer.visual.transformer.resblocks.3.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.44.attn.out_proj",
+    "transformer.h.23.attn.c_attn",
+    "transformer.visual.transformer.resblocks.22.attn.in_proj",
+    "transformer.visual.transformer.resblocks.6.attn.in_proj",
+    "transformer.visual.transformer.resblocks.44.mlp.c_fc",
+    "transformer.h.17.attn.c_attn",
+    "transformer.h.7.attn.c_attn",
+    "transformer.visual.transformer.resblocks.42.attn.in_proj",
+    "transformer.visual.transformer.resblocks.20.mlp.c_proj",
+    "transformer.h.8.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.17.attn.out_proj",
+    "transformer.h.14.attn.c_proj",
+    "transformer.visual.transformer.resblocks.40.attn.in_proj",
+    "transformer.h.25.attn.c_proj",
+    "transformer.h.28.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.35.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.36.attn.in_proj",
+    "transformer.visual.transformer.resblocks.41.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.14.mlp.c_fc",
+    "transformer.h.30.mlp.w2",
+    "transformer.h.20.mlp.w1",
+    "transformer.visual.transformer.resblocks.33.mlp.c_fc",
+    "transformer.h.29.mlp.w2",
+    "transformer.visual.transformer.resblocks.47.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.30.mlp.c_fc",
+    "transformer.h.10.attn.c_attn",
+    "transformer.visual.transformer.resblocks.1.attn.in_proj",
+    "transformer.h.1.attn.c_proj",
+    "transformer.visual.transformer.resblocks.8.mlp.c_proj",
+    "transformer.h.19.attn.c_proj",
+    "transformer.visual.transformer.resblocks.37.attn.in_proj",
+    "transformer.h.15.attn.c_proj",
+    "transformer.h.5.attn.c_proj",
+    "transformer.visual.transformer.resblocks.32.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.3.attn.out_proj",
+    "transformer.visual.transformer.resblocks.32.attn.in_proj",
+    "transformer.h.21.mlp.w1",
+    "transformer.h.23.mlp.c_proj",
+    "transformer.h.30.mlp.w1",
+    "transformer.h.0.attn.c_attn",
+    "transformer.visual.transformer.resblocks.24.attn.out_proj",
+    "transformer.visual.transformer.resblocks.31.attn.in_proj",
+    "transformer.h.18.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.25.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.22.mlp.c_fc",
+    "transformer.h.30.attn.c_attn",
+    "transformer.visual.transformer.resblocks.13.mlp.c_fc",
+    "transformer.h.17.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.24.attn.in_proj",
+    "transformer.h.11.attn.c_attn",
+    "transformer.h.2.mlp.w2",
+    "transformer.visual.transformer.resblocks.8.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.0.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.2.attn.out_proj",
+    "transformer.visual.transformer.resblocks.35.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.39.attn.out_proj",
+    "transformer.h.12.attn.c_proj",
+    "transformer.visual.transformer.resblocks.28.attn.in_proj",
+    "transformer.visual.transformer.resblocks.29.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.0.attn.out_proj",
+    "transformer.visual.transformer.resblocks.23.mlp.c_proj",
+    "transformer.h.20.attn.c_attn",
+    "transformer.visual.transformer.resblocks.7.attn.out_proj",
+    "transformer.visual.transformer.resblocks.15.attn.out_proj",
+    "transformer.h.7.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.1.attn.out_proj",
+    "transformer.h.3.mlp.w2",
+    "transformer.h.9.mlp.w2",
+    "transformer.visual.transformer.resblocks.34.attn.in_proj",
+    "transformer.h.27.attn.c_attn",
+    "transformer.visual.transformer.resblocks.12.mlp.c_fc",
+    "transformer.h.6.mlp.w2",
+    "transformer.visual.transformer.resblocks.39.attn.in_proj",
+    "transformer.h.15.mlp.w2",
+    "transformer.visual.transformer.resblocks.18.mlp.c_proj",
+    "transformer.h.0.attn.c_proj",
+    "transformer.h.19.attn.c_attn",
+    "transformer.visual.transformer.resblocks.27.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.23.attn.out_proj",
+    "transformer.h.14.mlp.c_proj",
+    "transformer.h.9.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.12.attn.out_proj",
+    "transformer.visual.transformer.resblocks.0.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.5.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.28.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.6.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.22.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.37.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.17.attn.in_proj",
+    "transformer.visual.transformer.resblocks.46.attn.out_proj",
+    "transformer.h.24.mlp.w2",
+    "transformer.h.27.mlp.w1",
+    "transformer.visual.transformer.resblocks.11.attn.in_proj",
+    "transformer.visual.transformer.resblocks.4.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.21.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.26.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.15.mlp.c_fc",
+    "transformer.h.2.mlp.c_proj",
+    "transformer.h.1.mlp.c_proj",
+    "transformer.h.5.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.45.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.0.attn.in_proj",
+    "transformer.h.25.mlp.w2",
+    "transformer.h.20.attn.c_proj",
+    "transformer.h.17.attn.c_proj",
+    "transformer.visual.transformer.resblocks.1.mlp.c_fc"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1600/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80f99bd20b2a57ae180db378d4c2ad8777288d01fd71f21c7b258c2141ccd27c
+size 469105640

checkpoint-1600/latest ADDED Viewed

	@@ -0,0 +1 @@


1	+ global_step1600

checkpoint-1600/qwen.tiktoken ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1600/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa74b506d85700151c4e4c4f5c6adc63d055ed8ecb10bd6702453c61ca1d200b
+size 15920

checkpoint-1600/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dc196d48c3771157921ae2bef9abcc68219ad9aab60637928c27798c1a979dca
+size 15920

checkpoint-1600/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:94a45f5c84891190bab691174f3d23d0e4ba0525dd98afbfaa45c8a5faa2bb5e
+size 15920

checkpoint-1600/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8435d9f70042fb8d3d78a56558df657ae47801a72e408d1c47602693b6facda2
+size 15920

checkpoint-1600/rng_state_4.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4521c9dde92d304631e1dcdcb52f6d8149f69ce405bd47f0cdd43efa2d2fb5bf
+size 15920

checkpoint-1600/rng_state_5.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b9e9e23c4202da095f27b94a494e52c7f529a7b81972744f1ee768dac1b8ca5
+size 15920

checkpoint-1600/rng_state_6.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b81f03f8ec1fd50599c19d8224e60fe0ef15e2b9d856f9b1f5653703f7ad0408
+size 15920

checkpoint-1600/rng_state_7.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7fd839ce13a82530b1a2d875e0a29bcf7ca4daa14fe5a49a2fc9f255a4be0688
+size 15920

checkpoint-1600/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b629c1752a3080bde72cb93dc63770861076540b3c0bc6419645c02a824c238f
+size 1064

checkpoint-1600/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "pad_token": "<|endoftext|>"
+}

checkpoint-1600/tokenization_qwen.py ADDED Viewed

	@@ -0,0 +1,598 @@

+# Copyright (c) Alibaba Cloud.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+"""Tokenization classes for QWen."""
+import base64
+import logging
+import os
+import requests
+import unicodedata
+from typing import Collection, Dict, List, Set, Tuple, Union, Any, Callable, Optional
+import tiktoken
+import numpy as np
+from PIL import Image
+from PIL import ImageFont
+from PIL import ImageDraw
+from transformers import PreTrainedTokenizer, AddedToken
+from transformers.utils import try_to_load_from_cache
+import matplotlib.colors as mcolors
+from matplotlib.font_manager import FontProperties
+logger = logging.getLogger(__name__)
+VOCAB_FILES_NAMES = {"vocab_file": "qwen.tiktoken", "ttf": "SimSun.ttf"}
+FONT_PATH = try_to_load_from_cache("Qwen/Qwen-VL-Chat", "SimSun.ttf")
+if FONT_PATH is None:
+    if not os.path.exists("SimSun.ttf"):
+        ttf = requests.get("https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/SimSun.ttf")
+        open("SimSun.ttf", "wb").write(ttf.content)
+    FONT_PATH = "SimSun.ttf"
+PAT_STR = r"""(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+"""
+ENDOFTEXT = "<|endoftext|>"
+IMSTART = "<|im_start|>"
+IMEND = "<|im_end|>"
+# as the default behavior is changed to allow special tokens in
+# regular texts, the surface forms of special tokens need to be
+# as different as possible to minimize the impact
+EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205)))
+SPECIAL_TOKENS = (
+    ENDOFTEXT,
+    IMSTART,
+    IMEND,
+) + EXTRAS
+IMG_TOKEN_SPAN = 256
+def _load_tiktoken_bpe(tiktoken_bpe_file: str) -> Dict[bytes, int]:
+    with open(tiktoken_bpe_file, "rb") as f:
+        contents = f.read()
+    return {
+        base64.b64decode(token): int(rank)
+        for token, rank in (line.split() for line in contents.splitlines() if line)
+    }
+def _list_find(
+    input_list: List[Any],
+    candidates: Tuple[Any],
+    start: int = 0,
+):
+    for i in range(start, len(input_list)):
+        if input_list[i] in candidates:
+            return i
+    return -1
+def _replace_closed_tag(
+    input_tokens: List[Any],
+    start_tags: Union[Any, Tuple[Any]],
+    end_tags: Union[Any, Tuple[Any]],
+    inclusive_replace_func: Callable,
+    exclusive_replace_func: Callable = lambda x: x,
+):
+    if isinstance(start_tags, (str, int)):
+        start_tags = (start_tags,)
+    if isinstance(end_tags, (str, int)):
+        end_tags = (end_tags,)
+    assert len(start_tags) == len(end_tags)
+    output_tokens = []
+    end = 0
+    while True:
+        start = _list_find(input_tokens, start_tags, end)
+        if start == -1:
+            break
+        output_tokens.extend(exclusive_replace_func(input_tokens[end : start]))
+        tag_idx = start_tags.index(input_tokens[start])
+        end = _list_find(input_tokens, (end_tags[tag_idx],), start)
+        if end == -1:
+            raise ValueError("Unclosed image token")
+        output_tokens.extend(inclusive_replace_func(input_tokens[start : end + 1]))
+        end += 1
+    output_tokens.extend(exclusive_replace_func(input_tokens[end : ]))
+    return output_tokens
+class QWenTokenizer(PreTrainedTokenizer):
+    """QWen tokenizer."""
+    vocab_files_names = VOCAB_FILES_NAMES
+    def __init__(
+        self,
+        vocab_file,
+        errors="replace",
+        image_start_tag='<img>',
+        image_end_tag='</img>',
+        image_pad_tag='<imgpad>',
+        ref_start_tag='<ref>',
+        ref_end_tag='</ref>',
+        box_start_tag='<box>',
+        box_end_tag='</box>',
+        quad_start_tag='<quad>',
+        quad_end_tag='</quad>',
+        **kwargs,
+    ):
+        self.image_start_tag = image_start_tag
+        self.image_end_tag = image_end_tag
+        self.image_pad_tag = image_pad_tag
+        self.ref_start_tag = ref_start_tag
+        self.ref_end_tag = ref_end_tag
+        self.box_start_tag = box_start_tag
+        self.box_end_tag = box_end_tag
+        self.quad_start_tag = quad_start_tag
+        self.quad_end_tag = quad_end_tag
+        self.IMAGE_ST = (
+            ref_start_tag, ref_end_tag,
+            box_start_tag, box_end_tag,
+            quad_start_tag, quad_end_tag,
+            image_start_tag, image_end_tag,
+            image_pad_tag
+        )
+        super().__init__(**kwargs)
+        self.errors = errors  # how to handle errors in decoding
+        self.mergeable_ranks = _load_tiktoken_bpe(vocab_file)  # type: dict[bytes, int]
+        self.special_tokens = {
+            token: index
+            for index, token in enumerate(
+                SPECIAL_TOKENS + self.IMAGE_ST, start=len(self.mergeable_ranks)
+            )
+        }
+        self.img_start_id = self.special_tokens[self.image_start_tag]
+        self.img_end_id = self.special_tokens[self.image_end_tag]
+        self.img_pad_id = self.special_tokens[self.image_pad_tag]
+        self.ref_start_id = self.special_tokens[self.ref_start_tag]
+        self.ref_end_id = self.special_tokens[self.ref_end_tag]
+        self.box_start_id = self.special_tokens[self.box_start_tag]
+        self.box_end_id = self.special_tokens[self.box_end_tag]
+        self.quad_start_id = self.special_tokens[self.quad_start_tag]
+        self.quad_end_id = self.special_tokens[self.quad_end_tag]
+        self.image_special_tokens = set([
+            self.ref_start_id, self.ref_end_id, self.box_start_id, self.box_end_id,
+            self.quad_start_id, self.quad_end_id,
+        ])
+        enc = tiktoken.Encoding(
+            "Qwen",
+            pat_str=PAT_STR,
+            mergeable_ranks=self.mergeable_ranks,
+            special_tokens=self.special_tokens,
+        )
+        assert (
+            len(self.mergeable_ranks) + len(self.special_tokens) == enc.n_vocab
+        ), f"{len(self.mergeable_ranks) + len(self.special_tokens)} != {enc.n_vocab} in encoding"
+        self.decoder = {
+            v: k for k, v in self.mergeable_ranks.items()
+        }  # type: dict[int, bytes|str]
+        self.decoder.update({v: k for k, v in self.special_tokens.items()})
+        self.tokenizer = enc  # type: tiktoken.Encoding
+        self.eod_id = self.tokenizer.eot_token
+        self.im_start_id = self.special_tokens[IMSTART]
+        self.im_end_id = self.special_tokens[IMEND]
+    def __getstate__(self):
+        # for pickle lovers
+        state = self.__dict__.copy()
+        del state['tokenizer']
+        return state
+    def __setstate__(self, state):
+        # tokenizer is not python native; don't pass it; rebuild it
+        self.__dict__.update(state)
+        enc = tiktoken.Encoding(
+            "Qwen",
+            pat_str=PAT_STR,
+            mergeable_ranks=self.mergeable_ranks,
+            special_tokens=self.special_tokens,
+        )
+        self.tokenizer = enc
+    def __len__(self) -> int:
+        return self.tokenizer.n_vocab
+    def get_vocab(self) -> Dict[bytes, int]:
+        return self.mergeable_ranks
+    def convert_tokens_to_ids(
+        self, tokens: Union[bytes, str, List[Union[bytes, str]]]
+    ) -> List[int]:
+        ids = []
+        if isinstance(tokens, (str, bytes)):
+            if tokens in self.special_tokens:
+                return self.special_tokens[tokens]
+            else:
+                return self.mergeable_ranks.get(tokens)
+        for token in tokens:
+            if token in self.special_tokens:
+                ids.append(self.special_tokens[token])
+            else:
+                ids.append(self.mergeable_ranks.get(token))
+        return ids
+    def _add_tokens(self, new_tokens: Union[List[str], List[AddedToken]], special_tokens: bool = False) -> int:
+        if not special_tokens and new_tokens:
+            raise ValueError('Adding regular tokens is not supported')
+        for token in new_tokens:
+            surface_form = token.content if isinstance(token, AddedToken) else token
+            if surface_form not in SPECIAL_TOKENS + self.IMAGE_ST:
+                raise ValueError('Adding unknown special tokens is not supported')
+        return 0
+    def save_vocabulary(self, save_directory: str, **kwargs) -> Tuple[str]:
+        """
+        Save only the vocabulary of the tokenizer (vocabulary).
+        Returns:
+            `Tuple(str)`: Paths to the files saved.
+        """
+        file_path = os.path.join(save_directory, "qwen.tiktoken")
+        with open(file_path, "w", encoding="utf8") as w:
+            for k, v in self.mergeable_ranks.items():
+                line = base64.b64encode(k).decode("utf8") + " " + str(v) + "\n"
+                w.write(line)
+        return (file_path,)
+    def tokenize(
+        self,
+        text: str,
+        allowed_special: Union[Set, str] = "all",
+        disallowed_special: Union[Collection, str] = (),
+        **kwargs,
+    ) -> List[Union[bytes, str]]:
+        """
+        Converts a string in a sequence of tokens.
+        Args:
+            text (`str`):
+                The sequence to be encoded.
+            allowed_special (`Literal["all"]` or `set`):
+                The surface forms of the tokens to be encoded as special tokens in regular texts.
+                Default to "all".
+            disallowed_special (`Literal["all"]` or `Collection`):
+                The surface forms of the tokens that should not be in regular texts and trigger errors.
+                Default to an empty tuple.
+            kwargs (additional keyword arguments, *optional*):
+                Will be passed to the underlying model specific encode method.
+        Returns:
+            `List[bytes|str]`: The list of tokens.
+        """
+        tokens = []
+        text = unicodedata.normalize("NFC", text)
+        # this implementation takes a detour: text -> token id -> token surface forms
+        for t in self.tokenizer.encode(
+            text, allowed_special=allowed_special, disallowed_special=disallowed_special
+        ):
+            tokens.append(self.decoder[t])
+        def _encode_imgurl(img_tokens):
+            assert img_tokens[0] == self.image_start_tag and img_tokens[-1] == self.image_end_tag
+            img_tokens = img_tokens[1:-1]
+            img_url = b''.join(img_tokens)
+            out_img_tokens = list(map(self.decoder.get, img_url))
+            if len(out_img_tokens) > IMG_TOKEN_SPAN:
+                raise ValueError("The content in {}..{} is too long".format(
+                    self.image_start_tag, self.image_end_tag))
+            out_img_tokens.extend([self.image_pad_tag] * (IMG_TOKEN_SPAN - len(out_img_tokens)))
+            out_img_tokens = [self.image_start_tag] + out_img_tokens + [self.image_end_tag]
+            return out_img_tokens
+        return _replace_closed_tag(tokens, self.image_start_tag, self.image_end_tag, _encode_imgurl)
+    def convert_tokens_to_string(self, tokens: List[Union[bytes, str]]) -> str:
+        """
+        Converts a sequence of tokens in a single string.
+        """
+        text = ""
+        temp = b""
+        for t in tokens:
+            if isinstance(t, str):
+                if temp:
+                    text += temp.decode("utf-8", errors=self.errors)
+                    temp = b""
+                text += t
+            elif isinstance(t, bytes):
+                temp += t
+            else:
+                raise TypeError("token should only be of type types or str")
+        if temp:
+            text += temp.decode("utf-8", errors=self.errors)
+        return text
+    @property
+    def vocab_size(self):
+        return self.tokenizer.n_vocab
+    def _convert_id_to_token(self, index: int) -> Union[bytes, str]:
+        """Converts an id to a token, special tokens included"""
+        if index in self.decoder:
+            return self.decoder[index]
+        raise ValueError("unknown ids")
+    def _convert_token_to_id(self, token: Union[bytes, str]) -> int:
+        """Converts a token to an id using the vocab, special tokens included"""
+        if token in self.special_tokens:
+            return self.special_tokens[token]
+        if token in self.mergeable_ranks:
+            return self.mergeable_ranks[token]
+        raise ValueError("unknown token")
+    def _tokenize(self, text: str, **kwargs):
+        """
+        Converts a string in a sequence of tokens (string), using the tokenizer. Split in words for word-based
+        vocabulary or sub-words for sub-word-based vocabularies (BPE/SentencePieces/WordPieces).
+        Do NOT take care of added tokens.
+        """
+        raise NotImplementedError
+    def _decode(
+        self,
+        token_ids: Union[int, List[int]],
+        skip_special_tokens: bool = False,
+        errors: str = None,
+        **kwargs,
+    ) -> str:
+        if isinstance(token_ids, int):
+            token_ids = [token_ids]
+        def _decode_imgurl(img_token_ids):
+            assert img_token_ids[0] == self.img_start_id and img_token_ids[-1] == self.img_end_id
+            img_token_ids = img_token_ids[1:-1]
+            img_token_ids = img_token_ids[ : img_token_ids.index(self.img_pad_id)]
+            img_url = bytes(img_token_ids).decode('utf-8')
+            return [self.img_start_id] + self.tokenizer.encode(img_url) + [self.img_end_id]
+        token_ids = _replace_closed_tag(token_ids, self.img_start_id, self.img_end_id, _decode_imgurl)
+        if skip_special_tokens:
+            if kwargs.get('keep_image_special', False):
+                token_ids = [i for i in token_ids if i < self.eod_id
+                    or i in self.image_special_tokens]
+            else:
+                token_ids = [i for i in token_ids if i < self.eod_id]
+        return self.tokenizer.decode(token_ids, errors=errors or self.errors)
+    def to_list_format(self, text: str):
+        text = unicodedata.normalize("NFC", text)
+        token_ids = self.tokenizer.encode(
+            text, allowed_special=set(self.IMAGE_ST + (ENDOFTEXT,)))
+        def _encode_vl_info(tokens):
+            if len(tokens) == 0:
+                return []
+            if tokens[0] == self.img_start_id and tokens[-1] == self.img_end_id:
+                key = 'image'
+            elif tokens[0] == self.ref_start_id and tokens[-1] == self.ref_end_id:
+                key = 'ref'
+            elif tokens[0] == self.box_start_id and tokens[-1] == self.box_end_id:
+                key = 'box'
+            elif tokens[0] == self.quad_start_id and tokens[-1] == self.quad_end_id:
+                key = 'quad'
+            else:
+                _tobytes = lambda x: x.encode('utf-8') if isinstance(x, str) else x
+                return [{'text': b''.join(map(_tobytes, map(self.decoder.get, tokens))).decode('utf-8')}]
+            _tobytes = lambda x: x.encode('utf-8') if isinstance(x, str) else x
+            val = b''.join(map(_tobytes, map(self.decoder.get, tokens[1:-1]))).decode('utf-8')
+            return [{key: val}]
+        return _replace_closed_tag(
+            token_ids,
+            (self.img_start_id, self.ref_start_id, self.box_start_id, self.quad_start_id),
+            (self.img_end_id, self.ref_end_id, self.box_end_id, self.quad_end_id),
+            _encode_vl_info,
+            _encode_vl_info,
+        )
+    def from_list_format(self, list_format: List[Dict]):
+        text = ''
+        num_images = 0
+        for ele in list_format:
+            if 'image' in ele:
+                num_images += 1
+                text += f'Picture {num_images}: '
+                text += self.image_start_tag + ele['image'] + self.image_end_tag
+                text += '\n'
+            elif 'text' in ele:
+                text += ele['text']
+            elif 'box' in ele:
+                if 'ref' in ele:
+                    text += self.ref_start_tag + ele['ref'] + self.ref_end_tag
+                for box in ele['box']:
+                    text += self.box_start_tag + '(%d,%d),(%d,%d)' % (box[0], box[1], box[2], box[3]) + self.box_end_tag
+            else:
+                raise ValueError("Unsupport element: " + str(ele))
+        return text
+    def _fetch_latest_picture(self, response, history):
+        if history is None:
+            history = []
+        _history = history + [(response, None)]
+        for q, r in _history[::-1]:
+            for ele in self.to_list_format(q)[::-1]:
+                if 'image' in ele:
+                    return ele['image']
+        return None
+    def _fetch_all_box_with_ref(self, text):
+        list_format = self.to_list_format(text)
+        output = []
+        for i, ele in enumerate(list_format):
+            if 'box' in ele:
+                bbox = tuple(map(int, ele['box'].replace('(', '').replace(')', '').split(',')))
+                assert len(bbox) == 4
+                output.append({'box': bbox})
+                if i > 0 and 'ref' in list_format[i-1]:
+                    output[-1]['ref'] = list_format[i-1]['ref'].strip()
+        return output
+    def draw_bbox_on_latest_picture(
+        self,
+        response,
+        history=None,
+    ) -> Optional[Image.Image]:
+        image = self._fetch_latest_picture(response, history)
+        if image is None:
+            return None
+        if image.startswith("http://") or image.startswith("https://"):
+            image = Image.open(requests.get(image, stream=True).raw).convert("RGB")
+            h, w = image.height, image.width
+        else:
+            image = np.asarray(Image.open(image).convert("RGB"))
+            h, w = image.shape[0], image.shape[1]
+        visualizer = Visualizer(image)
+        boxes = self._fetch_all_box_with_ref(response)
+        if not boxes:
+            return None
+        color = random.choice([_ for _ in mcolors.TABLEAU_COLORS.keys()]) # init color
+        for box in boxes:
+            if 'ref' in box: # random new color for new refexps
+                color = random.choice([_ for _ in mcolors.TABLEAU_COLORS.keys()])
+            x1, y1, x2, y2 = box['box']
+            x1, y1, x2, y2 = (int(x1 / 1000 * w), int(y1 / 1000 * h), int(x2 / 1000 * w), int(y2 / 1000 * h))
+            visualizer.draw_box((x1, y1, x2, y2), alpha=1, edge_color=color)
+            if 'ref' in box:
+                visualizer.draw_text(box['ref'], (x1, y1), color=color, horizontal_alignment="left")
+        return visualizer.output
+import colorsys
+import logging
+import math
+import numpy as np
+import matplotlib as mpl
+import matplotlib.colors as mplc
+import matplotlib.figure as mplfigure
+import torch
+from matplotlib.backends.backend_agg import FigureCanvasAgg
+from PIL import Image
+import random
+logger = logging.getLogger(__name__)
+class VisImage:
+    def __init__(self, img, scale=1.0):
+        self.img = img
+        self.scale = scale
+        self.width, self.height = img.shape[1], img.shape[0]
+        self._setup_figure(img)
+    def _setup_figure(self, img):
+        fig = mplfigure.Figure(frameon=False)
+        self.dpi = fig.get_dpi()
+        # add a small 1e-2 to avoid precision lost due to matplotlib's truncation
+        # (https://github.com/matplotlib/matplotlib/issues/15363)
+        fig.set_size_inches(
+            (self.width * self.scale + 1e-2) / self.dpi,
+            (self.height * self.scale + 1e-2) / self.dpi,
+        )
+        self.canvas = FigureCanvasAgg(fig)
+        # self.canvas = mpl.backends.backend_cairo.FigureCanvasCairo(fig)
+        ax = fig.add_axes([0.0, 0.0, 1.0, 1.0])
+        ax.axis("off")
+        self.fig = fig
+        self.ax = ax
+        self.reset_image(img)
+    def reset_image(self, img):
+        img = img.astype("uint8")
+        self.ax.imshow(img, extent=(0, self.width, self.height, 0), interpolation="nearest")
+    def save(self, filepath):
+        self.fig.savefig(filepath)
+    def get_image(self):
+        canvas = self.canvas
+        s, (width, height) = canvas.print_to_buffer()
+        buffer = np.frombuffer(s, dtype="uint8")
+        img_rgba = buffer.reshape(height, width, 4)
+        rgb, alpha = np.split(img_rgba, [3], axis=2)
+        return rgb.astype("uint8")
+class Visualizer:
+    def __init__(self, img_rgb, metadata=None, scale=1.0):
+        self.img = np.asarray(img_rgb).clip(0, 255).astype(np.uint8)
+        self.font_path = FONT_PATH
+        self.output = VisImage(self.img, scale=scale)
+        self.cpu_device = torch.device("cpu")
+        # too small texts are useless, therefore clamp to 14
+        self._default_font_size = max(
+            np.sqrt(self.output.height * self.output.width) // 30, 15 // scale
+        )
+    def draw_text(
+        self,
+        text,
+        position,
+        *,
+        font_size=None,
+        color="g",
+        horizontal_alignment="center",
+        rotation=0,
+    ):
+        if not font_size:
+            font_size = self._default_font_size
+        # since the text background is dark, we don't want the text to be dark
+        color = np.maximum(list(mplc.to_rgb(color)), 0.2)
+        color[np.argmax(color)] = max(0.8, np.max(color))
+        x, y = position
+        self.output.ax.text(
+            x,
+            y,
+            text,
+            size=font_size * self.output.scale,
+            fontproperties=FontProperties(fname=self.font_path),
+            bbox={"facecolor": "black", "alpha": 0.8, "pad": 0.7, "edgecolor": "none"},
+            verticalalignment="top",
+            horizontalalignment=horizontal_alignment,
+            color=color,
+            zorder=10,
+            rotation=rotation,
+        )
+        return self.output
+    def draw_box(self, box_coord, alpha=0.5, edge_color="g", line_style="-"):
+        x0, y0, x1, y1 = box_coord
+        width = x1 - x0
+        height = y1 - y0
+        linewidth = max(self._default_font_size / 4, 1)
+        self.output.ax.add_patch(
+            mpl.patches.Rectangle(
+                (x0, y0),
+                width,
+                height,
+                fill=False,
+                edgecolor=edge_color,
+                linewidth=linewidth * self.output.scale,
+                alpha=alpha,
+                linestyle=line_style,
+            )
+        )
+        return self.output
+    def get_output(self):
+        return self.output

checkpoint-1600/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "added_tokens_decoder": {},
+  "auto_map": {
+    "AutoTokenizer": [
+      "Qwen/Qwen-VL-Chat--tokenization_qwen.QWenTokenizer",
+      null
+    ]
+  },
+  "clean_up_tokenization_spaces": true,
+  "model_max_length": 768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "tokenizer_class": "QWenTokenizer"
+}

checkpoint-1600/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1153 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.10441819487045617,
+  "eval_steps": 500,
+  "global_step": 1600,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0006526137179403511,
+      "grad_norm": 17.690582114691438,
+      "learning_rate": 1.948051948051948e-06,
+      "loss": 1.3559,
+      "step": 10
+    },
+    {
+      "epoch": 0.0013052274358807021,
+      "grad_norm": 7.768088366444893,
+      "learning_rate": 3.896103896103896e-06,
+      "loss": 1.2706,
+      "step": 20
+    },
+    {
+      "epoch": 0.001957841153821053,
+      "grad_norm": 7.705313536090087,
+      "learning_rate": 5.844155844155845e-06,
+      "loss": 1.3781,
+      "step": 30
+    },
+    {
+      "epoch": 0.0026104548717614043,
+      "grad_norm": 34.39078827766783,
+      "learning_rate": 7.792207792207792e-06,
+      "loss": 1.2749,
+      "step": 40
+    },
+    {
+      "epoch": 0.0032630685897017554,
+      "grad_norm": 68.28824334896528,
+      "learning_rate": 9.74025974025974e-06,
+      "loss": 1.2955,
+      "step": 50
+    },
+    {
+      "epoch": 0.003915682307642106,
+      "grad_norm": 14.220322607917241,
+      "learning_rate": 1.168831168831169e-05,
+      "loss": 1.2315,
+      "step": 60
+    },
+    {
+      "epoch": 0.0045682960255824575,
+      "grad_norm": 12.611848231734811,
+      "learning_rate": 1.3636363636363637e-05,
+      "loss": 1.0953,
+      "step": 70
+    },
+    {
+      "epoch": 0.0052209097435228086,
+      "grad_norm": 6.055664298727015,
+      "learning_rate": 1.5584415584415583e-05,
+      "loss": 1.105,
+      "step": 80
+    },
+    {
+      "epoch": 0.00587352346146316,
+      "grad_norm": 3.52269227801977,
+      "learning_rate": 1.753246753246753e-05,
+      "loss": 0.9563,
+      "step": 90
+    },
+    {
+      "epoch": 0.006526137179403511,
+      "grad_norm": 10.771884023354394,
+      "learning_rate": 1.948051948051948e-05,
+      "loss": 0.9523,
+      "step": 100
+    },
+    {
+      "epoch": 0.007178750897343862,
+      "grad_norm": 33.41476483216757,
+      "learning_rate": 2.1428571428571428e-05,
+      "loss": 0.832,
+      "step": 110
+    },
+    {
+      "epoch": 0.007831364615284213,
+      "grad_norm": 31.120240364617406,
+      "learning_rate": 2.337662337662338e-05,
+      "loss": 0.8376,
+      "step": 120
+    },
+    {
+      "epoch": 0.008483978333224564,
+      "grad_norm": 5.517231564060886,
+      "learning_rate": 2.5324675324675325e-05,
+      "loss": 0.8293,
+      "step": 130
+    },
+    {
+      "epoch": 0.009136592051164915,
+      "grad_norm": 4.311605388342058,
+      "learning_rate": 2.7272727272727273e-05,
+      "loss": 0.8295,
+      "step": 140
+    },
+    {
+      "epoch": 0.009789205769105266,
+      "grad_norm": 6.997724163121519,
+      "learning_rate": 2.922077922077922e-05,
+      "loss": 0.7662,
+      "step": 150
+    },
+    {
+      "epoch": 0.010441819487045617,
+      "grad_norm": 6.517836234400708,
+      "learning_rate": 2.999998841890695e-05,
+      "loss": 0.8158,
+      "step": 160
+    },
+    {
+      "epoch": 0.011094433204985968,
+      "grad_norm": 4.186989141019666,
+      "learning_rate": 2.99999176456253e-05,
+      "loss": 0.8037,
+      "step": 170
+    },
+    {
+      "epoch": 0.01174704692292632,
+      "grad_norm": 5.181546943355458,
+      "learning_rate": 2.9999782533305785e-05,
+      "loss": 0.7274,
+      "step": 180
+    },
+    {
+      "epoch": 0.01239966064086667,
+      "grad_norm": 3.767076521211455,
+      "learning_rate": 2.9999583082527935e-05,
+      "loss": 0.7474,
+      "step": 190
+    },
+    {
+      "epoch": 0.013052274358807021,
+      "grad_norm": 18.84416377940188,
+      "learning_rate": 2.999931929414726e-05,
+      "loss": 0.7708,
+      "step": 200
+    },
+    {
+      "epoch": 0.013704888076747372,
+      "grad_norm": 3.169160630444992,
+      "learning_rate": 2.999899116929522e-05,
+      "loss": 0.8279,
+      "step": 210
+    },
+    {
+      "epoch": 0.014357501794687724,
+      "grad_norm": 1.912782077307437,
+      "learning_rate": 2.999859870937924e-05,
+      "loss": 0.7407,
+      "step": 220
+    },
+    {
+      "epoch": 0.015010115512628075,
+      "grad_norm": 3.3906505952914974,
+      "learning_rate": 2.9998141916082696e-05,
+      "loss": 0.7732,
+      "step": 230
+    },
+    {
+      "epoch": 0.015662729230568426,
+      "grad_norm": 2.7144492322383584,
+      "learning_rate": 2.999762079136491e-05,
+      "loss": 0.7272,
+      "step": 240
+    },
+    {
+      "epoch": 0.01631534294850878,
+      "grad_norm": 7.109330196029837,
+      "learning_rate": 2.9997035337461135e-05,
+      "loss": 0.7748,
+      "step": 250
+    },
+    {
+      "epoch": 0.016967956666449128,
+      "grad_norm": 1.6054280593801813,
+      "learning_rate": 2.9996385556882555e-05,
+      "loss": 0.7676,
+      "step": 260
+    },
+    {
+      "epoch": 0.01762057038438948,
+      "grad_norm": 10.883212441614672,
+      "learning_rate": 2.9995671452416274e-05,
+      "loss": 0.735,
+      "step": 270
+    },
+    {
+      "epoch": 0.01827318410232983,
+      "grad_norm": 3.511064886507805,
+      "learning_rate": 2.999489302712529e-05,
+      "loss": 0.7741,
+      "step": 280
+    },
+    {
+      "epoch": 0.018925797820270183,
+      "grad_norm": 3.618603818375307,
+      "learning_rate": 2.9994050284348497e-05,
+      "loss": 0.749,
+      "step": 290
+    },
+    {
+      "epoch": 0.019578411538210532,
+      "grad_norm": 6.012944880342178,
+      "learning_rate": 2.9993143227700668e-05,
+      "loss": 0.7411,
+      "step": 300
+    },
+    {
+      "epoch": 0.020231025256150885,
+      "grad_norm": 2.348670372295822,
+      "learning_rate": 2.9992171861072428e-05,
+      "loss": 0.7394,
+      "step": 310
+    },
+    {
+      "epoch": 0.020883638974091234,
+      "grad_norm": 4.728309497649916,
+      "learning_rate": 2.9991136188630263e-05,
+      "loss": 0.8077,
+      "step": 320
+    },
+    {
+      "epoch": 0.021536252692031587,
+      "grad_norm": 15.611917863290122,
+      "learning_rate": 2.9990036214816467e-05,
+      "loss": 0.7209,
+      "step": 330
+    },
+    {
+      "epoch": 0.022188866409971936,
+      "grad_norm": 3.7315277354070817,
+      "learning_rate": 2.998887194434916e-05,
+      "loss": 0.7101,
+      "step": 340
+    },
+    {
+      "epoch": 0.02284148012791229,
+      "grad_norm": 6.618759094750745,
+      "learning_rate": 2.998764338222222e-05,
+      "loss": 0.7759,
+      "step": 350
+    },
+    {
+      "epoch": 0.02349409384585264,
+      "grad_norm": 6.770044306239603,
+      "learning_rate": 2.998635053370533e-05,
+      "loss": 0.7398,
+      "step": 360
+    },
+    {
+      "epoch": 0.02414670756379299,
+      "grad_norm": 12.471224202357552,
+      "learning_rate": 2.998499340434389e-05,
+      "loss": 0.7046,
+      "step": 370
+    },
+    {
+      "epoch": 0.02479932128173334,
+      "grad_norm": 4.147359416986547,
+      "learning_rate": 2.9983571999959013e-05,
+      "loss": 0.761,
+      "step": 380
+    },
+    {
+      "epoch": 0.025451934999673693,
+      "grad_norm": 34.84722866603778,
+      "learning_rate": 2.9982086326647533e-05,
+      "loss": 0.757,
+      "step": 390
+    },
+    {
+      "epoch": 0.026104548717614043,
+      "grad_norm": 5.245498180313093,
+      "learning_rate": 2.998053639078193e-05,
+      "loss": 0.7536,
+      "step": 400
+    },
+    {
+      "epoch": 0.026757162435554396,
+      "grad_norm": 36.55990241841121,
+      "learning_rate": 2.997892219901034e-05,
+      "loss": 0.7395,
+      "step": 410
+    },
+    {
+      "epoch": 0.027409776153494745,
+      "grad_norm": 5.03198653806696,
+      "learning_rate": 2.9977243758256494e-05,
+      "loss": 0.7208,
+      "step": 420
+    },
+    {
+      "epoch": 0.028062389871435098,
+      "grad_norm": 11.376914733036081,
+      "learning_rate": 2.997550107571972e-05,
+      "loss": 0.719,
+      "step": 430
+    },
+    {
+      "epoch": 0.028715003589375447,
+      "grad_norm": 2.958119684662306,
+      "learning_rate": 2.9973694158874898e-05,
+      "loss": 0.7271,
+      "step": 440
+    },
+    {
+      "epoch": 0.0293676173073158,
+      "grad_norm": 6.037096737490817,
+      "learning_rate": 2.9971823015472418e-05,
+      "loss": 0.7356,
+      "step": 450
+    },
+    {
+      "epoch": 0.03002023102525615,
+      "grad_norm": 5.3042973640363575,
+      "learning_rate": 2.9969887653538164e-05,
+      "loss": 0.7207,
+      "step": 460
+    },
+    {
+      "epoch": 0.030672844743196502,
+      "grad_norm": 2.4985603001745624,
+      "learning_rate": 2.996788808137347e-05,
+      "loss": 0.7769,
+      "step": 470
+    },
+    {
+      "epoch": 0.03132545846113685,
+      "grad_norm": 7.607065841315647,
+      "learning_rate": 2.9965824307555084e-05,
+      "loss": 0.7091,
+      "step": 480
+    },
+    {
+      "epoch": 0.03197807217907721,
+      "grad_norm": 4.322533035107957,
+      "learning_rate": 2.9963696340935144e-05,
+      "loss": 0.7114,
+      "step": 490
+    },
+    {
+      "epoch": 0.03263068589701756,
+      "grad_norm": 5.878565903250334,
+      "learning_rate": 2.9961504190641108e-05,
+      "loss": 0.7284,
+      "step": 500
+    },
+    {
+      "epoch": 0.033283299614957906,
+      "grad_norm": 5.0026507027119855,
+      "learning_rate": 2.9959247866075764e-05,
+      "loss": 0.6992,
+      "step": 510
+    },
+    {
+      "epoch": 0.033935913332898256,
+      "grad_norm": 7.12632150273901,
+      "learning_rate": 2.9956927376917137e-05,
+      "loss": 0.7285,
+      "step": 520
+    },
+    {
+      "epoch": 0.03458852705083861,
+      "grad_norm": 5.211123255860348,
+      "learning_rate": 2.9954542733118496e-05,
+      "loss": 0.7511,
+      "step": 530
+    },
+    {
+      "epoch": 0.03524114076877896,
+      "grad_norm": 9.925273547498618,
+      "learning_rate": 2.995209394490827e-05,
+      "loss": 0.7699,
+      "step": 540
+    },
+    {
+      "epoch": 0.03589375448671931,
+      "grad_norm": 7.418381681996765,
+      "learning_rate": 2.9949581022790025e-05,
+      "loss": 0.759,
+      "step": 550
+    },
+    {
+      "epoch": 0.03654636820465966,
+      "grad_norm": 4.352380973507467,
+      "learning_rate": 2.9947003977542423e-05,
+      "loss": 0.7537,
+      "step": 560
+    },
+    {
+      "epoch": 0.037198981922600016,
+      "grad_norm": 9.712842120769198,
+      "learning_rate": 2.9944362820219167e-05,
+      "loss": 0.7063,
+      "step": 570
+    },
+    {
+      "epoch": 0.037851595640540366,
+      "grad_norm": 5.757600819230482,
+      "learning_rate": 2.994165756214895e-05,
+      "loss": 0.7893,
+      "step": 580
+    },
+    {
+      "epoch": 0.038504209358480715,
+      "grad_norm": 5.529209601152462,
+      "learning_rate": 2.9938888214935426e-05,
+      "loss": 0.6771,
+      "step": 590
+    },
+    {
+      "epoch": 0.039156823076421064,
+      "grad_norm": 10.550479346499758,
+      "learning_rate": 2.9936054790457127e-05,
+      "loss": 0.737,
+      "step": 600
+    },
+    {
+      "epoch": 0.03980943679436142,
+      "grad_norm": 8.284279553451016,
+      "learning_rate": 2.9933157300867437e-05,
+      "loss": 0.7182,
+      "step": 610
+    },
+    {
+      "epoch": 0.04046205051230177,
+      "grad_norm": 8.18511648646326,
+      "learning_rate": 2.9930195758594542e-05,
+      "loss": 0.6901,
+      "step": 620
+    },
+    {
+      "epoch": 0.04111466423024212,
+      "grad_norm": 14.569754827631956,
+      "learning_rate": 2.9927170176341365e-05,
+      "loss": 0.7008,
+      "step": 630
+    },
+    {
+      "epoch": 0.04176727794818247,
+      "grad_norm": 4.214581273685441,
+      "learning_rate": 2.992408056708551e-05,
+      "loss": 0.7489,
+      "step": 640
+    },
+    {
+      "epoch": 0.042419891666122825,
+      "grad_norm": 10.038596627079452,
+      "learning_rate": 2.9920926944079224e-05,
+      "loss": 0.7649,
+      "step": 650
+    },
+    {
+      "epoch": 0.043072505384063174,
+      "grad_norm": 2.386544029221306,
+      "learning_rate": 2.9917709320849305e-05,
+      "loss": 0.7223,
+      "step": 660
+    },
+    {
+      "epoch": 0.043725119102003523,
+      "grad_norm": 8.286359254511249,
+      "learning_rate": 2.9914427711197096e-05,
+      "loss": 0.7089,
+      "step": 670
+    },
+    {
+      "epoch": 0.04437773281994387,
+      "grad_norm": 4.235819327444911,
+      "learning_rate": 2.9911082129198372e-05,
+      "loss": 0.7138,
+      "step": 680
+    },
+    {
+      "epoch": 0.04503034653788423,
+      "grad_norm": 5.187338033698449,
+      "learning_rate": 2.9907672589203316e-05,
+      "loss": 0.7192,
+      "step": 690
+    },
+    {
+      "epoch": 0.04568296025582458,
+      "grad_norm": 6.360475337181379,
+      "learning_rate": 2.9904199105836443e-05,
+      "loss": 0.7094,
+      "step": 700
+    },
+    {
+      "epoch": 0.04633557397376493,
+      "grad_norm": 4.906400836156689,
+      "learning_rate": 2.990066169399654e-05,
+      "loss": 0.654,
+      "step": 710
+    },
+    {
+      "epoch": 0.04698818769170528,
+      "grad_norm": 17.600495314130633,
+      "learning_rate": 2.9897060368856603e-05,
+      "loss": 0.7299,
+      "step": 720
+    },
+    {
+      "epoch": 0.04764080140964563,
+      "grad_norm": 7.765935941492389,
+      "learning_rate": 2.989339514586377e-05,
+      "loss": 0.7486,
+      "step": 730
+    },
+    {
+      "epoch": 0.04829341512758598,
+      "grad_norm": 7.30026395137639,
+      "learning_rate": 2.9889666040739252e-05,
+      "loss": 0.6941,
+      "step": 740
+    },
+    {
+      "epoch": 0.04894602884552633,
+      "grad_norm": 4.676985481218465,
+      "learning_rate": 2.9885873069478275e-05,
+      "loss": 0.7701,
+      "step": 750
+    },
+    {
+      "epoch": 0.04959864256346668,
+      "grad_norm": 42.50656974727186,
+      "learning_rate": 2.9882016248350006e-05,
+      "loss": 0.7428,
+      "step": 760
+    },
+    {
+      "epoch": 0.05025125628140704,
+      "grad_norm": 3.9893667031114766,
+      "learning_rate": 2.9878095593897474e-05,
+      "loss": 0.7204,
+      "step": 770
+    },
+    {
+      "epoch": 0.05090386999934739,
+      "grad_norm": 8.909028486553332,
+      "learning_rate": 2.9874111122937518e-05,
+      "loss": 0.7336,
+      "step": 780
+    },
+    {
+      "epoch": 0.051556483717287736,
+      "grad_norm": 5.256925284136456,
+      "learning_rate": 2.9870062852560698e-05,
+      "loss": 0.7674,
+      "step": 790
+    },
+    {
+      "epoch": 0.052209097435228086,
+      "grad_norm": 5.835535487534073,
+      "learning_rate": 2.986595080013123e-05,
+      "loss": 0.7547,
+      "step": 800
+    },
+    {
+      "epoch": 0.05286171115316844,
+      "grad_norm": 4.7337998648314565,
+      "learning_rate": 2.9861774983286913e-05,
+      "loss": 0.7412,
+      "step": 810
+    },
+    {
+      "epoch": 0.05351432487110879,
+      "grad_norm": 4.020304406250962,
+      "learning_rate": 2.9857535419939053e-05,
+      "loss": 0.7351,
+      "step": 820
+    },
+    {
+      "epoch": 0.05416693858904914,
+      "grad_norm": 7.005748568175158,
+      "learning_rate": 2.9853232128272367e-05,
+      "loss": 0.7146,
+      "step": 830
+    },
+    {
+      "epoch": 0.05481955230698949,
+      "grad_norm": 12.598315147497464,
+      "learning_rate": 2.984886512674494e-05,
+      "loss": 0.7066,
+      "step": 840
+    },
+    {
+      "epoch": 0.055472166024929846,
+      "grad_norm": 5.636755294839953,
+      "learning_rate": 2.9844434434088114e-05,
+      "loss": 0.8033,
+      "step": 850
+    },
+    {
+      "epoch": 0.056124779742870196,
+      "grad_norm": 2.5964949457129305,
+      "learning_rate": 2.9839940069306436e-05,
+      "loss": 0.718,
+      "step": 860
+    },
+    {
+      "epoch": 0.056777393460810545,
+      "grad_norm": 5.496060434333994,
+      "learning_rate": 2.9835382051677548e-05,
+      "loss": 0.7382,
+      "step": 870
+    },
+    {
+      "epoch": 0.057430007178750894,
+      "grad_norm": 3.367511777906771,
+      "learning_rate": 2.9830760400752117e-05,
+      "loss": 0.7049,
+      "step": 880
+    },
+    {
+      "epoch": 0.05808262089669125,
+      "grad_norm": 12.228282751386294,
+      "learning_rate": 2.9826075136353762e-05,
+      "loss": 0.7135,
+      "step": 890
+    },
+    {
+      "epoch": 0.0587352346146316,
+      "grad_norm": 7.426066867205744,
+      "learning_rate": 2.9821326278578955e-05,
+      "loss": 0.6966,
+      "step": 900
+    },
+    {
+      "epoch": 0.05938784833257195,
+      "grad_norm": 5.720080945169142,
+      "learning_rate": 2.981651384779693e-05,
+      "loss": 0.7325,
+      "step": 910
+    },
+    {
+      "epoch": 0.0600404620505123,
+      "grad_norm": 3.3362738196336275,
+      "learning_rate": 2.9811637864649622e-05,
+      "loss": 0.7013,
+      "step": 920
+    },
+    {
+      "epoch": 0.060693075768452655,
+      "grad_norm": 5.5481143050516675,
+      "learning_rate": 2.980669835005154e-05,
+      "loss": 0.7107,
+      "step": 930
+    },
+    {
+      "epoch": 0.061345689486393004,
+      "grad_norm": 2.7247889305754533,
+      "learning_rate": 2.980169532518971e-05,
+      "loss": 0.6839,
+      "step": 940
+    },
+    {
+      "epoch": 0.06199830320433335,
+      "grad_norm": 12.705144630158374,
+      "learning_rate": 2.9796628811523576e-05,
+      "loss": 0.7061,
+      "step": 950
+    },
+    {
+      "epoch": 0.0626509169222737,
+      "grad_norm": 3.1174966376805777,
+      "learning_rate": 2.9791498830784896e-05,
+      "loss": 0.706,
+      "step": 960
+    },
+    {
+      "epoch": 0.06330353064021406,
+      "grad_norm": 6.454819870022971,
+      "learning_rate": 2.9786305404977657e-05,
+      "loss": 0.6901,
+      "step": 970
+    },
+    {
+      "epoch": 0.06395614435815442,
+      "grad_norm": 8.62099817289566,
+      "learning_rate": 2.9781048556377982e-05,
+      "loss": 0.6737,
+      "step": 980
+    },
+    {
+      "epoch": 0.06460875807609476,
+      "grad_norm": 12.649532843245389,
+      "learning_rate": 2.977572830753404e-05,
+      "loss": 0.6777,
+      "step": 990
+    },
+    {
+      "epoch": 0.06526137179403511,
+      "grad_norm": 5.019508830810828,
+      "learning_rate": 2.9770344681265925e-05,
+      "loss": 0.7125,
+      "step": 1000
+    },
+    {
+      "epoch": 0.06591398551197546,
+      "grad_norm": 5.417114630539967,
+      "learning_rate": 2.9764897700665595e-05,
+      "loss": 0.7558,
+      "step": 1010
+    },
+    {
+      "epoch": 0.06656659922991581,
+      "grad_norm": 13.487574757960102,
+      "learning_rate": 2.975938738909674e-05,
+      "loss": 0.7305,
+      "step": 1020
+    },
+    {
+      "epoch": 0.06721921294785617,
+      "grad_norm": 4.115297871929447,
+      "learning_rate": 2.97538137701947e-05,
+      "loss": 0.7382,
+      "step": 1030
+    },
+    {
+      "epoch": 0.06787182666579651,
+      "grad_norm": 4.218133725965425,
+      "learning_rate": 2.974817686786636e-05,
+      "loss": 0.7131,
+      "step": 1040
+    },
+    {
+      "epoch": 0.06852444038373687,
+      "grad_norm": 23.754945260227526,
+      "learning_rate": 2.9742476706290044e-05,
+      "loss": 0.6854,
+      "step": 1050
+    },
+    {
+      "epoch": 0.06917705410167722,
+      "grad_norm": 9.992382581534882,
+      "learning_rate": 2.973671330991541e-05,
+      "loss": 0.7224,
+      "step": 1060
+    },
+    {
+      "epoch": 0.06982966781961757,
+      "grad_norm": 9.022842665053004,
+      "learning_rate": 2.973088670346336e-05,
+      "loss": 0.69,
+      "step": 1070
+    },
+    {
+      "epoch": 0.07048228153755792,
+      "grad_norm": 7.180693480173149,
+      "learning_rate": 2.97249969119259e-05,
+      "loss": 0.6752,
+      "step": 1080
+    },
+    {
+      "epoch": 0.07113489525549826,
+      "grad_norm": 4.631581340679664,
+      "learning_rate": 2.9719043960566088e-05,
+      "loss": 0.7078,
+      "step": 1090
+    },
+    {
+      "epoch": 0.07178750897343862,
+      "grad_norm": 3.8365551360021497,
+      "learning_rate": 2.9713027874917867e-05,
+      "loss": 0.7455,
+      "step": 1100
+    },
+    {
+      "epoch": 0.07244012269137898,
+      "grad_norm": 20.612721990589407,
+      "learning_rate": 2.9706948680785984e-05,
+      "loss": 0.7123,
+      "step": 1110
+    },
+    {
+      "epoch": 0.07309273640931932,
+      "grad_norm": 8.515913036269723,
+      "learning_rate": 2.9700806404245893e-05,
+      "loss": 0.6755,
+      "step": 1120
+    },
+    {
+      "epoch": 0.07374535012725968,
+      "grad_norm": 8.702591994450561,
+      "learning_rate": 2.9694601071643607e-05,
+      "loss": 0.743,
+      "step": 1130
+    },
+    {
+      "epoch": 0.07439796384520003,
+      "grad_norm": 20.204623397644042,
+      "learning_rate": 2.968833270959562e-05,
+      "loss": 0.6995,
+      "step": 1140
+    },
+    {
+      "epoch": 0.07505057756314037,
+      "grad_norm": 3.4150625200259563,
+      "learning_rate": 2.9682001344988768e-05,
+      "loss": 0.7245,
+      "step": 1150
+    },
+    {
+      "epoch": 0.07570319128108073,
+      "grad_norm": 4.827412673105033,
+      "learning_rate": 2.967560700498013e-05,
+      "loss": 0.6764,
+      "step": 1160
+    },
+    {
+      "epoch": 0.07635580499902107,
+      "grad_norm": 5.9778449783108965,
+      "learning_rate": 2.9669149716996897e-05,
+      "loss": 0.7094,
+      "step": 1170
+    },
+    {
+      "epoch": 0.07700841871696143,
+      "grad_norm": 4.626419468156439,
+      "learning_rate": 2.9662629508736278e-05,
+      "loss": 0.7139,
+      "step": 1180
+    },
+    {
+      "epoch": 0.07766103243490179,
+      "grad_norm": 8.23953369228554,
+      "learning_rate": 2.9656046408165344e-05,
+      "loss": 0.7132,
+      "step": 1190
+    },
+    {
+      "epoch": 0.07831364615284213,
+      "grad_norm": 5.755275462407804,
+      "learning_rate": 2.964940044352095e-05,
+      "loss": 0.6923,
+      "step": 1200
+    },
+    {
+      "epoch": 0.07896625987078248,
+      "grad_norm": 3.8396649246253816,
+      "learning_rate": 2.9642691643309572e-05,
+      "loss": 0.7082,
+      "step": 1210
+    },
+    {
+      "epoch": 0.07961887358872284,
+      "grad_norm": 5.7429454484886415,
+      "learning_rate": 2.963592003630723e-05,
+      "loss": 0.7095,
+      "step": 1220
+    },
+    {
+      "epoch": 0.08027148730666318,
+      "grad_norm": 17.628494673763004,
+      "learning_rate": 2.962908565155932e-05,
+      "loss": 0.7309,
+      "step": 1230
+    },
+    {
+      "epoch": 0.08092410102460354,
+      "grad_norm": 4.83400055237192,
+      "learning_rate": 2.9622188518380528e-05,
+      "loss": 0.6925,
+      "step": 1240
+    },
+    {
+      "epoch": 0.08157671474254388,
+      "grad_norm": 3.1535973307593905,
+      "learning_rate": 2.9615228666354667e-05,
+      "loss": 0.7441,
+      "step": 1250
+    },
+    {
+      "epoch": 0.08222932846048424,
+      "grad_norm": 4.085385929026401,
+      "learning_rate": 2.9608206125334586e-05,
+      "loss": 0.7137,
+      "step": 1260
+    },
+    {
+      "epoch": 0.0828819421784246,
+      "grad_norm": 4.299591870123697,
+      "learning_rate": 2.9601120925442016e-05,
+      "loss": 0.7515,
+      "step": 1270
+    },
+    {
+      "epoch": 0.08353455589636494,
+      "grad_norm": 12.873434323415678,
+      "learning_rate": 2.959397309706746e-05,
+      "loss": 0.6852,
+      "step": 1280
+    },
+    {
+      "epoch": 0.0841871696143053,
+      "grad_norm": 6.427088345402557,
+      "learning_rate": 2.958676267087004e-05,
+      "loss": 0.6499,
+      "step": 1290
+    },
+    {
+      "epoch": 0.08483978333224565,
+      "grad_norm": 4.70723263638176,
+      "learning_rate": 2.9579489677777387e-05,
+      "loss": 0.6803,
+      "step": 1300
+    },
+    {
+      "epoch": 0.08549239705018599,
+      "grad_norm": 4.819218491318424,
+      "learning_rate": 2.9572154148985495e-05,
+      "loss": 0.6798,
+      "step": 1310
+    },
+    {
+      "epoch": 0.08614501076812635,
+      "grad_norm": 3.0652661968089827,
+      "learning_rate": 2.9564756115958592e-05,
+      "loss": 0.6935,
+      "step": 1320
+    },
+    {
+      "epoch": 0.08679762448606669,
+      "grad_norm": 5.997224165634556,
+      "learning_rate": 2.9557295610429017e-05,
+      "loss": 0.7133,
+      "step": 1330
+    },
+    {
+      "epoch": 0.08745023820400705,
+      "grad_norm": 3.3593003375605717,
+      "learning_rate": 2.954977266439706e-05,
+      "loss": 0.7335,
+      "step": 1340
+    },
+    {
+      "epoch": 0.0881028519219474,
+      "grad_norm": 4.161242018302672,
+      "learning_rate": 2.954218731013083e-05,
+      "loss": 0.7054,
+      "step": 1350
+    },
+    {
+      "epoch": 0.08875546563988775,
+      "grad_norm": 5.827431481546491,
+      "learning_rate": 2.953453958016614e-05,
+      "loss": 0.6321,
+      "step": 1360
+    },
+    {
+      "epoch": 0.0894080793578281,
+      "grad_norm": 7.1039105888444904,
+      "learning_rate": 2.952682950730634e-05,
+      "loss": 0.6941,
+      "step": 1370
+    },
+    {
+      "epoch": 0.09006069307576846,
+      "grad_norm": 2.7616336275225892,
+      "learning_rate": 2.951905712462219e-05,
+      "loss": 0.6928,
+      "step": 1380
+    },
+    {
+      "epoch": 0.0907133067937088,
+      "grad_norm": 4.261061690296871,
+      "learning_rate": 2.9511222465451716e-05,
+      "loss": 0.7176,
+      "step": 1390
+    },
+    {
+      "epoch": 0.09136592051164916,
+      "grad_norm": 5.4134818862551395,
+      "learning_rate": 2.950332556340006e-05,
+      "loss": 0.7048,
+      "step": 1400
+    },
+    {
+      "epoch": 0.0920185342295895,
+      "grad_norm": 6.3477656240577085,
+      "learning_rate": 2.949536645233935e-05,
+      "loss": 0.6842,
+      "step": 1410
+    },
+    {
+      "epoch": 0.09267114794752986,
+      "grad_norm": 63.477804314776044,
+      "learning_rate": 2.9487345166408545e-05,
+      "loss": 0.6876,
+      "step": 1420
+    },
+    {
+      "epoch": 0.09332376166547021,
+      "grad_norm": 4.368664541213622,
+      "learning_rate": 2.9479261740013286e-05,
+      "loss": 0.6913,
+      "step": 1430
+    },
+    {
+      "epoch": 0.09397637538341055,
+      "grad_norm": 9.476938465079238,
+      "learning_rate": 2.9471116207825754e-05,
+      "loss": 0.6891,
+      "step": 1440
+    },
+    {
+      "epoch": 0.09462898910135091,
+      "grad_norm": 8.434794578560851,
+      "learning_rate": 2.9462908604784523e-05,
+      "loss": 0.6585,
+      "step": 1450
+    },
+    {
+      "epoch": 0.09528160281929127,
+      "grad_norm": 4.798759761163433,
+      "learning_rate": 2.945463896609441e-05,
+      "loss": 0.6736,
+      "step": 1460
+    },
+    {
+      "epoch": 0.09593421653723161,
+      "grad_norm": 9.782724872581115,
+      "learning_rate": 2.9446307327226306e-05,
+      "loss": 0.6659,
+      "step": 1470
+    },
+    {
+      "epoch": 0.09658683025517197,
+      "grad_norm": 3.997516099278308,
+      "learning_rate": 2.9437913723917058e-05,
+      "loss": 0.6527,
+      "step": 1480
+    },
+    {
+      "epoch": 0.09723944397311232,
+      "grad_norm": 4.623015725563099,
+      "learning_rate": 2.942945819216928e-05,
+      "loss": 0.7274,
+      "step": 1490
+    },
+    {
+      "epoch": 0.09789205769105266,
+      "grad_norm": 3.2197835799755055,
+      "learning_rate": 2.942094076825123e-05,
+      "loss": 0.6966,
+      "step": 1500
+    },
+    {
+      "epoch": 0.09854467140899302,
+      "grad_norm": 3.5107988249516984,
+      "learning_rate": 2.9412361488696628e-05,
+      "loss": 0.7235,
+      "step": 1510
+    },
+    {
+      "epoch": 0.09919728512693336,
+      "grad_norm": 18.7865650951996,
+      "learning_rate": 2.9403720390304518e-05,
+      "loss": 0.7382,
+      "step": 1520
+    },
+    {
+      "epoch": 0.09984989884487372,
+      "grad_norm": 3.85598692653545,
+      "learning_rate": 2.93950175101391e-05,
+      "loss": 0.7475,
+      "step": 1530
+    },
+    {
+      "epoch": 0.10050251256281408,
+      "grad_norm": 20.459657003411998,
+      "learning_rate": 2.938625288552957e-05,
+      "loss": 0.6558,
+      "step": 1540
+    },
+    {
+      "epoch": 0.10115512628075442,
+      "grad_norm": 6.416583997846208,
+      "learning_rate": 2.9377426554069976e-05,
+      "loss": 0.7205,
+      "step": 1550
+    },
+    {
+      "epoch": 0.10180773999869477,
+      "grad_norm": 5.532087704430113,
+      "learning_rate": 2.936853855361904e-05,
+      "loss": 0.7189,
+      "step": 1560
+    },
+    {
+      "epoch": 0.10246035371663513,
+      "grad_norm": 4.756518458886862,
+      "learning_rate": 2.9359588922299986e-05,
+      "loss": 0.7088,
+      "step": 1570
+    },
+    {
+      "epoch": 0.10311296743457547,
+      "grad_norm": 5.775658785412931,
+      "learning_rate": 2.9350577698500408e-05,
+      "loss": 0.682,
+      "step": 1580
+    },
+    {
+      "epoch": 0.10376558115251583,
+      "grad_norm": 7.714313915746094,
+      "learning_rate": 2.9341504920872087e-05,
+      "loss": 0.7393,
+      "step": 1590
+    },
+    {
+      "epoch": 0.10441819487045617,
+      "grad_norm": 11.153510433173501,
+      "learning_rate": 2.933237062833082e-05,
+      "loss": 0.6616,
+      "step": 1600
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 15323,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 400,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4.3737129443917824e+18,
+  "train_batch_size": 8,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1600/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3a6a5052a9445cc570063f5939fdeea3ff8007e9c2718674bb335b9eea0bfff
+size 6520

checkpoint-1600/zero_to_fp32.py ADDED Viewed

	@@ -0,0 +1,587 @@

+#!/usr/bin/env python
+# Copyright (c) Microsoft Corporation.
+# SPDX-License-Identifier: Apache-2.0
+# DeepSpeed Team
+# This script extracts fp32 consolidated weights from a zero 1, 2 and 3 DeepSpeed checkpoints. It gets
+# copied into the top level checkpoint dir, so the user can easily do the conversion at any point in
+# the future. Once extracted, the weights don't require DeepSpeed and can be used in any
+# application.
+#
+# example: python zero_to_fp32.py . pytorch_model.bin
+import argparse
+import torch
+import glob
+import math
+import os
+import re
+from collections import OrderedDict
+from dataclasses import dataclass
+# while this script doesn't use deepspeed to recover data, since the checkpoints are pickled with
+# DeepSpeed data structures it has to be available in the current python environment.
+from deepspeed.utils import logger
+from deepspeed.checkpoint.constants import (DS_VERSION, OPTIMIZER_STATE_DICT, SINGLE_PARTITION_OF_FP32_GROUPS,
+                                            FP32_FLAT_GROUPS, ZERO_STAGE, PARTITION_COUNT, PARAM_SHAPES, BUFFER_NAMES,
+                                            FROZEN_PARAM_SHAPES, FROZEN_PARAM_FRAGMENTS)
+@dataclass
+class zero_model_state:
+    buffers: dict()
+    param_shapes: dict()
+    shared_params: list
+    ds_version: int
+    frozen_param_shapes: dict()
+    frozen_param_fragments: dict()
+debug = 0
+# load to cpu
+device = torch.device('cpu')
+def atoi(text):
+    return int(text) if text.isdigit() else text
+def natural_keys(text):
+    '''
+    alist.sort(key=natural_keys) sorts in human order
+    http://nedbatchelder.com/blog/200712/human_sorting.html
+    (See Toothy's implementation in the comments)
+    '''
+    return [atoi(c) for c in re.split(r'(\d+)', text)]
+def get_model_state_file(checkpoint_dir, zero_stage):
+    if not os.path.isdir(checkpoint_dir):
+        raise FileNotFoundError(f"Directory '{checkpoint_dir}' doesn't exist")
+    # there should be only one file
+    if zero_stage <= 2:
+        file = os.path.join(checkpoint_dir, "mp_rank_00_model_states.pt")
+    elif zero_stage == 3:
+        file = os.path.join(checkpoint_dir, "zero_pp_rank_0_mp_rank_00_model_states.pt")
+    if not os.path.exists(file):
+        raise FileNotFoundError(f"can't find model states file at '{file}'")
+    return file
+def get_checkpoint_files(checkpoint_dir, glob_pattern):
+    # XXX: need to test that this simple glob rule works for multi-node setup too
+    ckpt_files = sorted(glob.glob(os.path.join(checkpoint_dir, glob_pattern)), key=natural_keys)
+    if len(ckpt_files) == 0:
+        raise FileNotFoundError(f"can't find {glob_pattern} files in directory '{checkpoint_dir}'")
+    return ckpt_files
+def get_optim_files(checkpoint_dir):
+    return get_checkpoint_files(checkpoint_dir, "*_optim_states.pt")
+def get_model_state_files(checkpoint_dir):
+    return get_checkpoint_files(checkpoint_dir, "*_model_states.pt")
+def parse_model_states(files):
+    zero_model_states = []
+    for file in files:
+        state_dict = torch.load(file, map_location=device)
+        if BUFFER_NAMES not in state_dict:
+            raise ValueError(f"{file} is not a model state checkpoint")
+        buffer_names = state_dict[BUFFER_NAMES]
+        if debug:
+            print("Found buffers:", buffer_names)
+        # recover just the buffers while restoring them to fp32 if they were saved in fp16
+        buffers = {k: v.float() for k, v in state_dict["module"].items() if k in buffer_names}
+        param_shapes = state_dict[PARAM_SHAPES]
+        # collect parameters that are included in param_shapes
+        param_names = []
+        for s in param_shapes:
+            for name in s.keys():
+                param_names.append(name)
+        # update with frozen parameters
+        frozen_param_shapes = state_dict.get(FROZEN_PARAM_SHAPES, None)
+        if frozen_param_shapes is not None:
+            if debug:
+                print(f"Found frozen_param_shapes: {frozen_param_shapes}")
+            param_names += list(frozen_param_shapes.keys())
+        # handle shared params
+        shared_params = [[k, v] for k, v in state_dict["shared_params"].items()]
+        ds_version = state_dict.get(DS_VERSION, None)
+        frozen_param_fragments = state_dict.get(FROZEN_PARAM_FRAGMENTS, None)
+        z_model_state = zero_model_state(buffers=buffers,
+                                         param_shapes=param_shapes,
+                                         shared_params=shared_params,
+                                         ds_version=ds_version,
+                                         frozen_param_shapes=frozen_param_shapes,
+                                         frozen_param_fragments=frozen_param_fragments)
+        zero_model_states.append(z_model_state)
+    return zero_model_states
+def parse_optim_states(files, ds_checkpoint_dir):
+    total_files = len(files)
+    state_dicts = []
+    for f in files:
+        state_dict = torch.load(f, map_location=device)
+        # immediately discard the potentially huge 2 optimizer states as we only care for fp32 master weights
+        # and also handle the case where it was already removed by another helper script
+        state_dict["optimizer_state_dict"].pop("optimizer_state_dict", None)
+        state_dicts.append(state_dict)
+    if not ZERO_STAGE in state_dicts[0][OPTIMIZER_STATE_DICT]:
+        raise ValueError(f"{files[0]} is not a zero checkpoint")
+    zero_stage = state_dicts[0][OPTIMIZER_STATE_DICT][ZERO_STAGE]
+    world_size = state_dicts[0][OPTIMIZER_STATE_DICT][PARTITION_COUNT]
+    # For ZeRO-2 each param group can have different partition_count as data parallelism for expert
+    # parameters can be different from data parallelism for non-expert parameters. So we can just
+    # use the max of the partition_count to get the dp world_size.
+    if type(world_size) is list:
+        world_size = max(world_size)
+    if world_size != total_files:
+        raise ValueError(
+            f"Expected {world_size} of '*_optim_states.pt' under '{ds_checkpoint_dir}' but found {total_files} files. "
+            "Possibly due to an overwrite of an old checkpoint, or a checkpoint didn't get saved by one or more processes."
+        )
+    # the groups are named differently in each stage
+    if zero_stage <= 2:
+        fp32_groups_key = SINGLE_PARTITION_OF_FP32_GROUPS
+    elif zero_stage == 3:
+        fp32_groups_key = FP32_FLAT_GROUPS
+    else:
+        raise ValueError(f"unknown zero stage {zero_stage}")
+    if zero_stage <= 2:
+        fp32_flat_groups = [state_dicts[i][OPTIMIZER_STATE_DICT][fp32_groups_key] for i in range(len(state_dicts))]
+    elif zero_stage == 3:
+        # if there is more than one param group, there will be multiple flattened tensors - one
+        # flattened tensor per group - for simplicity merge them into a single tensor
+        #
+        # XXX: could make the script more memory efficient for when there are multiple groups - it
+        # will require matching the sub-lists of param_shapes for each param group flattened tensor
+        fp32_flat_groups = [
+            torch.cat(state_dicts[i][OPTIMIZER_STATE_DICT][fp32_groups_key], 0) for i in range(len(state_dicts))
+        ]
+    return zero_stage, world_size, fp32_flat_groups
+def _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir):
+    """
+    Returns fp32 state_dict reconstructed from ds checkpoint
+    Args:
+        - ``ds_checkpoint_dir``: path to the deepspeed checkpoint folder (where the optimizer files are)
+    """
+    print(f"Processing zero checkpoint '{ds_checkpoint_dir}'")
+    optim_files = get_optim_files(ds_checkpoint_dir)
+    zero_stage, world_size, fp32_flat_groups = parse_optim_states(optim_files, ds_checkpoint_dir)
+    print(f"Detected checkpoint of type zero stage {zero_stage}, world_size: {world_size}")
+    model_files = get_model_state_files(ds_checkpoint_dir)
+    zero_model_states = parse_model_states(model_files)
+    print(f'Parsing checkpoint created by deepspeed=={zero_model_states[0].ds_version}')
+    if zero_stage <= 2:
+        return _get_fp32_state_dict_from_zero2_checkpoint(world_size, fp32_flat_groups, zero_model_states)
+    elif zero_stage == 3:
+        return _get_fp32_state_dict_from_zero3_checkpoint(world_size, fp32_flat_groups, zero_model_states)
+def _zero2_merge_frozen_params(state_dict, zero_model_states):
+    if zero_model_states[0].frozen_param_shapes is None or len(zero_model_states[0].frozen_param_shapes) == 0:
+        return
+    frozen_param_shapes = zero_model_states[0].frozen_param_shapes
+    frozen_param_fragments = zero_model_states[0].frozen_param_fragments
+    if debug:
+        num_elem = sum(s.numel() for s in frozen_param_shapes.values())
+        print(f'rank 0: {FROZEN_PARAM_SHAPES}.numel = {num_elem}')
+        wanted_params = len(frozen_param_shapes)
+        wanted_numel = sum(s.numel() for s in frozen_param_shapes.values())
+        avail_numel = sum([p.numel() for p in frozen_param_fragments.values()])
+        print(f'Frozen params: Have {avail_numel} numels to process.')
+        print(f'Frozen params: Need {wanted_numel} numels in {wanted_params} params')
+    total_params = 0
+    total_numel = 0
+    for name, shape in frozen_param_shapes.items():
+        total_params += 1
+        unpartitioned_numel = shape.numel()
+        total_numel += unpartitioned_numel
+        state_dict[name] = frozen_param_fragments[name]
+        if debug:
+            print(f"{name} full shape: {shape} unpartitioned numel {unpartitioned_numel} ")
+    print(f"Reconstructed Frozen fp32 state dict with {total_params} params {total_numel} elements")
+def _zero2_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states):
+    param_shapes = zero_model_states[0].param_shapes
+    # Reconstruction protocol:
+    #
+    # XXX: document this
+    if debug:
+        for i in range(world_size):
+            for j in range(len(fp32_flat_groups[0])):
+                print(f"{FP32_FLAT_GROUPS}[{i}][{j}].shape={fp32_flat_groups[i][j].shape}")
+    # XXX: memory usage doubles here (zero2)
+    num_param_groups = len(fp32_flat_groups[0])
+    merged_single_partition_of_fp32_groups = []
+    for i in range(num_param_groups):
+        merged_partitions = [sd[i] for sd in fp32_flat_groups]
+        full_single_fp32_vector = torch.cat(merged_partitions, 0)
+        merged_single_partition_of_fp32_groups.append(full_single_fp32_vector)
+    avail_numel = sum(
+        [full_single_fp32_vector.numel() for full_single_fp32_vector in merged_single_partition_of_fp32_groups])
+    if debug:
+        wanted_params = sum([len(shapes) for shapes in param_shapes])
+        wanted_numel = sum([sum(shape.numel() for shape in shapes.values()) for shapes in param_shapes])
+        # not asserting if there is a mismatch due to possible padding
+        print(f"Have {avail_numel} numels to process.")
+        print(f"Need {wanted_numel} numels in {wanted_params} params.")
+    # params
+    # XXX: for huge models that can't fit into the host's RAM we will have to recode this to support
+    # out-of-core computing solution
+    total_numel = 0
+    total_params = 0
+    for shapes, full_single_fp32_vector in zip(param_shapes, merged_single_partition_of_fp32_groups):
+        offset = 0
+        avail_numel = full_single_fp32_vector.numel()
+        for name, shape in shapes.items():
+            unpartitioned_numel = shape.numel()
+            total_numel += unpartitioned_numel
+            total_params += 1
+            if debug:
+                print(f"{name} full shape: {shape} unpartitioned numel {unpartitioned_numel} ")
+            state_dict[name] = full_single_fp32_vector.narrow(0, offset, unpartitioned_numel).view(shape)
+            offset += unpartitioned_numel
+        # Z2 started to align to 2*world_size to improve nccl performance. Therefore both offset and
+        # avail_numel can differ by anywhere between 0..2*world_size. Due to two unrelated complex
+        # paddings performed in the code it's almost impossible to predict the exact numbers w/o the
+        # live optimizer object, so we are checking that the numbers are within the right range
+        align_to = 2 * world_size
+        def zero2_align(x):
+            return align_to * math.ceil(x / align_to)
+        if debug:
+            print(f"original offset={offset}, avail_numel={avail_numel}")
+        offset = zero2_align(offset)
+        avail_numel = zero2_align(avail_numel)
+        if debug:
+            print(f"aligned  offset={offset}, avail_numel={avail_numel}")
+        # Sanity check
+        if offset != avail_numel:
+            raise ValueError(f"consumed {offset} numels out of {avail_numel} - something is wrong")
+    print(f"Reconstructed fp32 state dict with {total_params} params {total_numel} elements")
+def _get_fp32_state_dict_from_zero2_checkpoint(world_size, fp32_flat_groups, zero_model_states):
+    state_dict = OrderedDict()
+    # buffers
+    buffers = zero_model_states[0].buffers
+    state_dict.update(buffers)
+    if debug:
+        print(f"added {len(buffers)} buffers")
+    _zero2_merge_frozen_params(state_dict, zero_model_states)
+    _zero2_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states)
+    # recover shared parameters
+    for pair in zero_model_states[0].shared_params:
+        if pair[1] in state_dict:
+            state_dict[pair[0]] = state_dict[pair[1]]
+    return state_dict
+def zero3_partitioned_param_info(unpartitioned_numel, world_size):
+    remainder = unpartitioned_numel % world_size
+    padding_numel = (world_size - remainder) if remainder else 0
+    partitioned_numel = math.ceil(unpartitioned_numel / world_size)
+    return partitioned_numel, padding_numel
+def _zero3_merge_frozen_params(state_dict, world_size, zero_model_states):
+    if zero_model_states[0].frozen_param_shapes is None or len(zero_model_states[0].frozen_param_shapes) == 0:
+        return
+    if debug:
+        for i in range(world_size):
+            num_elem = sum(s.numel() for s in zero_model_states[i].frozen_param_fragments.values())
+            print(f'rank {i}: {FROZEN_PARAM_SHAPES}.numel = {num_elem}')
+        frozen_param_shapes = zero_model_states[0].frozen_param_shapes
+        wanted_params = len(frozen_param_shapes)
+        wanted_numel = sum(s.numel() for s in frozen_param_shapes.values())
+        avail_numel = sum([p.numel() for p in zero_model_states[0].frozen_param_fragments.values()]) * world_size
+        print(f'Frozen params: Have {avail_numel} numels to process.')
+        print(f'Frozen params: Need {wanted_numel} numels in {wanted_params} params')
+    total_params = 0
+    total_numel = 0
+    for name, shape in zero_model_states[0].frozen_param_shapes.items():
+        total_params += 1
+        unpartitioned_numel = shape.numel()
+        total_numel += unpartitioned_numel
+        param_frags = tuple(model_state.frozen_param_fragments[name] for model_state in zero_model_states)
+        state_dict[name] = torch.cat(param_frags, 0).narrow(0, 0, unpartitioned_numel).view(shape)
+        partitioned_numel, partitioned_padding_numel = zero3_partitioned_param_info(unpartitioned_numel, world_size)
+        if debug:
+            print(
+                f"Frozen params: {total_params} {name} full shape: {shape} partition0 numel={partitioned_numel} partitioned_padding_numel={partitioned_padding_numel}"
+            )
+    print(f"Reconstructed Frozen fp32 state dict with {total_params} params {total_numel} elements")
+def _zero3_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states):
+    param_shapes = zero_model_states[0].param_shapes
+    avail_numel = fp32_flat_groups[0].numel() * world_size
+    # Reconstruction protocol: For zero3 we need to zip the partitions together at boundary of each
+    # param, re-consolidating each param, while dealing with padding if any
+    # merge list of dicts, preserving order
+    param_shapes = {k: v for d in param_shapes for k, v in d.items()}
+    if debug:
+        for i in range(world_size):
+            print(f"{FP32_FLAT_GROUPS}[{i}].shape={fp32_flat_groups[i].shape}")
+        wanted_params = len(param_shapes)
+        wanted_numel = sum(shape.numel() for shape in param_shapes.values())
+        # not asserting if there is a mismatch due to possible padding
+        avail_numel = fp32_flat_groups[0].numel() * world_size
+        print(f"Trainable params: Have {avail_numel} numels to process.")
+        print(f"Trainable params: Need {wanted_numel} numels in {wanted_params} params.")
+    # params
+    # XXX: for huge models that can't fit into the host's RAM we will have to recode this to support
+    # out-of-core computing solution
+    offset = 0
+    total_numel = 0
+    total_params = 0
+    for name, shape in param_shapes.items():
+        unpartitioned_numel = shape.numel()
+        total_numel += unpartitioned_numel
+        total_params += 1
+        partitioned_numel, partitioned_padding_numel = zero3_partitioned_param_info(unpartitioned_numel, world_size)
+        if debug:
+            print(
+                f"Trainable params: {total_params} {name} full shape: {shape} partition0 numel={partitioned_numel} partitioned_padding_numel={partitioned_padding_numel}"
+            )
+        # XXX: memory usage doubles here
+        state_dict[name] = torch.cat(
+            tuple(fp32_flat_groups[i].narrow(0, offset, partitioned_numel) for i in range(world_size)),
+            0).narrow(0, 0, unpartitioned_numel).view(shape)
+        offset += partitioned_numel
+    offset *= world_size
+    # Sanity check
+    if offset != avail_numel:
+        raise ValueError(f"consumed {offset} numels out of {avail_numel} - something is wrong")
+    print(f"Reconstructed Trainable fp32 state dict with {total_params} params {total_numel} elements")
+def _get_fp32_state_dict_from_zero3_checkpoint(world_size, fp32_flat_groups, zero_model_states):
+    state_dict = OrderedDict()
+    # buffers
+    buffers = zero_model_states[0].buffers
+    state_dict.update(buffers)
+    if debug:
+        print(f"added {len(buffers)} buffers")
+    _zero3_merge_frozen_params(state_dict, world_size, zero_model_states)
+    _zero3_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states)
+    # recover shared parameters
+    for pair in zero_model_states[0].shared_params:
+        if pair[1] in state_dict:
+            state_dict[pair[0]] = state_dict[pair[1]]
+    return state_dict
+def get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, tag=None):
+    """
+    Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated state_dict that can be loaded with
+    ``load_state_dict()`` and used for training without DeepSpeed or shared with others, for example
+    via a model hub.
+    Args:
+        - ``checkpoint_dir``: path to the desired checkpoint folder
+        - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in 'latest' file. e.g., ``global_step14``
+    Returns:
+        - pytorch ``state_dict``
+    Note: this approach may not work if your application doesn't have sufficient free CPU memory and
+    you may need to use the offline approach using the ``zero_to_fp32.py`` script that is saved with
+    the checkpoint.
+    A typical usage might be ::
+        from deepspeed.utils.zero_to_fp32 import get_fp32_state_dict_from_zero_checkpoint
+        # do the training and checkpoint saving
+        state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir) # already on cpu
+        model = model.cpu() # move to cpu
+        model.load_state_dict(state_dict)
+        # submit to model hub or save the model to share with others
+    In this example the ``model`` will no longer be usable in the deepspeed context of the same
+    application. i.e. you will need to re-initialize the deepspeed engine, since
+    ``model.load_state_dict(state_dict)`` will remove all the deepspeed magic from it.
+    If you want it all done for you, use ``load_state_dict_from_zero_checkpoint`` instead.
+    """
+    if tag is None:
+        latest_path = os.path.join(checkpoint_dir, 'latest')
+        if os.path.isfile(latest_path):
+            with open(latest_path, 'r') as fd:
+                tag = fd.read().strip()
+        else:
+            raise ValueError(f"Unable to find 'latest' file at {latest_path}")
+    ds_checkpoint_dir = os.path.join(checkpoint_dir, tag)
+    if not os.path.isdir(ds_checkpoint_dir):
+        raise FileNotFoundError(f"Directory '{ds_checkpoint_dir}' doesn't exist")
+    return _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir)
+def convert_zero_checkpoint_to_fp32_state_dict(checkpoint_dir, output_file, tag=None):
+    """
+    Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated ``state_dict`` file that can be
+    loaded with ``torch.load(file)`` + ``load_state_dict()`` and used for training without DeepSpeed.
+    Args:
+        - ``checkpoint_dir``: path to the desired checkpoint folder. (one that contains the tag-folder, like ``global_step14``)
+        - ``output_file``: path to the pytorch fp32 state_dict output file (e.g. path/pytorch_model.bin)
+        - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in the file named ``latest`` in the checkpoint folder, e.g., ``global_step14``
+    """
+    state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, tag)
+    print(f"Saving fp32 state dict to {output_file}")
+    torch.save(state_dict, output_file)
+def load_state_dict_from_zero_checkpoint(model, checkpoint_dir, tag=None):
+    """
+    1. Put the provided model to cpu
+    2. Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated ``state_dict``
+    3. Load it into the provided model
+    Args:
+        - ``model``: the model object to update
+        - ``checkpoint_dir``: path to the desired checkpoint folder. (one that contains the tag-folder, like ``global_step14``)
+        - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in the file named ``latest`` in the checkpoint folder, e.g., ``global_step14``
+    Returns:
+        - ``model`: modified model
+    Make sure you have plenty of CPU memory available before you call this function. If you don't
+    have enough use the ``zero_to_fp32.py`` utility to do the conversion. You will find it
+    conveniently placed for you in the checkpoint folder.
+    A typical usage might be ::
+        from deepspeed.utils.zero_to_fp32 import load_state_dict_from_zero_checkpoint
+        model = load_state_dict_from_zero_checkpoint(trainer.model, checkpoint_dir)
+        # submit to model hub or save the model to share with others
+    Note, that once this was run, the ``model`` will no longer be usable in the deepspeed context
+    of the same application. i.e. you will need to re-initialize the deepspeed engine, since
+    ``model.load_state_dict(state_dict)`` will remove all the deepspeed magic from it.
+    """
+    logger.info(f"Extracting fp32 weights")
+    state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, tag)
+    logger.info(f"Overwriting model with fp32 weights")
+    model = model.cpu()
+    model.load_state_dict(state_dict, strict=False)
+    return model
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("checkpoint_dir",
+                        type=str,
+                        help="path to the desired checkpoint folder, e.g., path/checkpoint-12")
+    parser.add_argument(
+        "output_file",
+        type=str,
+        help="path to the pytorch fp32 state_dict output file (e.g. path/checkpoint-12/pytorch_model.bin)")
+    parser.add_argument("-t",
+                        "--tag",
+                        type=str,
+                        default=None,
+                        help="checkpoint tag used as a unique identifier for checkpoint. e.g., global_step1")
+    parser.add_argument("-d", "--debug", action='store_true', help="enable debug")
+    args = parser.parse_args()
+    debug = args.debug
+    convert_zero_checkpoint_to_fp32_state_dict(args.checkpoint_dir, args.output_file, tag=args.tag)

checkpoint-2000/README.md ADDED Viewed

	@@ -0,0 +1,203 @@

+---
+library_name: peft
+base_model: Qwen/Qwen-VL-Chat
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.10.0
+- PEFT 0.11.1

checkpoint-2000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,380 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen-VL-Chat",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "transformer.h.16.mlp.w1",
+    "transformer.visual.transformer.resblocks.13.attn.out_proj",
+    "transformer.h.28.mlp.w1",
+    "transformer.h.16.attn.c_attn",
+    "transformer.h.3.mlp.w1",
+    "transformer.visual.transformer.resblocks.29.attn.in_proj",
+    "transformer.visual.transformer.resblocks.19.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.47.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.34.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.4.attn.out_proj",
+    "transformer.h.31.attn.c_attn",
+    "transformer.h.16.mlp.w2",
+    "transformer.visual.transformer.resblocks.5.attn.out_proj",
+    "transformer.h.2.mlp.w1",
+    "transformer.visual.transformer.resblocks.7.attn.in_proj",
+    "transformer.h.20.mlp.w2",
+    "transformer.h.19.mlp.w1",
+    "transformer.visual.transformer.resblocks.18.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.27.attn.out_proj",
+    "transformer.visual.transformer.resblocks.10.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.43.mlp.c_fc",
+    "transformer.h.5.mlp.w1",
+    "transformer.visual.transformer.resblocks.15.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.25.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.10.attn.out_proj",
+    "transformer.visual.transformer.resblocks.4.mlp.c_fc",
+    "transformer.h.31.mlp.w2",
+    "transformer.visual.transformer.resblocks.37.attn.out_proj",
+    "transformer.h.8.attn.c_proj",
+    "transformer.h.29.attn.c_attn",
+    "transformer.visual.transformer.resblocks.24.mlp.c_proj",
+    "transformer.h.19.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.11.attn.out_proj",
+    "transformer.h.13.mlp.c_proj",
+    "transformer.h.27.mlp.c_proj",
+    "transformer.h.31.mlp.w1",
+    "transformer.visual.transformer.resblocks.7.mlp.c_proj",
+    "transformer.h.28.mlp.w2",
+    "transformer.visual.transformer.resblocks.3.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.13.attn.in_proj",
+    "transformer.h.21.attn.c_attn",
+    "transformer.visual.transformer.resblocks.23.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.33.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.42.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.3.attn.in_proj",
+    "transformer.h.13.mlp.w1",
+    "transformer.visual.transformer.resblocks.22.attn.out_proj",
+    "transformer.visual.transformer.resblocks.20.mlp.c_fc",
+    "transformer.h.26.mlp.w2",
+    "transformer.h.14.attn.c_attn",
+    "transformer.h.16.attn.c_proj",
+    "transformer.h.1.mlp.w1",
+    "transformer.visual.transformer.resblocks.21.attn.out_proj",
+    "transformer.visual.transformer.resblocks.39.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.4.attn.in_proj",
+    "transformer.h.29.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.12.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.14.attn.in_proj",
+    "transformer.h.28.attn.c_proj",
+    "transformer.h.18.mlp.w1",
+    "transformer.h.27.mlp.w2",
+    "transformer.h.18.attn.c_attn",
+    "transformer.visual.transformer.resblocks.33.attn.out_proj",
+    "transformer.h.5.mlp.w2",
+    "transformer.visual.transformer.resblocks.37.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.2.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.42.attn.out_proj",
+    "transformer.visual.transformer.resblocks.15.attn.in_proj",
+    "transformer.visual.transformer.resblocks.6.mlp.c_fc",
+    "transformer.h.13.mlp.w2",
+    "transformer.h.23.attn.c_proj",
+    "transformer.h.20.mlp.c_proj",
+    "transformer.h.14.mlp.w2",
+    "transformer.visual.transformer.resblocks.9.attn.in_proj",
+    "transformer.visual.transformer.resblocks.46.attn.in_proj",
+    "transformer.h.9.attn.c_attn",
+    "transformer.visual.transformer.resblocks.36.mlp.c_proj",
+    "transformer.h.31.attn.c_proj",
+    "transformer.visual.transformer.resblocks.19.mlp.c_fc",
+    "transformer.h.17.mlp.w1",
+    "transformer.h.2.attn.c_proj",
+    "transformer.visual.transformer.resblocks.47.attn.in_proj",
+    "transformer.visual.transformer.resblocks.45.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.46.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.27.attn.in_proj",
+    "transformer.visual.transformer.resblocks.26.attn.out_proj",
+    "transformer.h.22.attn.c_proj",
+    "transformer.visual.transformer.resblocks.40.attn.out_proj",
+    "transformer.visual.transformer.resblocks.46.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.18.attn.out_proj",
+    "transformer.h.27.attn.c_proj",
+    "transformer.visual.transformer.resblocks.26.attn.in_proj",
+    "transformer.h.4.mlp.w1",
+    "transformer.h.10.attn.c_proj",
+    "transformer.h.6.attn.c_attn",
+    "transformer.h.2.attn.c_attn",
+    "transformer.h.22.mlp.w1",
+    "transformer.visual.transformer.resblocks.39.mlp.c_fc",
+    "transformer.h.8.mlp.w2",
+    "transformer.h.4.attn.c_attn",
+    "transformer.h.26.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.29.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.5.mlp.c_proj",
+    "transformer.h.11.mlp.c_proj",
+    "transformer.h.0.mlp.w2",
+    "transformer.visual.transformer.resblocks.36.attn.out_proj",
+    "transformer.h.29.mlp.w1",
+    "transformer.h.12.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.2.attn.in_proj",
+    "transformer.visual.transformer.resblocks.2.mlp.c_fc",
+    "transformer.h.25.attn.c_attn",
+    "transformer.visual.transformer.resblocks.19.attn.in_proj",
+    "transformer.visual.transformer.resblocks.43.attn.out_proj",
+    "transformer.visual.transformer.resblocks.35.attn.out_proj",
+    "transformer.h.22.attn.c_attn",
+    "transformer.h.0.mlp.w1",
+    "transformer.h.3.attn.c_attn",
+    "transformer.h.28.attn.c_attn",
+    "transformer.visual.transformer.resblocks.25.attn.in_proj",
+    "transformer.visual.transformer.resblocks.34.attn.out_proj",
+    "transformer.h.21.attn.c_proj",
+    "transformer.h.6.attn.c_proj",
+    "transformer.visual.transformer.resblocks.11.mlp.c_proj",
+    "transformer.h.13.attn.c_attn",
+    "transformer.visual.transformer.resblocks.38.attn.out_proj",
+    "transformer.h.3.attn.c_proj",
+    "transformer.visual.transformer.resblocks.17.mlp.c_fc",
+    "transformer.h.26.mlp.w1",
+    "transformer.visual.transformer.resblocks.36.mlp.c_fc",
+    "transformer.h.26.attn.c_attn",
+    "transformer.visual.transformer.resblocks.29.attn.out_proj",
+    "transformer.h.7.mlp.w1",
+    "transformer.visual.transformer.resblocks.40.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.9.attn.out_proj",
+    "transformer.h.3.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.26.mlp.c_fc",
+    "transformer.h.11.mlp.w2",
+    "transformer.visual.transformer.resblocks.33.attn.in_proj",
+    "transformer.visual.transformer.resblocks.42.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.32.attn.out_proj",
+    "transformer.h.4.attn.c_proj",
+    "transformer.visual.transformer.resblocks.27.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.11.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.25.attn.out_proj",
+    "transformer.visual.transformer.resblocks.23.attn.in_proj",
+    "transformer.h.5.attn.c_attn",
+    "transformer.h.16.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.14.mlp.c_proj",
+    "transformer.h.22.mlp.w2",
+    "transformer.h.25.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.10.mlp.c_fc",
+    "transformer.h.24.mlp.c_proj",
+    "transformer.h.19.mlp.w2",
+    "transformer.h.14.mlp.w1",
+    "transformer.visual.transformer.resblocks.40.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.28.attn.out_proj",
+    "transformer.visual.transformer.resblocks.24.mlp.c_fc",
+    "transformer.h.8.attn.c_attn",
+    "transformer.h.9.mlp.w1",
+    "transformer.h.6.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.19.attn.out_proj",
+    "transformer.visual.transformer.resblocks.32.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.7.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.44.attn.in_proj",
+    "transformer.visual.transformer.resblocks.34.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.9.mlp.c_fc",
+    "transformer.visual.conv1",
+    "transformer.visual.transformer.resblocks.8.attn.out_proj",
+    "transformer.h.23.mlp.w2",
+    "transformer.h.7.mlp.w2",
+    "transformer.h.24.attn.c_proj",
+    "transformer.h.30.attn.c_proj",
+    "transformer.h.29.attn.c_proj",
+    "transformer.visual.transformer.resblocks.9.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.35.attn.in_proj",
+    "transformer.visual.transformer.resblocks.21.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.41.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.38.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.13.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.41.attn.out_proj",
+    "transformer.visual.transformer.resblocks.16.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.45.attn.out_proj",
+    "transformer.h.11.mlp.w1",
+    "transformer.visual.transformer.resblocks.16.attn.in_proj",
+    "transformer.visual.transformer.resblocks.47.attn.out_proj",
+    "transformer.h.9.attn.c_proj",
+    "transformer.h.31.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.12.attn.in_proj",
+    "transformer.visual.transformer.resblocks.28.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.20.attn.out_proj",
+    "transformer.h.12.attn.c_attn",
+    "transformer.h.24.mlp.w1",
+    "transformer.visual.transformer.resblocks.21.attn.in_proj",
+    "transformer.visual.transformer.resblocks.41.attn.in_proj",
+    "transformer.h.10.mlp.w1",
+    "transformer.h.1.mlp.w2",
+    "transformer.h.0.mlp.c_proj",
+    "transformer.h.22.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.18.attn.in_proj",
+    "transformer.visual.transformer.resblocks.38.mlp.c_proj",
+    "transformer.h.12.mlp.w1",
+    "transformer.h.1.attn.c_attn",
+    "transformer.visual.transformer.resblocks.31.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.44.mlp.c_proj",
+    "transformer.h.15.mlp.c_proj",
+    "transformer.h.6.mlp.w1",
+    "transformer.visual.transformer.resblocks.16.mlp.c_proj",
+    "transformer.h.13.attn.c_proj",
+    "transformer.h.15.attn.c_attn",
+    "transformer.h.15.mlp.w1",
+    "transformer.h.17.mlp.w2",
+    "transformer.visual.transformer.resblocks.10.attn.in_proj",
+    "transformer.h.26.attn.c_proj",
+    "transformer.visual.transformer.resblocks.20.attn.in_proj",
+    "transformer.h.10.mlp.w2",
+    "transformer.h.24.attn.c_attn",
+    "transformer.h.8.mlp.w1",
+    "transformer.h.23.mlp.w1",
+    "transformer.visual.transformer.resblocks.1.mlp.c_proj",
+    "transformer.h.4.mlp.w2",
+    "transformer.visual.transformer.resblocks.38.attn.in_proj",
+    "transformer.h.12.mlp.w2",
+    "transformer.h.7.attn.c_proj",
+    "transformer.h.4.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.31.attn.out_proj",
+    "transformer.visual.transformer.resblocks.17.mlp.c_proj",
+    "transformer.h.21.mlp.w2",
+    "transformer.visual.transformer.resblocks.5.attn.in_proj",
+    "transformer.h.18.attn.c_proj",
+    "transformer.visual.transformer.resblocks.31.mlp.c_fc",
+    "transformer.h.18.mlp.w2",
+    "transformer.visual.transformer.resblocks.6.attn.out_proj",
+    "transformer.visual.transformer.resblocks.8.attn.in_proj",
+    "transformer.visual.transformer.resblocks.30.mlp.c_proj",
+    "transformer.h.30.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.30.attn.out_proj",
+    "transformer.visual.transformer.resblocks.16.attn.out_proj",
+    "transformer.visual.transformer.resblocks.14.attn.out_proj",
+    "transformer.h.25.mlp.w1",
+    "transformer.visual.transformer.resblocks.45.attn.in_proj",
+    "transformer.h.11.attn.c_proj",
+    "transformer.visual.transformer.resblocks.30.attn.in_proj",
+    "transformer.visual.transformer.resblocks.43.mlp.c_proj",
+    "transformer.h.10.mlp.c_proj",
+    "transformer.h.21.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.43.attn.in_proj",
+    "transformer.visual.transformer.resblocks.3.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.44.attn.out_proj",
+    "transformer.h.23.attn.c_attn",
+    "transformer.visual.transformer.resblocks.22.attn.in_proj",
+    "transformer.visual.transformer.resblocks.6.attn.in_proj",
+    "transformer.visual.transformer.resblocks.44.mlp.c_fc",
+    "transformer.h.17.attn.c_attn",
+    "transformer.h.7.attn.c_attn",
+    "transformer.visual.transformer.resblocks.42.attn.in_proj",
+    "transformer.visual.transformer.resblocks.20.mlp.c_proj",
+    "transformer.h.8.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.17.attn.out_proj",
+    "transformer.h.14.attn.c_proj",
+    "transformer.visual.transformer.resblocks.40.attn.in_proj",
+    "transformer.h.25.attn.c_proj",
+    "transformer.h.28.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.35.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.36.attn.in_proj",
+    "transformer.visual.transformer.resblocks.41.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.14.mlp.c_fc",
+    "transformer.h.30.mlp.w2",
+    "transformer.h.20.mlp.w1",
+    "transformer.visual.transformer.resblocks.33.mlp.c_fc",
+    "transformer.h.29.mlp.w2",
+    "transformer.visual.transformer.resblocks.47.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.30.mlp.c_fc",
+    "transformer.h.10.attn.c_attn",
+    "transformer.visual.transformer.resblocks.1.attn.in_proj",
+    "transformer.h.1.attn.c_proj",
+    "transformer.visual.transformer.resblocks.8.mlp.c_proj",
+    "transformer.h.19.attn.c_proj",
+    "transformer.visual.transformer.resblocks.37.attn.in_proj",
+    "transformer.h.15.attn.c_proj",
+    "transformer.h.5.attn.c_proj",
+    "transformer.visual.transformer.resblocks.32.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.3.attn.out_proj",
+    "transformer.visual.transformer.resblocks.32.attn.in_proj",
+    "transformer.h.21.mlp.w1",
+    "transformer.h.23.mlp.c_proj",
+    "transformer.h.30.mlp.w1",
+    "transformer.h.0.attn.c_attn",
+    "transformer.visual.transformer.resblocks.24.attn.out_proj",
+    "transformer.visual.transformer.resblocks.31.attn.in_proj",
+    "transformer.h.18.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.25.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.22.mlp.c_fc",
+    "transformer.h.30.attn.c_attn",
+    "transformer.visual.transformer.resblocks.13.mlp.c_fc",
+    "transformer.h.17.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.24.attn.in_proj",
+    "transformer.h.11.attn.c_attn",
+    "transformer.h.2.mlp.w2",
+    "transformer.visual.transformer.resblocks.8.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.0.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.2.attn.out_proj",
+    "transformer.visual.transformer.resblocks.35.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.39.attn.out_proj",
+    "transformer.h.12.attn.c_proj",
+    "transformer.visual.transformer.resblocks.28.attn.in_proj",
+    "transformer.visual.transformer.resblocks.29.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.0.attn.out_proj",
+    "transformer.visual.transformer.resblocks.23.mlp.c_proj",
+    "transformer.h.20.attn.c_attn",
+    "transformer.visual.transformer.resblocks.7.attn.out_proj",
+    "transformer.visual.transformer.resblocks.15.attn.out_proj",
+    "transformer.h.7.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.1.attn.out_proj",
+    "transformer.h.3.mlp.w2",
+    "transformer.h.9.mlp.w2",
+    "transformer.visual.transformer.resblocks.34.attn.in_proj",
+    "transformer.h.27.attn.c_attn",
+    "transformer.visual.transformer.resblocks.12.mlp.c_fc",
+    "transformer.h.6.mlp.w2",
+    "transformer.visual.transformer.resblocks.39.attn.in_proj",
+    "transformer.h.15.mlp.w2",
+    "transformer.visual.transformer.resblocks.18.mlp.c_proj",
+    "transformer.h.0.attn.c_proj",
+    "transformer.h.19.attn.c_attn",
+    "transformer.visual.transformer.resblocks.27.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.23.attn.out_proj",
+    "transformer.h.14.mlp.c_proj",
+    "transformer.h.9.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.12.attn.out_proj",
+    "transformer.visual.transformer.resblocks.0.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.5.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.28.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.6.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.22.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.37.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.17.attn.in_proj",
+    "transformer.visual.transformer.resblocks.46.attn.out_proj",
+    "transformer.h.24.mlp.w2",
+    "transformer.h.27.mlp.w1",
+    "transformer.visual.transformer.resblocks.11.attn.in_proj",
+    "transformer.visual.transformer.resblocks.4.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.21.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.26.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.15.mlp.c_fc",
+    "transformer.h.2.mlp.c_proj",
+    "transformer.h.1.mlp.c_proj",
+    "transformer.h.5.mlp.c_proj",
+    "transformer.visual.transformer.resblocks.45.mlp.c_fc",
+    "transformer.visual.transformer.resblocks.0.attn.in_proj",
+    "transformer.h.25.mlp.w2",
+    "transformer.h.20.attn.c_proj",
+    "transformer.h.17.attn.c_proj",
+    "transformer.visual.transformer.resblocks.1.mlp.c_fc"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-2000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5b8112968ddb6d5c9cc45ec4d181a2563b8c368858d7518e99e7a2f245a9f0f9
+size 469105640

checkpoint-2000/latest ADDED Viewed

	@@ -0,0 +1 @@


1	+ global_step2000

checkpoint-2000/qwen.tiktoken ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-2000/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:373e583e765629a92d3530782b1b5ca914f786284fe0f518884228570ac59903
+size 15920

checkpoint-2000/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:361a2793fa584b3154fa7c73ce06ed5ea5168d465509e86dc4cb35aaab2a8bc1
+size 15920

checkpoint-2000/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4375f3bebf15db7c1ff742b2f45104c74917bdd457bdcf7c4e871a438ef88a23
+size 15920

checkpoint-2000/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:53e9816ac4dacdc9737c33654b217a3d3423ce888a9165acc83b6c109118e8bb
+size 15920

checkpoint-2000/rng_state_4.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9be6ba5b929abd1627d02881ae59d6445ddd79d165123739de9ec3f1ecf40134
+size 15920