|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
base_model: |
|
|
- deepseek-ai/Janus-Pro-1B |
|
|
pipeline_tag: visual-question-answering |
|
|
tags: |
|
|
- DeepSeek |
|
|
- Janus-Pro-1B |
|
|
--- |
|
|
|
|
|
# Janus-Pro-1B-Int8 |
|
|
|
|
|
This version of Janus-Pro-1B has been converted to run on the Axera NPU using **w8a16** quantization. |
|
|
|
|
|
This model has been optimized with the following LoRA: |
|
|
|
|
|
Compatible with Pulsar2 version: 3.4 |
|
|
|
|
|
## Convert tools links: |
|
|
|
|
|
For those who are interested in model conversion, you can try to export axmodel through the original repo : |
|
|
https://huggingface.co/deepseek-ai/Janus-Pro-1B |
|
|
|
|
|
- [Github for Janus-Pro-1B.axera](https://github.com/AXERA-TECH/Janus-Pro-1B.axera) |
|
|
- [Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html) |
|
|
|
|
|
## Support Platform |
|
|
- AX650 |
|
|
- [M4N-Dock(η±θ―ζ΄ΎPro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html) |
|
|
|
|
|
|chips|image encoder 384 | ttft | w8a16 | |
|
|
|--|--|--|--| |
|
|
|AX650| 142.682 ms | 4560.214 ms | 11.43 tokens/sec| |
|
|
|
|
|
## How to use |
|
|
|
|
|
Download all files from this repository to the device. |
|
|
|
|
|
**Using AX650 Board** |
|
|
|
|
|
```bash |
|
|
root@ax650 ~/yongqiang/push_hugging_face/Janus-Pro-1B # tree -L 1 |
|
|
. |
|
|
βββ assets |
|
|
βββ config.json |
|
|
βββ embeds |
|
|
βββ img_gen_onnx |
|
|
βββ imgs |
|
|
βββ infer_axmodel_gen.py |
|
|
βββ infer_axmodel_und.py |
|
|
βββ janus_pro_1b_axmodel |
|
|
βββ janus_pro_1b_tokenizer |
|
|
βββ README.md |
|
|
βββ vit_axmodel |
|
|
|
|
|
8 directories, 3 files |
|
|
``` |
|
|
|
|
|
#### Install janus |
|
|
|
|
|
```bash |
|
|
$ git clone https://github.com/deepseek-ai/Janus |
|
|
$ cd Janus |
|
|
$ pip3 install -e . |
|
|
``` |
|
|
|
|
|
#### Inference with AX650 Host, such as M4N-Dock(η±θ―ζ΄ΎPro) or AX650N DEMO Board |
|
|
|
|
|
**Multimodal Understanding** |
|
|
|
|
|
input text: |
|
|
|
|
|
``` |
|
|
Please describe the picture. |
|
|
``` |
|
|
|
|
|
- input image |
|
|
|
|
|
 |
|
|
|
|
|
log information: |
|
|
|
|
|
```bash |
|
|
root@ax650 ~/yongqiang/push_hugging_face/Janus-Pro-1B # python3 infer_axmodel_und.py --tokenizer_dir janus_pro_1b_tokenizer --axmodel_path janus_pro_1b_axmodel --vit_axmodel_path vit_axmodel/janus_warp_vit.axmodel -i ./imgs/image.png |
|
|
[INFO] Available providers: ['AxEngineExecutionProvider'] |
|
|
[INFO] Chip type: ChipType.MC50 |
|
|
[INFO] VNPU type: VNPUType.DISABLED |
|
|
[INFO] Engine version: 2.11.0a |
|
|
vit_output.shape is (1, 576, 2048), vit feature extract done! |
|
|
Init InferenceSession: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 24/24 [00:04<00:00, 4.94it/s] |
|
|
model load done! |
|
|
prefill done! |
|
|
Decoder: 62%|ββββββββββββββββββββββββββββββββββββββββββ | 634/1024 [00:00<00:00, 2505.28it/s]Decoder: 72%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 741/1024 [00:19<00:10, 27.69it/s]hit eos! |
|
|
Decoder: 74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 762/1024 [00:23<00:08, 31.84it/s] |
|
|
Janus Answers: The image depicts three astronauts standing in a lush, green forest. They are wearing traditional white space suits with various patches and equipment attached. The suits have a reflective visor on their helmets, and they appear to be in a relaxed pose, with one astronaut raising his arms and the others standing or crouching. The forest is dense with tall trees and dense foliage, creating a serene and somewhat mysterious atmosphere. |
|
|
``` |
|
|
|
|
|
**Text-to-Image Generation** |
|
|
|
|
|
input text: |
|
|
|
|
|
``` |
|
|
"A close-up high-contrast photo of Sydney Opera House sitting next to Eiffel tower, under a blue night sky of roiling energy, exploding yellow stars, and radiating swirls of blue." |
|
|
``` |
|
|
|
|
|
log information: |
|
|
|
|
|
```bash |
|
|
root@ax650 ~/yongqiang/push_hugging_face/Janus-Pro-1B # python3 infer_axmodel_gen.py --tokenizer_dir janus_pro_1b_tokenizer/ --axmodel_path janus_pro_1b_axmodel/ |
|
|
[INFO] Available providers: ['AxEngineExecutionProvider'] |
|
|
Init InferenceSession: 0%| | 0/24 [00:00<?, ?it/s][INFO] Chip type: ChipType.MC50 |
|
|
[INFO] VNPU type: VNPUType.DISABLED |
|
|
[INFO] Engine version: 2.11.0a |
|
|
Init InferenceSession: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 24/24 [00:14<00:00, 1.68it/s] |
|
|
2025-04-14 15:55:23.408 | INFO | __main__:<module>:269 - model load done! |
|
|
2025-04-14 15:55:33.104 | DEBUG | __main__:generate:158 - prefill completed! |
|
|
ImageToken: 18%|ββββββββββββ | 104/575 [00:39<02:58, 2.64it/s]ImageToken: 45%|βββββββββββββββββββββββββββββββ | 261/575 [01:39<01:58, 2.65it/s]ImageToken: 73%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 419/575 [02:39<00:58, 2.66it/s]ImageToken: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 575/575 [03:38<00:00, 2.63it/s] |
|
|
``` |
|
|
|
|
|
output image |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
|