Upload README.md
Browse files
README.md
CHANGED
|
@@ -25,14 +25,14 @@ tags:
|
|
| 25 |
|
| 26 |
๐ [Paper](https://arxiv.org/abs/2410.17241) | ๐ [Home](https://github.com/ai4colonoscopy/IntelliScope)
|
| 27 |
|
| 28 |
-
> This is the merged weights of [ColonGPT-v1-phi1.5-siglip-lora](https://drive.google.com/
|
| 29 |
|
| 30 |
Our ColonGPT is a standard multimodal language model, which contains four basic components: a language tokenizer, an visual encoder (๐ค [SigLIP-SO](https://huggingface.co/google/siglip-so400m-patch14-384)), a multimodal connector, and a language model (๐ค [Phi1.5](https://huggingface.co/microsoft/phi-1_5)). In this huggingface page, we provide a quick start for convenient of new users. For further details about ColonGPT, we highly recommend visiting our [homepage](https://github.com/BAAI-DCAI/Bunny). There, you'll find comprehensive usage instructions for our model and the latest advancements in intelligent colonoscopy technology.
|
| 31 |
|
| 32 |
|
| 33 |
# Quick start
|
| 34 |
|
| 35 |
-
Here is a code snippet to show you how to quickly try-on our ColonGPT model with transformers. For convenience, we manually combined some configuration and code files and merged the weights. Please note that this is a quick code, we recommend you installing [ColonGPT's source code](https://github.com/ai4colonoscopy/IntelliScope/blob/main/docs/guideline-for-ColonGPT.md) to explore more.
|
| 36 |
|
| 37 |
- Before running the snippet, you only need to install the following minimium dependencies.
|
| 38 |
```shell
|
|
@@ -83,12 +83,12 @@ Here is a code snippet to show you how to quickly try-on our ColonGPT model with
|
|
| 83 |
return True
|
| 84 |
return False
|
| 85 |
|
| 86 |
-
prompt = "
|
| 87 |
text = f"USER: <image>\n{prompt} ASSISTANT:"
|
| 88 |
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
|
| 89 |
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0).to(device)
|
| 90 |
|
| 91 |
-
image = Image.open('cache/examples/example2.png')
|
| 92 |
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
|
| 93 |
|
| 94 |
stop_str = "<|endoftext|>"
|
|
|
|
| 25 |
|
| 26 |
๐ [Paper](https://arxiv.org/abs/2410.17241) | ๐ [Home](https://github.com/ai4colonoscopy/IntelliScope)
|
| 27 |
|
| 28 |
+
> This is the merged weights of [ColonGPT-v1-phi1.5-siglip-lora-stg2](https://drive.google.com/file/d/1xAAaVKu16czWO_jgnf-2jCgj2hf14BwM/view?usp=sharing), including vision encoder (siglip) + language model (phi-1.5), and other fine-tuned weights on our ColonINST.
|
| 29 |
|
| 30 |
Our ColonGPT is a standard multimodal language model, which contains four basic components: a language tokenizer, an visual encoder (๐ค [SigLIP-SO](https://huggingface.co/google/siglip-so400m-patch14-384)), a multimodal connector, and a language model (๐ค [Phi1.5](https://huggingface.co/microsoft/phi-1_5)). In this huggingface page, we provide a quick start for convenient of new users. For further details about ColonGPT, we highly recommend visiting our [homepage](https://github.com/BAAI-DCAI/Bunny). There, you'll find comprehensive usage instructions for our model and the latest advancements in intelligent colonoscopy technology.
|
| 31 |
|
| 32 |
|
| 33 |
# Quick start
|
| 34 |
|
| 35 |
+
Here is a code snippet to show you how to quickly try-on our ColonGPT model with transformers. The model focuses on three downstream tasks: image classification (CLS), referring expression generation (REG), and referring expression comprehension (REC). If you need a caption generator, please refer to [ColonGPT-V1-stg1](https://huggingface.co/ai4colonoscopy/ColonGPT-v1-stg1). For convenience, we manually combined some configuration and code files and merged the weights. Please note that this is a quick code, we recommend you installing [ColonGPT's source code](https://github.com/ai4colonoscopy/IntelliScope/blob/main/docs/guideline-for-ColonGPT.md) to explore more.
|
| 36 |
|
| 37 |
- Before running the snippet, you only need to install the following minimium dependencies.
|
| 38 |
```shell
|
|
|
|
| 83 |
return True
|
| 84 |
return False
|
| 85 |
|
| 86 |
+
prompt = "Categorize the object."
|
| 87 |
text = f"USER: <image>\n{prompt} ASSISTANT:"
|
| 88 |
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
|
| 89 |
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0).to(device)
|
| 90 |
|
| 91 |
+
image = Image.open('/home/projects/u7248002/Project/ColonGPT-tmp/cache/examples/example2.png')
|
| 92 |
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
|
| 93 |
|
| 94 |
stop_str = "<|endoftext|>"
|