ctranslate2-4you commited on
Commit
b771ead
·
verified ·
1 Parent(s): 5b2551d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: ctranslate2
3
+ base_model:
4
+ - Qwen/Qwen3-1.7B
5
+ base_model_relation: quantized
6
+ tags:
7
+ - ctranslate2
8
+ - chat
9
+ ---
10
+ Bloat16 Ctranslate2 compatable version of [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B).
11
+
12
+ ## VRAM Usage:
13
+
14
+ | Model | VRAM Usage |
15
+ |-------|------------|
16
+ | [Qwen3-32B-ct2-awq](https://huggingface.co/CTranslate2HQ/Qwen3-32B-ct2-AWQ) | ~18.3 GB |
17
+ | [Qwen3-14B-ct2-awq](https://huggingface.co/CTranslate2HQ/Qwen3-14B-ct2-AWQ) | ~9.5 GB |
18
+ | [Qwen3-8B-ct2-awq](https://huggingface.co/CTranslate2HQ/Qwen3-8B-ct2-AWQ) | ~5.8 GB |
19
+ | **👉 [Qwen3-1.7B-ct2-bfloat16](https://huggingface.co/CTranslate2HQ/Qwen3-1.7B-ct2-bfloat16)** | ~3.3 GB |
20
+ | [Qwen3-4B-ct2-awq](https://huggingface.co/CTranslate2HQ/Qwen3-4B-ct2-AWQ) | ~2.6 GB |
21
+ | [Qwen3-1.7B-ct2-awq](https://huggingface.co/CTranslate2HQ/Qwen3-1.7B-ct2-AWQ) | ~1.3 GB |
22
+ | [Qwen3-0.6B-ct2-awq](https://huggingface.co/CTranslate2HQ/Qwen3-0.6B-ct2-AWQ) | ~0.6 GB |
23
+
24
+ ## Example Usage:
25
+
26
+ ```python
27
+ import ctranslate2
28
+ from transformers import AutoTokenizer
29
+
30
+ MODEL_ID = "CTranslate2HQ/Qwen3-1.7B-ct2-bfloat16"
31
+
32
+ # Load model and tokenizer from Hugging Face Hub
33
+ generator = ctranslate2.Generator(MODEL_ID, device="cuda")
34
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
35
+
36
+ # Format prompt using chat template
37
+ messages = [
38
+ {"role": "system", "content": "You are a helpful AI assistant."},
39
+ {"role": "user", "content": "Write a short poem about a cat."}
40
+ ]
41
+
42
+ prompt = tokenizer.apply_chat_template(
43
+ messages,
44
+ tokenize=False,
45
+ add_generation_prompt=True,
46
+ enable_thinking=False
47
+ )
48
+
49
+ # Tokenize and generate
50
+ tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))
51
+
52
+ # Do NOT use the "compute_type" parameter with AWQ models
53
+ results = generator.generate_batch(
54
+ [tokens],
55
+ max_length=8192,
56
+ sampling_temperature=0.7,
57
+ sampling_topk=50,
58
+ compute_type="bfloat16"
59
+ )
60
+
61
+ # Decode and print response
62
+ output_ids = results[0].sequences_ids[0]
63
+ response = tokenizer.decode(output_ids, skip_special_tokens=True)
64
+ print(response)
65
+ ```
66
+
67
+ **Requirements:**
68
+ ```
69
+ ctranslate2
70
+ transformers
71
+ torch
72
+ huggingface_hub
73
+ ```