michaelroyzen commited on
Commit
387db3f
·
verified ·
1 Parent(s): 11e53f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -3
README.md CHANGED
@@ -1,3 +1,137 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.3
3
+ ---
4
+
5
+ ---
6
+ license: llama3.3
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
+ base_model: meta-llama/Llama-3.3-70B-Instruct
10
+ tags:
11
+ - llama
12
+ - llama-3
13
+ - code
14
+ - instruct
15
+ - fine-tuned
16
+ language:
17
+ - en
18
+ ---
19
+
20
+ # Phind-70B
21
+
22
+ Phind-70B is a fine-tuned version of [Llama 3.3 70B Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct), optimized for code generation, technical reasoning, and general instruction following.
23
+
24
+ ## Model Details
25
+
26
+ | Attribute | Details |
27
+ |-----------|---------|
28
+ | **Base Model** | meta-llama/Llama-3.3-70B-Instruct |
29
+ | **Model Type** | Causal Language Model |
30
+ | **Parameters** | 70 Billion |
31
+ | **Context Length** | 128K tokens |
32
+ | **Language** | English |
33
+ | **License** | Llama 3.3 Community License |
34
+
35
+ ## Intended Use
36
+
37
+ Phind-70B is designed for:
38
+
39
+ - **Code generation** across multiple programming languages
40
+ - **Technical problem-solving** and debugging
41
+ - **General instruction following** and reasoning tasks
42
+ - **Multi-turn conversations** requiring context retention
43
+
44
+ ## How to Use
45
+
46
+ ### With Transformers
47
+
48
+ ```python
49
+ from transformers import AutoModelForCausalLM, AutoTokenizer
50
+ import torch
51
+
52
+ model_id = "Phind/Phind-70B"
53
+
54
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
55
+ model = AutoModelForCausalLM.from_pretrained(
56
+ model_id,
57
+ torch_dtype=torch.bfloat16,
58
+ device_map="auto",
59
+ )
60
+
61
+ messages = [
62
+ {"role": "system", "content": "You are Phind, an intelligent assistant that helps with programming and technical questions."},
63
+ {"role": "user", "content": "Write a Python function to find the longest palindromic substring."},
64
+ ]
65
+
66
+ input_ids = tokenizer.apply_chat_template(
67
+ messages,
68
+ add_generation_prompt=True,
69
+ return_tensors="pt"
70
+ ).to(model.device)
71
+
72
+ outputs = model.generate(
73
+ input_ids,
74
+ max_new_tokens=1024,
75
+ do_sample=True,
76
+ temperature=0.7,
77
+ top_p=0.9,
78
+ )
79
+
80
+ response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
81
+ print(response)
82
+ ```
83
+
84
+ ### With vLLM
85
+
86
+ ```python
87
+ from vllm import LLM, SamplingParams
88
+
89
+ llm = LLM(model="Phind/Phind-70B", tensor_parallel_size=4)
90
+ sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=1024)
91
+
92
+ prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
93
+
94
+ You are Phind, an intelligent assistant that helps with programming and technical questions.<|eot_id|><|start_header_id|>user<|end_header_id|}
95
+
96
+ Write a Python function to find the longest palindromic substring.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
97
+
98
+ """
99
+
100
+ outputs = llm.generate([prompt], sampling_params)
101
+ print(outputs[0].outputs[0].text)
102
+ ```
103
+
104
+ ## Chat Template
105
+
106
+ This model uses the Llama 3 chat format:
107
+
108
+ ```
109
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
110
+
111
+ {system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>
112
+
113
+ {user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|}
114
+
115
+ {assistant_response}<|eot_id|>
116
+ ```
117
+
118
+ ## Hardware Requirements
119
+
120
+ | Precision | VRAM Required |
121
+ |-----------|---------------|
122
+ | FP16/BF16 | ~140 GB |
123
+ | INT8 | ~70 GB |
124
+ | INT4 | ~35 GB |
125
+
126
+ For inference, we recommend using multiple GPUs with tensor parallelism or quantized versions for consumer hardware.
127
+
128
+ ## Limitations
129
+
130
+ - May occasionally generate incorrect or misleading information
131
+ - Not suitable for production use without additional safety measures
132
+ - Performance may vary on tasks outside the training distribution
133
+ - Should not be used for generating harmful, illegal, or unethical content
134
+
135
+ ## Acknowledgments
136
+
137
+ This model builds upon the excellent work by Meta on the Llama 3.3 model family. We are grateful for their contributions to open-source AI.