File size: 2,963 Bytes
387db3f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
license: llama3.3
library_name: transformers
pipeline_tag: text-generation
base_model: meta-llama/Llama-3.3-70B-Instruct
tags:
  - llama
  - llama-3
  - code
  - instruct
  - fine-tuned
language:
  - en
---

# Phind-70B

Phind-70B is a fine-tuned version of [Llama 3.3 70B Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct), optimized for code generation, technical reasoning, and general instruction following.

## Model Details

| Attribute | Details |
|-----------|---------|
| **Base Model** | meta-llama/Llama-3.3-70B-Instruct |
| **Model Type** | Causal Language Model |
| **Parameters** | 70 Billion |
| **Context Length** | 128K tokens |
| **Language** | English |
| **License** | Llama 3.3 Community License |

## Intended Use

Phind-70B is designed for:

- **Code generation** across multiple programming languages
- **Technical problem-solving** and debugging
- **General instruction following** and reasoning tasks
- **Multi-turn conversations** requiring context retention

## How to Use

### With Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Phind/Phind-70B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are Phind, an intelligent assistant that helps with programming and technical questions."},
    {"role": "user", "content": "Write a Python function to find the longest palindromic substring."},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)
```

## Chat Template

This model uses the Llama 3 chat format:

```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>

{user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|}

{assistant_response}<|eot_id|>
```

## Hardware Requirements

| Precision | VRAM Required |
|-----------|---------------|
| FP16/BF16 | ~140 GB |
| INT8 | ~70 GB |
| INT4 | ~35 GB |

For inference, we recommend using multiple GPUs with tensor parallelism or quantized versions for consumer hardware.

## Limitations

- May occasionally generate incorrect or misleading information
- Not suitable for production use without additional safety measures
- Performance may vary on tasks outside the training distribution
- Should not be used for generating harmful, illegal, or unethical content

## Acknowledgments

This model builds upon the excellent work by Meta on the Llama 3.3 model family. We are grateful for their contributions to open-source AI.