Phind-70B / README.md
michaelroyzen's picture
Update README.md
9c08e97 verified
---
license: llama3.3
library_name: transformers
pipeline_tag: text-generation
base_model: meta-llama/Llama-3.3-70B-Instruct
tags:
- llama
- llama-3
- code
- instruct
- fine-tuned
language:
- en
---
# Phind-70B
Phind-70B is a fine-tuned version of [Llama 3.3 70B Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct), optimized for code generation, technical reasoning, and general instruction following.
## Model Details
| Attribute | Details |
|-----------|---------|
| **Base Model** | meta-llama/Llama-3.3-70B-Instruct |
| **Model Type** | Causal Language Model |
| **Parameters** | 70 Billion |
| **Context Length** | 128K tokens |
| **Language** | English |
| **License** | Llama 3.3 Community License |
## Intended Use
Phind-70B is designed for:
- **Code generation** across multiple programming languages
- **Technical problem-solving** and debugging
- **General instruction following** and reasoning tasks
- **Multi-turn conversations** requiring context retention
## How to Use
### With Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "Phind/Phind-70B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are Phind, an intelligent assistant that helps with programming and technical questions."},
{"role": "user", "content": "Write a Python function to find the longest palindromic substring."},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=1024,
do_sample=True,
temperature=0.7,
top_p=0.9,
)
response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)
```
## Chat Template
This model uses the Llama 3 chat format:
```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>
{user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|}
{assistant_response}<|eot_id|>
```
## Hardware Requirements
| Precision | VRAM Required |
|-----------|---------------|
| FP16/BF16 | ~140 GB |
| INT8 | ~70 GB |
| INT4 | ~35 GB |
For inference, we recommend using multiple GPUs with tensor parallelism or quantized versions for consumer hardware.
## Limitations
- May occasionally generate incorrect or misleading information
- Not suitable for production use without additional safety measures
- Performance may vary on tasks outside the training distribution
- Should not be used for generating harmful, illegal, or unethical content
## Acknowledgments
This model builds upon the excellent work by Meta on the Llama 3.3 model family. We are grateful for their contributions to open-source AI.