Phind
/

Phind-70B

+---
+license: llama3.3
+---
+---
+license: llama3.3
+library_name: transformers
+pipeline_tag: text-generation
+base_model: meta-llama/Llama-3.3-70B-Instruct
+tags:
+  - llama
+  - llama-3
+  - code
+  - instruct
+  - fine-tuned
+language:
+  - en
+---
+# Phind-70B
+Phind-70B is a fine-tuned version of [Llama 3.3 70B Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct), optimized for code generation, technical reasoning, and general instruction following.
+## Model Details
+| Attribute | Details |
+|-----------|---------|
+| **Base Model** | meta-llama/Llama-3.3-70B-Instruct |
+| **Model Type** | Causal Language Model |
+| **Parameters** | 70 Billion |
+| **Context Length** | 128K tokens |
+| **Language** | English |
+| **License** | Llama 3.3 Community License |
+## Intended Use
+Phind-70B is designed for:
+- **Code generation** across multiple programming languages
+- **Technical problem-solving** and debugging
+- **General instruction following** and reasoning tasks
+- **Multi-turn conversations** requiring context retention
+## How to Use
+### With Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_id = "Phind/Phind-70B"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "You are Phind, an intelligent assistant that helps with programming and technical questions."},
+    {"role": "user", "content": "Write a Python function to find the longest palindromic substring."},
+]
+input_ids = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to(model.device)
+outputs = model.generate(
+    input_ids,
+    max_new_tokens=1024,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.9,
+)
+response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
+print(response)
+```
+### With vLLM
+```python
+from vllm import LLM, SamplingParams
+llm = LLM(model="Phind/Phind-70B", tensor_parallel_size=4)
+sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=1024)
+prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+You are Phind, an intelligent assistant that helps with programming and technical questions.<|eot_id|><|start_header_id|>user<|end_header_id|}
+Write a Python function to find the longest palindromic substring.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+"""
+outputs = llm.generate([prompt], sampling_params)
+print(outputs[0].outputs[0].text)
+```
+## Chat Template
+This model uses the Llama 3 chat format:
+```
+<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>
+{user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|}
+{assistant_response}<|eot_id|>
+```
+## Hardware Requirements
+| Precision | VRAM Required |
+|-----------|---------------|
+| FP16/BF16 | ~140 GB |
+| INT8 | ~70 GB |
+| INT4 | ~35 GB |
+For inference, we recommend using multiple GPUs with tensor parallelism or quantized versions for consumer hardware.
+## Limitations
+- May occasionally generate incorrect or misleading information
+- Not suitable for production use without additional safety measures
+- Performance may vary on tasks outside the training distribution
+- Should not be used for generating harmful, illegal, or unethical content
+## Acknowledgments
+This model builds upon the excellent work by Meta on the Llama 3.3 model family. We are grateful for their contributions to open-source AI.