Update README.md
Browse files
README.md
CHANGED
|
@@ -54,6 +54,83 @@ Note some portions of conversation redacted for public README purposes
|
|
| 54 |
**You:** no tell me about it
|
| 55 |
**Sophie:** 😊 Sure thing! Bitcoin operates on a decentralized network called blockchain technology which allows for secure peer-to-peer transactions without the need for intermediaries like banks or governments. Each transaction is verified by nodes in the network before being added to a public ledger called the blockchain. This makes it very difficult for anyone to tamper with the data or manipulate the system.
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
## Model Description
|
| 59 |
|
|
|
|
| 54 |
**You:** no tell me about it
|
| 55 |
**Sophie:** 😊 Sure thing! Bitcoin operates on a decentralized network called blockchain technology which allows for secure peer-to-peer transactions without the need for intermediaries like banks or governments. Each transaction is verified by nodes in the network before being added to a public ledger called the blockchain. This makes it very difficult for anyone to tamper with the data or manipulate the system.
|
| 56 |
|
| 57 |
+
## Hardware Requirements
|
| 58 |
+
|
| 59 |
+
The model has been tested with different precision settings to determine the minimum hardware requirements:
|
| 60 |
+
|
| 61 |
+
| Precision | VRAM Allocated | VRAM Reserved | Suitable GPUs | Notes |
|
| 62 |
+
|-----------|----------------|---------------|---------------|-------|
|
| 63 |
+
| FP32 (32-bit) | 14.24 GB | 14.24 GB | RTX 3090, 4090, A5000, A6000, etc. | Default loading mode |
|
| 64 |
+
| FP16 (16-bit) | 7.12 GB | 21.35 GB | RTX 3090, 4090, A5000, A6000, etc. | Recommended for most users |
|
| 65 |
+
| 8-bit Quantization | 5.68 GB | 6.14 GB | RTX 2060 12GB, 3060, 3070, etc. | Good balance of quality and efficiency |
|
| 66 |
+
| 4-bit Quantization | 2.27 GB | 2.30 GB | Most modern GPUs (GTX 1060+) | Lowest quality, runs on older hardware |
|
| 67 |
+
|
| 68 |
+
### Recommended Loading Code
|
| 69 |
+
|
| 70 |
+
**For high-end GPUs (FP16):**
|
| 71 |
+
```python
|
| 72 |
+
import torch
|
| 73 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 74 |
+
|
| 75 |
+
# Load in half precision for best balance of performance and quality
|
| 76 |
+
tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
|
| 77 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 78 |
+
"magicsquares137/phi3-uncensored-chat",
|
| 79 |
+
torch_dtype=torch.float16,
|
| 80 |
+
device_map="auto"
|
| 81 |
+
)
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
**For mid-range GPUs (8-bit):**
|
| 85 |
+
```python
|
| 86 |
+
import torch
|
| 87 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
| 88 |
+
|
| 89 |
+
# 8-bit quantization config
|
| 90 |
+
quantization_config = BitsAndBytesConfig(
|
| 91 |
+
load_in_8bit=True,
|
| 92 |
+
llm_int8_threshold=6.0
|
| 93 |
+
)
|
| 94 |
+
|
| 95 |
+
# Load in 8-bit
|
| 96 |
+
tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
|
| 97 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 98 |
+
"magicsquares137/phi3-uncensored-chat",
|
| 99 |
+
quantization_config=quantization_config,
|
| 100 |
+
device_map="auto"
|
| 101 |
+
)
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
**For low-end GPUs (4-bit):**
|
| 105 |
+
```python
|
| 106 |
+
import torch
|
| 107 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
| 108 |
+
|
| 109 |
+
# 4-bit quantization config
|
| 110 |
+
quantization_config = BitsAndBytesConfig(
|
| 111 |
+
load_in_4bit=True,
|
| 112 |
+
bnb_4bit_compute_dtype=torch.float16
|
| 113 |
+
)
|
| 114 |
+
|
| 115 |
+
# Load in 4-bit
|
| 116 |
+
tokenizer = AutoTokenizer.from_pretrained("magicsquares137/phi3-uncensored-chat")
|
| 117 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 118 |
+
"magicsquares137/phi3-uncensored-chat",
|
| 119 |
+
quantization_config=quantization_config,
|
| 120 |
+
device_map="auto"
|
| 121 |
+
)
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
**For CPU-only inference** (much slower but works on any system):
|
| 125 |
+
```python
|
| 126 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 127 |
+
"magicsquares137/phi3-uncensored-chat",
|
| 128 |
+
device_map="cpu"
|
| 129 |
+
)
|
| 130 |
+
```
|
| 131 |
+
|
| 132 |
+
Note: Lower precision (8-bit and 4-bit) may result in slightly reduced output quality, but the difference is often minimal for most use cases.
|
| 133 |
+
|
| 134 |
|
| 135 |
## Model Description
|
| 136 |
|