Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -255,18 +255,6 @@ for i, output in enumerate(outputs):
|
|
| 255 |
|
| 256 |
5. **Canon Layers Help**: The depthwise causal convolutions (Canon layers) improve factuality and reasoning with only 0.13% parameter overhead.
|
| 257 |
|
| 258 |
-
## When to Use Dhara
|
| 259 |
-
|
| 260 |
-
**Choose Dhara when:**
|
| 261 |
-
- Batch generation throughput matters
|
| 262 |
-
- Factual accuracy is critical
|
| 263 |
-
- You have an existing AR checkpoint to convert
|
| 264 |
-
|
| 265 |
-
**Choose AR models when:**
|
| 266 |
-
- Interactive latency is critical
|
| 267 |
-
- Sequential reasoning is important (math, coding)
|
| 268 |
-
- Memory is constrained
|
| 269 |
-
|
| 270 |
## Limitations
|
| 271 |
|
| 272 |
- Lower performance on sequential reasoning tasks (GSM8K: 0.00%)
|
|
|
|
| 255 |
|
| 256 |
5. **Canon Layers Help**: The depthwise causal convolutions (Canon layers) improve factuality and reasoning with only 0.13% parameter overhead.
|
| 257 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 258 |
## Limitations
|
| 259 |
|
| 260 |
- Lower performance on sequential reasoning tasks (GSM8K: 0.00%)
|