AdvRahul commited on
Commit
5228340
·
verified ·
1 Parent(s): b089476

Update README.md

Browse files

# Dharma-DeepScaleR-1.5B-Preview-Q4_K_M

This model is a Q4_K_M quantized version of the DeepScaleR-1.5B-Preview model originally developed by Agentica. The quantization reduces the model size while maintaining most of its performance capabilities, making it more efficient for deployment on resource-constrained environments.

## Model Description

Dharma-DeepScaleR-1.5B-Preview-Q4_K_M is a 4-bit quantized variant of the DeepScaleR-1.5B-Preview model, optimized using the Q4_K_M quantization method. This quantization approach significantly reduces the model's memory footprint and inference time while preserving its core capabilities.

### Key Features

- **Efficient Resource Usage**: The Q4_K_M quantization reduces memory requirements by approximately 75% compared to the original model
- **Faster Inference**: Enjoy reduced latency and higher throughput, particularly beneficial for production deployments
- **Maintained Performance**: Preserves most of the original model's capabilities with minimal degradation in quality
- **Indian Market Optimization**: Particularly suitable for deployment in the Indian market where computational resources may be constrained

### Quantization Details

The model was quantized using the Q4_K_M method, which:
- Represents weights with 4-bit precision
- Uses K-means clustering for weight grouping
- Maintains mixed precision for certain sensitive layers

## Intended Use

This quantized model is ideal for:
- Mobile and edge device deployments
- Cost-effective cloud implementations
- Applications requiring real-time responses
- Scenarios with limited computational resources

## Limitations

- Slight reduction in performance compared to the full-precision model
- May show degradation in handling complex or nuanced language tasks
- Not recommended for tasks requiring extremely high precision

## Training and Quantization Process

This model is a quantized version of DeepScaleR-1.5B-Preview. The original model's architecture and training methodology remain unchanged. The quantization process focused on optimizing the model for deployment efficiency while minimizing performance loss.

## License

This model is licensed under the Apache 2.0 license, which allows for both commercial and non-commercial use with proper attribution.

## Base Model

```yaml
base_model: agentica-org/DeepScaleR-1.5B-Preview
```

## Additional Metadata

```yaml
license: apache-2.0
tags:
- quantized
- q4_k_m
- efficient-inference
- indian-market
- llm
```

Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -5,4 +5,6 @@ base_model:
5
  pipeline_tag: text-generation
6
  tags:
7
  - legal
 
 
8
  ---
 
5
  pipeline_tag: text-generation
6
  tags:
7
  - legal
8
+ language:
9
+ - en
10
  ---