File size: 2,881 Bytes
c98b5a7 46e1d26 c98b5a7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
language:
- en
tags:
- deepseek
- paraphrase
- lora
- text-generation
license: mit
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
datasets:
- quora
model-index:
- name: Deepseek Paraphrase
results: []
---
# Deepseek Paraphrase
This model is a fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) that has been specialized for high-quality paraphrase generation. It was trained using LoRA (Low-Rank Adaptation) and then merged back into the base model for efficient inference.
## Model Details
- **Base Model**: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- **Task**: Paraphrase Generation
- **Training Method**: LoRA fine-tuning with r=16, alpha=32
- **Training Data**: Multi-domain text from literary works, technical documentation, academic papers, and articles, plus the Quora paraphrase dataset
## Performance
This model outperforms standard paraphrasing models like BART and T5 on key metrics:
- **Semantic Preservation** (BERTScore): 0.952 - Excellent
- **Lexical Diversity** (BLEU Diversity): 0.513 - Acceptable
- **Character-level Changes** (Edit Distance): 0.344 - Acceptable
- **Structural Variation** (Syntactic Diversity): 0.147 - Moderate
- **Overall Balance** (Harmonic Score): 0.468 - Acceptable
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "PeterAM4/deepseek-paraphrase"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
text = "Learn Once, Write Anywhere: We don't make assumptions about the rest of your technology stack, so you can develop new features in React without rewriting existing code."
prompt = f"<|begin▁of▁sentence|><|User|>Paraphrase the following text while preserving its meaning but changing the wording and structure: {text}<|Assistant|><think>\nLet me analyze this text and find ways to rephrase it while keeping the same meaning.\nI need to use different vocabulary and structure.\n</think>\n\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.95,
do_sample=True
)
paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True).replace(prompt, "")
print(paraphrase)
```
## Limitations
- Very technical or domain-specific terminology may not be paraphrased optimally
- Always review paraphrases for factual accuracy and meaning preservation
## Citation
If you use this model in your research or applications, please cite:
```
@misc{deepseek-paraphrase,
author = {PeterAM4},
title = {DeepSeek Paraphrase: Fine-tuned DeepSeek model for high-quality paraphrasing},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/PeterAM4/deepseek-paraphrase}}
}
```
|