--- language: - en tags: - deepseek - paraphrase - lora - text-generation license: mit base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B datasets: - quora model-index: - name: Deepseek Paraphrase results: [] --- # Deepseek Paraphrase This model is a fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) that has been specialized for high-quality paraphrase generation. It was trained using LoRA (Low-Rank Adaptation) and then merged back into the base model for efficient inference. ## Model Details - **Base Model**: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B - **Task**: Paraphrase Generation - **Training Method**: LoRA fine-tuning with r=16, alpha=32 - **Training Data**: Multi-domain text from literary works, technical documentation, academic papers, and articles, plus the Quora paraphrase dataset ## Performance This model outperforms standard paraphrasing models like BART and T5 on key metrics: - **Semantic Preservation** (BERTScore): 0.952 - Excellent - **Lexical Diversity** (BLEU Diversity): 0.513 - Acceptable - **Character-level Changes** (Edit Distance): 0.344 - Acceptable - **Structural Variation** (Syntactic Diversity): 0.147 - Moderate - **Overall Balance** (Harmonic Score): 0.468 - Acceptable ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "PeterAM4/deepseek-paraphrase" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) text = "Learn Once, Write Anywhere: We don't make assumptions about the rest of your technology stack, so you can develop new features in React without rewriting existing code." prompt = f"<|begin▁of▁sentence|><|User|>Paraphrase the following text while preserving its meaning but changing the wording and structure: {text}<|Assistant|>\nLet me analyze this text and find ways to rephrase it while keeping the same meaning.\nI need to use different vocabulary and structure.\n\n\n" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=200, temperature=0.7, top_p=0.95, do_sample=True ) paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True).replace(prompt, "") print(paraphrase) ``` ## Limitations - Very technical or domain-specific terminology may not be paraphrased optimally - Always review paraphrases for factual accuracy and meaning preservation ## Citation If you use this model in your research or applications, please cite: ``` @misc{deepseek-paraphrase, author = {PeterAM4}, title = {DeepSeek Paraphrase: Fine-tuned DeepSeek model for high-quality paraphrasing}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/PeterAM4/deepseek-paraphrase}} } ```