Spaces:

George-API
/

qwen4bit

Sleeping

App Files Files Community

George-API commited on Mar 10

Commit

cfcf792

verified ·

1 Parent(s): 52e5371

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +69 -17

README.md CHANGED Viewed

@@ -1,28 +1,80 @@
-# Fine-tuned DeepSeek-R1-Distill-Qwen-14B
-This space hosts a fine-tuned version of the [unsloth/DeepSeek-R1-Distill-Qwen-14B-bnb-4bit](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-14B-bnb-4bit) model.
-## Model Details
-- **Base Model**: `unsloth/DeepSeek-R1-Distill-Qwen-14B-bnb-4bit`
-- **Fine-tuned on**: `phi4-cognitive-dataset`
-- **Quantization**: Already 4-bit quantized (no additional quantization applied)
-## Current Status
-This space is currently being prepared. The fine-tuned model will be available soon.
-## Usage
-Once deployed, you can interact with the model through the Gradio interface or via API.
-## Training Process
-The model is being fine-tuned with the following specifications:
-- Training dataset processed in ascending order by `prompt_number`
-- Custom training parameters optimized for the L40S GPU
-- Mixed precision training for optimal performance
-## Contact
-For questions or issues, please reach out through the [Hugging Face community](https://huggingface.co/discussions).

+---
+title: Fine-tuning DeepSeek-R1-Distill-Qwen-14B (Research Training)
+emoji: 🧪
+colorFrom: blue
+colorTo: indigo
+sdk: gradio
+sdk_version: 4.13.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# Model Fine-Tuning Project
+## Overview
+- **Goal**: Fine-tune unsloth/DeepSeek-R1-Distill-Qwen-14B-bnb-4bit using pre-tokenized JSONL dataset
+- **Model**: `unsloth/DeepSeek-R1-Distill-Qwen-14B-bnb-4bit`
+  - **Important**: Already 4-bit quantized - do not quantize further
+- **Dataset**: `phi4-cognitive-dataset`
+⚠️ **RESEARCH TRAINING PHASE ONLY**: This space is being used for training purposes and does not provide interactive model outputs.
+### Dataset Specs
+- Entries under 2048 tokens
+- Fields: `prompt_number`, `article_id`, `conversations`
+- Process in ascending `prompt_number` order
+- Pre-tokenized dataset - no additional tokenization needed
+### Hardware
+- GPU: 1x L40S (48GB VRAM)
+- RAM: 62GB
+- CPU: 8 cores
+## Environment Variables (.env)
+- `HF_TOKEN`: Hugging Face API token
+- `HF_USERNAME`: Hugging Face username
+- `HF_SPACE_NAME`: Target space name
+## Files
+### 1. `app.py`
+- Training status dashboard
+- No interactive model demo (research phase only)
+### 2. `transformers_config.json`
+- Configuration for Hugging Face Transformers
+- Contains: model parameters, hardware settings, optimizer details
+- Specifies pre-tokenized dataset handling
+### 3. `run_cloud_training.py`
+- Loads pre-tokenized dataset, sorts by `prompt_number`, initiates training
+1. Load and sort JSONL by `prompt_number`
+2. Use pre-tokenized input_ids directly (no tokenization)
+3. Initialize with parameters from config
+4. Execute training with metrics, checkpoints, error handling
+- Uses Hugging Face's Trainer API with custom pre-tokenized data collator
+### 4. `requirements.txt`
+- Python dependencies: `transformers`, `datasets`, `torch`, etc.
+- Contains unsloth for optimized training
+### 5. `upload_to_space.py`
+- Update model and space directly using HF API
+## Implementation Notes
+### Best Practices
+- Dataset is pre-tokenized and sorted by `prompt_number`
+- Settings stored in config file, avoiding hardcoding
+- Hardware-optimized training parameters
+- Gradient checkpointing and mixed precision training
+- Complete logging for monitoring progress
+### Model Repository
+This space hosts a fine-tuned version of the [unsloth/DeepSeek-R1-Distill-Qwen-14B-bnb-4bit](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-14B-bnb-4bit) model.
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference