Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

README.md +118 -0
config.json +44 -0
generation_config.json +6 -0
model.safetensors +3 -0
special_tokens_map.json +23 -0
tokenizer.json +0 -0
tokenizer_config.json +75 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,118 @@

+---
+language:
+- en
+- te
+- sa
+tags:
+- text-generation
+- structured-data
+- multilingual
+- deepseek
+- no-domain
+license: apache-2.0
+datasets:
+- custom
+pipeline_tag: text-generation
+---
+# Fine-tuned SLM T2 - Structured Data Generation (No Domain)
+This model is fine-tuned for generating natural language sentences from structured data **without domain labels**.
+## Model Details
+- **Base Model**: DeepSeek V3 Compact (~110M parameters)
+- **Task**: Structured data to text generation
+- **Languages**: English, Telugu, Sanskrit
+- **Training Format**: `Generate a sentence from this data: {key: value, ...}`
+- **Domains**: Sports, Weather, Travel, Movies, Products
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained("asrith05/finetuned_slm_t2")
+tokenizer = AutoTokenizer.from_pretrained("asrith05/finetuned_slm_t2")
+# Example: Sports data
+prompt = "Generate a sentence from this data: {Team1: 'Lakers', Score1: 108, Team2: 'Warriors', Score2: 90}"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.8)
+result = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(result)
+# Expected: "Generate a sentence from this data: {Team1: 'Lakers', Score1: 108, Team2: 'Warriors', Score2: 90} The Lakers won the game against the Warriors with a final score of 108 to 90."
+```
+## Training Details
+- **Dataset Split**: 24k train / 6k validation / 6k test
+- **Epochs**: 1 epoch
+- **Learning Rate**: 5e-5
+- **Batch Size**: 4 with gradient accumulation
+- **Format**: No domain labels, direct structured data to text
+## Supported Data Types
+### Sports
+```
+Generate a sentence from this data: {Team1: 'Mumbai Indians', Score1: 185, Team2: 'Chennai Super Kings', Score2: 180}
+```
+### Weather
+```
+Generate a sentence from this data: {City: 'Hyderabad', Temperature: 32, Condition: 'sunny', Day: 'Monday'}
+```
+### Travel
+```
+Generate a sentence from this data: {Person: 'Priya', City: 'Bangalore', Transport: 'flight', Duration: 2}
+```
+### Movies
+```
+Generate a sentence from this data: {Movie: 'RRR', Genre: 'Action', Rating: 8.2, Year: 2022}
+```
+### Products
+```
+Generate a sentence from this data: {Product: 'iPhone', Brand: 'Apple', Price: 999, Rating: 4.5}
+```
+## Key Features
+- **Domain-Agnostic**: No need to specify domain in input
+- **Clean Format**: Simple structured data input
+- **Multilingual**: Supports English, Telugu, Sanskrit
+- **Versatile**: Works across multiple data types
+## Model Performance
+- Trained on diverse structured data examples
+- Optimized for coherent natural language generation
+- Validated on hold-out test set
+- Supports temperature-based generation control
+## Limitations
+- Best performance on data similar to training format
+- May struggle with deeply nested structures
+- Requires well-formatted input dictionaries
+- Limited to domains seen during training
+## Related Models
+- [asrith05/finetuned_slm_t2_diverse](https://huggingface.co/asrith05/finetuned_slm_t2_diverse) - Multi-domain with labels
+- [asrith05/slm](https://huggingface.co/asrith05/slm) - Entity extraction model
+- [asrith05/deepseek_pretrain_90k](https://huggingface.co/asrith05/deepseek_pretrain_90k) - Pretrained base
+## Citation
+```bibtex
+@model{finetuned_slm_t2,
+  title={Fine-tuned SLM T2: Structured Data Generation},
+  author={Asrith},
+  year={2024},
+  url={https://huggingface.co/asrith05/finetuned_slm_t2}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "architectures": [
+    "DeepseekV3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "eos_token_id": 1,
+  "first_k_dense_replace": 4,
+  "head_dim": 64,
+  "hidden_act": "silu",
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "kv_lora_rank": 512,
+  "max_position_embeddings": 4096,
+  "model_type": "deepseek_v3",
+  "moe_intermediate_size": 2048,
+  "n_group": 8,
+  "n_routed_experts": 0,
+  "n_shared_experts": 1,
+  "norm_topk_prob": true,
+  "num_attention_heads": 12,
+  "num_experts_per_tok": 8,
+  "num_hidden_layers": 4,
+  "num_key_value_heads": 12,
+  "pretraining_tp": 1,
+  "q_lora_rank": 1536,
+  "qk_head_dim": 192,
+  "qk_nope_head_dim": 128,
+  "qk_rope_head_dim": 64,
+  "rms_norm_eps": 1e-06,
+  "rope_interleave": true,
+  "rope_scaling": null,
+  "rope_theta": 10000.0,
+  "routed_scaling_factor": 2.5,
+  "tie_word_embeddings": false,
+  "topk_group": 4,
+  "torch_dtype": "float32",
+  "transformers_version": "4.51.3",
+  "use_cache": false,
+  "v_head_dim": 128,
+  "vocab_size": 32000
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 1,
+  "transformers_version": "4.51.3"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:de350ca565dbe4a0c2576dbd6c06fd9cd4077620d720cf5c4a2d957656b14008
+size 436535952

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,75 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<en>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<te>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<sa>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<mask>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "tokenizer_class": "PreTrainedTokenizer"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02fa4890d43ae1dfef7a3e238d1c0055c35b415b279c0a199a0f53f6d22ce47e
+size 5304