Initial commit: PebbleLM-117M-Chat

Browse files

Files changed (6) hide show

.gitattributes +1 -0
README.md +264 -0
config.json +21 -0
model.pt +3 -0
tokenizer.json +0 -0
tokenizer_config.json +9 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1 @@


1	+ *.pt filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,264 @@

+---
+license: mit
+language:
+- en
+tags:
+- text-generation
+- pytorch
+- small-language-model
+- edge-deployment
+- conversational
+- chat
+base_model: nameissakthi/PebbleLM-117M
+datasets:
+- yahma/alpaca-cleaned
+- databricks/databricks-dolly-15k
+pipeline_tag: text-generation
+---
+# PebbleLM-117M-Chat
+A 117.5M parameter language model finetuned for conversational Q&A. Small but solid - designed for edge deployment.
+**Base Model:** [PebbleLM-117M](https://huggingface.co/nameissakthi/PebbleLM-117M)
+## Model Description
+PebbleLM-117M-Chat is finetuned from PebbleLM-117M on focused Q&A datasets to provide direct, concise answers to questions.
+| Property | Value |
+|----------|-------|
+| Parameters | 117.5M |
+| Architecture | Decoder-only Transformer |
+| Layers | 8 |
+| Hidden Size | 1024 |
+| Attention Heads | 16 |
+| Context Length | 1024 tokens |
+| Vocabulary | 16,000 BPE tokens |
+## Inference Examples
+```
+User: What is 2 + 2?
+Assistant: 2 + 2 equals 4.
+User: What is the capital of France?
+Assistant: The capital of France is Paris.
+User: What is the largest planet?
+Assistant: Jupiter is the largest planet in our solar system.
+User: Who wrote Romeo and Juliet?
+Assistant: William Shakespeare wrote Romeo and Juliet.
+User: How do you say hello in Spanish?
+Assistant: Hello in Spanish is 'Hola'.
+User: What is AI?
+Assistant: AI (artificial intelligence) is the ability of machines to perform
+tasks that normally require human intelligence, such as learning,
+problem-solving, and decision making.
+User: Thank you
+Assistant: Goodbye! Have a great day!
+```
+## Training Data
+### Pretraining (Base Model)
+| Dataset | Samples | Link |
+|---------|---------|------|
+| Wikipedia | 488,906 | [wikipedia](https://huggingface.co/datasets/wikipedia) |
+| OpenWebText | 500,000 | [openwebtext](https://huggingface.co/datasets/openwebtext) |
+| TinyStories | 188,067 | [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) |
+### Finetuning (This Model)
+| Dataset | Samples | Description | Link |
+|---------|---------|-------------|------|
+| Alpaca-cleaned | 20,000 | Instruction-response pairs | [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) |
+| Databricks Dolly | 10,991 | Q&A pairs | [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) |
+| Simple Q&A | 1,500 | Hand-crafted basic facts | Custom |
+| **Total** | **32,491** | | |
+## Training Details
+```yaml
+Base Checkpoint: PebbleLM-117M
+Epochs: 5
+Batch Size: 48
+Gradient Accumulation: 2
+Learning Rate: 5e-5
+Final Training Loss: 1.55
+Hardware: NVIDIA A100 80GB
+Training Time: ~40 minutes
+```
+## Benchmark Results
+| Benchmark | Base Model | Chat Model | Change |
+|-----------|------------|------------|--------|
+| HellaSwag | 32.20% | 31.80% | -0.4% |
+| ARC-Easy | 35.80% | 40.00% | **+4.2%** |
+| WinoGrande | 52.80% | 49.20% | -3.6% |
+| PIQA | 58.20% | 56.00% | -2.2% |
+| **Average** | **44.75%** | **44.25%** | -0.5% |
+**Note:** Slight benchmark decrease is expected - model is optimized for Q&A quality, not reasoning benchmarks. The real improvement is in conversational responses.
+## Usage
+### Installation
+```bash
+pip install torch tokenizers huggingface_hub
+# Clone model architecture code
+git clone https://github.com/nameissakthi/slm-qualcomm
+cd slm-qualcomm
+```
+### Download Model
+```python
+from huggingface_hub import hf_hub_download
+# Download model files
+model_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M-Chat", filename="model.pt")
+tokenizer_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M-Chat", filename="tokenizer.json")
+```
+### Load Model
+```python
+import torch
+from tokenizers import Tokenizer
+from src.model.transformer import SLMForCausalLM
+from src.model.config import SLMConfig
+# Load tokenizer
+tokenizer = Tokenizer.from_file(tokenizer_path)
+# Load model
+config = SLMConfig(vocab_size=16384)
+model = SLMForCausalLM(config)
+state_dict = torch.load(model_path, map_location="cpu")
+if "model_state_dict" in state_dict:
+    state_dict = state_dict["model_state_dict"]
+model.load_state_dict(state_dict)
+model.eval()
+```
+### Prompt Format
+```
+<|user|>
+Your question here
+<|assistant|>
+```
+### Generate Response
+```python
+def generate(prompt, max_tokens=128, temperature=0.3):
+    formatted = f"<|user|>\n{prompt}\n<|assistant|>\n"
+    input_ids = torch.tensor([tokenizer.encode(formatted).ids])
+    with torch.no_grad():
+        for _ in range(max_tokens):
+            logits = model(input_ids).logits[:, -1, :]
+            logits = logits / temperature
+            probs = torch.softmax(logits, dim=-1)
+            next_token = torch.multinomial(probs, 1)
+            input_ids = torch.cat([input_ids, next_token], dim=-1)
+            # Stop on EOS or user token
+            if next_token.item() in [tokenizer.token_to_id("<|eos|>"),
+                                      tokenizer.token_to_id("<|user|>")]:
+                break
+    response = tokenizer.decode(input_ids[0].tolist())
+    return response.split("<|assistant|>")[-1].replace("<|eos|>", "").strip()
+# Example
+print(generate("What is the capital of France?"))
+# Output: The capital of France is Paris.
+print(generate("What is 2 + 2?"))
+# Output: 2 + 2 equals 4.
+```
+### Recommended Settings
+```python
+temperature = 0.3        # Lower = more consistent
+top_k = 50               # Limit token choices
+top_p = 0.9              # Nucleus sampling
+repetition_penalty = 1.2 # Reduce repetition
+max_tokens = 128         # Keep responses short
+```
+## Intended Use
+**Appropriate for:**
+- Edge deployment demos
+- Simple Q&A applications
+- Educational purposes
+- IoT/embedded device experiments
+**Not recommended for:**
+- Production chatbots
+- Factual accuracy-critical applications
+- Complex multi-turn conversations
+## Limitations
+- **~60% accuracy** on simple factual questions
+- **Inconsistent** on complex or unusual questions
+- **May hallucinate** incorrect facts
+- **English only**
+- **117M parameters** limits knowledge capacity
+For production quality, consider 1B+ parameter models.
+## Model Files
+| File | Description |
+|------|-------------|
+| `model.pt` | PyTorch model weights |
+| `config.json` | Model configuration |
+| `tokenizer.json` | BPE tokenizer |
+| `tokenizer_config.json` | Tokenizer configuration |
+## Citation
+```bibtex
+@misc{pebblellmchat2026,
+  author = {Sakthivel},
+  title = {PebbleLM-117M-Chat: A Small Conversational Language Model},
+  year = {2026},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/nameissakthi/PebbleLM-117M-Chat}}
+}
+```
+## Acknowledgments
+### Training Data
+- [Wikipedia](https://huggingface.co/datasets/wikipedia) - Wikimedia Foundation
+- [OpenWebText](https://huggingface.co/datasets/openwebtext) - Aaron Gokaslan and Vanya Cohen
+- [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) - Ronen Eldan and Yuanzhi Li
+- [Alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) - Yahoo Research
+- [Databricks Dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) - Databricks
+### Infrastructure
+- Google Cloud Platform (A100 GPU)
+- Weights & Biases (experiment tracking)
+### Frameworks
+- PyTorch
+- Hugging Face Tokenizers
+## License
+MIT License

config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+  "model_type": "pebblellm",
+  "architectures": ["PebbleLMForCausalLM"],
+  "vocab_size": 16384,
+  "hidden_size": 1024,
+  "num_hidden_layers": 8,
+  "num_attention_heads": 16,
+  "head_dim": 64,
+  "intermediate_size": 4096,
+  "max_position_embeddings": 1024,
+  "rope_theta": 10000.0,
+  "rms_norm_eps": 1e-6,
+  "tie_word_embeddings": true,
+  "hidden_act": "gelu",
+  "dropout": 0.0,
+  "attention_dropout": 0.0,
+  "torch_dtype": "float16",
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 0
+}

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0c615f561cd6e88d06db3a8009a5e98c1b879ae97cde55d6863151b603cc5e4
+size 469854989

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "vocab_size": 16384,
+  "pad_token": "<|pad|>",
+  "bos_token": "<|bos|>",
+  "eos_token": "<|eos|>",
+  "unk_token": "<|unk|>",
+  "user_token": "<|user|>",
+  "assistant_token": "<|assistant|>"
+}