File size: 1,650 Bytes
2ac8e38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

---
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- llama
- causal-lm
- code-generation
- lightweight
- 3.08B
base_model:
- Qwen/Qwen2.5-Coder-3B-Instruct
---

<p align="center">
  <img alt="HOS-OSS-3.08B" src="https://huggingface.co/hydffgg/HOS-OSS-1.54B/resolve/main/HOS-OSS-270M.png">
</p>


# HOS-OSS-3.08B

HOS-OSS-3.08B is a lightweight 3.08B parameter causal language model optimized for text and code generation tasks.  
It is designed for fast inference, low resource usage, and local deployment.

---

## 🚀 Overview

- **Model size:** ~3.08B parameters  
- **Architecture:** LLaMA-style decoder-only transformer  
- **Base model:** Qwen2.5-Coder-3B-Instruct (distilled / adapted)  
- **Framework:** 🤗 Transformers  
- **Use cases:**  
  - Code generation  
  - Instruction following  
  - Chat-style completion  
  - Lightweight local AI assistant  

---

## ⚡ Features

- Fast inference on low-end GPUs
- Runs on Kaggle / Colab without large VRAM
- Suitable for edge deployment
- Clean instruction-response formatting

---

## 🧠 Example Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "hydffgg/HOS-OSS-3.08B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "User: Write a Python Hello World
Assistant:"

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))