hydffgg commited on
Commit
2ac8e38
·
verified ·
1 Parent(s): 65f7325

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +76 -0
  2. config.json +32 -0
  3. generation_config.json +9 -0
  4. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ language:
4
+ - en
5
+ license: apache-2.0
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - llama
10
+ - causal-lm
11
+ - code-generation
12
+ - lightweight
13
+ - 3.08B
14
+ base_model:
15
+ - Qwen/Qwen2.5-Coder-3B-Instruct
16
+ ---
17
+
18
+ <p align="center">
19
+ <img alt="HOS-OSS-3.08B" src="https://huggingface.co/hydffgg/HOS-OSS-1.54B/resolve/main/HOS-OSS-270M.png">
20
+ </p>
21
+
22
+
23
+ # HOS-OSS-3.08B
24
+
25
+ HOS-OSS-3.08B is a lightweight 3.08B parameter causal language model optimized for text and code generation tasks.
26
+ It is designed for fast inference, low resource usage, and local deployment.
27
+
28
+ ---
29
+
30
+ ## 🚀 Overview
31
+
32
+ - **Model size:** ~3.08B parameters
33
+ - **Architecture:** LLaMA-style decoder-only transformer
34
+ - **Base model:** Qwen2.5-Coder-3B-Instruct (distilled / adapted)
35
+ - **Framework:** 🤗 Transformers
36
+ - **Use cases:**
37
+ - Code generation
38
+ - Instruction following
39
+ - Chat-style completion
40
+ - Lightweight local AI assistant
41
+
42
+ ---
43
+
44
+ ## ⚡ Features
45
+
46
+ - Fast inference on low-end GPUs
47
+ - Runs on Kaggle / Colab without large VRAM
48
+ - Suitable for edge deployment
49
+ - Clean instruction-response formatting
50
+
51
+ ---
52
+
53
+ ## 🧠 Example Usage
54
+
55
+ ```python
56
+ from transformers import AutoTokenizer, AutoModelForCausalLM
57
+ import torch
58
+
59
+ model_name = "hydffgg/HOS-OSS-3.08B"
60
+
61
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
62
+ model = AutoModelForCausalLM.from_pretrained(model_name)
63
+
64
+ prompt = "User: Write a Python Hello World
65
+ Assistant:"
66
+
67
+ inputs = tokenizer(prompt, return_tensors="pt")
68
+
69
+ with torch.no_grad():
70
+ outputs = model.generate(
71
+ **inputs,
72
+ max_new_tokens=512,
73
+ temperature=0.7
74
+ )
75
+
76
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "dtype": "float32",
9
+ "eos_token_id": 2,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 3072,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 8192,
15
+ "max_position_embeddings": 2048,
16
+ "mlp_bias": false,
17
+ "model_type": "llama",
18
+ "num_attention_heads": 24,
19
+ "num_hidden_layers": 28,
20
+ "num_key_value_heads": 24,
21
+ "pad_token_id": null,
22
+ "pretraining_tp": 1,
23
+ "rms_norm_eps": 1e-06,
24
+ "rope_parameters": {
25
+ "rope_theta": 10000.0,
26
+ "rope_type": "default"
27
+ },
28
+ "tie_word_embeddings": false,
29
+ "transformers_version": "5.2.0",
30
+ "use_cache": true,
31
+ "vocab_size": 32000
32
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "output_attentions": false,
6
+ "output_hidden_states": false,
7
+ "transformers_version": "5.2.0",
8
+ "use_cache": true
9
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36a783429a57fb5f06e205b47ddfcf68abfe893c773c9e757e57abca76993182
3
+ size 13470737168