Harley-ml commited on
Commit
31618c2
·
verified ·
1 Parent(s): 65c1746

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +492 -3
README.md CHANGED
@@ -1,3 +1,492 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - code
7
+ - markdown
8
+ - tiny
9
+ - small
10
+ - quick
11
+ - fast
12
+ - 28M
13
+ - mistral
14
+ - text-generation-inference
15
+ ---
16
+
17
+ # **Mini-MD**
18
+
19
+ Mini-MD is a **\~28M parameter transformer-decoder** trained on \~200k markdown files from Github.
20
+
21
+ ## Architecture
22
+
23
+ | Key | Value |
24
+ | :---: | :---: |
25
+ | `hidden_size` | 384 |
26
+ | `num_layers` | 8 |
27
+ | `num_heads` | 6 |
28
+ | `num_kv_heads` | 2 |
29
+ | `head_dim` | 64 |
30
+ | `intermediate_size` | 1536 |
31
+ | `vocab_size` | 14002 |
32
+ | `sliding_window` | 640 |
33
+ | `rope_theta` | 10000.0 |
34
+ | `tie_embeddings` | True |
35
+ | `total_params` | 28061568 |
36
+
37
+ ## Training
38
+
39
+ ### Training Parameters
40
+
41
+ | Key | Value |
42
+ | :---: | :---: |
43
+ | `num_epochs` | 3 |
44
+ | `batch_size` | 5 |
45
+ | `stride` | 620 |
46
+ | `seq_len` | 640 |
47
+ | `val_split` | 0.09 |
48
+ | `learning_rate` | 2e-4 |
49
+
50
+ ### Training Results
51
+
52
+ | `train_loss` | `val_loss` | `step` | `epoch` |
53
+ | :---: | :---: | :---: | :---: |
54
+ | 6.8138 | 5.7706 | 1200 | 0.02 |
55
+ | 2.4274 | 2.5915 | 12000 | 0.24 |
56
+ | 2.1519 | 2.2091 | 30000 | 0.59 |
57
+ | 2.0411 | 2.0464 | 48000 | 0.95 |
58
+ | 1.7728 | 1.8912 | 84000 | 1.66 |
59
+ | 1.7304 | 1.8494 | 100800 | 1.99 |
60
+ | 1.6394 | 1.7599 | 132000 | 2.60 |
61
+ | 1.6794 | 1.7234 | 151200 | 2.98 |
62
+
63
+ ### Hardware
64
+
65
+ GPU: One singular NVIDIA RTX 2060 6GB (14GB Shared RAM).
66
+ CPU: Ryzen 5 2600
67
+ RAM: 16GB
68
+
69
+ ## Generations
70
+
71
+ Input:
72
+
73
+ ```
74
+ # README
75
+ ```
76
+
77
+ Output:
78
+
79
+ ```
80
+ Cover Evolution
81
+
82
+ Official documentation for dis curated documentation: cosu:
83
+
84
+ **README.md** (Recommended for reference)
85
+ - **Repository**:
86
+ - **Bloomberg**:
87
+
88
+ **Welcome to Open Questions about Apache Dis curriculum**
89
+ - **Security:** All files are securely required through social media.
90
+ - **International (FONTLOG)**: Only include relevant code and resources when necessary to see new or better info.
91
+ ```
92
+
93
+ ---
94
+
95
+ Input:
96
+
97
+ ```
98
+ # Getting Started with Rust
99
+ ```
100
+
101
+ Output:
102
+
103
+ ```
104
+ You need a Rust crate for your operating system or platform environment, but you can
105
+ install the Rust organization for Rust programs, and it's enough you'd really want it
106
+ to work with your operating system.
107
+
108
+ ### Building Tools
109
+
110
+ #### FFI Compatibility
111
+
112
+ Rust's Rust has a compiler that can handle the following:
113
+ - Using a toolchain with the AVX-based kernels
114
+ - FFI Compatibility on ARM for iOS, iOS, Robot, etc
115
+ - FFI Support from Linux for XPC platforms
116
+ ```
117
+
118
+ ---
119
+
120
+ Input:
121
+
122
+ ```
123
+ ## Installation
124
+ ```
125
+
126
+ Output:
127
+
128
+ ```
129
+ in this project
130
+
131
+ For building the Heft and French benchmarks, there are two popular tools you can use to build
132
+ various benchmarks:
133
+
134
+ * [Jim]( and [iL2](
135
+ * [Lucid]( is supported, with [t5]( the
136
+ [`LuxSrc32`]( CLI to get a better benchmark
137
+
138
+ ## Usage in this project:
139
+
140
+ Npm install
141
+ [LuxSrc32]( It is recommended to use precompiled GopherLM files
142
+ ```
143
+
144
+ ---
145
+
146
+ Input:
147
+
148
+ ```
149
+ The quick brown fox
150
+ ```
151
+
152
+ Output:
153
+
154
+ ```
155
+ ispepper is the skype of the games and you just want to generate them to your be.
156
+ To use this you will need a Galicu sensor that has 2x 3x2 pesus.
157
+
158
+ The other four mandaracters share it - you want the German light/highest food wey no
159
+ your browses for game. Geometric, like grove, grove, and bus are pencilumo.
160
+ ```
161
+
162
+ ---
163
+
164
+ Input:
165
+
166
+ ```
167
+ Hello world
168
+ ```
169
+
170
+ Output:
171
+
172
+ ```
173
+ I can find a brown, I know it may need the full-fledip data into my phone
174
+ buying you...
175
+ my first time I went my email, we're going through it.
176
+ You should now see you.
177
+ Thank you the wonderful! Thanks! It's very cool when the dog would bed.
178
+ We can't go through the world until you've started it!
179
+ ```
180
+
181
+ ---
182
+
183
+ Input:
184
+
185
+ ```
186
+ Once upon a time,
187
+ ```
188
+
189
+ Output:
190
+
191
+ ```
192
+ all data are moved to any place the user is aware of their operation and then removed.
193
+
194
+ When the following is present:
195
+
196
+ * The ability to move state through the same service, which allows
197
+ the ability of clients to move state across a network interface.
198
+
199
+ During a [DaemonSet controller](/docs/concepts/cluster-administration/deployment/),
200
+ all webhooks of the DaemonSet controller may be invoked from a user through a
201
+ "Service Cluster" resource without any actions.
202
+ ```
203
+
204
+ ## Capabilities & Restrictions
205
+
206
+ ### Capabilities
207
+
208
+ - Continues GitHub-style markdown documents plausibly, particularly README sections, changelogs, installation guides, and API documentation
209
+ - Produces syntactically well-formed code blocks across multiple languages (Python, Rust, Go, C++, JavaScript)
210
+ - Sustains a single topic for several paragraphs when the prompt closely matches training distribution (e.g. `# Getting Started with <common language>`)
211
+
212
+ ### Restrictions
213
+
214
+ - Not an instruction-following model — treats all input as a document prefix to continue, not a query to answer
215
+ - Out-of-distribution prompts (natural language, fiction, conversation) produce incoherent or nonsensical output
216
+ - Prone to topic drift over longer generations, gradually sliding into unrelated documentation
217
+ - Prone to repetition loops, particularly on short or ambiguous prompts
218
+ - Generates hallucinated URLs, package names, library names, and version numbers with no grounding
219
+ - Multilingual output may appear mid-generation, inherited from non-English READMEs in the training corpus; coherence in non-English output is lower than English
220
+ - Not suitable for any production use
221
+
222
+ ## Inference
223
+
224
+ ```python
225
+ #!/usr/bin/env python3
226
+ """
227
+ Tiny Mistral REPL demo — streaming tokens (TextStreamer if available, else manual sampling).
228
+ Commands: :quit, :help, :show, :set <param> <value> (max_new_tokens, temperature, top_p, full_output)
229
+ """
230
+ from __future__ import annotations
231
+ import shlex
232
+ import time
233
+ import torch
234
+ from typing import Optional
235
+
236
+ from transformers import AutoTokenizer, MistralForCausalLM
237
+
238
+ # --------- CONFIG ----------
239
+ MODEL_DIR = "Harley-ml/Mini-MD"
240
+ TOKENIZER_DIR = MODEL_DIR
241
+ DEFAULT_MAX_NEW_TOKENS = 640
242
+ DEFAULT_TEMPERATURE = 0.9
243
+ DEFAULT_TOP_P = 0.9
244
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
245
+ PROMPT = ">>> "
246
+ # ---------------------------
247
+
248
+ def load_tokenizer(path: str):
249
+ print("Loading tokenizer...", path)
250
+ tok = AutoTokenizer.from_pretrained(path, use_fast=True, local_files_only=False)
251
+ if tok.pad_token is None:
252
+ if getattr(tok, "eos_token", None) is not None:
253
+ tok.add_special_tokens({"pad_token": tok.eos_token})
254
+ else:
255
+ tok.add_special_tokens({"pad_token": "<pad>", "eos_token": "</s>"})
256
+ print("Tokenizer ready. vocab_size=", getattr(tok, "vocab_size", "N/A"))
257
+ return tok
258
+
259
+ def load_model(path: str, device: str):
260
+ print("Loading model...", path)
261
+ model = None
262
+ try:
263
+ desired_dtype = torch.float16 if device.startswith("cuda") else torch.float32
264
+ model = MistralForCausalLM.from_pretrained(path, local_files_only=False, dtype=desired_dtype)
265
+ print("Loaded with dtype arg.")
266
+ except TypeError:
267
+ model = MistralForCausalLM.from_pretrained(path, local_files_only=False)
268
+ print("Loaded without dtype; will convert.")
269
+ except Exception as e:
270
+ print("Load warning, retrying without dtype:", e)
271
+ model = MistralForCausalLM.from_pretrained(path, local_files_only=False)
272
+
273
+ try:
274
+ model.to(device)
275
+ if device.startswith("cuda") and next(model.parameters()).dtype != torch.float16:
276
+ model.half()
277
+ if not device.startswith("cuda") and next(model.parameters()).dtype != torch.float32:
278
+ model.to(torch.float32)
279
+ except Exception as e:
280
+ print("Model move/convert warning:", e)
281
+
282
+ model.config.pad_token_id = getattr(model.config, "pad_token_id", None)
283
+ model.eval()
284
+ return model
285
+
286
+ # Simple nucleus/top-p filtering for a single logits vector
287
+ def top_p_filtering(logits: torch.Tensor, top_p: float, min_keep: int = 1) -> torch.Tensor:
288
+ if top_p <= 0 or top_p >= 1.0:
289
+ return logits
290
+ sorted_logits, sorted_idx = torch.sort(logits, descending=True)
291
+ probs = torch.softmax(sorted_logits, dim=-1)
292
+ cumprobs = torch.cumsum(probs, dim=-1)
293
+ cutoff = (cumprobs > top_p).nonzero(as_tuple=False)
294
+ if cutoff.numel() > 0:
295
+ idx = int(cutoff[0].item())
296
+ cutoff_idx = max(idx + 1, min_keep)
297
+ else:
298
+ cutoff_idx = sorted_logits.size(-1)
299
+ mask = torch.ones_like(sorted_logits, dtype=torch.bool)
300
+ mask[cutoff_idx:] = False
301
+ filtered = sorted_logits.masked_fill(~mask, -float("inf"))
302
+ return torch.empty_like(filtered).scatter_(0, sorted_idx, filtered)
303
+
304
+ # Manual streaming generator (single-batch)
305
+ def manual_stream_generate(model, tokenizer, prompt: str, device: str,
306
+ max_new_tokens: int = 64, temperature: float = 1.0, top_p: float = 0.9,
307
+ eos_token_id: Optional[int] = None):
308
+ inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
309
+ input_ids = inputs["input_ids"].to(device)
310
+ attention_mask = inputs.get("attention_mask", None)
311
+ if attention_mask is not None:
312
+ attention_mask = attention_mask.to(device)
313
+
314
+ past = None
315
+ with torch.no_grad():
316
+ out = model(input_ids=input_ids, attention_mask=attention_mask, use_cache=True)
317
+ past = getattr(out, "past_key_values", None)
318
+
319
+ # start sampling tokens
320
+ next_input = input_ids[:, -1:].to(device) if past is not None else input_ids.to(device)
321
+ for _ in range(max_new_tokens):
322
+ with torch.no_grad():
323
+ out = model(input_ids=next_input, past_key_values=past, use_cache=True)
324
+ logits = out.logits[:, -1, :] # (batch, vocab)
325
+ past = getattr(out, "past_key_values", past)
326
+
327
+ if temperature != 1.0:
328
+ logits = logits / max(temperature, 1e-8)
329
+
330
+ filtered = top_p_filtering(logits[0].cpu(), top_p).to(device)
331
+ probs = torch.nn.functional.softmax(filtered.unsqueeze(0), dim=-1)
332
+ next_token = torch.multinomial(probs, num_samples=1)
333
+ token_id = int(next_token[0, 0].item())
334
+
335
+ token_text = tokenizer.decode([token_id], clean_up_tokenization_spaces=False)
336
+ yield token_id, token_text
337
+
338
+ if eos_token_id is not None and token_id == eos_token_id:
339
+ break
340
+ next_input = torch.tensor([[token_id]], dtype=torch.long, device=device)
341
+
342
+ def has_text_streamer():
343
+ try:
344
+ from transformers import TextStreamer # type: ignore
345
+ return True
346
+ except Exception:
347
+ return False
348
+
349
+ # tiny REPL state
350
+ class State:
351
+ def __init__(self):
352
+ self.max_new_tokens = DEFAULT_MAX_NEW_TOKENS
353
+ self.temperature = DEFAULT_TEMPERATURE
354
+ self.top_p = DEFAULT_TOP_P
355
+ self.full_output = False
356
+ self.stream = True
357
+
358
+ def handle_generation(model, tokenizer, prompt: str, device: str, state: State):
359
+ eos = getattr(tokenizer, "eos_token_id", None)
360
+ try:
361
+ if has_text_streamer():
362
+ from transformers import TextStreamer
363
+ streamer = TextStreamer(tokenizer, skip_prompt=not state.full_output, skip_special_tokens=True)
364
+ inputs = tokenizer(prompt, return_tensors="pt", truncation=True, add_special_tokens=False)
365
+ inputs = {k: v.to(device) for k, v in inputs.items() if isinstance(v, torch.Tensor)}
366
+ inputs.pop("token_type_ids", None)
367
+ model.generate(**inputs,
368
+ max_new_tokens=state.max_new_tokens,
369
+ do_sample=True,
370
+ temperature=state.temperature,
371
+ top_p=state.top_p,
372
+ pad_token_id=tokenizer.pad_token_id,
373
+ eos_token_id=tokenizer.eos_token_id,
374
+ streamer=streamer)
375
+ print("") # newline after streamer
376
+ return
377
+ # fallback: manual streaming
378
+ gen = manual_stream_generate(model, tokenizer, prompt, device,
379
+ max_new_tokens=state.max_new_tokens,
380
+ temperature=state.temperature,
381
+ top_p=state.top_p,
382
+ eos_token_id=eos)
383
+ if state.full_output:
384
+ print("PROMPT:", prompt)
385
+ print("GENERATING:", end=" ", flush=True)
386
+ else:
387
+ print("GENERATING:", end=" ", flush=True)
388
+
389
+ count = 0
390
+ t0 = time.time()
391
+ for _tok_id, tok_text in gen:
392
+ count += 1
393
+ print(tok_text, end="", flush=True)
394
+ print()
395
+ print(f"(generated {count} tokens in {time.time()-t0:.2f}s)")
396
+ except KeyboardInterrupt:
397
+ print("\n[interrupted] Generation aborted by user.")
398
+ except Exception as e:
399
+ print("Generation error:", e)
400
+
401
+ def repl(model, tokenizer, device):
402
+ state = State()
403
+ help_text = (
404
+ "Commands:\n"
405
+ " :quit\n"
406
+ " :help\n"
407
+ " :show\n"
408
+ " :set <param> <value> # params: max_new_tokens, temperature, top_p, full_output, stream\n"
409
+ " (blank line repeats last prompt)\n"
410
+ )
411
+ print("Tiny Mistral REPL — device:", device)
412
+ print(help_text)
413
+ last = ""
414
+ while True:
415
+ try:
416
+ raw = input(PROMPT).strip()
417
+ except (EOFError, KeyboardInterrupt):
418
+ print("\nExiting.")
419
+ break
420
+ if not raw:
421
+ raw = last
422
+ if not raw:
423
+ continue
424
+
425
+ if raw.startswith(":"):
426
+ toks = shlex.split(raw)
427
+ cmd = toks[0].lower()
428
+ if cmd == ":quit":
429
+ print("bye.")
430
+ break
431
+ if cmd == ":help":
432
+ print(help_text); continue
433
+ if cmd == ":show":
434
+ print(f"max_new_tokens={state.max_new_tokens}, temperature={state.temperature}, top_p={state.top_p}, full_output={state.full_output}, stream={state.stream}")
435
+ continue
436
+ if cmd == ":set":
437
+ if len(toks) < 3:
438
+ print("usage: :set <param> <value>"); continue
439
+ k, v = toks[1], toks[2]
440
+ try:
441
+ if k == "max_new_tokens":
442
+ state.max_new_tokens = int(v)
443
+ elif k == "temperature":
444
+ state.temperature = float(v)
445
+ elif k == "top_p":
446
+ state.top_p = float(v)
447
+ elif k in ("full_output", "full"):
448
+ state.full_output = v.lower() in ("1", "true", "yes", "y")
449
+ elif k == "stream":
450
+ state.stream = v.lower() in ("1", "true", "yes", "y")
451
+ else:
452
+ print("unknown param:", k)
453
+ continue
454
+ print("OK.")
455
+ except Exception as e:
456
+ print("set error:", e)
457
+ continue
458
+ print("unknown command")
459
+ continue
460
+
461
+ last = raw
462
+ if state.stream:
463
+ handle_generation(model, tokenizer, raw, device, state)
464
+ else:
465
+ # non-streaming generate
466
+ try:
467
+ inputs = tokenizer(raw, return_tensors="pt", truncation=True, add_special_tokens=False)
468
+ inputs = {k: v.to(device) for k, v in inputs.items() if isinstance(v, torch.Tensor)}
469
+ inputs.pop("token_type_ids", None)
470
+ out = model.generate(**inputs,
471
+ max_new_tokens=state.max_new_tokens,
472
+ do_sample=True,
473
+ temperature=state.temperature,
474
+ top_p=state.top_p,
475
+ pad_token_id=tokenizer.pad_token_id,
476
+ eos_token_id=tokenizer.eos_token_id)
477
+ seq = out[0]
478
+ input_len = inputs["input_ids"].shape[1] if "input_ids" in inputs else 0
479
+ text = tokenizer.decode(seq if state.full_output else seq[input_len:], skip_special_tokens=True)
480
+ print("\nOUTPUT\n", text)
481
+ except Exception as e:
482
+ print("Generation failed:", e)
483
+
484
+ def main():
485
+ device = DEVICE
486
+ tokenizer = load_tokenizer(TOKENIZER_DIR)
487
+ model = load_model(MODEL_DIR, device)
488
+ repl(model, tokenizer, device)
489
+
490
+ if __name__ == "__main__":
491
+ main()
492
+ ```