| | --- |
| | license: mit |
| | language: |
| | - en |
| | tags: |
| | - code |
| | - markdown |
| | - tiny |
| | - small |
| | - quick |
| | - fast |
| | - 28M |
| | - mistral |
| | - text-generation-inference |
| | --- |
| | |
| | # **Mini-MD** |
| |
|
| | Mini-MD is a **\~28M parameter transformer-decoder** trained on \~200k markdown files from Github. |
| |
|
| | ## Architecture |
| |
|
| | | Key | Value | |
| | | :---: | :---: | |
| | | `hidden_size` | 384 | |
| | | `num_layers` | 8 | |
| | | `num_heads` | 6 | |
| | | `num_kv_heads` | 2 | |
| | | `head_dim` | 64 | |
| | | `intermediate_size` | 1536 | |
| | | `vocab_size` | 14002 | |
| | | `sliding_window` | 640 | |
| | | `rope_theta` | 10000.0 | |
| | | `tie_embeddings` | True | |
| | | `total_params` | 28061568 | |
| |
|
| | ## Training |
| |
|
| | ### Training Parameters |
| |
|
| | | Key | Value | |
| | | :---: | :---: | |
| | | `num_epochs` | 3 | |
| | | `batch_size` | 5 | |
| | | `stride` | 620 | |
| | | `seq_len` | 640 | |
| | | `val_split` | 0.09 | |
| | | `learning_rate` | 2e-4 | |
| |
|
| | ### Training Results |
| |
|
| | | `train_loss` | `val_loss` | `step` | `epoch` | |
| | | :---: | :---: | :---: | :---: | |
| | | 6.8138 | 5.7706 | 1200 | 0.02 | |
| | | 2.4274 | 2.5915 | 12000 | 0.24 | |
| | | 2.1519 | 2.2091 | 30000 | 0.59 | |
| | | 2.0411 | 2.0464 | 48000 | 0.95 | |
| | | 1.7728 | 1.8912 | 84000 | 1.66 | |
| | | 1.7304 | 1.8494 | 100800 | 1.99 | |
| | | 1.6394 | 1.7599 | 132000 | 2.60 | |
| | | 1.6794 | 1.7234 | 151200 | 2.98 | |
| |
|
| | ### Hardware |
| |
|
| | GPU: One singular NVIDIA RTX 2060 6GB (14GB Shared RAM). |
| | CPU: Ryzen 5 2600 |
| | RAM: 16GB |
| |
|
| | ## Generations |
| |
|
| | Input: |
| |
|
| | ``` |
| | # README |
| | ``` |
| |
|
| | Output: |
| |
|
| | ``` |
| | Cover Evolution |
| | |
| | Official documentation for dis curated documentation: cosu: |
| | |
| | **README.md** (Recommended for reference) |
| | - **Repository**: |
| | - **Bloomberg**: |
| | |
| | **Welcome to Open Questions about Apache Dis curriculum** |
| | - **Security:** All files are securely required through social media. |
| | - **International (FONTLOG)**: Only include relevant code and resources when necessary to see new or better info. |
| | ``` |
| |
|
| | --- |
| |
|
| | Input: |
| |
|
| | ``` |
| | # Getting Started with Rust |
| | ``` |
| |
|
| | Output: |
| |
|
| | ``` |
| | You need a Rust crate for your operating system or platform environment, but you can |
| | install the Rust organization for Rust programs, and it's enough you'd really want it |
| | to work with your operating system. |
| | |
| | ### Building Tools |
| | |
| | #### FFI Compatibility |
| | |
| | Rust's Rust has a compiler that can handle the following: |
| | - Using a toolchain with the AVX-based kernels |
| | - FFI Compatibility on ARM for iOS, iOS, Robot, etc |
| | - FFI Support from Linux for XPC platforms |
| | ``` |
| |
|
| | --- |
| |
|
| | Input: |
| |
|
| | ``` |
| | ## Installation |
| | ``` |
| |
|
| | Output: |
| |
|
| | ``` |
| | in this project |
| | |
| | For building the Heft and French benchmarks, there are two popular tools you can use to build |
| | various benchmarks: |
| | |
| | * [Jim]( and [iL2]( |
| | * [Lucid]( is supported, with [t5]( the |
| | [`LuxSrc32`]( CLI to get a better benchmark |
| | |
| | ## Usage in this project: |
| | |
| | Npm install |
| | [LuxSrc32]( It is recommended to use precompiled GopherLM files |
| | ``` |
| |
|
| | --- |
| |
|
| | Input: |
| |
|
| | ``` |
| | The quick brown fox |
| | ``` |
| |
|
| | Output: |
| |
|
| | ``` |
| | ispepper is the skype of the games and you just want to generate them to your be. |
| | To use this you will need a Galicu sensor that has 2x 3x2 pesus. |
| | |
| | The other four mandaracters share it - you want the German light/highest food wey no |
| | your browses for game. Geometric, like grove, grove, and bus are pencilumo. |
| | ``` |
| |
|
| | --- |
| |
|
| | Input: |
| |
|
| | ``` |
| | Hello world |
| | ``` |
| |
|
| | Output: |
| |
|
| | ``` |
| | I can find a brown, I know it may need the full-fledip data into my phone |
| | buying you... |
| | my first time I went my email, we're going through it. |
| | You should now see you. |
| | Thank you the wonderful! Thanks! It's very cool when the dog would bed. |
| | We can't go through the world until you've started it! |
| | ``` |
| |
|
| | --- |
| |
|
| | Input: |
| |
|
| | ``` |
| | Once upon a time, |
| | ``` |
| |
|
| | Output: |
| |
|
| | ``` |
| | all data are moved to any place the user is aware of their operation and then removed. |
| | |
| | When the following is present: |
| | |
| | * The ability to move state through the same service, which allows |
| | the ability of clients to move state across a network interface. |
| | |
| | During a [DaemonSet controller](/docs/concepts/cluster-administration/deployment/), |
| | all webhooks of the DaemonSet controller may be invoked from a user through a |
| | "Service Cluster" resource without any actions. |
| | ``` |
| |
|
| | ## Capabilities & Restrictions |
| |
|
| | ### Capabilities |
| |
|
| | - Continues GitHub-style markdown documents plausibly, particularly README sections, changelogs, installation guides, and API documentation |
| | - Produces syntactically well-formed code blocks across multiple languages (Python, Rust, Go, C++, JavaScript) |
| | - Sustains a single topic for several paragraphs when the prompt closely matches training distribution (e.g. `# Getting Started with <common language>`) |
| |
|
| | ### Restrictions |
| |
|
| | - Not an instruction-following model — treats all input as a document prefix to continue, not a query to answer |
| | - Out-of-distribution prompts (natural language, fiction, conversation) produce incoherent or nonsensical output |
| | - Prone to topic drift over longer generations, gradually sliding into unrelated documentation |
| | - Prone to repetition loops, particularly on short or ambiguous prompts |
| | - Generates hallucinated URLs, package names, library names, and version numbers with no grounding |
| | - Multilingual output may appear mid-generation, inherited from non-English READMEs in the training corpus; coherence in non-English output is lower than English |
| | - Not suitable for any production use |
| |
|
| | ## Inference |
| |
|
| | ```python |
| | #!/usr/bin/env python3 |
| | """ |
| | Tiny Mistral REPL demo — streaming tokens (TextStreamer if available, else manual sampling). |
| | Commands: :quit, :help, :show, :set <param> <value> (max_new_tokens, temperature, top_p, full_output) |
| | """ |
| | from __future__ import annotations |
| | import shlex |
| | import time |
| | import torch |
| | from typing import Optional |
| | |
| | from transformers import AutoTokenizer, MistralForCausalLM |
| | |
| | # --------- CONFIG ---------- |
| | MODEL_DIR = "Harley-ml/Mini-MD" |
| | TOKENIZER_DIR = MODEL_DIR |
| | DEFAULT_MAX_NEW_TOKENS = 640 |
| | DEFAULT_TEMPERATURE = 0.9 |
| | DEFAULT_TOP_P = 1.4 |
| | DEVICE = "cuda" if torch.cuda.is_available() else "cpu" |
| | PROMPT = ">>> " |
| | # --------------------------- |
| | |
| | def load_tokenizer(path: str): |
| | print("Loading tokenizer...", path) |
| | tok = AutoTokenizer.from_pretrained(path, use_fast=True, local_files_only=False) |
| | if tok.pad_token is None: |
| | if getattr(tok, "eos_token", None) is not None: |
| | tok.add_special_tokens({"pad_token": tok.eos_token}) |
| | else: |
| | tok.add_special_tokens({"pad_token": "<pad>", "eos_token": "</s>"}) |
| | print("Tokenizer ready. vocab_size=", getattr(tok, "vocab_size", "N/A")) |
| | return tok |
| | |
| | def load_model(path: str, device: str): |
| | print("Loading model...", path) |
| | model = None |
| | try: |
| | desired_dtype = torch.float16 if device.startswith("cuda") else torch.float32 |
| | model = MistralForCausalLM.from_pretrained(path, local_files_only=False, dtype=desired_dtype) |
| | print("Loaded with dtype arg.") |
| | except TypeError: |
| | model = MistralForCausalLM.from_pretrained(path, local_files_only=False) |
| | print("Loaded without dtype; will convert.") |
| | except Exception as e: |
| | print("Load warning, retrying without dtype:", e) |
| | model = MistralForCausalLM.from_pretrained(path, local_files_only=False) |
| | |
| | try: |
| | model.to(device) |
| | if device.startswith("cuda") and next(model.parameters()).dtype != torch.float16: |
| | model.half() |
| | if not device.startswith("cuda") and next(model.parameters()).dtype != torch.float32: |
| | model.to(torch.float32) |
| | except Exception as e: |
| | print("Model move/convert warning:", e) |
| | |
| | model.config.pad_token_id = getattr(model.config, "pad_token_id", None) |
| | model.eval() |
| | return model |
| | |
| | # Simple nucleus/top-p filtering for a single logits vector |
| | def top_p_filtering(logits: torch.Tensor, top_p: float, min_keep: int = 1) -> torch.Tensor: |
| | if top_p <= 0 or top_p >= 1.0: |
| | return logits |
| | sorted_logits, sorted_idx = torch.sort(logits, descending=True) |
| | probs = torch.softmax(sorted_logits, dim=-1) |
| | cumprobs = torch.cumsum(probs, dim=-1) |
| | cutoff = (cumprobs > top_p).nonzero(as_tuple=False) |
| | if cutoff.numel() > 0: |
| | idx = int(cutoff[0].item()) |
| | cutoff_idx = max(idx + 1, min_keep) |
| | else: |
| | cutoff_idx = sorted_logits.size(-1) |
| | mask = torch.ones_like(sorted_logits, dtype=torch.bool) |
| | mask[cutoff_idx:] = False |
| | filtered = sorted_logits.masked_fill(~mask, -float("inf")) |
| | return torch.empty_like(filtered).scatter_(0, sorted_idx, filtered) |
| | |
| | # Manual streaming generator (single-batch) |
| | def manual_stream_generate(model, tokenizer, prompt: str, device: str, |
| | max_new_tokens: int = 64, temperature: float = 1.0, top_p: float = 0.9, |
| | eos_token_id: Optional[int] = None): |
| | inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False) |
| | input_ids = inputs["input_ids"].to(device) |
| | attention_mask = inputs.get("attention_mask", None) |
| | if attention_mask is not None: |
| | attention_mask = attention_mask.to(device) |
| | |
| | past = None |
| | with torch.no_grad(): |
| | out = model(input_ids=input_ids, attention_mask=attention_mask, use_cache=True) |
| | past = getattr(out, "past_key_values", None) |
| | |
| | # start sampling tokens |
| | next_input = input_ids[:, -1:].to(device) if past is not None else input_ids.to(device) |
| | for _ in range(max_new_tokens): |
| | with torch.no_grad(): |
| | out = model(input_ids=next_input, past_key_values=past, use_cache=True) |
| | logits = out.logits[:, -1, :] # (batch, vocab) |
| | past = getattr(out, "past_key_values", past) |
| | |
| | if temperature != 1.0: |
| | logits = logits / max(temperature, 1e-8) |
| | |
| | filtered = top_p_filtering(logits[0].cpu(), top_p).to(device) |
| | probs = torch.nn.functional.softmax(filtered.unsqueeze(0), dim=-1) |
| | next_token = torch.multinomial(probs, num_samples=1) |
| | token_id = int(next_token[0, 0].item()) |
| | |
| | token_text = tokenizer.decode([token_id], clean_up_tokenization_spaces=False) |
| | yield token_id, token_text |
| | |
| | if eos_token_id is not None and token_id == eos_token_id: |
| | break |
| | next_input = torch.tensor([[token_id]], dtype=torch.long, device=device) |
| | |
| | def has_text_streamer(): |
| | try: |
| | from transformers import TextStreamer # type: ignore |
| | return True |
| | except Exception: |
| | return False |
| | |
| | # tiny REPL state |
| | class State: |
| | def __init__(self): |
| | self.max_new_tokens = DEFAULT_MAX_NEW_TOKENS |
| | self.temperature = DEFAULT_TEMPERATURE |
| | self.top_p = DEFAULT_TOP_P |
| | self.full_output = False |
| | self.stream = True |
| | |
| | def handle_generation(model, tokenizer, prompt: str, device: str, state: State): |
| | eos = getattr(tokenizer, "eos_token_id", None) |
| | try: |
| | if has_text_streamer(): |
| | from transformers import TextStreamer |
| | streamer = TextStreamer(tokenizer, skip_prompt=not state.full_output, skip_special_tokens=True) |
| | inputs = tokenizer(prompt, return_tensors="pt", truncation=True, add_special_tokens=False) |
| | inputs = {k: v.to(device) for k, v in inputs.items() if isinstance(v, torch.Tensor)} |
| | inputs.pop("token_type_ids", None) |
| | model.generate(**inputs, |
| | max_new_tokens=state.max_new_tokens, |
| | do_sample=True, |
| | temperature=state.temperature, |
| | top_p=state.top_p, |
| | pad_token_id=tokenizer.pad_token_id, |
| | eos_token_id=tokenizer.eos_token_id, |
| | streamer=streamer) |
| | print("") # newline after streamer |
| | return |
| | # fallback: manual streaming |
| | gen = manual_stream_generate(model, tokenizer, prompt, device, |
| | max_new_tokens=state.max_new_tokens, |
| | temperature=state.temperature, |
| | top_p=state.top_p, |
| | eos_token_id=eos) |
| | if state.full_output: |
| | print("PROMPT:", prompt) |
| | print("GENERATING:", end=" ", flush=True) |
| | else: |
| | print("GENERATING:", end=" ", flush=True) |
| | |
| | count = 0 |
| | t0 = time.time() |
| | for _tok_id, tok_text in gen: |
| | count += 1 |
| | print(tok_text, end="", flush=True) |
| | print() |
| | print(f"(generated {count} tokens in {time.time()-t0:.2f}s)") |
| | except KeyboardInterrupt: |
| | print("\n[interrupted] Generation aborted by user.") |
| | except Exception as e: |
| | print("Generation error:", e) |
| | |
| | def repl(model, tokenizer, device): |
| | state = State() |
| | help_text = ( |
| | "Commands:\n" |
| | " :quit\n" |
| | " :help\n" |
| | " :show\n" |
| | " :set <param> <value> # params: max_new_tokens, temperature, top_p, full_output, stream\n" |
| | " (blank line repeats last prompt)\n" |
| | ) |
| | print("Tiny Mistral REPL — device:", device) |
| | print(help_text) |
| | last = "" |
| | while True: |
| | try: |
| | raw = input(PROMPT).strip() |
| | except (EOFError, KeyboardInterrupt): |
| | print("\nExiting.") |
| | break |
| | if not raw: |
| | raw = last |
| | if not raw: |
| | continue |
| | |
| | if raw.startswith(":"): |
| | toks = shlex.split(raw) |
| | cmd = toks[0].lower() |
| | if cmd == ":quit": |
| | print("bye.") |
| | break |
| | if cmd == ":help": |
| | print(help_text); continue |
| | if cmd == ":show": |
| | print(f"max_new_tokens={state.max_new_tokens}, temperature={state.temperature}, top_p={state.top_p}, full_output={state.full_output}, stream={state.stream}") |
| | continue |
| | if cmd == ":set": |
| | if len(toks) < 3: |
| | print("usage: :set <param> <value>"); continue |
| | k, v = toks[1], toks[2] |
| | try: |
| | if k == "max_new_tokens": |
| | state.max_new_tokens = int(v) |
| | elif k == "temperature": |
| | state.temperature = float(v) |
| | elif k == "top_p": |
| | state.top_p = float(v) |
| | elif k in ("full_output", "full"): |
| | state.full_output = v.lower() in ("1", "true", "yes", "y") |
| | elif k == "stream": |
| | state.stream = v.lower() in ("1", "true", "yes", "y") |
| | else: |
| | print("unknown param:", k) |
| | continue |
| | print("OK.") |
| | except Exception as e: |
| | print("set error:", e) |
| | continue |
| | print("unknown command") |
| | continue |
| | |
| | last = raw |
| | if state.stream: |
| | handle_generation(model, tokenizer, raw, device, state) |
| | else: |
| | # non-streaming generate |
| | try: |
| | inputs = tokenizer(raw, return_tensors="pt", truncation=True, add_special_tokens=False) |
| | inputs = {k: v.to(device) for k, v in inputs.items() if isinstance(v, torch.Tensor)} |
| | inputs.pop("token_type_ids", None) |
| | out = model.generate(**inputs, |
| | max_new_tokens=state.max_new_tokens, |
| | do_sample=True, |
| | temperature=state.temperature, |
| | top_p=state.top_p, |
| | pad_token_id=tokenizer.pad_token_id, |
| | eos_token_id=tokenizer.eos_token_id) |
| | seq = out[0] |
| | input_len = inputs["input_ids"].shape[1] if "input_ids" in inputs else 0 |
| | text = tokenizer.decode(seq if state.full_output else seq[input_len:], skip_special_tokens=True) |
| | print("\nOUTPUT\n", text) |
| | except Exception as e: |
| | print("Generation failed:", e) |
| | |
| | def main(): |
| | device = DEVICE |
| | tokenizer = load_tokenizer(TOKENIZER_DIR) |
| | model = load_model(MODEL_DIR, device) |
| | repl(model, tokenizer, device) |
| | |
| | if __name__ == "__main__": |
| | main() |
| | ``` |