Spaces:

OpceanAI
/

Yuuki

Running

App Files Files Community

Gogs commited on 21 days ago

Commit

29256c1

1 Parent(s): 6f78033

🌸 Professional Yuuki demo with full UI and stats

Browse files

Files changed (3) hide show

README.md +44 -10
app.py +287 -54
requirements.txt +4 -0

README.md CHANGED Viewed

@@ -1,15 +1,49 @@
 ---
-title: Yuuki
-emoji: 💬
-colorFrom: yellow
-colorTo: purple
 sdk: gradio
-sdk_version: 5.42.0
 app_file: app.py
-pinned: false
-hf_oauth: true
-hf_oauth_scopes:
-- inference-api
 ---
-An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

 ---
+title: Yuuki - Mobile-Trained Code Generator
+emoji: 🌸
+colorFrom: purple
+colorTo: pink
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
+pinned: true
+license: apache-2.0
+tags:
+  - code-generation
+  - mobile-training
+  - zero-budget
+  - edge-ml
+  - experimental
 ---
+# 🌸 Yuuki - Mobile-Trained Code Generator
+**First LLM trained entirely on mobile CPU with $0 budget.**
+Try the live demo above!
+## Features
+- ✅ Agda code generation (best performance: 55/100)
+- ⚠️ Limited C, Assembly support
+- 📱 Trained on Snapdragon 685 CPU
+- 💰 Zero computational cost
+- 🔬 Fully documented research experiment
+## Model
+- **Base:** DistilGPT-2 (82M parameters)
+- **Training:** 2,000 steps (5.3% of v0.1)
+- **Hardware:** Mobile CPU only
+- **Time:** ~50 hours continuous
+- **Cost:** $0.00
+## Links
+- 🤗 [Model Card](https://huggingface.co/OpceanAI/Yuuki-best)
+- 📄 Paper (coming soon)
+- 💻 [Training Code](https://github.com/YuuKi-OS/yuuki-training)
+---
+*Proving the barrier to AI is mindset, not money* 🌸

app.py CHANGED Viewed

@@ -1,70 +1,303 @@
 import gradio as gr
-from huggingface_hub import InferenceClient
-def respond(
-    message,
-    history: list[dict[str, str]],
-    system_message,
-    max_tokens,
-    temperature,
-    top_p,
-    hf_token: gr.OAuthToken,
-):
-    """
-    For more information on `huggingface_hub` Inference API support, please check the docs: https://huggingface.co/docs/huggingface_hub/v0.22.2/en/guides/inference
-    """
-    client = InferenceClient(token=hf_token.token, model="openai/gpt-oss-20b")
-    messages = [{"role": "system", "content": system_message}]
-    messages.extend(history)
-    messages.append({"role": "user", "content": message})
-    response = ""
-    for message in client.chat_completion(
-        messages,
-        max_tokens=max_tokens,
-        stream=True,
-        temperature=temperature,
-        top_p=top_p,
-    ):
-        choices = message.choices
-        token = ""
-        if len(choices) and choices[0].delta.content:
-            token = choices[0].delta.content
-        response += token
-        yield response
 """
-For information on how to customize the ChatInterface, peruse the gradio docs: https://www.gradio.app/docs/chatinterface
-"""
-chatbot = gr.ChatInterface(
-    respond,
-    type="messages",
-    additional_inputs=[
-        gr.Textbox(value="You are a friendly Chatbot.", label="System message"),
-        gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
-        gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
-        gr.Slider(
-            minimum=0.1,
-            maximum=1.0,
-            value=0.95,
-            step=0.05,
-            label="Top-p (nucleus sampling)",
-        ),
-    ],
-)
-with gr.Blocks() as demo:
-    with gr.Sidebar():
-        gr.LoginButton()
-    chatbot.render()
 if __name__ == "__main__":
     demo.launch()

 import gradio as gr
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# ============================================================================
+# 🌸 YUUKI - Mobile-Trained Code Generator
+# ============================================================================
+print("🌸 Loading Yuuki model...")
+print("This may take a minute on first load...")
+try:
+    model = AutoModelForCausalLM.from_pretrained(
+        "OpceanAI/Yuuki-best",
+        torch_dtype=torch.float32,
+        low_cpu_mem_usage=True
+    )
+    tokenizer = AutoTokenizer.from_pretrained("OpceanAI/Yuuki-best")
+    print("✅ Model loaded successfully!")
+except Exception as e:
+    print(f"❌ Error loading model: {e}")
+    model = None
+    tokenizer = None
+# ============================================================================
+# Generation Function
+# ============================================================================
+def generate_code(
+    prompt: str,
+    max_length: int = 100,
+    temperature: float = 0.7,
+    top_p: float = 0.9
+) -> str:
+    """Generate code completion using Yuuki."""
+    if model is None or tokenizer is None:
+        return "❌ Model failed to load. Please refresh the page."
+    if not prompt.strip():
+        return "⚠️ Please enter a code prompt."
+    try:
+        inputs = tokenizer(prompt, return_tensors="pt")
+        with torch.no_grad():
+            outputs = model.generate(
+                **inputs,
+                max_length=max_length,
+                temperature=temperature,
+                top_p=top_p,
+                do_sample=True,
+                pad_token_id=tokenizer.eos_token_id,
+                eos_token_id=tokenizer.eos_token_id
+            )
+        generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
+        return generated
+    except Exception as e:
+        return f"❌ Generation error: {str(e)}"
+# ============================================================================
+# Examples
+# ============================================================================
+examples = [
+    # Agda (best language)
+    ["module Main where", 100, 0.7, 0.9],
+    ["open import Data.Nat", 80, 0.7, 0.9],
+    ["data Bool : Set where", 80, 0.7, 0.9],
+    # C (limited but improving)
+    ["int main() {", 80, 0.7, 0.9],
+    ["#include <stdio.h>", 60, 0.7, 0.9],
+    # Python (weak due to dataset ordering)
+    ["def hello():", 60, 0.8, 0.9],
+    ["import numpy as np", 60, 0.7, 0.9],
+]
+# ============================================================================
+# Custom CSS
+# ============================================================================
+custom_css = """
+#title {
+    text-align: center;
+    background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
+    -webkit-background-clip: text;
+    -webkit-text-fill-color: transparent;
+    font-size: 3em;
+    font-weight: bold;
+    margin-bottom: 0.5em;
+}
+#subtitle {
+    text-align: center;
+    font-size: 1.3em;
+    color: #666;
+    margin-bottom: 1em;
+}
+#warning-box {
+    background: linear-gradient(135deg, #fff3cd 0%, #ffe8a1 100%);
+    border-left: 4px solid #ffc107;
+    border-radius: 8px;
+    padding: 20px;
+    margin: 20px 0;
+    box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+}
+#stats-box {
+    background: linear-gradient(135deg, #e7f3ff 0%, #cfe7ff 100%);
+    border-left: 4px solid #2196F3;
+    border-radius: 8px;
+    padding: 20px;
+    margin: 20px 0;
+    box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+}
+#achievement-box {
+    background: linear-gradient(135deg, #f0e8ff 0%, #e1d4ff 100%);
+    border-left: 4px solid #9c27b0;
+    border-radius: 8px;
+    padding: 20px;
+    margin: 20px 0;
+    box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+}
+.gr-button-primary {
+    background: linear-gradient(90deg, #667eea 0%, #764ba2 100%) !important;
+    border: none !important;
+    font-weight: bold !important;
+}
+footer {
+    margin-top: 40px;
+    padding-top: 20px;
+    border-top: 1px solid #ddd;
+}
 """
+# ============================================================================
+# Gradio Interface
+# ============================================================================
+with gr.Blocks(css=custom_css, title="🌸 Yuuki - Mobile-Trained Code Generator", theme=gr.themes.Soft()) as demo:
+    # Header
+    gr.Markdown("<h1 id='title'>🌸 Yuuki</h1>")
+    gr.Markdown("<p id='subtitle'>First LLM Trained Entirely on Mobile CPU | Zero-Budget ML Research</p>")
+    # Warning Box
+    gr.Markdown("""
+    <div id='warning-box'>
+    <h3 style='margin-top:0; color:#856404;'>⚠️ Experimental Research Model</h3>
+    <p style='margin-bottom:0;'>
+    Yuuki was trained on a <strong>smartphone CPU</strong> with <strong>$0 budget</strong>.
+    This is a <strong>proof-of-concept</strong> demonstrating mobile training feasibility,
+    not a production-ready code generator.
+    </p>
+    <br>
+    <p style='margin-bottom:0;'>
+    <strong>Best at:</strong> Agda (55/100) •
+    <strong>Limited:</strong> C (20/100), Assembly (15/100) •
+    <strong>Weak:</strong> Python (8/100)
+    </p>
+    </div>
+    """)
+    # Stats Box
+    gr.Markdown("""
+    <div id='stats-box'>
+    <h3 style='margin-top:0; color:#0d47a1;'>📊 Training Statistics</h3>
+    <p style='margin-bottom:5px;'><strong>Hardware:</strong> Snapdragon 685 (CPU only) | <strong>Steps:</strong> 2,000 / 37,500 (5.3%)</p>
+    <p style='margin-bottom:5px;'><strong>Training Time:</strong> ~50 hours continuous | <strong>Speed:</strong> ~86 sec/step</p>
+    <p style='margin-bottom:5px;'><strong>Loss:</strong> 1.94 | <strong>Cost:</strong> $0.00 | <strong>Quality:</strong> 24.6/100 average</p>
+    <p style='margin-bottom:0;'><strong>Status:</strong> Best checkpoint from early training | <strong>Full v0.1:</strong> Coming March 2026</p>
+    </div>
+    """)
+    # Achievement Box
+    gr.Markdown("""
+    <div id='achievement-box'>
+    <h3 style='margin-top:0; color:#6a1b9a;'>🏆 Community Validation</h3>
+    <p style='margin-bottom:5px;'>✅ <strong>Followed by Gradio team member</strong> - recognized for unique approach</p>
+    <p style='margin-bottom:5px;'>✅ <strong>Liked by mradermacher</strong> - quantization expert validated concept</p>
+    <p style='margin-bottom:0;'>✅ <strong>5+ downloads</strong> - early adopters supporting mobile ML training</p>
+    </div>
+    """)
+    # Main Interface
+    with gr.Row():
+        with gr.Column(scale=1):
+            prompt_input = gr.Textbox(
+                label="💻 Code Prompt",
+                placeholder="module Main where",
+                lines=3,
+                info="Try Agda for best results!"
+            )
+            with gr.Accordion("⚙️ Advanced Settings", open=False):
+                max_length = gr.Slider(
+                    minimum=20,
+                    maximum=200,
+                    value=100,
+                    step=10,
+                    label="Max Length",
+                    info="Maximum tokens to generate"
+                )
+                temperature = gr.Slider(
+                    minimum=0.1,
+                    maximum=1.5,
+                    value=0.7,
+                    step=0.1,
+                    label="Temperature",
+                    info="Higher = more creative, lower = more conservative"
+                )
+                top_p = gr.Slider(
+                    minimum=0.1,
+                    maximum=1.0,
+                    value=0.9,
+                    step=0.05,
+                    label="Top P",
+                    info="Nucleus sampling parameter"
+                )
+            generate_btn = gr.Button("🚀 Generate Code", variant="primary", size="lg")
+        with gr.Column(scale=1):
+            output = gr.Textbox(
+                label="📝 Generated Code",
+                lines=15,
+                show_copy_button=True
+            )
+    # Examples Section
+    gr.Markdown("### 💡 Try These Examples:")
+    gr.Examples(
+        examples=examples,
+        inputs=[prompt_input, max_length, temperature, top_p],
+        outputs=output,
+        fn=generate_code,
+        cache_examples=False,
+        label="Click any example to try it"
+    )
+    # Generate button action
+    generate_btn.click(
+        fn=generate_code,
+        inputs=[prompt_input, max_length, temperature, top_p],
+        outputs=output
+    )
+    # Footer
+    gr.Markdown("""
+    <footer>
+    ### 🌟 About This Project
+    **Yuuki proves that LLM training is accessible** even with zero budget and consumer hardware.
+    **Why this matters:**
+    - 🎓 **Students** without GPU access can experiment with ML training
+    - 🌍 **Democratizes** ML research globally - barriers are mindset, not money
+    - 📱 **Explores** edge ML training possibilities on mobile devices
+    - 🔬 **Documents** complete training journey including failures and recoveries
+    **Training Journey Highlights:**
+    - Step 1,292: Early peak (loss 1.70, quality 31/100)
+    - Step 1,600: Mode collapse (loss 2.41) 💀
+    - Step 1,900: Recovery begins (loss 1.76)
+    - **Step 2,000: Current best** (loss 1.94, quality 24.6/100) ⭐
+    - Steps 2,100-2,500: Bad data zone (<11/100 quality)
+    **Key Finding:** Dataset quality matters more than loss value. Some checkpoints with excellent
+    loss (1.71) had terrible quality (7/100) due to corrupted training data.
+    ---
+    ### 🔗 Links
+    - 🤗 [Yuuki-best Model](https://huggingface.co/OpceanAI/Yuuki-best) - This checkpoint (recommended)
+    - 📜 [Original Yuuki](https://huggingface.co/OpceanAI/Yuuki) - First upload (historical)
+    - ⏳ Yuuki v0.1 Complete - Coming March 2026 (2 full epochs)
+    - 📄 Research Paper - Coming soon
+    - 💻 [Training Code](https://github.com/YuuKi-OS/yuuki-training)
+    ---
+    <p align="center">
+      <i>Built with patience, a phone, and zero budget</i><br>
+      <b>🌸 Proving the barrier to AI is mindset, not money</b><br><br>
+      Made with ❤️ | Powered by <a href="https://gradio.app">Gradio</a> & <a href="https://huggingface.co">HuggingFace</a>
+    </p>
+    </footer>
+    """)
+# Launch
 if __name__ == "__main__":
     demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+gradio==4.44.0
+transformers==4.36.0
+torch==2.1.0
+accelerate==0.25.0