Buckets:

PLTAT/Tools / PLTAT_Chat_Server_with__Cloud_v2.ipynb
welyjesch's picture
download
raw
19.4 kB
{
"cells": [
{
"cell_type": "markdown",
"source": [
"\n",
"# Philippine Languages Translation and AI Training Community\n",
"\n",
"This organization is dedicated to the development of high-performance natural language processing (NLP) architectures for the major and regional languages of the Philippines. Our objective is to bridge the digital divide for low-resource languages through state-of-the-art model alignment, knowledge distillation, and the deployment of efficient, edge-ready AI models.\n",
"\n",
"\n",
"<details>\n",
"<summary>Click to view our Technical Roadmap </summary>\n",
"\n",
"## Technical Roadmap\n",
"\n",
"### Phase 1: Foundation Model Alignment and NMT Parity\n",
"**Objective:** Finetune large-scale transformer architectures (Llama 3.1/3.2 series) to achieve Neural Machine Translation (NMT) parity with commercial benchmarks for the eight major Philippine languages.\n",
"* **Technical Detail:** Implementation of Supervised Fine-Tuning (SFT) using high-quality parallel corpora and instruction-tuning datasets. This phase utilizes QLoRA and full-parameter tuning to optimize for Tagalog, Cebuano, Ilocano, Hiligaynon, Bicolano, Waray, Kapampangan, and Pangasinan.\n",
"* **Milestone:** Validated \"Teacher\" models capable of high-fidelity translation and complex instruction following, serving as the performance baseline for subsequent distillation.\n",
"\n",
"### Phase 2: Knowledge Distillation and Synthetic Corpus Generation\n",
"**Objective:** Utilize Phase 1 models as high-capacity Teacher models to generate high-density synthetic training data for low-resource linguistic variants.\n",
"* **Technical Detail:** Leveraging the Teacher models to perform Knowledge Distillation (KD) by generating synthetic instruction-response pairs and reasoning chains. This mitigates the scarcity of organic digital text in regional dialects and provides the required data density for training smaller student architectures without performance degradation.\n",
"* **Milestone:** A comprehensive multi-language synthetic dataset optimized for training sub-3B parameter models.\n",
"\n",
"### Phase 3: LFM 2.5 Implementation and Specialized Specialization\n",
"**Objective:** Train and specialize Liquid Foundation Model (LFM) 2.5 architectures to create lightweight, language-specific models.\n",
"* **Technical Detail:** Transitioning from standard Transformers to LFM 2.5 allows for linear scaling and reduced memory footprints. We use the distilled datasets from Phase 2 to train \"Student\" models that replicate the output distribution of the larger Llama models. Final optimization includes Direct Preference Optimization (DPO) to refine cultural and grammatical nuance for each specific language.\n",
"* **Milestone:** A suite of specialized, deployment-ready models (1.2B to 3B parameters) optimized for edge computing and local hardware integration.\n",
"\n",
"</details>\n",
"---\n",
"\n",
"## Stakeholder Engagement and Collaboration\n",
"\n",
"<details>\n",
" \n",
"<summary>\n",
"The community is actively seeking institutional and technical stakeholders to assist in the scaling, adoption, and operationalization of these models.\n",
"</summary>\n",
" \n",
"### Call for Partners\n",
"* **Compute Provisioning:** We are seeking partners to provide GPU resources (A100/H100 clusters) required for the heavy compute cycles in Phase 1 and Phase 2.\n",
"* **Domain-Specific Finetuning:** We invite organizations to adopt and finetune our existing foundation models for specialized sectors, including legal, medical, and governmental services.\n",
"* **Validation and Evaluation:** We are looking for academic and linguistic experts to conduct rigorous human evaluation and Red Teaming to ensure model safety and linguistic accuracy across regional variants.\n",
"* **Deployment Integration:** We seek partners interested in integrating these lightweight models into mobile applications or environments with limited connectivity.\n",
"\n",
"Interested parties may reach out via the Hugging Face discussion board or review our current repository of model weights and datasets.\n",
"\n",
"</details>"
],
"metadata": {
"id": "-NMUgRUSwJd4"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "PDOL1O-ttc3d",
"collapsed": true
},
"outputs": [],
"source": [
"#@title โš™๏ธ Step 1: Install Ollama, Ngrok, and QR Code Generator\n",
"!sudo apt-get install -y zstd lshw\n",
"!curl -fsSL https://ollama.com/install.sh | sh\n",
"!pip install ollama pyngrok qrcode[pil]\n",
"\n",
"import os\n",
"import subprocess\n",
"import time\n",
"\n",
"# Ensure the server knows to use the GPU\n",
"os.environ[\"OLLAMA_INTEL_GPU\"] = \"0\"\n",
"os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"\n",
"\n",
"# CRITICAL: Allow CORS so the React web app can communicate with Ollama\n",
"os.environ[\"OLLAMA_ORIGINS\"] = \"*\"\n",
"\n",
"\n",
"\n",
"print(\"โœ… Ollama server started with GPU environment variables and CORS enabled!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "616695c3"
},
"outputs": [],
"source": [
"#@title Run Ollama Server\n",
"import subprocess\n",
"import os\n",
"import time\n",
"import ollama\n",
"\n",
"# Set environment variables to allow outside connections and CORS\n",
"os.environ[\"OLLAMA_ORIGINS\"] = \"*\"\n",
"os.environ[\"OLLAMA_HOST\"] = \"0.0.0.0\"\n",
"\n",
"# Start ollama serve as a detached background process\n",
"with open(\"ollama.log\", \"w\") as log_file:\n",
" subprocess.Popen([\"ollama\", \"serve\"], stdout=log_file, stderr=log_file, preexec_fn=os.setpgrp)\n",
"\n",
"print(\"โณ Waiting 10 seconds for Ollama server to initialize...\")\n",
"time.sleep(5)\n",
"print(\"โœ… Ollama is now running!\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "YK3NDlxHU106"
},
"outputs": [],
"source": [
"#@title Optional: Cloud models for GPU-less App testing only (Needs Ollama Account)\n",
"print(\"Ignore the e rror message related to XDG, this is because running this in browser does not auto-open the link.\")\n",
"!ollama signin\n",
"input('Press enter if you have finished the sign-in process using the link generated.')\n",
"cloud_models = [\n",
" \"nemotron-3-super:cloud\",\n",
" \"gemini-3-flash-preview:cloud\",\n",
" \"glm-4.6:cloud\",\n",
" \"deepseek-v3.2:cloud\",\n",
" \"glm-4.7:cloud\",\n",
" \"kimi-k2.5:cloud\",\n",
" \"minimax-m2.7:cloud\",\n",
" \"gpt-oss:120b-cloud\",\n",
" \"devstral-small-2:24b-cloud\",\n",
" \"ministral-3:14b-cloud\",\n",
" \"glm-5:cloud\",\n",
" \"qwen3.5:397b-cloud\",\n",
" \"minimax-m2:cloud\",\n",
" \"minimax-m2.5:cloud\",\n",
" \"qwen3-coder-next:cloud\",\n",
" \"qwen2.5-coder:3b\"\n",
"]\n",
"\n",
"print(\"๐Ÿ“ฅ Starting to pull cloud models...\")\n",
"for model in cloud_models:\n",
" print(f\"Pulling {model}...\")\n",
" try:\n",
" ollama.pull(model)\n",
" print(f\"โœ… Success: {model}\")\n",
" except Exception as e:\n",
" print(f\"โŒ Failed to pull {model}: {e}\")\n",
"\n",
"print(\"โœ… All cloud models processed!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "GyDo2pwktlmC",
"collapsed": true
},
"outputs": [],
"source": [
"# @title โš™โ€ Model Selector & Downloader (Select only 1 for fastest inference)\n",
"import ollama\n",
"\n",
"def ensure_model(model_name):\n",
" \"\"\"Checks if a model exists locally, pulls it if not.\"\"\"\n",
" print(f\"๐Ÿ” Checking for model: {model_name}...\")\n",
" try:\n",
" ollama.show(model_name)\n",
" print(f\"โœ… Model '{model_name}' is already installed.\")\n",
" except ollama.ResponseError:\n",
" print(f\"๐Ÿ“ฅ Model not found. Starting download for '{model_name}'...\")\n",
" for progress in ollama.pull(model_name, stream=True):\n",
" if 'completed' in progress and 'total' in progress:\n",
" percent = (progress['completed'] / progress['total']) * 100\n",
" print(f\"\\rDownloading: {percent:.1f}%\", end=\"\")\n",
" print(f\"\\nโœ… Successfully pulled {model_name}\")\n",
"\n",
"# --- PLTAT FT Models ---\n",
"# @markdown ### ๐Ÿท๏ธ PLTAT FT Models\n",
"filipino = True # @param {type:\"boolean\"}\n",
"kapampangan = False # @param {type:\"boolean\"}\n",
"bicolano = False # @param {type:\"boolean\"}\n",
"pangasinan = False # @param {type:\"boolean\"}\n",
"cebuano = False # @param {type:\"boolean\"}\n",
"ilocano = False # @param {type:\"boolean\"}\n",
"hiligaynon = False # @param {type:\"boolean\"}\n",
"translate_gemma = False # @param {type:\"boolean\"}\n",
"\n",
"# Map variable names to their actual Ollama model strings\n",
"model_map = {\n",
" \"welyjesch/filipino-llama-3:8b-q8_0\": filipino,\n",
" \"welyjesch/kapampangan-llama-3:8b-q8_0\": kapampangan,\n",
" \"welyjesch/bicolano-llama-3:8b-q8_0\": bicolano,\n",
" \"welyjesch/pangasinan-llama-3:8b-q8_0\": pangasinan,\n",
" \"welyjesch/cebuano-llama-3:8b-q8_0\": cebuano,\n",
" \"welyjesch/ilocano-llama-3:8b-q8_0\": ilocano,\n",
" \"welyjesch/hiligaynon-llama-3:8b-q8_0\": hiligaynon,\n",
" \"translategemma:4b\": translate_gemma\n",
"}\n",
"\n",
"# Iterate and ensure each selected model is pulled\n",
"selected_count = 0\n",
"for model_name, is_selected in model_map.items():\n",
" if is_selected:\n",
" ensure_model(model_name)\n",
" selected_count += 1\n",
"\n",
"if selected_count == 0:\n",
" print(\"โš โ€ No models selected. Please check at least one model in the form.\")\n",
"else:\n",
" print(f\"\\nโœ… Done! {selected_count} model(s) are ready to use.\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-f7eJhEwvhuS"
},
"source": [
"IMPORTANT:\n",
"You will need an Ngrok authtoken to use the next cell. You can get one by signing up for a free account at ngrok.com. Once you signed up, go to the Auth Token section to get an authtoken key."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "ZMesiuh7tpFm"
},
"outputs": [],
"source": [
"### Cell 3: Expose via Ngrok & Generate QR Code\n",
"#@title โ˜โ€ Step 3: Expose Server & Generate QR Code\n",
"from pyngrok import ngrok\n",
"import qrcode\n",
"from IPython.display import display\n",
"\n",
"# @markdown Enter your ngrok authtoken (get it from https://dashboard.ngrok.com/get-started/your-authtoken)\n",
"ngrok_authtoken = \"\" # @param {type:\"string\"}\n",
"\n",
"if ngrok_authtoken:\n",
" ngrok.set_auth_token(ngrok_authtoken)\n",
"else:\n",
" print(\"โ˜€โ€ WARNING: No ngrok authtoken provided. The tunnel might time out or fail to start.\")\n",
" print(\"Get your free token at https://dashboard.ngrok.com/get-started/your-authtoken\")\n",
"\n",
"# Terminate any existing ngrok tunnels\n",
"ngrok.kill()\n",
"\n",
"# Open a tunnel to the Ollama port (11434)\n",
"public_url = ngrok.connect(11434).public_url\n",
"print(f\"โœ”โ€ Public URL: {public_url}\")\n",
"print(\"โœ’โ€ Scan the QR code below using the PLTAT PHILCHAT APP (settings->provider settings (QR button after input box):\")\n",
"\n",
"# Generate QR code\n",
"qr = qrcode.QRCode(\n",
" version=1,\n",
" error_correction=qrcode.constants.ERROR_CORRECT_L,\n",
" box_size=10,\n",
" border=4,\n",
")\n",
"qr.add_data(public_url)\n",
"qr.make(fit=True)\n",
"\n",
"img = qr.make_image(fill_color=\"black\", back_color=\"white\")\n",
"display(img)\n",
"\n",
"# Create a test HTML file to verify the endpoint\n",
"html_content = f\"\"\"\n",
"<!DOCTYPE html>\n",
"<html>\n",
"<head><title>Ollama API Test</title></head>\n",
"<body>\n",
" <h1>Ollama API Test</h1>\n",
" <p>Target: <b>{public_url}</b></p>\n",
" <button onclick=\"testConnection()\">Test Connection</button>\n",
" <pre id=\"output\">Click the button to test...</pre>\n",
" <script>\n",
" async function testConnection() {{\n",
" const output = document.getElementById('output');\n",
" output.innerText = \"Sending request...\";\n",
" try {{\n",
" const response = await fetch('{public_url}/api/tags', {{\n",
" headers: {{\n",
" 'ngrok-skip-browser-warning': 'true'\n",
" }}\n",
" }});\n",
" const data = await response.json();\n",
" output.innerText = \"Success! Found models:\\\\n\" + JSON.stringify(data, null, 2);\n",
" }} catch (err) {{\n",
" output.innerText = \"Error: \" + err.message + \"\\\\nCheck if CORS is enabled and ngrok is running.\";\n",
" }}\n",
" }}\n",
" </script>\n",
"</body>\n",
"</html>\n",
"\"\"\"\n",
"\n",
"with open(\"test_ollama.html\", \"w\") as f:\n",
" f.write(html_content)\n",
"\n",
"print(\"โœ… Updated 'test_ollama.html' with ngrok skip-warning header!\")"
]
},
{
"cell_type": "code",
"source": [
"#@title Optional Step 4: Chat with the server\n",
"import requests\n",
"import json\n",
"\n",
"# @markdown Toggle this to connect to the local server (0.0.0.0) if Ngrok is unavailable\n",
"use_local_server = True # @param {type:\"boolean\"}\n",
"# @markdown Pls. take note that locally hosted model such as PLTAT models respond slow esp. the first message (1-3 mins)\n",
"\n",
"def start_chat():\n",
" if use_local_server:\n",
" base_url = \"http://0.0.0.0:11434\"\n",
" print(f\"๐Ÿ“ก Connecting to Local Server: {base_url}\")\n",
" else:\n",
" # Try to get the public_url from the global namespace if it was defined in Step 3\n",
" try:\n",
" base_url = public_url\n",
" print(f\"๐Ÿ“ก Connecting to Ngrok: {base_url}\")\n",
" except NameError:\n",
" base_url = input(\"๐Ÿ”— Enter your Ngrok Public URL (e.g., https://...): \").strip()\n",
"\n",
" if not base_url:\n",
" print(\"โŒ No URL provided. Please run the Ngrok cell first or use local server.\")\n",
" return\n",
"\n",
" # 1. Fetch available models\n",
" try:\n",
" headers = {'ngrok-skip-browser-warning': 'true'} if not use_local_server else {}\n",
" response = requests.get(f\"{base_url}/api/tags\", headers=headers)\n",
" response.raise_for_status()\n",
" models = response.json().get('models', [])\n",
"\n",
" if not models:\n",
" print(\"โš ๏ธ No models found on the server. Please pull a model first.\")\n",
" return\n",
"\n",
" print(\"\\n๐Ÿ“ฆ Available Models:\")\n",
" for i, m in enumerate(models):\n",
" print(f\"[{i}] {m['name']}\")\n",
"\n",
" choice = int(input(\"\\nSelect model index to use: \"))\n",
" selected_model = models[choice]['name']\n",
" print(f\"โœ… Selected: {selected_model}\\n--- Type 'exit' or 'quit' to stop ---\")\n",
"\n",
" except Exception as e:\n",
" print(f\"โŒ Connection Error: {e}\")\n",
" return\n",
"\n",
" # 2. Chat loop\n",
" messages = []\n",
" while True:\n",
" user_input = input(\"๐Ÿ‘ค You: \")\n",
" if user_input.lower() in ['exit', 'quit']:\n",
" print(\"๐Ÿ‘‹ Goodbye!\")\n",
" break\n",
"\n",
" messages.append({'role': 'user', 'content': user_input})\n",
"\n",
" try:\n",
" # Streaming request\n",
" payload = {\"model\": selected_model, \"messages\": messages, \"stream\": False}\n",
" resp = requests.post(f\"{base_url}/api/chat\", json=payload, headers=headers)\n",
" resp.raise_for_status()\n",
"\n",
" reply = resp.json()['message']['content']\n",
" print(f\"๐Ÿค– Assistant: {reply}\")\n",
" messages.append({'role': 'assistant', 'content': reply})\n",
"\n",
" except Exception as e:\n",
" print(f\"\\nโŒ Chat error: {e}\")\n",
" break\n",
"\n",
"if __name__ == \"__main__\":\n",
" start_chat()"
],
"metadata": {
"cellView": "form",
"id": "ukQRNfjA4hFb"
},
"execution_count": null,
"outputs": []
}
],
"metadata": {
"colab": {
"provenance": [],
"gpuType": "T4"
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"nbformat": 4,
"nbformat_minor": 0
}

Xet Storage Details

Size:
19.4 kB
ยท
Xet hash:
487d54924c78ed9138345dc21fd71c5b6730534eb2552ae042cd7b0a9cdfe3fb

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.