File size: 1,897 Bytes
681909f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
language:
- ko
library_name: pytorch
tags:
- sovyn
- korean
- conversational
- causal-lm
- from-scratch
pipeline_tag: text-generation
---

# SOVYN-300M-Cortex

SOVYN-300M-Cortex is a small Korean conversational model trained from scratch.
It is not a Transformers-compatible checkpoint yet; it uses the custom PyTorch
architecture in `src/sovyn`.

## Current Status

- Parameters: about 300M
- Format: custom PyTorch checkpoint
- Tokenizer: SentencePiece BPE
- Context length: 512 tokens
- Weight dtype in checkpoint: bfloat16
- Main checkpoint: `sovyn_300m_last.pt`

This is an early experimental checkpoint. It can handle short Korean dialogue
patterns, but it is not a broad knowledge model.

## Quick Start

```powershell
python -m venv .venv
.\.venv\Scripts\python.exe -m pip install torch sentencepiece pyyaml tqdm
.\.venv\Scripts\python.exe scripts\chat.py --checkpoint sovyn_300m_last.pt --tokenizer sovyn.model
```

Example:

```text
user: 나 오늘 피곤해
sovyn: 많이 지쳤겠다. 지금은 잠깐 쉬어도 괜찮아.
```

## Ollama-Compatible Local API

This repository includes an Ollama-compatible bridge, but the model is not a
native GGUF Ollama model yet.

```powershell
powershell -ExecutionPolicy Bypass -File scripts\start_ollama_bridge.ps1
```

Then call:

```text
POST http://127.0.0.1:11435/api/chat
```

## Files

- `sovyn_300m_last.pt`: model checkpoint
- `sovyn.model`, `sovyn.vocab`: SentencePiece tokenizer
- `config.yaml`: model and training config
- `src/sovyn`: custom PyTorch architecture and formatting/data helpers
- `scripts/chat.py`: local chat runner
- `scripts/ollama_bridge.py`: Ollama-compatible local API bridge

## Notes

SOVYN-300M-Cortex was trained for short, natural Korean replies. The next major
step is converting the architecture to a standard export format or writing a
GGUF converter so it can be registered as a native Ollama model.