Arthur Samuel Galego Panucci FIgueiredo commited on
Commit
7a8e531
·
verified ·
1 Parent(s): 3569347

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -3
README.md CHANGED
@@ -1,3 +1,65 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - pt
5
+ pipeline_tag: text-generation
6
+ ---
7
+
8
+ # MiniText-v1.0
9
+
10
+ MiniText-v1.0 is a minimal character-level language model trained from scratch
11
+ to learn basic Portuguese text patterns.
12
+
13
+ This project in the future will explore all the language modeling limits:
14
+ reasoning
15
+ math
16
+ code
17
+ (ALL WITH 10K PARAMETERS)
18
+
19
+ This project explores the lower limits of language modeling:
20
+ how small can a neural network be and still produce coherent text?
21
+
22
+ ## Model details
23
+
24
+ - Architecture: custom MiniText (character-level)
25
+ - Parameters: 10k (educational scale)
26
+ - Training data: synthetic Portuguese dataset
27
+ - Training objective: next-character prediction
28
+ - Language: Portuguese (basic)
29
+
30
+
31
+ ## What this model can do
32
+
33
+ - Generate simple Portuguese words and sentences
34
+ - Learn grammatical structure
35
+ - Mix domains (language + math) as a base model
36
+
37
+ ## What this model is NOT
38
+
39
+ - Not a chatbot
40
+ - Not instruction-tuned
41
+ - Not reasoning-capable
42
+ - Not safe for production use
43
+
44
+ This is a **base model** intended for research, experimentation, and education.
45
+
46
+ ## Example output
47
+
48
+ Input:
49
+ o gato é
50
+
51
+ Output (example):
52
+ o gato é um animal
53
+
54
+ ## How to run inference
55
+
56
+ python infer.py
57
+
58
+ License
59
+ MIT
60
+
61
+ Training Environment
62
+ CPU - AMD Ryzen 5 5600G 32GB
63
+ Epochs - 12000
64
+
65
+ Made by: Arthur Samuel(loboGOAT)