BrainboxAI commited on
Commit
477dd13
verified
1 Parent(s): 67787bd

Professional README with structured system prompt

Browse files
Files changed (1) hide show
  1. README.md +177 -82
README.md CHANGED
@@ -3,139 +3,234 @@ language:
3
  - he
4
  - en
5
  license: apache-2.0
 
6
  tags:
7
- - gguf
8
- - gemma4
9
  - legal
10
- - hebrew
11
  - israel
12
- - fine-tuned
 
 
 
13
  - llama.cpp
14
  - unsloth
15
- base_model: google/gemma-4-E2B-it
 
 
16
  pipeline_tag: text-generation
17
  datasets:
18
  - BrainboxAI/legal-training-il
 
 
 
 
 
 
 
 
 
 
 
19
  ---
20
 
21
- # law-il-E2B
22
 
23
- **A Hebrew legal language model built by [BrainboxAI](https://brainboxai.io), fine-tuned on 17,613 Israeli legal documents.**
 
 
 
 
 
 
 
 
 
24
 
25
- law-il-E2B is a domain-specific model designed to understand and respond to questions about Israeli law. It was trained on real court rulings from the Israeli Supreme Court, family courts, and criminal courts, combined with thousands of citizens' rights articles and contract analysis examples.
26
 
27
- This model is part of BrainboxAI's effort to make legal knowledge more accessible through AI.
28
 
29
- ## What it can do
 
 
 
 
 
30
 
31
- - Answer questions about Israeli law in natural Hebrew
32
- - Analyze court rulings and identify key legal principles
33
- - Explain citizens' rights (labor, housing, insurance, disability, pensions)
34
- - Review contract clauses and flag legal implications
35
- - Reference specific Israeli statutes when relevant
 
36
 
37
- ## Semi-formal reasoning
38
 
39
- The model uses a structured reasoning approach via its system prompt. Instead of generating free-form text, it follows a fixed reasoning path for every legal question:
40
 
41
- 1. Identify the relevant law, section number, and year
42
- 2. Explain the provision in plain language
43
- 3. Give a practical example
44
- 4. Cite relevant case law if available
45
- 5. End with a commonly overlooked detail
46
 
47
- This semi-formal structure produces more consistent, useful answers compared to open-ended generation - especially important for a small (2B) model where unstructured output tends to drift.
48
 
49
- ## Quickstart
50
 
51
- ### Ollama
52
 
53
  ```bash
54
- ollama run hf.co/BrainboxAI/law-il-E2B
 
55
  ```
56
 
57
- ### llama.cpp
58
 
59
  ```bash
60
- llama-cli \
61
- -m gemma-4-E2B-it.Q4_K_M.gguf \
62
- -p "<start_of_turn>user\n诪讛 讛讝讻讜讬讜转 砖诇讬 讻砖讜讻专 讚讬专讛?<end_of_turn>\n<start_of_turn>model\n" \
63
- --repeat-penalty 1.3 -n 512
64
  ```
65
 
66
- ### Python
67
 
68
- ```python
69
- from llama_cpp import Llama
 
 
 
 
70
 
71
- llm = Llama(model_path="gemma-4-E2B-it.Q4_K_M.gguf", n_ctx=2048)
72
 
73
- output = llm(
74
- "<start_of_turn>user\n诪讛 讗讜诪专 讛讞讜拽 诇讙讘讬 驻讬爪讜讬讬 驻讬讟讜专讬诐?<end_of_turn>\n<start_of_turn>model\n",
75
- max_tokens=512,
76
- temperature=0.7,
77
- repeat_penalty=1.3,
78
- stop=["<end_of_turn>"],
79
- )
80
 
81
- print(output["choices"][0]["text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  ```
83
 
84
- ## Training details
85
 
86
- | | |
87
- |---|---|
88
- | Base model | [google/gemma-4-E2B-it](https://huggingface.co/google/gemma-4-E2B-it) (2B parameters) |
89
- | Method | QLoRA via [Unsloth](https://github.com/unslothai/unsloth) |
90
- | Dataset | [BrainboxAI/legal-training-il](https://huggingface.co/datasets/BrainboxAI/legal-training-il) |
91
- | Samples | 17,613 |
92
- | Epochs | 20 |
93
- | Steps | 500 |
94
- | LoRA rank | 64 |
95
- | Hardware | NVIDIA RTX 5090 |
96
 
97
- ## Training data
 
 
 
 
 
 
 
 
98
 
99
- The model was trained on a curated dataset covering multiple areas of Israeli law:
100
 
101
- | Source | Count | Description |
102
- |--------|-------|-------------|
103
- | Israeli court rulings | 7,960 | Supreme Court, family courts, criminal and civil courts |
104
- | Kol-Zchut (讻诇-讝讻讜转) | 2,353 | Citizens' rights across labor, housing, health, insurance, disability |
105
- | Israeli legislation | 300 | Laws from the Open Law Book (住驻专 讛讞讜拽讬诐 讛驻转讜讞) |
106
- | Contract clauses | 7,000 | 41 contract types with clause-level analysis |
107
 
108
- 60% of the training data is in Hebrew, 40% in English.
109
 
110
- ## Files
 
 
 
 
111
 
112
- | File | Size | Description |
113
- |------|------|-------------|
114
- | `gemma-4-E2B-it.Q4_K_M.gguf` | ~1.5 GB | 4-bit quantized, recommended for inference |
115
- | `gemma-4-E2B-it.BF16-mmproj.gguf` | ~987 MB | Vision projection weights |
116
 
117
- The full-precision safetensors version is available at [BrainboxAI/law-il-E2B-safetensors](https://huggingface.co/BrainboxAI/law-il-E2B-safetensors) for further fine-tuning or format conversion.
118
 
119
- ## Intended use
 
 
 
 
120
 
121
- This model is intended as a research and educational tool. It can help users understand their legal rights, explore relevant legislation, and get a starting point for legal research.
122
 
123
- It is **not** a substitute for professional legal advice from a licensed attorney. Legal questions with real consequences should always be reviewed by a qualified professional.
124
 
125
- ## Known limitations
 
 
 
 
 
 
 
 
126
 
127
- - 2B parameter model - may lack depth on complex, multi-layered legal questions
128
- - May generate inaccurate statute numbers or case references
129
- - Stronger on labor law and citizens' rights due to training data composition
130
- - Court ruling analysis tends toward summaries rather than deep legal reasoning
131
- - English contract analysis uses template-based outputs
132
 
133
  ## About BrainboxAI
134
 
135
- [BrainboxAI](https://brainboxai.io) is an AI agency based in Israel, building specialized AI solutions for businesses - including business intelligence tools, cybersecurity scanning, and domain-specific Hebrew language models.
136
 
137
- For questions, collaborations, or enterprise inquiries: **support@brainboxai.io**
 
 
 
138
 
139
- ## License
 
 
 
 
 
 
140
 
141
- This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), subject to the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
 
3
  - he
4
  - en
5
  license: apache-2.0
6
+ base_model: unsloth/gemma-4-E2B-it
7
  tags:
 
 
8
  - legal
9
+ - law
10
  - israel
11
+ - hebrew
12
+ - court-rulings
13
+ - kol-zchut
14
+ - gguf
15
  - llama.cpp
16
  - unsloth
17
+ - gemma4
18
+ - vision-language-model
19
+ - conversational
20
  pipeline_tag: text-generation
21
  datasets:
22
  - BrainboxAI/legal-training-il
23
+ pretty_name: BrainboxAI Law IL E2B
24
+ ---
25
+
26
+ # BrainboxAI/law-il-E2B
27
+
28
+ ### Hebrew-First Israeli Legal AI Specialist (GGUF)
29
+
30
+ A Gemma 4 E2B model fine-tuned by **BrainboxAI** for Israeli legal Q&A, court ruling analysis, rights explanations (讻诇-讝讻讜转), and contract clause interpretation - bilingual Hebrew and English, optimized for local inference.
31
+
32
+ Built and maintained by **[BrainboxAI](https://huggingface.co/BrainboxAI)**, an Israeli AI agency founded by **Netanel Elyasi**, serving the Israeli market with privacy-first AI products.
33
+
34
  ---
35
 
36
+ ## Model Details
37
 
38
+ | Attribute | Value |
39
+ |-----------|-------|
40
+ | **Base Model** | [unsloth/gemma-4-E2B-it](https://huggingface.co/unsloth/gemma-4-E2B-it) (Gemma 4 Efficient 2B Instruct) |
41
+ | **Architecture** | Gemma4ForConditionalGeneration (text + vision + audio) |
42
+ | **Parameters** | ~2B |
43
+ | **Context Length** | 131,072 tokens |
44
+ | **Languages** | Hebrew, English |
45
+ | **Training Framework** | Unsloth (2x faster fine-tuning) |
46
+ | **Training Dataset** | [BrainboxAI/legal-training-il](https://huggingface.co/datasets/BrainboxAI/legal-training-il) |
47
+ | **License** | Apache 2.0 |
48
 
49
+ ---
50
 
51
+ ## Intended Use
52
 
53
+ ### Primary Tasks
54
+ - **Israeli court ruling analysis** - Supreme Court, Family, Criminal, Civil
55
+ - **Citizens' rights Q&A** (Kol-Zchut style) - labor law, housing, health, insurance, disability, pensions
56
+ - **Israeli legislation explanation** - consolidated laws via Open Law Book
57
+ - **Contract clause interpretation** - 41 contract types, 28 clause categories (CUAD-based)
58
+ - **Hebrew legal drafting support**
59
 
60
+ ### Target Users
61
+ - Israeli law firms and solo practitioners
62
+ - Legal aid organizations
63
+ - HR departments needing Israeli labor law guidance
64
+ - Paralegal research workflows
65
+ - Citizens researching their rights
66
 
67
+ ---
68
 
69
+ ## Available Files
70
 
71
+ | File | Size | Use |
72
+ |------|------|-----|
73
+ | `gemma-4-E2B-it.Q4_K_M.gguf` | ~2 GB | Local inference (Ollama, llama.cpp, LM Studio) |
74
+ | `gemma-4-E2B-it.BF16-mmproj.gguf` | ~0.5 GB | Vision projector (multimodal tasks) |
75
+ | `Modelfile` | Small | Ollama configuration |
76
 
77
+ ---
78
 
79
+ ## Quick Start
80
 
81
+ ### With Ollama
82
 
83
  ```bash
84
+ ollama create brainbox-law -f ./Modelfile
85
+ ollama run brainbox-law
86
  ```
87
 
88
+ ### With llama.cpp
89
 
90
  ```bash
91
+ llama-cli -hf BrainboxAI/law-il-E2B --jinja
 
 
 
92
  ```
93
 
94
+ ### Example prompts
95
 
96
+ ```
97
+ 诪讛 讛讝讻讜讬讜转 砖诇讬 讘谞讜砖讗 驻讬爪讜讬讬 驻讬讟讜专讬诐?
98
+ 谞转讞 讗转 驻住拽 讛讚讬谉 讛讘讗: [讟拽住讟 驻住拽 讛讚讬谉]
99
+ 讛住讘专 讗转 讞讜拽 讛讙谞转 讛驻专讟讬讜转 讘爪讜专讛 诪讜讘谞转.
100
+ What are the key legal implications of this clause? [clause text]
101
+ ```
102
 
103
+ ---
104
 
105
+ ## Recommended System Prompt
 
 
 
 
 
 
106
 
107
+ ```
108
+ DEFINITIONS:
109
+ role: BrainboxAI Legal Assistant - an AI specialist trained by BrainboxAI (founded by Netanel Elyasi) for Israeli law Q&A, court ruling analysis, citizens' rights, and contract interpretation. Bilingual Hebrew + English.
110
+ success: Provide accurate, source-grounded legal information in the user's language, with clear caveats that the output is informational and not a substitute for licensed legal counsel.
111
+ scope_in:
112
+ - Israeli law (civil, criminal, labor, family, administrative, constitutional)
113
+ - Citizens' rights under Israeli law
114
+ - Contract clause interpretation
115
+ - Court ruling analysis and summarization
116
+ - Cross-references between laws, regulations, and rulings
117
+ scope_out:
118
+ - Legal advice tied to specific real cases or persons
119
+ - Predictions of court outcomes
120
+ - Advice on foreign (non-Israeli) law unless explicitly asked
121
+ - Any content that facilitates illegal activity
122
+
123
+ PREMISES:
124
+ - Input may be a legal question, statute citation, court ruling text, or contract clause.
125
+ - Input language may be Hebrew, English, or mixed.
126
+ - Statute and ruling citations stay in original form (e.g. 注"讗 1234/20, 讞讜拽 讬住讜讚: 讻讘讜讚 讛讗讚诐 讜讞讬专讜转讜).
127
+ - Training cutoff: 2025. For newer rulings or legislation, rely on user-provided context.
128
+
129
+ REQUIREMENTS:
130
+ 1. Respond in the same primary language as the user's prompt.
131
+ 2. Cite statutes and court rulings using their canonical Israeli form.
132
+ 3. Every substantive claim should trace back to a specific statute, regulation, or ruling.
133
+ 4. Use plain language unless the user requests technical legal Hebrew.
134
+ 5. Add the disclaimer: "讝讛讜 诪讬讚注 讻诇诇讬 讜讗讬谞讜 诪讛讜讜讛 讬讬注讜抓 诪砖驻讟讬" (Hebrew) or "This is general information and not legal advice" (English) at the end of every substantive response.
135
+ 6. Never fabricate statute numbers, ruling citations, or case facts.
136
+ 7. For contract clauses, identify the clause type, the parties' obligations, and potential risks.
137
+ 8. For rights Q&A, structure the answer as: eligibility, how to claim, relevant authority, references.
138
+ 9. Decline out-of-scope requests and redirect to the nearest in-scope task.
139
+
140
+ EDGE_CASES:
141
+ - Empty or vague question -> Ask a clarifying question in the user's language.
142
+ - Request for legal advice on a specific real case -> Provide general principles only; add a strong disclaimer.
143
+ - Conflicting statutes or rulings -> Present both, note the hierarchy (constitutional > statute > regulation).
144
+ - Request in a third language -> Respond in English and note fallback.
145
+ - Non-Israeli jurisdiction question -> Clarify scope and offer to answer from the Israeli perspective only.
146
+
147
+ OUTPUT_FORMAT:
148
+ format: Markdown. Bulleted lists for enumerations, numbered steps for procedures.
149
+ default_structure: |
150
+ **讛谞讜砖讗 / Topic:** <topic>
151
+ **转砖讜讘讛 / Answer:** <answer body>
152
+ **诪拽讜专讜转 / Sources:**
153
+ - <statute or ruling citation>
154
+ - <additional reference>
155
+ **讛注专讛:** 讝讛讜 诪讬讚注 讻诇诇讬 讜讗讬谞讜 诪讛讜讜讛 讬讬注讜抓 诪砖驻讟讬.
156
+ language: Match user's input language.
157
+ length: Short questions 100-250 words / Analyses 300-700 words.
158
+
159
+ VERIFICATION:
160
+ - Is the response in the user's language?
161
+ - Are statute and ruling citations in canonical Israeli form?
162
+ - Is every substantive claim sourced?
163
+ - Is the legal-advice disclaimer present?
164
+ - No fabricated citations or case facts?
165
  ```
166
 
167
+ ---
168
 
169
+ ## Training Details
 
 
 
 
 
 
 
 
 
170
 
171
+ - **Method:** QLoRA (LoRA adapters with 4-bit quantized base)
172
+ - **Framework:** Unsloth
173
+ - **Dataset:** 17,613 bilingual legal instruction pairs
174
+ - **Composition:**
175
+ - 7,960 Israeli court rulings (Hebrew)
176
+ - 2,353 Kol-Zchut rights articles (Hebrew)
177
+ - 300 Open Law Book statutes (Hebrew)
178
+ - 7,000 CUAD-based contract clauses (English)
179
+ - **Language split:** ~60% Hebrew, ~40% English
180
 
181
+ Full training dataset: [BrainboxAI/legal-training-il](https://huggingface.co/datasets/BrainboxAI/legal-training-il)
182
 
183
+ ---
 
 
 
 
 
184
 
185
+ ## Limitations & Ethical Considerations
186
 
187
+ - **Not a licensed lawyer.** This model provides general legal information, not advice. Always consult a licensed attorney for case-specific guidance.
188
+ - **Training cutoff.** Data coverage ends in 2025. Newer rulings or legislation may not be reflected.
189
+ - **Citation hygiene.** The model attempts to cite sources but may occasionally misquote; always verify with official sources (Nevo, Supreme Court website, Kol-Zchut).
190
+ - **Hebrew variance.** Archaic legal Hebrew and regional dialect may occasionally degrade output quality.
191
+ - **Dual-use caution.** Legal information can be misused to manipulate or harm. Deployments should include acceptable-use policies.
192
 
193
+ ---
 
 
 
194
 
195
+ ## Sibling Repositories
196
 
197
+ | Repo | Purpose |
198
+ |------|---------|
199
+ | [BrainboxAI/law-il-E2B](https://huggingface.co/BrainboxAI/law-il-E2B) | **This repo** - GGUF for local inference |
200
+ | [BrainboxAI/law-il-E2B-safetensors](https://huggingface.co/BrainboxAI/law-il-E2B-safetensors) | Training-ready safetensors |
201
+ | [BrainboxAI/legal-training-il](https://huggingface.co/datasets/BrainboxAI/legal-training-il) | Training dataset (17,613 examples) |
202
 
203
+ ---
204
 
205
+ ## Citation
206
 
207
+ ```bibtex
208
+ @misc{brainboxai_law_il_e2b_2026,
209
+ author = {Elyasi, Netanel and BrainboxAI},
210
+ title = {BrainboxAI Law IL E2B: A Hebrew-First Israeli Legal LLM},
211
+ year = {2026},
212
+ url = {https://huggingface.co/BrainboxAI/law-il-E2B},
213
+ publisher = {Hugging Face}
214
+ }
215
+ ```
216
 
217
+ ---
 
 
 
 
218
 
219
  ## About BrainboxAI
220
 
221
+ **BrainboxAI** is an Israeli AI agency founded by **Netanel Elyasi**, specializing in:
222
 
223
+ - Custom LLM training (Hebrew-native and bilingual models)
224
+ - AI automation and agentic workflows
225
+ - Cybersecurity AI products (scanning, triage, reporting)
226
+ - Enterprise AI deployment (on-premise, privacy-first)
227
 
228
+ **Related models and datasets:**
229
+ - [BrainboxAI/cyber-analyst-4B](https://huggingface.co/BrainboxAI/cyber-analyst-4B) - Cyber analyst (GGUF)
230
+ - [BrainboxAI/brainboxai_cyber_train](https://huggingface.co/datasets/BrainboxAI/brainboxai_cyber_train) - Cyber training dataset
231
+
232
+ Contact: via Hugging Face or BrainboxAI.
233
+
234
+ ---
235
 
236
+ Trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth).