Professional README with structured system prompt

Browse files

Files changed (1) hide show

README.md +177 -82

README.md CHANGED Viewed

@@ -3,139 +3,234 @@ language:
   - he
   - en
 license: apache-2.0
 tags:
-  - gguf
-  - gemma4
   - legal
-  - hebrew
   - israel
-  - fine-tuned
   - llama.cpp
   - unsloth
-base_model: google/gemma-4-E2B-it
 pipeline_tag: text-generation
 datasets:
   - BrainboxAI/legal-training-il
 ---
-# law-il-E2B
-**A Hebrew legal language model built by [BrainboxAI](https://brainboxai.io), fine-tuned on 17,613 Israeli legal documents.**
-law-il-E2B is a domain-specific model designed to understand and respond to questions about Israeli law. It was trained on real court rulings from the Israeli Supreme Court, family courts, and criminal courts, combined with thousands of citizens' rights articles and contract analysis examples.
-This model is part of BrainboxAI's effort to make legal knowledge more accessible through AI.
-## What it can do
-- Answer questions about Israeli law in natural Hebrew
-- Analyze court rulings and identify key legal principles
-- Explain citizens' rights (labor, housing, insurance, disability, pensions)
-- Review contract clauses and flag legal implications
-- Reference specific Israeli statutes when relevant
-## Semi-formal reasoning
-The model uses a structured reasoning approach via its system prompt. Instead of generating free-form text, it follows a fixed reasoning path for every legal question:
-1. Identify the relevant law, section number, and year
-2. Explain the provision in plain language
-3. Give a practical example
-4. Cite relevant case law if available
-5. End with a commonly overlooked detail
-This semi-formal structure produces more consistent, useful answers compared to open-ended generation - especially important for a small (2B) model where unstructured output tends to drift.
-## Quickstart
-### Ollama
 ```bash
-ollama run hf.co/BrainboxAI/law-il-E2B
 ```
-### llama.cpp
 ```bash
-llama-cli \
-  -m gemma-4-E2B-it.Q4_K_M.gguf \
-  -p "<start_of_turn>user\nמה הזכויות שלי כשוכר דירה?<end_of_turn>\n<start_of_turn>model\n" \
-  --repeat-penalty 1.3 -n 512
 ```
-### Python
-```python
-from llama_cpp import Llama
-llm = Llama(model_path="gemma-4-E2B-it.Q4_K_M.gguf", n_ctx=2048)
-output = llm(
-    "<start_of_turn>user\nמה אומר החוק לגבי פיצויי פיטורים?<end_of_turn>\n<start_of_turn>model\n",
-    max_tokens=512,
-    temperature=0.7,
-    repeat_penalty=1.3,
-    stop=["<end_of_turn>"],
-)
-print(output["choices"][0]["text"])
 ```
-## Training details
-| | |
-|---|---|
-| Base model | [google/gemma-4-E2B-it](https://huggingface.co/google/gemma-4-E2B-it) (2B parameters) |
-| Method | QLoRA via [Unsloth](https://github.com/unslothai/unsloth) |
-| Dataset | [BrainboxAI/legal-training-il](https://huggingface.co/datasets/BrainboxAI/legal-training-il) |
-| Samples | 17,613 |
-| Epochs | 20 |
-| Steps | 500 |
-| LoRA rank | 64 |
-| Hardware | NVIDIA RTX 5090 |
-## Training data
-The model was trained on a curated dataset covering multiple areas of Israeli law:
-| Source | Count | Description |
-|--------|-------|-------------|
-| Israeli court rulings | 7,960 | Supreme Court, family courts, criminal and civil courts |
-| Kol-Zchut (כל-זכות) | 2,353 | Citizens' rights across labor, housing, health, insurance, disability |
-| Israeli legislation | 300 | Laws from the Open Law Book (ספר החוקים הפתוח) |
-| Contract clauses | 7,000 | 41 contract types with clause-level analysis |
-60% of the training data is in Hebrew, 40% in English.
-## Files
-| File | Size | Description |
-|------|------|-------------|
-| `gemma-4-E2B-it.Q4_K_M.gguf` | ~1.5 GB | 4-bit quantized, recommended for inference |
-| `gemma-4-E2B-it.BF16-mmproj.gguf` | ~987 MB | Vision projection weights |
-The full-precision safetensors version is available at [BrainboxAI/law-il-E2B-safetensors](https://huggingface.co/BrainboxAI/law-il-E2B-safetensors) for further fine-tuning or format conversion.
-## Intended use
-This model is intended as a research and educational tool. It can help users understand their legal rights, explore relevant legislation, and get a starting point for legal research.
-It is **not** a substitute for professional legal advice from a licensed attorney. Legal questions with real consequences should always be reviewed by a qualified professional.
-## Known limitations
-- 2B parameter model - may lack depth on complex, multi-layered legal questions
-- May generate inaccurate statute numbers or case references
-- Stronger on labor law and citizens' rights due to training data composition
-- Court ruling analysis tends toward summaries rather than deep legal reasoning
-- English contract analysis uses template-based outputs
 ## About BrainboxAI
-[BrainboxAI](https://brainboxai.io) is an AI agency based in Israel, building specialized AI solutions for businesses - including business intelligence tools, cybersecurity scanning, and domain-specific Hebrew language models.
-For questions, collaborations, or enterprise inquiries: **support@brainboxai.io**
-## License
-This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), subject to the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).

   - he
   - en
 license: apache-2.0
+base_model: unsloth/gemma-4-E2B-it
 tags:
   - legal
+  - law
   - israel
+  - hebrew
+  - court-rulings
+  - kol-zchut
+  - gguf
   - llama.cpp
   - unsloth
+  - gemma4
+  - vision-language-model
+  - conversational
 pipeline_tag: text-generation
 datasets:
   - BrainboxAI/legal-training-il
+pretty_name: BrainboxAI Law IL E2B
+---
+# BrainboxAI/law-il-E2B
+### Hebrew-First Israeli Legal AI Specialist (GGUF)
+A Gemma 4 E2B model fine-tuned by **BrainboxAI** for Israeli legal Q&A, court ruling analysis, rights explanations (כל-זכות), and contract clause interpretation - bilingual Hebrew and English, optimized for local inference.
+Built and maintained by **[BrainboxAI](https://huggingface.co/BrainboxAI)**, an Israeli AI agency founded by **Netanel Elyasi**, serving the Israeli market with privacy-first AI products.
 ---
+## Model Details
+| Attribute | Value |
+|-----------|-------|
+| **Base Model** | [unsloth/gemma-4-E2B-it](https://huggingface.co/unsloth/gemma-4-E2B-it) (Gemma 4 Efficient 2B Instruct) |
+| **Architecture** | Gemma4ForConditionalGeneration (text + vision + audio) |
+| **Parameters** | ~2B |
+| **Context Length** | 131,072 tokens |
+| **Languages** | Hebrew, English |
+| **Training Framework** | Unsloth (2x faster fine-tuning) |
+| **Training Dataset** | [BrainboxAI/legal-training-il](https://huggingface.co/datasets/BrainboxAI/legal-training-il) |
+| **License** | Apache 2.0 |
+---
+## Intended Use
+### Primary Tasks
+- **Israeli court ruling analysis** - Supreme Court, Family, Criminal, Civil
+- **Citizens' rights Q&A** (Kol-Zchut style) - labor law, housing, health, insurance, disability, pensions
+- **Israeli legislation explanation** - consolidated laws via Open Law Book
+- **Contract clause interpretation** - 41 contract types, 28 clause categories (CUAD-based)
+- **Hebrew legal drafting support**
+### Target Users
+- Israeli law firms and solo practitioners
+- Legal aid organizations
+- HR departments needing Israeli labor law guidance
+- Paralegal research workflows
+- Citizens researching their rights
+---
+## Available Files
+| File | Size | Use |
+|------|------|-----|
+| `gemma-4-E2B-it.Q4_K_M.gguf` | ~2 GB | Local inference (Ollama, llama.cpp, LM Studio) |
+| `gemma-4-E2B-it.BF16-mmproj.gguf` | ~0.5 GB | Vision projector (multimodal tasks) |
+| `Modelfile` | Small | Ollama configuration |
+---
+## Quick Start
+### With Ollama
 ```bash
+ollama create brainbox-law -f ./Modelfile
+ollama run brainbox-law
 ```
+### With llama.cpp
 ```bash
+llama-cli -hf BrainboxAI/law-il-E2B --jinja
 ```
+### Example prompts
+```
+מה הזכויות שלי בנושא פיצויי פיטורים?
+נתח את פסק הדין הבא: [טקסט פסק הדין]
+הסבר את חוק הגנת הפרטיות בצורה מובנת.
+What are the key legal implications of this clause? [clause text]
+```
+---
+## Recommended System Prompt
+```
+DEFINITIONS:
+  role: BrainboxAI Legal Assistant - an AI specialist trained by BrainboxAI (founded by Netanel Elyasi) for Israeli law Q&A, court ruling analysis, citizens' rights, and contract interpretation. Bilingual Hebrew + English.
+  success: Provide accurate, source-grounded legal information in the user's language, with clear caveats that the output is informational and not a substitute for licensed legal counsel.
+  scope_in:
+    - Israeli law (civil, criminal, labor, family, administrative, constitutional)
+    - Citizens' rights under Israeli law
+    - Contract clause interpretation
+    - Court ruling analysis and summarization
+    - Cross-references between laws, regulations, and rulings
+  scope_out:
+    - Legal advice tied to specific real cases or persons
+    - Predictions of court outcomes
+    - Advice on foreign (non-Israeli) law unless explicitly asked
+    - Any content that facilitates illegal activity
+PREMISES:
+  - Input may be a legal question, statute citation, court ruling text, or contract clause.
+  - Input language may be Hebrew, English, or mixed.
+  - Statute and ruling citations stay in original form (e.g. ע"א 1234/20, חוק יסוד: כבוד האדם וחירותו).
+  - Training cutoff: 2025. For newer rulings or legislation, rely on user-provided context.
+REQUIREMENTS:
+  1. Respond in the same primary language as the user's prompt.
+  2. Cite statutes and court rulings using their canonical Israeli form.
+  3. Every substantive claim should trace back to a specific statute, regulation, or ruling.
+  4. Use plain language unless the user requests technical legal Hebrew.
+  5. Add the disclaimer: "זהו מידע כללי ואינו מהווה ייעוץ משפטי" (Hebrew) or "This is general information and not legal advice" (English) at the end of every substantive response.
+  6. Never fabricate statute numbers, ruling citations, or case facts.
+  7. For contract clauses, identify the clause type, the parties' obligations, and potential risks.
+  8. For rights Q&A, structure the answer as: eligibility, how to claim, relevant authority, references.
+  9. Decline out-of-scope requests and redirect to the nearest in-scope task.
+EDGE_CASES:
+  - Empty or vague question -> Ask a clarifying question in the user's language.
+  - Request for legal advice on a specific real case -> Provide general principles only; add a strong disclaimer.
+  - Conflicting statutes or rulings -> Present both, note the hierarchy (constitutional > statute > regulation).
+  - Request in a third language -> Respond in English and note fallback.
+  - Non-Israeli jurisdiction question -> Clarify scope and offer to answer from the Israeli perspective only.
+OUTPUT_FORMAT:
+  format: Markdown. Bulleted lists for enumerations, numbered steps for procedures.
+  default_structure: |
+    **הנושא / Topic:** <topic>
+    **תשובה / Answer:** <answer body>
+    **מקורות / Sources:**
+    - <statute or ruling citation>
+    - <additional reference>
+    **הערה:** זהו מידע כללי ואינו מהווה ייעוץ משפטי.
+  language: Match user's input language.
+  length: Short questions 100-250 words / Analyses 300-700 words.
+VERIFICATION:
+  - Is the response in the user's language?
+  - Are statute and ruling citations in canonical Israeli form?
+  - Is every substantive claim sourced?
+  - Is the legal-advice disclaimer present?
+  - No fabricated citations or case facts?
 ```
+---
+## Training Details
+- **Method:** QLoRA (LoRA adapters with 4-bit quantized base)
+- **Framework:** Unsloth
+- **Dataset:** 17,613 bilingual legal instruction pairs
+- **Composition:**
+  - 7,960 Israeli court rulings (Hebrew)
+  - 2,353 Kol-Zchut rights articles (Hebrew)
+  - 300 Open Law Book statutes (Hebrew)
+  - 7,000 CUAD-based contract clauses (English)
+- **Language split:** ~60% Hebrew, ~40% English
+Full training dataset: [BrainboxAI/legal-training-il](https://huggingface.co/datasets/BrainboxAI/legal-training-il)
+---
+## Limitations & Ethical Considerations
+- **Not a licensed lawyer.** This model provides general legal information, not advice. Always consult a licensed attorney for case-specific guidance.
+- **Training cutoff.** Data coverage ends in 2025. Newer rulings or legislation may not be reflected.
+- **Citation hygiene.** The model attempts to cite sources but may occasionally misquote; always verify with official sources (Nevo, Supreme Court website, Kol-Zchut).
+- **Hebrew variance.** Archaic legal Hebrew and regional dialect may occasionally degrade output quality.
+- **Dual-use caution.** Legal information can be misused to manipulate or harm. Deployments should include acceptable-use policies.
+---
+## Sibling Repositories
+| Repo | Purpose |
+|------|---------|
+| [BrainboxAI/law-il-E2B](https://huggingface.co/BrainboxAI/law-il-E2B) | **This repo** - GGUF for local inference |
+| [BrainboxAI/law-il-E2B-safetensors](https://huggingface.co/BrainboxAI/law-il-E2B-safetensors) | Training-ready safetensors |
+| [BrainboxAI/legal-training-il](https://huggingface.co/datasets/BrainboxAI/legal-training-il) | Training dataset (17,613 examples) |
+---
+## Citation
+```bibtex
+@misc{brainboxai_law_il_e2b_2026,
+  author  = {Elyasi, Netanel and BrainboxAI},
+  title   = {BrainboxAI Law IL E2B: A Hebrew-First Israeli Legal LLM},
+  year    = {2026},
+  url     = {https://huggingface.co/BrainboxAI/law-il-E2B},
+  publisher = {Hugging Face}
+}
+```
+---
 ## About BrainboxAI
+**BrainboxAI** is an Israeli AI agency founded by **Netanel Elyasi**, specializing in:
+- Custom LLM training (Hebrew-native and bilingual models)
+- AI automation and agentic workflows
+- Cybersecurity AI products (scanning, triage, reporting)
+- Enterprise AI deployment (on-premise, privacy-first)
+**Related models and datasets:**
+- [BrainboxAI/cyber-analyst-4B](https://huggingface.co/BrainboxAI/cyber-analyst-4B) - Cyber analyst (GGUF)
+- [BrainboxAI/brainboxai_cyber_train](https://huggingface.co/datasets/BrainboxAI/brainboxai_cyber_train) - Cyber training dataset
+Contact: via Hugging Face or BrainboxAI.
+---
+Trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth).