| --- |
| license: other |
| license_name: aigency-commercial |
| license_link: https://aigency.dev/license |
| language: |
| - tr |
| - en |
| library_name: aigency-api |
| pipeline_tag: text-generation |
| tags: |
| - turkish |
| - multimodal |
| - sovereign |
| - frontier-adjacent |
| - aigency |
| - ecloud |
| - production |
| inference: false |
| extra_gated_heading: AIGENCY V4 is offered via API |
| extra_gated_description: | |
| Model weights are not distributed on HuggingFace. AIGENCY V4 is accessible |
| via the eCloud production API at https://aigency.dev. This page is a |
| reference card describing architecture, evaluation methodology, and |
| benchmark results, and links to a live demo Space. |
| model-index: |
| - name: AIGENCY V4 |
| results: |
| - task: |
| type: text-generation |
| name: Code generation |
| dataset: |
| type: openai_humaneval |
| name: HumanEval (pass@1) |
| metrics: |
| - type: pass@1 |
| value: 84.15 |
| name: pass@1 |
| verified: false |
| - task: |
| type: text-generation |
| name: Code generation extended |
| dataset: |
| type: humaneval-plus |
| name: HumanEval+ (pass@1) |
| metrics: |
| - type: pass@1 |
| value: 79.88 |
| name: pass@1 |
| verified: false |
| - task: |
| type: text-generation |
| name: Code generation |
| dataset: |
| type: mbpp |
| name: MBPP (sanitized) |
| metrics: |
| - type: pass@1 |
| value: 84.82 |
| name: pass@1 |
| verified: false |
| - task: |
| type: text-generation |
| name: Code generation extended |
| dataset: |
| type: mbpp-plus |
| name: MBPP+ |
| metrics: |
| - type: pass@1 |
| value: 78.04 |
| name: pass@1 |
| verified: false |
| - task: |
| type: text-generation |
| name: Mathematical reasoning |
| dataset: |
| type: gsm8k |
| name: GSM8K |
| metrics: |
| - type: accuracy |
| value: 94.62 |
| name: accuracy |
| verified: false |
| - task: |
| type: text-generation |
| name: Multitask language understanding |
| dataset: |
| type: cais/mmlu |
| name: MMLU (stratified n=1000) |
| metrics: |
| - type: accuracy |
| value: 80.10 |
| name: accuracy |
| verified: false |
| - task: |
| type: text-generation |
| name: Multitask language understanding (Pro) |
| dataset: |
| type: TIGER-Lab/MMLU-Pro |
| name: MMLU-Pro (n=1000) |
| metrics: |
| - type: accuracy |
| value: 50.20 |
| name: accuracy |
| verified: false |
| - task: |
| type: text-generation |
| name: Scientific reasoning |
| dataset: |
| type: ai2_arc |
| name: ARC-Challenge |
| metrics: |
| - type: accuracy |
| value: 94.88 |
| name: accuracy |
| verified: false |
| - task: |
| type: text-generation |
| name: Graduate-level QA |
| dataset: |
| type: idavidrein/gpqa |
| name: GPQA Diamond |
| metrics: |
| - type: accuracy |
| value: 37.88 |
| name: accuracy |
| verified: false |
| - task: |
| type: text-generation |
| name: Truthfulness |
| dataset: |
| type: truthful_qa |
| name: TruthfulQA MC1 |
| metrics: |
| - type: accuracy |
| value: 76.38 |
| name: accuracy |
| verified: false |
| - task: |
| type: text-generation |
| name: Instruction following |
| dataset: |
| type: google/IFEval |
| name: IFEval (strict) |
| metrics: |
| - type: accuracy |
| value: 80.22 |
| name: strict-prompt-level |
| verified: false |
| - task: |
| type: text-generation |
| name: Commonsense reasoning |
| dataset: |
| type: hellaswag |
| name: HellaSwag (n=1000) |
| metrics: |
| - type: accuracy |
| value: 88.60 |
| name: accuracy |
| verified: false |
| - task: |
| type: text-generation |
| name: Coreference reasoning |
| dataset: |
| type: winogrande |
| name: WinoGrande XL |
| metrics: |
| - type: accuracy |
| value: 74.66 |
| name: accuracy |
| verified: false |
| - task: |
| type: text-generation |
| name: Turkish reading comprehension |
| dataset: |
| type: facebook/belebele |
| name: Belebele-TR (Turkish) |
| metrics: |
| - type: accuracy |
| value: 87.33 |
| name: accuracy |
| verified: false |
| - task: |
| type: text-generation |
| name: Turkish extractive QA |
| dataset: |
| type: tquad |
| name: TQuAD (F1 ≥ 0.5) |
| metrics: |
| - type: f1 |
| value: 82.40 |
| name: F1 ≥ 0.5 |
| verified: false |
| - task: |
| type: text-generation |
| name: Turkish multitask understanding |
| dataset: |
| type: tr-mmlu |
| name: TR-MMLU |
| metrics: |
| - type: accuracy |
| value: 70.80 |
| name: accuracy |
| verified: false |
| - task: |
| type: text-generation |
| name: Turkish natural-language inference |
| dataset: |
| type: xnli |
| name: XNLI-TR |
| metrics: |
| - type: accuracy |
| value: 73.40 |
| name: accuracy |
| verified: false |
| - task: |
| type: text-generation |
| name: Turkish grammar |
| dataset: |
| type: tr-grammar-synthetic |
| name: TR Grammar (synthetic 50/50) |
| metrics: |
| - type: accuracy |
| value: 79.00 |
| name: accuracy |
| verified: false |
| - task: |
| type: image-text-to-text |
| name: Multimodal QA |
| dataset: |
| type: MMMU |
| name: MMMU (val, n=30) |
| metrics: |
| - type: accuracy |
| value: 53.33 |
| name: accuracy |
| verified: false |
| - task: |
| type: image-text-to-text |
| name: Chart QA |
| dataset: |
| type: HuggingFaceM4/ChartQA |
| name: ChartQA (relaxed) |
| metrics: |
| - type: accuracy |
| value: 67.68 |
| name: relaxed accuracy |
| verified: false |
| - task: |
| type: image-text-to-text |
| name: Document QA |
| dataset: |
| type: lmms-lab/DocVQA |
| name: DocVQA (ANLS ≥ 0.5) |
| metrics: |
| - type: accuracy |
| value: 79.17 |
| name: ANLS ≥ 0.5 |
| verified: false |
| - task: |
| type: image-text-to-text |
| name: Visual mathematical reasoning |
| dataset: |
| type: AI4Math/MathVista |
| name: MathVista (testmini) |
| metrics: |
| - type: accuracy |
| value: 34.13 |
| name: accuracy |
| verified: false |
| --- |
| |
| # AIGENCY V4 |
|
|
| > **Sovereign, fully independent, multimodal — 128B parameters.** |
| > A globally competitive Turkish-first AI model: world-leading on Turkish |
| > reading comprehension and natural-language inference, frontier-level on |
| > grade-school math and scientific reasoning, KVKK-resident. |
|
|
| [**🇹🇷 Türkçe README**](#türkçe) · [**🇬🇧 English README**](#english) · [**📄 Whitepaper (EN)**](https://github.com/ecloud-bh/aigency-v4-whitepaper/blob/main/AIGENCY-V4-Whitepaper-EN.pdf) · [**📄 Whitepaper (TR)**](https://github.com/ecloud-bh/aigency-v4-whitepaper/blob/main/AIGENCY-V4-Whitepaper-TR.pdf) · [**🌐 Try the demo**](https://huggingface.co/spaces/aigencydev/AIGENCY-V4-Demo) · [**🔗 API**](https://aigency.dev) |
|
|
| --- |
|
|
| ## English |
|
|
| ### Model summary |
|
|
| **AIGENCY V4** is the multimodal successor to AIGENCY V3, developed by |
| **eCloud Yazılım Teknolojileri** and released to production in Q2 2026. |
| The model retains V3's four sovereignty principles — zero external parameter |
| dependency, sovereign data residency, transparent architectural documentation, |
| and Turkish morphological context fidelity — and adds a sovereign 8B-parameter |
| vision encoder for image, document, chart, and visual-math understanding. |
|
|
| | | | |
| |---|---| |
| | **Total parameters** | 128B (120B core + 8B vision encoder) | |
| | **Architecture** | Sovereign decoder-only transformer + side vision encoder | |
| | **Optimisations** | Adaptive LoRA+, Selective Layer Collapse, Localised MoE, 4-bit block quantization, chunked attention | |
| | **Context window** | 278K tokens (HBM 3-tier: STM 4k / ITM 64k / LTM 278k) | |
| | **Active inference memory** | ~6.5 GB GPU under 4-bit quant | |
| | **Languages** | Turkish (primary), English | |
| | **Modalities** | Text, image (one image per request, 30 MB max, image/* MIME) | |
| | **Release version** | 1.0 production | |
| | **Release date** | April 2026 | |
| | **Licence** | API-only commercial — see https://aigency.dev/license | |
|
|
| ### Distribution |
|
|
| **Weights are not distributed.** AIGENCY V4 is accessed exclusively through |
| the eCloud production API at `https://aigency.dev/api/v2`. This page provides |
| the architectural specification, the evaluation methodology, and the full |
| benchmark results. To try the model interactively, use the |
| [demo Space](https://huggingface.co/spaces/aigencydev/AIGENCY-V4-Demo). For |
| production access, see [aigency.dev](https://aigency.dev). |
|
|
| ### Evaluation |
|
|
| A comprehensive single-session evaluation was conducted on **27 April 2026** |
| against the production API. **13,344 real API calls** across **22 distinct |
| benchmarks** were executed; every result is reported with a Wilson 95% |
| confidence interval, deterministic subsampling (seed=42), and an open dataset |
| identifier. |
|
|
| #### Tier 1 — Critical benchmarks (full set) |
|
|
| | Benchmark | Accuracy | Wilson 95% CI | n | Errors | |
| |---|---|---|---|---| |
| | HumanEval (pass@1) | **0.8415** | [0.778, 0.889] | 164/164 | 0 | |
| | IFEval (strict) | **0.8022** | [0.767, 0.834] | 541/541 | 1 | |
| | GPQA Diamond | 0.3788 | [0.314, 0.448] | 198/198 | 0 | |
| | Belebele-TR | **0.8733** | [0.850, 0.893] | 900/900 | 0 | |
| | ARC-Challenge | **0.9488** | [0.935, 0.960] | 1172/1172 | 0 | |
| | TruthfulQA MC1 | **0.7638** | [0.734, 0.792] | 817/817 | 0 | |
| | GSM8K | **0.9462** | [0.933, 0.957] | 1319/1319 | 0 | |
|
|
| #### Tier 2 — Mid-volume |
|
|
| | Benchmark | Accuracy | Wilson 95% CI | n | |
| |---|---|---|---| |
| | MMLU (stratified) | **0.8010** | [0.775, 0.825] | 1000/1000 | |
| | MMLU-Pro | 0.5020 | [0.471, 0.533] | 1000/1000 | |
| | HellaSwag | **0.8860** | [0.865, 0.904] | 1000/1000 | |
| | WinoGrande XL | 0.7466 | [0.722, 0.770] | 1267/1267 | |
| | HumanEval+ (extended) | **0.7988** | [0.731, 0.853] | 164/164 | |
| | MBPP (sanitized) | **0.8482** | [0.799, 0.887] | 257/257 | |
| | MBPP+ | **0.7804** | [0.736, 0.819] | 378/378 | |
|
|
| #### Tier 3-A — Turkish (V4 is the de-facto global reference) |
|
|
| | Benchmark | Accuracy | Wilson 95% CI | n | |
| |---|---|---|---| |
| | Belebele-TR | **0.8733** | [0.850, 0.893] | 900/900 | |
| | TQuAD (F1 ≥ 0.5) | **0.8240** | [0.788, 0.855] | 500/500 | |
| | TR-MMLU | **0.7080** | [0.667, 0.746] | 500/500 | |
| | XNLI-TR | **0.7340** | [0.694, 0.771] | 500/500 | |
| | TR Grammar (synthetic) | **0.7900** | [0.700, 0.858] | 100/100 | |
|
|
| > Frontier models do not consistently publish Turkish-specific scores. |
| > Within published global evaluation, AIGENCY V4 is the **Turkish reference**. |
|
|
| #### Tier 3-B — Multimodal (first production release) |
|
|
| | Benchmark | Accuracy | Wilson 95% CI | n | |
| |---|---|---|---| |
| | MMMU (val) | 0.5333 | [0.361, 0.698] | 30/30 | |
| | ChartQA (relaxed) | 0.6768 | [0.634, 0.717] | 492/500 | |
| | DocVQA (ANLS ≥ 0.5) | 0.7917 | [0.595, 0.908] | 24 | |
| | MathVista (testmini) | 0.3413 | [0.280, 0.408] | 208 | |
|
|
| ### Comparison with frontier (April 2026) |
|
|
| | Benchmark | AIGENCY V4 | GPT-5 | Claude 4.6/4.7 | Gemini 3 Pro | |
| |---|---|---|---|---| |
| | GSM8K | **94.62** | 96.8 | ~96 | ~94 | |
| | ARC-Challenge | **94.88** | ~96 | ~96 | ~95 | |
| | HumanEval | 84.15 | 94.0 | 95.0 | 89.7 | |
| | MMLU | 80.10 | 94.2 | 88-93 | 92.4 | |
| | MMLU-Pro | 50.20 | ~85 | ~84 | ~81 | |
| | GPQA Diamond | 37.88 | 88-94 | 91.3-94.2 | 91.9 | |
| | MMMU | 53.33 | 79.1 | 84.1 | — | |
|
|
| V4 is **at frontier level on grade-school math and scientific reasoning**, |
| **upper-mid frontier on code generation**, **lower-mid frontier on general |
| academic and instruction following**, and **in active development on |
| graduate-level expert knowledge and multimodal**. The V4.1 roadmap (Q4 2026) |
| targets MMLU-Pro 0.65, GPQA Diamond 0.55, and average latency 4 s. |
|
|
| ### Operational performance (single-session, 27 April 2026) |
|
|
| - Total API calls: 13,344 |
| - Persistent error rate: 0.3% |
| - Average latency: 9.55 s · p50 4.39 s · p95 32.77 s · p99 33.59 s |
| - V4.1 latency target: average ≤ 4 s · p95 ≤ 15 s |
|
|
| ### Reproducibility |
|
|
| Full evaluation harness, raw responses, scored items, summary JSON, and the |
| deterministic subsample seed are available at: |
|
|
| - **Benchmark code**: https://github.com/ecloud-bh/aigency-benchmarks |
| - **Evaluation results dataset**: https://huggingface.co/datasets/aigencydev/aigency-v4-evaluation |
| - **Whitepaper (EN/TR)**: https://github.com/ecloud-bh/aigency-v4-whitepaper |
|
|
| ### Intended use |
|
|
| **Primary deployment domains:** |
|
|
| 1. Public-sector and government workloads requiring KVKK residency |
| 2. Legal and legal-tech (statute search, contract analysis — Tural model integration) |
| 3. Education and higher education (Turkish academic, exam prep, course assistants) |
| 4. Banking, finance and insurance (Turkish-heavy KYC/AML) |
| 5. Healthcare administrative workloads (KVKK-compliant document handling) |
| 6. Media, publishing and editorial (Turkish grammar precision) |
| 7. Defence and critical infrastructure (sovereign architecture) |
| 8. Software, R&D and engineering (code generation, large-codebase analysis) |
|
|
| **Out-of-scope or non-recommended:** |
|
|
| - Clinical diagnosis or medical advice (administrative use only) |
| - Autonomous critical decisions without human review |
| - Graduate-level scientific research where GPQA-Diamond–class accuracy is required (use frontier model + V4 hybrid) |
| - High-fidelity multimodal reasoning where MMMU > 75 is required (await V4.1) |
|
|
| ### Safety and compliance |
|
|
| - KVKK §5 / §12 (Turkish PDPA) compliant — KVKK-resident hosting (TR DC) |
| - ISO/IEC 27001 — IT-ISMS, risk and control matrix |
| - NIST SP 800-207 (Zero-Trust) — mTLS, least privilege, continuous monitoring |
| - EU AI Act (ratified 2025) — high-risk classification with model card |
| - Memory encryption: AES-256-XTS (RAM), ChaCha20-Poly1305 (LTM disk) |
| - Image cache: AES-256-GCM, 30 MB limit, 24h TTL |
| - Pre-encoding visual safety filter + post-encoding output check |
|
|
| ### Known limitations |
|
|
| 1. **GPQA Diamond / MMLU-Pro gap** — 35-50pp behind frontier; graduate-level expert knowledge is a V4.1 target. |
| 2. **First-generation multimodal** — vision encoder is 8B; V4.1 plans to scale to 16B. |
| 3. **Latency 2-3× frontier** — vision-encoder overhead, multimodal safety filter; V4.1 targets ≤ 4 s avg. |
| 4. **Multimodal subsample size** — DocVQA n=24, MMMU n=30 (HF cache constraints); CIs are wide. |
| 5. **Multilingual non-TR evaluation not published** — global-scale claim is currently Turkish-anchored. |
|
|
| ### Citation |
|
|
| ```bibtex |
| @techreport{aigency-v4-2026, |
| title = {AIGENCY V4: Sovereign, Fully Independent and Multimodal 128B-Parameter AI Architecture}, |
| author = {{eCloud Yaz{\i}l{\i}m Teknolojileri}}, |
| year = {2026}, |
| month = apr, |
| institution = {eCloud Yaz{\i}l{\i}m Teknolojileri}, |
| url = {https://github.com/ecloud-bh/aigency-v4-whitepaper}, |
| note = {Whitepaper v1.0, April 2026} |
| } |
| ``` |
|
|
| --- |
|
|
| ## Türkçe |
|
|
| ### Model özeti |
|
|
| **AIGENCY V4**, eCloud Yazılım Teknolojileri tarafından geliştirilen, V3'ün |
| multimodal halefi olan 128 milyar parametreli yerli yapay zekâ modelidir. |
| 2026/Q2'de üretime alındı. V3'ün dört bağımsızlık ilkesini (dış parametre |
| sıfırlama, yerel veri egemenliği, şeffaf belgeleme, Türkçe bağlam uyumu) |
| korur ve görsel anlama, belge soru-cevap, grafik yorumlama, görsel matematik |
| yetkinliklerini ekleyen 8B parametreli yerli vision encoder ile genişletir. |
|
|
| | | | |
| |---|---| |
| | **Toplam parametre** | 128B (120B çekirdek + 8B vision encoder) | |
| | **Mimari** | Yerli decoder-only transformer + yan vision encoder | |
| | **Optimizasyonlar** | Adaptif LoRA+, Selective Layer Collapse, L-MoE, 4-bit blok kuantizasyon, öbekli dikkat | |
| | **Bağlam penceresi** | 278K token (HBM 3-katmanlı: STM 4k / ITM 64k / LTM 278k) | |
| | **Aktif inferans bellek** | 4-bit kuantizasyon altında ~6.5 GB GPU | |
| | **Diller** | Türkçe (birincil), İngilizce | |
| | **Modaliteler** | Metin, görsel (istek başına bir görsel, max 30 MB, image/* MIME) | |
| | **Sürüm** | 1.0 üretim | |
| | **Yayın tarihi** | Nisan 2026 | |
| | **Lisans** | API-only ticari — https://aigency.dev/license | |
|
|
| ### Dağıtım |
|
|
| **Ağırlıklar HuggingFace'de paylaşılmaz.** AIGENCY V4'e erişim yalnızca |
| `https://aigency.dev/api/v2` üzerinden sağlanır. Bu sayfa mimari |
| spesifikasyonu, değerlendirme metodolojisini ve tam benchmark sonuçlarını |
| sunar. Modeli interaktif olarak denemek için |
| [demo Space](https://huggingface.co/spaces/aigencydev/AIGENCY-V4-Demo) |
| sayfasını kullanın. Üretim erişimi için: [aigency.dev](https://aigency.dev). |
|
|
| ### Konumlandırma — Tek cümlede |
|
|
| AIGENCY V4, Türkçe okuma anlama ve doğal dil çıkarımında dünya lideri, |
| fen muhakemesi ve grade-school matematikte küresel frontier seviyesinde, |
| kod üretiminde üst-orta frontier segmentinde, multimodal ve graduate-level |
| uzman bilgide aktif geliştirme aşamasında, tam-bağımsız ve KVKK-yerel bir |
| yerli yapay zekâ modelidir. |
|
|
| ### Hedef kullanım alanları |
|
|
| 1. Kamu sektörü ve devlet kurumları (KVKK gereksinimi) |
| 2. Hukuk ve hukuk teknolojileri (mevzuat arama, sözleşme analizi) |
| 3. Eğitim ve yükseköğretim (Türkçe akademik, sınav hazırlık) |
| 4. Bankacılık, finans ve sigorta (Türkçe-yoğun KYC/AML) |
| 5. Sağlık idari iş yükleri (KVKK uyumlu belge işleme) |
| 6. Medya, yayıncılık ve editoryal (Türkçe dilbilgisi titizliği) |
| 7. Savunma ve kritik altyapı (egemen mimari) |
| 8. Yazılım, AR-GE ve mühendislik |
|
|
| ### Bilinen kısıtlar |
|
|
| 1. GPQA Diamond / MMLU-Pro frontier'ın 35-50pp gerisinde — V4.1 hedefi. |
| 2. Multimodal ilk üretim sürümü — V4.1'de 16B vision encoder planlandı. |
| 3. Latency frontier'ın 2-3 katı — V4.1 hedefi ≤ 4 s ortalama. |
| 4. Multimodal subsample boyutu küçük (DocVQA n=24, MMMU n=30); CI geniş. |
| 5. TR-dışı çok-dilli profil yayımlanmadı — küresel iddia şu an TR-merkezli. |
|
|
| ### Atıf |
|
|
| ```bibtex |
| @techreport{aigency-v4-2026, |
| title = {AIGENCY V4: Yerli, Tam Ba{\u g}{\i}ms{\i}z ve Multimodal 128B Parametreli Yapay Zek\^a Mimarisi}, |
| author = {{eCloud Yaz{\i}l{\i}m Teknolojileri}}, |
| year = {2026}, |
| month = apr, |
| institution = {eCloud Yaz{\i}l{\i}m Teknolojileri}, |
| url = {https://github.com/ecloud-bh/aigency-v4-whitepaper} |
| } |
| ``` |
|
|
| --- |
|
|
| ## License |
|
|
| AIGENCY V4 is offered under the **eCloud AIGENCY Commercial Licence** (API-only). |
| Model weights are not redistributed. The accompanying whitepaper is licensed |
| under **CC BY-ND 4.0**, and the benchmark code is licensed under **MIT**. |
|
|
| For commercial use, partnership, or research collaboration: |
| **info@e-cloud.web.tr · ai@aigency.dev** · https://aigency.dev |
|
|
| © 2026 eCloud Yazılım Teknolojileri. |
|
|