CaptAdel — Fly GACA

Status: in development. This repository is the future home of the GACAR retrieval embedding model that powers Captain Adel, Fly GACA's independent, educational AI flight instructor for Saudi civil aviation. Model weights are not published here yet.

What Captain Adel is

Captain Adel is a retrieval-augmented assistant: it answers GACAR questions from a curated corpus with the relevant Part cited, and refuses rather than guess when it can't ground an answer. Its answers come from retrieval over source regulations — not from a model that has memorised them. This repo holds the retrieval piece: a bilingual (Arabic / English) embedding model fine-tuned to find the right regulation for a query.

It is a retrieval component, not a knowledge store. It does not "know" or generate regulations.

Unofficial & educational

Fly GACA is independent and not affiliated with, endorsed by, or operated by the General Authority of Civil Aviation (GACA). Nothing here is official, legal advice, or for operational use. The authoritative source for any regulation is always GACA — https://gaca.gov.sa. Captain Adel does not provide real-time weather, NOTAMs, clearances, or airworthiness/medical decisions.

Intended use (once weights are published)

  • ✅ Embedding GACAR / aviation text and queries for semantic retrieval (EN/AR)
  • ✅ Powering the Fly GACA RAG pipeline

Out of scope:

  • ❌ Not a source of truth for regulations
  • ❌ Not official
  • ❌ Not for operational decisions

Usage

Note: this snippet will work once weights are published.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("flygaca/CaptAdel")

# If the base model requires prefixes (e.g. E5-style), keep them:
query = "query: What are the medical requirements for a Part 61 PPL?"
passages = [
    "passage: GACAR Part 67 sets out the medical standards ...",
    "passage: GACAR Part 61 covers certification of pilots ...",
]

q_emb = model.encode(query, normalize_embeddings=True)
p_emb = model.encode(passages, normalize_embeddings=True)

scores = q_emb @ p_emb.T
print(scores)

Model specifications

Property Value
Base model TBD
Embedding dimension TBD
Max sequence length TBD
Query/passage prefix required? TBD (e.g. query: / passage:)
Similarity function Cosine

Evaluation

Evaluated with the Fly GACA retrieval harness against a held-out bilingual query set. Metrics reported per language, since cross-lingual retrieval quality is typically asymmetric.

Model Lang Recall@1 Recall@5 MRR@10 nDCG@10
CaptAdel EN TBD TBD TBD TBD
CaptAdel AR TBD TBD TBD TBD
Base model (baseline) EN TBD TBD TBD TBD
Base model (baseline) AR TBD TBD TBD TBD

Training data

Fine-tuned on query–passage pairs derived from the curated Fly GACA corpus (the 74 GACAR Parts, topical handbooks and the reference shelf). The source regulations remain GACA's; this model is trained only to retrieve over them.

  • Corpus scope: GACAR / AIP source regulations (EN + AR)
  • Pair mining: TBD (describe how query–passage pairs were generated)
  • Language balance: TBD
  • Chunking: TBD (chunk size / overlap)

Limitations & bias

  • May underperform on out-of-domain queries (non-GACAR aviation topics).
  • Arabic performance may vary between Modern Standard Arabic and dialectal phrasing.
  • Retrieval reflects the corpus snapshot it was trained on — it will not be aware of amended or superseded regulations published after that snapshot.
  • It retrieves; it does not verify currency. Always confirm against the latest official GACA publication.

Versioning

Version Corpus snapshot Base model Notes
TBD TBD TBD Initial release

Links


© Fly GACA · Independent of GACA · Made in the Kingdom · صُنع في السعودية

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support