Update Open Structure release with v8 instruction benchmark wall

Browse files

Files changed (7) hide show

README.md +14 -0
docs/AETHON_OPEN_STRUCTURE_HF_MODEL_CARD.md +14 -0
docs/AETHON_OPEN_STRUCTURE_RUNTIME.md +11 -5
examples/aethon_open_structure_python.py +22 -0
runtime/aethon/rfi_query.py +11 -0
runtime/aethon/rfi_runtime.py +185 -2
runtime/aethon/rfi_surface.py +4 -1

README.md CHANGED Viewed

@@ -210,6 +210,7 @@ The intended public experience is model-like:
 - load the bundle
 - create a runtime object from the shipped release
 - call `ask(...)`
 - get natural text back
 ```python
@@ -222,6 +223,16 @@ try:
         "Where is the notebook now, and explain the reasoning clearly."
     )
     print(reply.text)
 finally:
     model.close()
 ```
@@ -280,6 +291,7 @@ People should be able to ask in their own words.
 | --- | --- | --- | --- |
 | `aethon_n1_benchmark_v6.jsonl` | `43 / 43` | `1.0` | `3.476s` |
 | `aethon_n1_benchmark_v7.jsonl` | `15 / 15` | `1.0` | `18.488s` |
 ### What This Wall Covers
@@ -291,6 +303,8 @@ People should be able to ask in their own words.
 - open-grounded answers on unseen prompts
 - religion transfer under fresh setup facts
 - instruction-sensitive prompt checks
 ## One-Shot Data

 - load the bundle
 - create a runtime object from the shipped release
 - call `ask(...)`
+- or call `ask_messages([...])` for system-guided instruction following
 - get natural text back
 ```python
         "Where is the notebook now, and explain the reasoning clearly."
     )
     print(reply.text)
+    instructed = model.ask_messages(
+        [
+            {"role": "system", "content": "Answer in exactly three sentences and keep each sentence grounded."},
+            {
+                "role": "user",
+                "content": "Take this carefully and answer each part in one flowing response: where is Amina, what does regional launch depend on, and what is your tokenizer?",
+            },
+        ]
+    )
+    print(instructed.text)
 finally:
     model.close()
 ```
 | --- | --- | --- | --- |
 | `aethon_n1_benchmark_v6.jsonl` | `43 / 43` | `1.0` | `3.476s` |
 | `aethon_n1_benchmark_v7.jsonl` | `15 / 15` | `1.0` | `18.488s` |
+| `aethon_n1_benchmark_v8.jsonl` | `10 / 10` | `1.0` | `89.170s` |
 ### What This Wall Covers
 - open-grounded answers on unseen prompts
 - religion transfer under fresh setup facts
 - instruction-sensitive prompt checks
+- native system-guided instruction following
+- long mixed prompts with exact sentence-shape pressure
 ## One-Shot Data

docs/AETHON_OPEN_STRUCTURE_HF_MODEL_CARD.md CHANGED Viewed

@@ -210,6 +210,7 @@ The intended public experience is model-like:
 - load the bundle
 - create a runtime object from the shipped release
 - call `ask(...)`
 - get natural text back
 ```python
@@ -222,6 +223,16 @@ try:
         "Where is the notebook now, and explain the reasoning clearly."
     )
     print(reply.text)
 finally:
     model.close()
 ```
@@ -280,6 +291,7 @@ People should be able to ask in their own words.
 | --- | --- | --- | --- |
 | `aethon_n1_benchmark_v6.jsonl` | `43 / 43` | `1.0` | `3.476s` |
 | `aethon_n1_benchmark_v7.jsonl` | `15 / 15` | `1.0` | `18.488s` |
 ### What This Wall Covers
@@ -291,6 +303,8 @@ People should be able to ask in their own words.
 - open-grounded answers on unseen prompts
 - religion transfer under fresh setup facts
 - instruction-sensitive prompt checks
 ## One-Shot Data

 - load the bundle
 - create a runtime object from the shipped release
 - call `ask(...)`
+- or call `ask_messages([...])` for system-guided instruction following
 - get natural text back
 ```python
         "Where is the notebook now, and explain the reasoning clearly."
     )
     print(reply.text)
+    instructed = model.ask_messages(
+        [
+            {"role": "system", "content": "Answer in exactly three sentences and keep each sentence grounded."},
+            {
+                "role": "user",
+                "content": "Take this carefully and answer each part in one flowing response: where is Amina, what does regional launch depend on, and what is your tokenizer?",
+            },
+        ]
+    )
+    print(instructed.text)
 finally:
     model.close()
 ```
 | --- | --- | --- | --- |
 | `aethon_n1_benchmark_v6.jsonl` | `43 / 43` | `1.0` | `3.476s` |
 | `aethon_n1_benchmark_v7.jsonl` | `15 / 15` | `1.0` | `18.488s` |
+| `aethon_n1_benchmark_v8.jsonl` | `10 / 10` | `1.0` | `89.170s` |
 ### What This Wall Covers
 - open-grounded answers on unseen prompts
 - religion transfer under fresh setup facts
 - instruction-sensitive prompt checks
+- native system-guided instruction following
+- long mixed prompts with exact sentence-shape pressure
 ## One-Shot Data

docs/AETHON_OPEN_STRUCTURE_RUNTIME.md CHANGED Viewed

@@ -53,17 +53,16 @@ The recommended public shape is:
 1. pull the bundle
 2. construct a runtime object from the shipped release
 3. call `ask(...)`
-4. receive natural text back
-Starter example in this repo:
 - `examples/aethon_open_structure_python.py`
 - `run_aethon.py`
 - `runtime/aethon/...`
-The release now ships a portable bundle-native runtime pack.
-That runtime hides storage details behind a model-facing interface so developers interact with Aethon as a model rather than as a data store.
 ## Minimum Read Path
@@ -101,6 +100,13 @@ model = AethonOpenStructureModel.from_hub("OkeyMetaLtd/Aethon-N1-Base-Open-Struc
 try:
     reply = model.ask("Tell me what changed about Amina's location and explain it clearly.")
     print(reply.text)
 finally:
     model.close()
 ```

 1. pull the bundle
 2. construct a runtime object from the shipped release
 3. call `ask(...)`
+4. or call `ask_messages([...])` when a runtime wants system-style guidance
+5. receive natural text back
+Examples in this repo:
 - `examples/aethon_open_structure_python.py`
 - `run_aethon.py`
 - `runtime/aethon/...`
+These entry points expose Aethon as a model-facing runtime instead of a storage-facing interface.
 ## Minimum Read Path
 try:
     reply = model.ask("Tell me what changed about Amina's location and explain it clearly.")
     print(reply.text)
+    instructed = model.ask_messages(
+        [
+            {"role": "system", "content": "Answer in exactly two sentences."},
+            {"role": "user", "content": "Where is Amina now, and what does regional launch depend on?"},
+        ]
+    )
+    print(instructed.text)
 finally:
     model.close()
 ```

examples/aethon_open_structure_python.py CHANGED Viewed

@@ -64,6 +64,17 @@ class AethonOpenStructureModel:
             mode=response.mode,
         )
     def learn(self, text: str) -> dict[str, object]:
         return self._runtime.learn(text)
@@ -88,5 +99,16 @@ if __name__ == "__main__":
                 for step in reply.reasoning:
                     print(f"  - {step}")
             print()
     finally:
         model.close()

             mode=response.mode,
         )
+    def ask_messages(self, messages: list[dict[str, str]]) -> AethonOpenStructureResponse:
+        response = self._runtime.ask_messages(messages)
+        return AethonOpenStructureResponse(
+            answer=response.answer,
+            text=response.text,
+            explanation=response.explanation,
+            proof=tuple(response.proof),
+            reasoning=tuple(response.reasoning),
+            mode=response.mode,
+        )
     def learn(self, text: str) -> dict[str, object]:
         return self._runtime.learn(text)
                 for step in reply.reasoning:
                     print(f"  - {step}")
             print()
+        instructed = model.ask_messages(
+            [
+                {"role": "system", "content": "Answer in exactly three sentences and keep each sentence grounded."},
+                {
+                    "role": "user",
+                    "content": "Take this carefully and answer each part in one flowing response: where is Amina, what does regional launch depend on, and what is your tokenizer?",
+                },
+            ]
+        )
+        print("Instruction-following example:")
+        print(instructed.text)
     finally:
         model.close()

runtime/aethon/rfi_query.py CHANGED Viewed

@@ -211,6 +211,10 @@ class ProofQueryEngine:
             location = self._direct_or_abstract(parsed.subject, "located_in")
             if location is not None:
                 return location
             carried = self._infer_carried_object_location(parsed.subject)
             if carried is not None:
                 return carried
@@ -996,6 +1000,9 @@ class ProofQueryEngine:
             if lower_core in self._PROTECTED_QUERY_TOKENS:
                 corrected.append(token)
                 continue
             if lower_core in self.ontology.semantic_lexicon.typo_map:
                 replacement = self.ontology.semantic_lexicon.typo_map[lower_core]
                 if core[:1].isupper():
@@ -1071,6 +1078,10 @@ class ProofQueryEngine:
             "divided",
             "by",
         }
         for concept in self.graph.list_concepts():
             base_words.update(part for part in concept.split("_") if part)
             base_words.add(concept.replace("_", " "))

             location = self._direct_or_abstract(parsed.subject, "located_in")
             if location is not None:
                 return location
+            for relation in ("lives_in", "work_in", "study_in", "reached", "visited", "bought_in"):
+                direct_location = self._direct_or_abstract(parsed.subject, relation)
+                if direct_location is not None:
+                    return direct_location
             carried = self._infer_carried_object_location(parsed.subject)
             if carried is not None:
                 return carried
             if lower_core in self._PROTECTED_QUERY_TOKENS:
                 corrected.append(token)
                 continue
+            if lower_core in self.ontology.semantic_lexicon.alias_map:
+                corrected.append(token)
+                continue
             if lower_core in self.ontology.semantic_lexicon.typo_map:
                 replacement = self.ontology.semantic_lexicon.typo_map[lower_core]
                 if core[:1].isupper():
             "divided",
             "by",
         }
+        base_words.update(self.math._NUMBER_WORDS.keys())
+        base_words.update(self.ontology.semantic_lexicon.alias_map.keys())
+        for phrase in self.ontology.semantic_lexicon.phrase_alias_map.keys():
+            base_words.update(word for word in phrase.split() if word)
         for concept in self.graph.list_concepts():
             base_words.update(part for part in concept.split("_") if part)
             base_words.add(concept.replace("_", " "))

runtime/aethon/rfi_runtime.py CHANGED Viewed

@@ -30,6 +30,13 @@ class NativeResponse:
     mode: str
 class AethonNativeBase:
     """The first real no-weight Aethon base runtime."""
@@ -233,9 +240,34 @@ class AethonNativeBase:
         return {"rows": rows, "facts": facts}
     def ask(self, query: str) -> NativeResponse:
         parts = self._split_query_parts(query)
         if len(parts) > 1:
-            responses = [self.ask(part) for part in parts]
             return NativeResponse(
                 answer=" | ".join(response.answer for response in responses),
                 text=" ".join(response.text for response in responses if response.text),
@@ -257,6 +289,157 @@ class AethonNativeBase:
             )
         return self._render(query, result)
     def inspect(self, text: str) -> list[dict[str, object]]:
         return self.codec.export_tokens(text)
@@ -293,7 +476,7 @@ class AethonNativeBase:
     def _split_query_parts(query: str) -> list[str]:
         parts: list[str] = []
         for part in re.split(
-            r"(?:\?\s+|\?\s*$|(?:\s+and\s+also\s+)|(?:\s+also\s+)|(?:\s*;\s*)|(?:\s+then\s+)|(?:\r?\n+))",
             query,
         ):
             cleaned = part.strip()

     mode: str
+@dataclass(frozen=True)
+class NativeInstructionProfile:
+    sentence_target: int | None = None
+    bullet_points: bool = False
+    answer_only: bool = False
 class AethonNativeBase:
     """The first real no-weight Aethon base runtime."""
         return {"rows": rows, "facts": facts}
     def ask(self, query: str) -> NativeResponse:
+        profile, cleaned_query = self._extract_instruction_profile(query)
+        response = self._ask_core(cleaned_query)
+        return self._apply_instruction_profile(response, profile)
+    def ask_messages(self, messages: list[dict[str, str]]) -> NativeResponse:
+        system_parts: list[str] = []
+        user_query = ""
+        for message in messages:
+            role = str(message.get("role", "")).strip().lower()
+            content = str(message.get("content", "")).strip()
+            if not content:
+                continue
+            if role in {"system", "developer"}:
+                system_parts.append(content)
+            elif role == "user":
+                user_query = content
+        if not user_query:
+            return self.ask("")
+        system_profile, _ = self._extract_instruction_profile(" ".join(system_parts))
+        inline_profile, cleaned_query = self._extract_instruction_profile(user_query)
+        profile = self._merge_instruction_profiles(system_profile, inline_profile)
+        response = self._ask_core(cleaned_query)
+        return self._apply_instruction_profile(response, profile)
+    def _ask_core(self, query: str) -> NativeResponse:
         parts = self._split_query_parts(query)
         if len(parts) > 1:
+            responses = [self._ask_core(part) for part in parts]
             return NativeResponse(
                 answer=" | ".join(response.answer for response in responses),
                 text=" ".join(response.text for response in responses if response.text),
             )
         return self._render(query, result)
+    def _apply_instruction_profile(self, response: NativeResponse, profile: NativeInstructionProfile) -> NativeResponse:
+        if not profile.answer_only and not profile.bullet_points and profile.sentence_target is None:
+            return response
+        text = response.text
+        explanation = response.explanation
+        if profile.answer_only:
+            text = self._answer_only_text(response)
+            explanation = text
+        elif profile.bullet_points:
+            text = self._bullet_text(response)
+            explanation = text
+        elif profile.sentence_target is not None:
+            text = self._reshape_to_sentence_target(response, profile.sentence_target)
+            explanation = text
+        return NativeResponse(
+            answer=response.answer,
+            text=text,
+            explanation=explanation,
+            proof=response.proof,
+            reasoning=response.reasoning,
+            mode=response.mode,
+        )
+    def _extract_instruction_profile(self, prompt: str) -> tuple[NativeInstructionProfile, str]:
+        lowered = prompt.lower()
+        sentence_target = self._sentence_target_from_prompt(lowered)
+        bullet_points = "bullet points" in lowered or "bullets" in lowered
+        answer_only = (
+            "only the final answer" in lowered
+            or "answer only" in lowered
+            or "final answer only" in lowered
+        )
+        cleaned = prompt.strip()
+        patterns = [
+            r"^(?:please\s+)?(?:answer|respond|write|use|give|provide)\s+(?:in\s+)?(?:exactly\s+)?(?:one|two|three|1|2|3)\s+sentences?\s*(?:and\s+)?",
+            r"^(?:please\s+)?(?:use|write|give|provide)\s+bullet\s+points?\s*(?:and\s+)?",
+            r"^(?:please\s+)?(?:answer|respond|write|give|provide)\s+with\s+only\s+the\s+final\s+answer[.:]?\s*",
+            r"^(?:please\s+)?(?:give|provide)\s+only\s+the\s+final\s+answer[.:]?\s*",
+        ]
+        for pattern in patterns:
+            cleaned = re.sub(pattern, "", cleaned, flags=re.IGNORECASE)
+        if ":" in cleaned:
+            lead, tail = cleaned.split(":", 1)
+            if any(token in lead.lower() for token in ("answer each part", "flowing response", "carefully", "respond")) and tail.strip():
+                cleaned = tail.strip()
+        lowered_cleaned = cleaned.lower()
+        if lowered_cleaned.startswith("explain why "):
+            explanation_subject = cleaned[len("explain why ") :].strip()
+            for marker in (" equals ", "="):
+                if marker in explanation_subject:
+                    explanation_subject = explanation_subject.split(marker, 1)[0].strip()
+                    break
+            if explanation_subject:
+                cleaned = f"solve {explanation_subject}"
+        return NativeInstructionProfile(sentence_target=sentence_target, bullet_points=bullet_points, answer_only=answer_only), cleaned.strip() or prompt.strip()
+    @staticmethod
+    def _merge_instruction_profiles(left: NativeInstructionProfile, right: NativeInstructionProfile) -> NativeInstructionProfile:
+        return NativeInstructionProfile(
+            sentence_target=right.sentence_target if right.sentence_target is not None else left.sentence_target,
+            bullet_points=left.bullet_points or right.bullet_points,
+            answer_only=left.answer_only or right.answer_only,
+        )
+    @staticmethod
+    def _sentence_target_from_prompt(prompt: str) -> int | None:
+        match = re.search(r"(?:exactly\s+)?(one|two|three|1|2|3)\s+sentences?", prompt)
+        if match is None:
+            return None
+        token = match.group(1)
+        mapping = {"one": 1, "two": 2, "three": 3, "1": 1, "2": 2, "3": 3}
+        return mapping.get(token)
+    def _answer_only_text(self, response: NativeResponse) -> str:
+        parts = [part.strip() for part in response.answer.split("|") if part.strip()]
+        if not parts:
+            return response.text.strip()
+        humanized = [self.surface._humanize(part) for part in parts]
+        if len(humanized) == 1:
+            return humanized[0]
+        return "; ".join(humanized)
+    def _bullet_text(self, response: NativeResponse) -> str:
+        parts = [part.strip() for part in response.answer.split("|") if part.strip()]
+        if parts:
+            return "\n".join(f"- {self.surface._sentence(self.surface._humanize(part)).strip()}" for part in parts)
+        sentences = self._split_sentences(response.text)
+        if not sentences:
+            sentences = [response.text.strip()]
+        return "\n".join(f"- {self.surface._sentence(sentence).strip()}" for sentence in sentences if sentence.strip())
+    def _reshape_to_sentence_target(self, response: NativeResponse, target: int) -> str:
+        parts = [part.strip() for part in response.answer.split("|") if part.strip()]
+        if parts:
+            part_sentences = [self.surface._sentence(self.surface._humanize(part)).strip() for part in parts]
+            if target == 1:
+                return self.surface._sentence("; ".join(self.surface._humanize(part) for part in parts)).strip()
+            if len(part_sentences) >= target:
+                return " ".join(part_sentences[:target])
+        if response.mode == "story":
+            story_sentences = [
+                self.surface._sentence(sentence).strip()
+                for sentence in (
+                    self.surface._proof_line_to_sentence(step) for step in response.proof
+                )
+                if sentence
+            ]
+            if len(story_sentences) < target:
+                story_sentences.extend(self._split_sentences(response.text))
+            if len(story_sentences) >= target:
+                return " ".join(story_sentences[-target:])
+        if response.mode == "plan":
+            answer_sentence = self.surface._sentence(self.surface._humanize(response.answer)).strip()
+            support_candidates = [sentence for sentence in self._candidate_sentences(response) if self.surface._humanize(response.answer).lower() not in sentence.lower()]
+            if target == 1:
+                return answer_sentence
+            selected = [answer_sentence]
+            selected.extend(support_candidates[: max(target - 1, 0)])
+            while len(selected) < target:
+                selected.append(answer_sentence)
+            return " ".join(selected[:target])
+        candidates = self._candidate_sentences(response)
+        if not candidates:
+            return response.text.strip()
+        if target == 1:
+            return self.surface._sentence(" ".join(sentence.rstrip(".!?") for sentence in candidates[:2])).strip()
+        selected = candidates[:target]
+        while len(selected) < target:
+            selected.append(self.surface._sentence(self.surface._humanize(response.answer)).strip())
+        return " ".join(selected[:target])
+    def _candidate_sentences(self, response: NativeResponse) -> list[str]:
+        candidates: list[str] = []
+        seen: set[str] = set()
+        for text in [response.text, response.explanation, *response.reasoning]:
+            for sentence in self._split_sentences(text):
+                normalized = sentence.strip()
+                if normalized and normalized not in seen:
+                    candidates.append(self.surface._sentence(normalized).strip())
+                    seen.add(normalized)
+        if response.answer and response.answer != "<unknown>":
+            normalized_answer = self.surface._humanize(response.answer)
+            if normalized_answer not in seen:
+                candidates.append(self.surface._sentence(normalized_answer).strip())
+        return candidates
+    @staticmethod
+    def _split_sentences(text: str) -> list[str]:
+        return [piece.strip() for piece in re.split(r"(?<=[.!?])\s+", text.strip()) if piece.strip()]
     def inspect(self, text: str) -> list[dict[str, object]]:
         return self.codec.export_tokens(text)
     def _split_query_parts(query: str) -> list[str]:
         parts: list[str] = []
         for part in re.split(
+            r"(?:\?\s+|\?\s*$|(?:\s+and\s+also\s+)|(?:\s+also\s+)|(?:\s*;\s*)|(?:\s+then\s+)|(?:,\s+(?=what|where|who|how|why|which|solve))|(?:\s+and\s+(?=what|where|who|how|why|which|solve))|(?:\r?\n+))",
             query,
         ):
             cleaned = part.strip()

runtime/aethon/rfi_surface.py CHANGED Viewed

@@ -457,13 +457,16 @@ class GraphVerbalizer:
             return []
         concepts = self._unknown_query_concepts(query)
         query_kind = self._unknown_query_kind(query)
         supports: list[str] = []
         seen: set[str] = set()
         for concept in concepts[:2]:
             edges = [
                 edge
                 for edge in self.graph.iter_outgoing_edges(concept)
-                if edge.is_active and self._edge_matches_unknown_query_kind(edge.relation, query_kind)
             ]
             edges.sort(key=lambda edge: (0 if edge.source_kind != "derived" else 1, -edge.edge_id))
             for edge in edges[:2]:

             return []
         concepts = self._unknown_query_concepts(query)
         query_kind = self._unknown_query_kind(query)
+        lowered_query = query.lower()
         supports: list[str] = []
         seen: set[str] = set()
         for concept in concepts[:2]:
             edges = [
                 edge
                 for edge in self.graph.iter_outgoing_edges(concept)
+                if edge.is_active and (
+                    "what changed about " in lowered_query or self._edge_matches_unknown_query_kind(edge.relation, query_kind)
+                )
             ]
             edges.sort(key=lambda edge: (0 if edge.source_kind != "derived" else 1, -edge.edge_id))
             for edge in edges[:2]: