Spaces:
Sleeping
Sleeping
File size: 12,594 Bytes
9cb16b8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | # CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project
Passive OSINT Control Panel β a Gradio-based Hugging Face Space for drift-aware,
passive-first OSINT enrichment. Inputs are validated, sanitised, normalised, and
HMAC-hashed before any logging or enrichment. External-target interaction is
gated behind explicit authorization.
- **Entry point:** `app.py` (Gradio `demo.launch()`)
- **Runtime:** Python 3.11+, Gradio 6.13.x, HF Space SDK `gradio`
- **Version:** `0.1.0` (`APP_VERSION` in `app.py`, `__version__` in `osint_core/__init__.py`)
- **License:** Apache-2.0
- **HF Space front matter** lives at the top of `README.md` β do not strip it.
## Repository layout
```
.
βββ app.py # Monolithic Gradio UI + full pipeline (current prod)
βββ osint_core/ # Modular refactor of app.py's logic (preferred for new code)
β βββ __init__.py # Public API: validate_indicator, create_orchestrator, ...
β βββ validators.py # Input validation + normalisation (pure, no I/O)
β βββ policy.py # Module authorization boundary + correction verb gate
β βββ orchestrator.py # Agent-based workflow coordination
β βββ drift.py # DRIFT LAYER β currently PSEUDOCODE, not runnable (see below)
βββ agent/ # Claude-powered OSINT expert agent (NEW)
β βββ __init__.py # Exports OSINTAgent
β βββ osint_agent.py # OSINTAgent class β claude-opus-4-7, adaptive thinking, prompt caching
β βββ cli.py # argparse CLI: --target, --type, --iocs, --explain, interactive mode
βββ tests/
β βββ test_policy.py # Passes against osint_core.policy
β βββ test_orchestrator.py # Passes against osint_core.orchestrator
β βββ test_drift.py # Contract tests β FAIL until drift.py is implemented
βββ data/sources.yaml # OSINT source registry (subset; app.py has its own inline copy)
βββ policy.yaml # Declarative policy snapshot (mirrors osint_core.policy)
βββ manifest.json # Artifact manifest skeleton
βββ golden_tests.json # Smoke-test fixtures for classification
βββ requirements.txt # Python deps (pinned ranges)
βββ packages.txt # HF Space apt packages (dnsutils, whois, libmagic1)
βββ README.md # User/operator docs + HF Space config
```
Two layers coexist:
- `app.py` is self-contained and ships the running Space today.
- `osint_core/` is the modular refactor targeted by the test suite. New logic
should land in `osint_core/` and be wired into `app.py` rather than expanded
in `app.py` directly.
## Orchestrator agent architecture
The `osint_core/orchestrator.py` module provides an agent-based workflow
coordination system. It separates concerns into:
- **Agent**: `OrchestratorAgent` coordinates the full enrichment workflow
- **Skills**: High-level capabilities (e.g., "Resolve DNS", "Fetch WHOIS")
- **Tools**: Atomic external actions (e.g., DNS query, HTTP request, subprocess)
- **Execution context**: Tracks state through validation β policy β enrichment β drift β audit
### Key data structures
- `Tool`: Atomic capability (subprocess, network, file, computation)
- `Skill`: Composed capability with required indicator types and tools
- `ExecutionContext`: Per-request state (run ID, normalized indicator, policy eval, errors)
- `SkillResult`: Result from executing a skill (status, data, error, duration)
- `EnrichmentWorkflow`: Complete workflow result with all execution details
### Workflow execution pattern
```python
from osint_core import create_orchestrator
agent = create_orchestrator()
workflow = agent.execute_workflow(
raw_indicator="example.com",
indicator_type_hint="Domain",
requested_modules=["resource_links", "dns_records"],
authorized_target=False,
passive_only=True,
)
# workflow.validation_result β ValidationResult
# workflow.policy_evaluation β PolicyEvaluation
# workflow.skill_results β list[SkillResult]
# workflow.drift_vector β dict[str, float]
# workflow.correction_verb β "ADAPT" | "CONSTRAIN" | "REVERT" | "OBSERVE"
```
### Adding new skills
1. Define a `Tool` with type, description, auth requirements, timeout
2. Create a `Skill` that references the tool(s) and required indicator types
3. Register the skill in `SKILLS_REGISTRY` with a canonical name
4. Add skill execution logic in `OrchestratorAgent._execute_skill`
5. Add corresponding test in `tests/test_orchestrator.py`
Do not add skills directly to `app.py`; add them to `orchestrator.py` and wire
them through the orchestrator API.
## AI Agent (Claude-powered)
`agent/osint_agent.py` β `OSINTAgent` wraps `claude-opus-4-7` with adaptive thinking
and prompt caching (the ~2000-token system prompt is cached via `cache_control: ephemeral`,
cutting input costs ~90% after the first turn).
```python
from agent import OSINTAgent
agent = OSINTAgent() # reads ANTHROPIC_API_KEY from env
result = agent.analyze_target("example.com", analysis_type="passive") # or: full|threat|footprint|breach|darkweb|socmint
report = agent.generate_ioc_report(["1.2.3.4", "evil.com"])
```
CLI: `python -m agent.cli --target example.com --type passive`
or interactive: `python -m agent.cli`
Conversation history preserves full `response.content` blocks (including thinking blocks)
for correct multi-turn context propagation.
## Known state / critical gotchas
1. **`osint_core/drift.py` is pseudocode, not Python.** It uses `DEFINE`,
`FUNCTION`, `RETURN`, `FOR β¦ IN` as bare keywords and will raise
`SyntaxError` on import. `tests/test_drift.py` imports
`DriftAssessment`, `DriftSignal`, `DriftType`, `DriftVector`,
`TelemetrySnapshot`, `aggregate_signals`, `assess_drift`,
`choose_dominant_drift_type`, `estimate_confidence`, and
`recommend_correction` β these are not yet implemented in Python. Treat
`drift.py` as a spec to implement against the tests, not as working code
to edit in-place.
2. **Two module registries.** `app.py` hard-codes `OSINT_LINKS`,
`PASSIVE_MODULES`, `AUTHORIZED_ONLY_MODULES`. `osint_core/policy.py` has
the canonical `MODULE_POLICIES` registry plus `ALIASES`. When adding a
module, update the `osint_core` registry first; mirror into `app.py` only
if the UI needs it.
3. **Two validators.** `app.py` has inline `sanitize_text` /
`classify_and_normalize` / `validate_as_type`. `osint_core/validators.py`
is the stricter, structured replacement (`ValidationResult`,
`ValidationErrorCode`). Prefer `osint_core` for new code paths.
4. **`OSINT_HASH_SALT` is required.** `app.py:get_hash_salt` raises on
startup without it. For local/dev only, set `ALLOW_DEV_SALT=true`. Never
commit a salt value.
5. **Correction verbs are a closed set:** `ADAPT`, `CONSTRAIN`, `REVERT`,
`OBSERVE`. Do not introduce new verbs; `policy.enforce_correction_verb`
rejects anything else.
6. **`policy.yaml` says `immutable: true`.** Policy changes require the
out-of-band gate in `policy.enforce_policy_mutation_gate`. Do not silently
broaden rules.
## Design invariants (must not be violated)
From `policy.yaml`, `manifest.json`, and `app.py:make_manifest`:
- Passive by default. No scanning, brute forcing, credential testing, or
exploitation β these are `risk="forbidden"` in `MODULE_POLICIES` and must
stay that way.
- Validation runs before anything else. Downstream code does not re-validate.
- Hash (HMAC-SHA256 with `OSINT_HASH_SALT`, lowercased input) before writing
to audit logs. Raw indicators never enter audit payloads β
`policy.enforce_audit_payload` rejects `raw_indicator`, `raw_input`,
`indicator`, `email`, `domain`, `username`, `url`, `ip` keys.
- Authorized-only modules (`http_headers`, `robots_txt`, `screenshot`) stay
blocked unless the caller asserts `authorized_target=True` AND
`passive_only=False`.
- Drift detection is pure β it does not mutate telemetry, baseline, manifest,
or policy input. See `test_assess_drift_is_pure_and_does_not_mutate_inputs`.
- Correction priority: **policy > structural > behavioral > adversarial >
operational > statistical.** Adversarial CONSTRAINs before the system
ADAPTs. Statistical drift may ADAPT only when nothing higher-priority fires.
## Development workflow
### Setup
```bash
pip install -r requirements.txt
# pytest and lint tooling are not in requirements.txt β install ad hoc:
pip install pytest ruff bandit pip-audit
export OSINT_HASH_SALT="$(python -c 'import secrets;print(secrets.token_hex(32))')"
# or for local-only:
export ALLOW_DEV_SALT=true
# for agent/ module:
export ANTHROPIC_API_KEY=<your-key>
```
### Run the app
```bash
python app.py
# Gradio binds to 127.0.0.1:7860 by default.
```
### Test
```bash
pytest # expect test_policy to pass; test_drift will fail
pytest tests/test_policy.py -v
pytest tests/test_policy.py::TestPolicyEnforcement::test_passive_only -v # single test
```
Before claiming drift work is done, `pytest tests/test_drift.py` must pass.
### Lint / security scan (per README)
```bash
ruff check .
bandit -r osint_core/
pip-audit
```
None of these are wired into CI yet β run locally.
## Conventions
- **Python:** type hints throughout, `from __future__ import annotations`,
`@dataclass(frozen=True)` for value objects, `Literal[...]` for closed
enums. Match the existing style in `osint_core/validators.py` and
`osint_core/policy.py`.
- **Errors:** structured exceptions with an error-code enum
(`ValidationErrorCode`, `PolicyErrorCode`). Prefer returning a result
dataclass (`ValidationResult`, `PolicyEvaluation`) over raising for
expected failure paths; raise only at enforcement boundaries
(`assert_valid_or_raise`, `enforce_*`).
- **Module naming:** UI labels are human (`"HTTP Headers"`); canonical names
are snake_case (`"http_headers"`). Route every UI input through
`canonicalize_module_name` before policy checks.
- **No new dependencies without reason.** `requirements.txt` uses pinned
ranges; preserve the lower/upper bounds when bumping.
- **Never log raw indicators.** Add a test in the style of
`test_audit_payload_blocks_raw_indicator_fields` when introducing new
audit sinks.
- **Docstrings:** module-level docstrings state design intent (see
`validators.py`, `policy.py`). Keep that pattern for new modules.
## Git workflow
- Default branch: `main`.
- Claude work branch (this environment): `claude/skills-agent-osint-4zbxP`.
Push only to the designated feature branch; never force-push `main`.
- GitHub repo scope for MCP tools: `canstralian/passiveosintcontrolpanel`
only. Other repos are denied.
- After pushing, open a **draft** PR if one does not already exist.
- Commit style from `git log`: short imperative titles
(`Create osint_core/policy.py`, `Update requirements.txt`).
## Where to make common changes
| Task | File(s) |
| --- | --- |
| Add a new OSINT source link | `app.py:OSINT_LINKS` and `data/sources.yaml` |
| Add / change a module's risk or auth requirement | `osint_core/policy.py:MODULE_POLICIES` + test in `tests/test_policy.py` + mirror in `policy.yaml` |
| Tighten input validation | `osint_core/validators.py` (regexes, `DANGEROUS_PATTERNS`, `PRIVATE_NETS`) |
| Implement drift detection | `osint_core/drift.py` β rewrite the pseudocode to satisfy `tests/test_drift.py` |
| Change correction verbs | Forbidden without out-of-band approval; touches `osint_core/policy.py:ALLOWED_CORRECTION_VERBS`, `app.py:CorrectionVerb`, and `policy.yaml` |
| Wire a new module into the UI | `app.py:PASSIVE_MODULES`, `run_enrichment`, and the Gradio `CheckboxGroup` |
| Change audit schema | `app.py:TelemetryEvent` + `write_audit` + any consumer in `export_audit_index` |
| Add a new orchestrator skill | `osint_core/orchestrator.py:SKILLS_REGISTRY` + tool definition + test in `tests/test_orchestrator.py` |
| Extend the AI agent's analysis types | `agent/osint_agent.py:_build_analysis_prompt` (add to the `prompts` dict) |
| Adjust the AI agent's OSINT persona / knowledge | `agent/osint_agent.py:OSINT_SYSTEM_PROMPT` (re-cache will happen automatically on next call) |
## Runtime artifacts (gitignored-ish)
`app.py` creates `runs/reports/` and `runs/audit/` on import and writes per-run
`.md` and `.json` files there. These directories are not tracked; do not
commit their contents.
|