new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jun 9

Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI

As AI systems evolve from passive models into autonomous active agents capable of initiating actions, collaborating, and delegating tasks, the traditional boundaries of software systems blur. Traditional authorization and delegation frameworks, built around fixed principals, explicit requests, and static scopes, are insufficient to govern agentic systems. Agentic AI demands richer authorization semantics: agents must inherit and delegate permissions, act under time-limited authority, and coordinate through shared protocols. Existing Identity and Access Management (IAM) systems fail to fully capture this notion of agency, lacking mechanisms for recursive delegation, contextual boundaries, and dynamic scoping as executable governance primitives. Unlike access delegation standards such as OAuth 2.0, we treat delegation as a contractual term rather than merely a static token-based consent credential. This paper proposes a compositional governance framework that introduces primitives indispensable for agentic AI. We define types of delegation and their permissions and accountability implications, and we introduce a notion of resource scope attenuation to bound agentic access envelopes. These concepts are expressed as general relational definitions that can be composed into existing authorization domains (e.g., financial systems). To operationalize this composition, we define a compositional operator that overlays new agentic semantics, such as recursive delegation chains, onto existing relational policies without rewriting them. We substantiate this framework through formal proofs and empirical evaluation, showing that it provides a formal yet practical foundation for accountable authorization in agentic AI systems.

  • 2 authors
·
Jun 1

ODS: A self-reporting system for radio telescopes to coexist with adaptive satellite constellations

Low Earth orbit (LEO) satellite constellations bring broadband internet and cellular service to the most remote locations on the planet. Unfortunately, many of these locations also host some of the world's best optical and radio astronomy (RA) observatories. With the number of LEO satellites expected to increase by an order of magnitude in the upcoming decade, satellite downlink radio frequency interference (RFI) is a growing concern in protected radio-quiet areas like the United States National Radio Quiet Zone. When these satellites transmit in the spectrum near protected RA bands, undesired out-of-band emission can leak into these protected bands and impact scientific observations. In this paper, we present a self-reporting system - Operational Data Sharing (ODS) - which enables mutual awareness by publishing radio telescopes' operational information to a protected database that is available to satellite operators through a representational state transfer application programming interface (REST API). Satellite operators can use the ODS data to adapt their downlink tasking algorithms in real time to avoid overwhelming sensitive RA facilities, particularly, through the novel Telescope Boresight Avoidance (TBA) technique. Preliminary results from recent experiments between the NRAO and the SpaceX Starlink teams demonstrate the effectiveness of the ODS and TBA in reducing downlink RFI in the Karl G. Jansky Very Large Array's observations in the 1990-1995 MHz and 10.7-12.7 GHz bands. This automated ODS system is beginning to be implemented by other RA facilities and could be utilized by other satellite operators in the near future.

  • 17 authors
·
Feb 20, 2025

Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation

Off-Policy Evaluation (OPE) aims to assess the effectiveness of counterfactual policies using only offline logged data and is often used to identify the top-k promising policies for deployment in online A/B tests. Existing evaluation metrics for OPE estimators primarily focus on the "accuracy" of OPE or that of downstream policy selection, neglecting risk-return tradeoff in the subsequent online policy deployment. To address this issue, we draw inspiration from portfolio evaluation in finance and develop a new metric, called SharpeRatio@k, which measures the risk-return tradeoff of policy portfolios formed by an OPE estimator under varying online evaluation budgets (k). We validate our metric in two example scenarios, demonstrating its ability to effectively distinguish between low-risk and high-risk estimators and to accurately identify the most efficient one. Efficiency of an estimator is characterized by its capability to form the most advantageous policy portfolios, maximizing returns while minimizing risks during online deployment, a nuance that existing metrics typically overlook. To facilitate a quick, accurate, and consistent evaluation of OPE via SharpeRatio@k, we have also integrated this metric into an open-source software, SCOPE-RL (https://github.com/hakuhodo-technologies/scope-rl). Employing SharpeRatio@k and SCOPE-RL, we conduct comprehensive benchmarking experiments on various estimators and RL tasks, focusing on their risk-return tradeoff. These experiments offer several interesting directions and suggestions for future OPE research.

  • 6 authors
·
Nov 29, 2023

SCOPE: Selective Conformal Optimized Pairwise LLM Judging

Large language models (LLMs) are increasingly used as judges to replace costly human preference labels in pairwise evaluation. Despite their practicality, LLM judges remain prone to miscalibration and systematic biases. This paper proposes SCOPE (Selective Conformal Optimized Pairwise Evaluation), a framework for selective pairwise judging with finite-sample statistical guarantees. Under exchangeability, SCOPE calibrates an acceptance threshold such that the error rate among non-abstained judgments is at most a user-specified level α. To provide SCOPE with a bias-neutral uncertainty signal, we introduce Bidirectional Preference Entropy (BPE), which queries the judge under both response positions, aggregates the implied preference probabilities to enforce invariance to response order, and converts the aggregated probability into an entropy-based uncertainty score. Across MT-Bench, RewardBench, and Chatbot Arena, BPE improves uncertainty quality over standard confidence proxies, providing a stronger selection signal that enables SCOPE to consistently meet the target risk level while retaining good coverage across judge scales. In particular, at α= 0.10, SCOPE consistently satisfies the risk bound across all benchmarks and judge scales (empirical risk approx 0.097 to 0.099), while retaining substantial coverage, reaching 0.89 on RewardBench with Qwen-14B and 0.98 on RewardBench with Qwen-32B. Compared to naïve baselines, SCOPE accepts up to 2.4times more judgments on MT-Bench with Qwen-7B under the same target risk constraint, demonstrating that BPE enables reliable and high-coverage LLM-based evaluation.

  • 3 authors
·
Feb 18