Buckets:
Title: How to Trust Your Diffusion Model: A Convex Optimization Approach to Conformal Risk Control
URL Source: https://arxiv.org/html/2302.03791
Markdown Content: Back to arXiv
This is experimental HTML to improve accessibility. We invite you to report rendering errors. Use Alt+Y to toggle on accessible reporting links and Alt+Shift+Y to toggle off. Learn more about this project and help improve conversions.
Why HTML? Report Issue Back to Abstract Download PDF 1Introduction 2Background 3How to Trust Your Diffusion Model 4Experiments 5Conclusions License: CC BY 4.0 arXiv:2302.03791v3 [stat.ML] 27 Dec 2023 How to Trust Your Diffusion Model: A Convex Optimization Approach to Conformal Risk Control Jacopo Teneggi1 2 3 Matthew Tivnan4 J. Webster Stayman4 Jeremias Sulam 1 4 3 Abstract
Score-based generative modeling, informally referred to as diffusion models, continue to grow in popularity across several important domains and tasks. While they provide high-quality and diverse samples from empirical distributions, important questions remain on the reliability and trustworthiness of these sampling procedures for their responsible use in critical scenarios. Conformal prediction is a modern tool to construct finite-sample, distribution-free uncertainty guarantees for any black-box predictor. In this work, we focus on image-to-image regression tasks and we present a generalization of the Risk-Controlling Prediction Sets (RCPS) procedure, that we term ๐พ -RCPS, which allows to (i) provide entrywise calibrated intervals for future samples of any diffusion model, and (ii) control a certain notion of risk with respect to a ground truth image with minimal mean interval length. Differently from existing conformal risk control procedures, ours relies on a novel convex optimization approach that allows for multidimensional risk control while provably minimizing the mean interval length. We illustrate our approach on two real-world image denoising problems: on natural images of faces as well as on computed tomography (CT) scans of the abdomen, demonstrating state of the art performance.
1Introduction
Generative modeling is one of the longest standing tasks of classical and modern machine learning [12]. Recently, the foundational works by Song and Ermon [50], Song et al. [52], Pang et al. [43] on sampling via score-matching [27] and by Ho et al. [23] on denoising diffusion models [49] paved the way for a new class of score-based generative models, which solve a reverse-time stochastic differential equation (SDE) [53, 2]. These models have proven remarkably effective on both unconditional (i.e., starting from random noise) and conditional (e.g., inpainting, denoising, super-resolution, or class-conditional) sample generation across a variety of fields [61, 17]. For example, score-based generative models have been applied to inverse problems in general computer vision and medical imaging [29, 32, 31, 59, 55, 54], 3D shape generation [62, 60, 42], and even in protein design [25, 16, 58, 28].
These strong empirical results highlight the potential of score-based generative models. However, they currently lack of precise statistical guarantees on the distribution of the generated samples, which hinders their safe deployment in high-stakes scenarios [26]. For example, consider a radiologist who is shown a computed tomography (CT) scan of the abdomen of a patient reconstructed via a score-based generative model. How confident should they be of the fine-grained details of the presented image? Should they trust that the model has not hallucinated some of the features (e.g., calcifications, blood vessels, or nodules) involved in the diagnostic process? Put differently, how different will future samples be from the presented image, and how far can we expect them to be from the ground truth image?
In this work we focus on image-to-image regression problems, where we are interested in recovering a high-quality ground truth image given a low-quality observation. While our approach is general, we focus on the problem of image denoising as a running example. We address the questions posed above on the reliability of score-based generative models (and, more generally, of any sampling procedure) through the lens of conformal prediction [44, 57, 38, 48, 3] and conformal risk control [10, 4, 5] which provide any black-box predictor with distribution-free, finite-sample uncertainty guarantees. In particular, the contribution of this paper is three-fold:
1.
Given a fixed score network, a low-quality observation, and any sampling procedure, we show how to construct valid entrywise calibrated intervals that provide coverage of future samples, i.e. future samples (on the same observation) will fall within the intervals with high probability;
2.
We introduce a novel high-dimensional conformal risk control procedure that minimizes the mean interval length directly, while guaranteeing the number of pixels in the ground truth image that fall outside of these intervals is below a user-specified level on future, unseen low-quality observations;
3.
We showcase our approach for denoising of natural face images as well as for computed tomography of the abdomen, achieving state of the art results in mean interval length.
Going back to our example, providing such uncertainty intervals with provable statistical guarantees would improve the radiologistโs trust in the sense that these intervals precisely characterize the type of tissue that could be reconstructed by the model. Lastly, even though our contributions are presented in the context of score-based generative modeling for regression problemsโgiven their recent popularity [33, 61, 18]โour results are broadly applicable to any sampling procedure, and we will comment on potential direct extensions where appropriate.
1.1Related work Image-to-Image Risk Control
Previous works have explored conformal risk control procedures for image-to-image regression tasks. In particular, Angelopoulos et al. [6] show how to construct set predictors from heuristic notions of uncertainty (e.g., quantile regression [34, 45]) for any image regressor, and how to calibrate the resulting intervals according to the original RCPS procedure of Bates et al. [10]. Kutiel et al. [35] move beyond set predictors and propose a mask-based conformal risk control procedure that allows for notions of distance between the ground truth and predicted images other than interval-based ones. Finally, and most closely to this paper, Horwitz and Hoshen [26] sketch ideas of conformal risk control for diffusion models with the intention to integrate quantile regression and produce heuristic sampling bounds without the need to sample several times. Horwitz and Hoshen [26] also use the original RCPS procedure to guarantee risk control. Although similar in spirit, the contribution of this paper focuses on a high-dimensional generalization of the original RCPS procedure that formally minimizes the mean interval length. Our proposed procedure is agnostic of the notion of uncertainty chosen to construct the necessary set predictors.
2Background
First, we briefly introduce the necessary notation and general background information. Herein, we will refer to images as vectors in โ ๐ , such that ๐ณ โ โ ๐ and ๐ด โ โ ๐ indicate the space of high-quality ground truth images, and low-quality observations, respectively. We assume both ๐ณ and ๐ด to be bounded. For a general image-to-image regression problem, given a pair ( ๐ฅ , ๐ฆ ) drawn from an unknown distribution ๐ over ๐ณ ร ๐ด , the task is to retrieve ๐ฅ โ ๐ณ given ๐ฆ โ ๐ด . This is usually carried out by means of a predictor ๐ : ๐ด โ ๐ณ that minimizes some notion of distance (e.g., MSE loss) between the ground truth images and reconstructed estimates on a set { ( ๐ฅ ๐ , ๐ฆ ๐ ) } ๐
1 ๐ โผ ๐ ๐ of ๐ pairs of high- and low-quality images. For example, in the classical denoising problem, one has ๐ฆ
๐ฅ + ๐ฃ 0 where ๐ฃ 0 โผ ๐ฉ โข ( 0 , ๐ 0 2 โข ๐ ) is random Gaussian noise with variance ๐ 0 2 , and one wishes to learn a denoiser ๐ such that ๐ โข ( ๐ฆ ) โ ๐ฅ .
2.1Score-based Conditional Sampling
Most image-to-image regression problems are ill-posed: there exist several ground truth images that could have generated the same low-quality observation. This is easy to see for the classical denoising problem described above. Instead of a point predictor ๐ โwhich could approximate a maximum-a-posteriori (MAP) estimateโone is often interested in devising a sampling procedure ๐น : ๐ด โ ๐ณ for the posterior ๐ โข ( ๐ฅ | ๐ฆ ) , which precisely describes the distribution of possible ground truth images that generated the observation ๐ฆ . In real-world scenarios, however, the full joint ( ๐ฅ , ๐ฆ ) is unknown, and one must resort to approximate ๐ โข ( ๐ฅ | ๐ฆ ) from finite data. It is known that for a general Itรด process d โข ๐ฅ
โ โข ( ๐ฅ , ๐ก ) โข d โข ๐ก + ๐ โข ( ๐ก ) โข d โข ๐ค that perturbs an input ๐ฅ into random noise [30], it suffices to know the Stein score โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ ) [2, 39] to sample from ๐ โข ( ๐ฅ ) via the reverse-time process
d โข ๐ฅ
[ โ โข ( ๐ฅ , ๐ก ) โ ๐ โข ( ๐ก ) 2 โข โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ ) ] โข d โข ๐ก + ๐ โข ( ๐ก ) โข d โข ๐ค ยฏ ,
(1)
where โ โข ( ๐ฅ , ๐ก ) and ๐ โข ( ๐ก ) are a drift and diffusion term, respectively, and d โข ๐ค and d โข ๐ค ยฏ are forward- and reverse-time standard Brownian motion.5 Furthermore, if the likelihood ๐ โข ( ๐ฆ | ๐ฅ ) is knownโwhich is usually the case for image-to-image regression problemsโit is possible to condition the sampling procedure on an observation ๐ฆ . Specifically, by Bayesโ rule, it follows that โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ | ๐ฆ )
โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฆ | ๐ฅ ) + โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ ) which can be plugged-in into the reverse-time SDE in Equation 1 to sample from ๐ โข ( ๐ฅ | ๐ฆ ) .
Recent advances in generative modeling by Song and Ermon [50], Song et al. [53] showed that one can efficiently train a time-conditional score network ๐ โข ( ๐ฅ ~ , ๐ก ) to approximate the score โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ ~ ) via denoising score-matching [27]. In this way, given a forward-time SDE that models the observation process, a score network ๐ โข ( ๐ฅ ~ , ๐ก ) โ โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ ~ ) , and the likelihood term ๐ โข ( ๐ฆ | ๐ฅ ~ ) , one can sample from ๐ โข ( ๐ฅ | ๐ฆ ) by solving the conditional reverse-time SDE with any discretization (e.g., Euler-Maruyama) or predictor-corrector scheme [53]. While these models perform remarkably well in practice, limited guarantees exist on the distributions that they sample from [37]. Instead, we will provide guarantees for diffusion models by leveraging ideas of conformal prediction and conformal risk control, which we now introduce.
2.2Conformal Prediction
Conformal prediction has a rich history in mathematical statistics [57, 44, 56, 8, 9, 22].6 It comprises various methodologies to construct finite-sample, statistically valid uncertainty guarantees for general predictors without making any assumption on the distribution of the response (i.e., they are distribution-free). It particular, these methods construct valid prediction sets that provide coverage, which we now define.
Definition 2.1 (Coverage [48]).
Let ๐ง 1 , โฆ , ๐ง ๐ , ๐ง ๐ + 1 be ๐ + 1 exchangeable random variables drawn from the same unknown distribution ๐ฌ over ๐ต . For a desired miscoverage level ๐ผ โ [ 0 , 1 ] , a set ๐ โ 2 ๐ต that only depends on ๐ง 1 , โฆ , ๐ง ๐ provides coverage if
โ โข [ ๐ง ๐ + 1 โ ๐ ] โฅ 1 โ ๐ผ .
(2)
We remark that the notion of coverage defined above was introduced in the context of classification problems, where one is interested in guaranteeing that the true, unseen label of a future sample will be in the prediction set ๐ with high probability. It is immediate to see how conformal prediction conveys a very precise notion of uncertaintyโthe larger ๐ has to be in order to guarantee coverage, the more uncertain the underlying predictor. We refer the interested reader to [48, 3] for classical examples of conformal prediction.
In many scenarios (e.g., regression), the natural notion of uncertainty may be different from miscoverage as described above (e.g., โ 2 norm). We now move onto presenting conformal risk control, which extends the coverage to any notion of risk.
2.3Conformal Risk Control
Let โ : ๐ด โ ๐ณ โฒ be a general set-valued predictor from ๐ด into ๐ณ โฒ โ 2 ๐ณ . Consider a nonnegative loss โ : ๐ณ ร ๐ณ โฒ โ โ measuring the discrepancy between a ground truth ๐ฅ and the predicted intervals โ โข ( ๐ฆ ) . We might be interested in guaranteeing that this loss will be below a certain tolerance ๐ โฅ 0 with high probability on future, unseen samples ๐ฆ for which we do not know the ground truth ๐ฅ . Conformal risk control [10, 4, 5] extends ideas of conformal prediction in order to select a specific predictor โ that controls the risk ๐ผ โข [ โ โข ( ๐ฅ , โ โข ( ๐ฆ ) ) ] in the following sense.
Definition 2.2 (Risk Controlling Prediction Sets).
Let ๐ฎ cal
{ ( ๐ฅ ๐ , ๐ฆ ๐ ) } ๐
1 ๐ โผ ๐ ๐ be a calibration set of ๐ i.i.d. samples from an unknown distribution ๐ over ๐ณ ร ๐ด . For a desired risk level ๐ โฅ 0 and a failure probability ๐ฟ โ [ 0 , 1 ] , a random set-valued predictor โ : ๐ด โ ๐ณ โฒ โ 2 ๐ณ is an ( ๐ , ๐ฟ ) -RCPS w.r.t. a loss function โ : ๐ณ ร ๐ณ โฒ โ โ if
โ ๐ฎ cal โข [ ๐ผ ( ๐ฅ , ๐ฆ ) โผ ๐ โข [ โ โข ( ๐ฅ , โ โข ( ๐ฆ ) ) ] โค ๐ ] โฅ 1 โ ๐ฟ .
(3)
Bates et al. [10] introduced the first conformal risk control procedure for monotonically nonincreasing loss functions, those that satisfy, for a fixed ๐ฅ ,
โ โข ( ๐ฆ ) โ โ โฒ โข ( ๐ฆ ) โน โ โข ( ๐ฅ , โ โฒ โข ( ๐ฆ ) ) โค โ โข ( ๐ฅ , โ โข ( ๐ฆ ) ) .
(4)
In this way, increasing the size of the sets cannot increase the value of the loss. Furthermore, assume that for a fixed input ๐ฆ the family of set predictors { โ ๐ โข ( ๐ฆ ) } ๐ โ ฮ , indexed by ๐ โ ฮ , ฮ โ โ ยฏ โ โ โช { ยฑ โ } , satisfies the following nesting property [22]
๐ 1 < ๐ 2 โน โ ๐ 1 โข ( ๐ฆ ) โ โ ๐ 2 โข ( ๐ฆ ) .
(5)
Denote ๐ โข ( ๐ )
๐ผ โข [ โ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) ] the risk of โ ๐ โข ( ๐ฆ ) and ๐ ^ โข ( ๐ ) its empirical estimate over a calibration set ๐ฎ cal
{ ( ๐ฅ ๐ , ๐ฆ ๐ ) } ๐
1 ๐ . Finally, let ๐ ^ + โข ( ๐ ) be a pointwise upper confidence bound (UCB) that covers the risk, that is
โ โข [ ๐ โข ( ๐ ) โค ๐ ^ + โข ( ๐ ) ] โฅ 1 โ ๐ฟ
(6)
for each, fixed value of ๐ โsuch that can be derived by means of concentration inequalities (e.g., Hoeffdingโs inequality [24], Bentkusโ inequality [11], or respective hybridization [10]).7 With these elements, Bates et al. [10] show that choosing
๐ ^
inf { ๐ โ ฮ : ๐ ^ + โข ( ๐ โฒ ) < ๐ , โ ๐ โฒ โฅ ๐ }
(7)
guarantees that โ ๐ ^ โข ( ๐ฆ ) is an ( ๐ , ๐ฟ ) -RCPS according to Definition 2.2. In other words, choosing ๐ ^ as the smallest ๐ such that the UCB is below the desired level ๐ for all values of ๐ โฅ ๐ ^ controls the risk al level ๐ with probability at least 1 โ ๐ฟ . For the sake of completeness, we include the original conformal risk control procedure in Algorithm 2 in Appendix B.
Equipped with these general concepts, we now move onto presenting the contributions of this work.
3How to Trust Your Diffusion Model
We now go back to the main focus of this paper: solving image-to-image regression problems with diffusion models. Rather than a point-predictor ๐ : ๐ด โ ๐ณ , we assume to have access to a stochastic sampling procedure ๐น : ๐ด โ ๐ณ such that ๐น โข ( ๐ฆ ) is a random variable with unknown distribution ๐ฌ ๐ฆ โthat hopefully approximates the posterior distribution of ๐ฅ given ๐ฆ , i.e. ๐ฌ ๐ฆ โ ๐ โข ( ๐ฅ | ๐ฆ ) . However, we make no assumptions on the quality of this approximation for our results to hold. As described in Section 2.1, ๐น can be obtained by means of a time-conditional score network ๐ โข ( ๐ฅ ~ , ๐ก ) and a reverse-time SDE. While our results are applicable to any sampling procedure, we present them in the context of diffusion models because of their remarkable empirical results and increasing use in critical applications [61, 17].
One can identify three separate sources of randomness in a general stochastic image-to-image regression problem: (i) the unknown prior ๐ โข ( ๐ฅ ) over the space of ground-truth images, as ๐ฅ โผ ๐ โข ( ๐ฅ ) , (ii) the randomness in the observation process of ๐ฆ (which can be modeled by a forward-time SDE over ๐ฅ ), and finally (iii) the stochasticity in the sampling procedure ๐น โข ( ๐ฆ ) . We will first provide conformal prediction guarantees for a fixed observation ๐ฆ , and then move onto conformal risk control for the ground truth image ๐ฅ .
3.1Calibrated Quantiles for Future Samples
Given the same low-quality (e.g., noisy) observation ๐ฆ , where will future unseen samples from ๐น โข ( ๐ฆ ) โผ ๐ฌ ๐ฆ fall? How concentrated will they be? Denote โ : ๐ด โ ๐ณ โฒ a (random) set-valued predictor from ๐ด โ โ ๐ into a space of sets ๐ณ โฒ โ 2 ๐ณ over ๐ณ โ โ ๐ (e.g., ๐ณ
[ 0 , 1 ] ๐ , ๐ณ โฒ โ 2 [ 0 , 1 ] ๐ ). We extend the notion of coverage in Definition 2.1 to entrywise coverage, which we now make precise.
Definition 3.1 (Entrywise coverage).
Let ๐ง 1 , โฆ , ๐ง ๐ , ๐ง ๐ + 1 be ๐ + 1 exchangeable random vectors drawn from the same unknown distribution ๐ฌ over ๐ณ โ โ ๐ . For a desired miscoverage level ๐ผ โ [ 0 , 1 ] , a set โ โ 2 ๐ณ that only depends on ๐ง 1 , โฆ , ๐ง ๐ provides entrywise coverage if
โ โข [ ( ๐ง ๐ + 1 ) ๐ โ โ ๐ ] โฅ 1 โ ๐ผ
(8)
for each ๐ โ [ ๐ ] โ { 1 , โฆ , ๐ } .
We stress that the definition above is different from notions of vector quantiles [13, 15] in the sense that coverage is not guaranteed over the entire new random vector ๐ง ๐ + 1 but rather along each dimension independently. Ideas of vector quantile regression (VQR) are complementary to the contribution of the current work and subject of ongoing research [21, 14, 47].
For a fixed observation ๐ฆ , we use conformal prediction to construct a set predictor that provides entrywise coverage.
Lemma 3.2 (Calibrated quantiles guarantee entrywise coverage).
Let ๐น : ๐ด โ ๐ณ be a stochastic sampling procedure from ๐ด โ โ ๐ into ๐ณ โ โ ๐ . Given ๐ฆ โ ๐ด , let ๐น 1 , โฆ , ๐น ๐ , ๐น ๐ + 1 be ๐ + 1 i.i.d. samples from ๐น โข ( ๐ฆ ) . For a desired miscoverage level ๐ผ โ [ 0 , 1 ] and for each ๐ โ [ ๐ ] , let ๐ ^ ๐ , ๐ผ , ๐ข ^ ๐ , ๐ผ be the โ ( ๐ + 1 ) โข ๐ผ / 2 โ / ๐ and โ ( ๐ + 1 ) โข ( 1 โ ๐ผ / 2 ) โ / ๐ entrywise calibrated empirical quantiles of ๐น 1 , โฆ , ๐น ๐ . Then,
โ ๐ผ โข ( ๐ฆ ) ๐
[ ๐ ^ ๐ , ๐ผ , ๐ข ^ ๐ , ๐ผ ]
(9)
provides entrywise coverage.
The simple proof of this result is included in Section A.1. We remark that, analogously to previous works [6, 26], the intervals in โ ๐ผ โข ( ๐ฆ ) are feature-dependent and they capture regions of the image where the sampling process ๐น โข ( ๐ฆ ) may have larger uncertainty. The intervals in โ ๐ผ โข ( ๐ฆ ) are statistically valid for any number of samples ๐ and any distribution ๐ฌ ๐ฆ , i.e. they are not a heuristic notion of uncertainty. If the sampling procedure ๐น is a diffusion model, constructing โ ๐ผ โข ( ๐ฆ ) is agnostic of the discretization scheme used to solve the reverse-time SDE [53] and it does not require retraining the underlying score network, which can be a time-consuming and delicate process, especially when the size of the images is considerable. On the other hand, constructing the intervals โ ๐ผ โข ( ๐ฆ ) requires sampling a large enough number of times from ๐น โข ( ๐ฆ ) , which may seen cumbersome [26]. This is by construction and intention: diffusion models are indeed very useful in providing good (and varied) samples from the approximate posterior. In this way, practitioners do typically sample several realizations to get an empirical study of this distribution. In these settings, constructing the intervals โ ๐ผ โข ( ๐ฆ ) does not involve any additional computational costs. Furthermore, note that sampling is completely parallelizable, and so no extra complexity is incurred if a larger number of computing nodes are available.
3.2A Provable Approach to Optimal Risk Control
In this section, we will revisit the main ideas around conformal risk control introduced in Section 2.3 and generalize them into our proposed approach, ๐พ -RCPS. Naturally, one would like a good conformal risk control procedure to yield the shortest possible interval lengths. Assume pixel intensities are normalized between [ 0 , 1 ] and consider the loss function
โ 01 โข ( ๐ฅ , โ โข ( ๐ฆ ) )
1 ๐ โข โ ๐ โ [ ๐ ] ๐ โข [ ๐ฅ ๐ โ โ โข ( ๐ฆ ) ๐ ] ,
(10)
which counts the (average) number of ground truth pixels that fall outside of their respective intervals in โ โข ( ๐ฆ ) . The constant set-valued predictor ๐ฐ โข ( ๐ฆ )
[ 0 , 1 ] ๐ would trivially control the risk, i.e. ๐ 01 โข ( ๐ )
๐ผ โข [ โ 01 โข ( ๐ฅ , ๐ฐ โข ( ๐ฆ ) ) ]
0 . Alas, such a predictor would be completely uninformative. Instead, let { โ ๐ โข ( ๐ฆ ) } ๐ โ ฮ , ฮ โ โ ยฏ be a family of predictors that satisfies the nesting property in Equation 5. In particular, we propose the following additive parametrization in ๐
โ ๐ โข ( ๐ฆ ) ๐
[ ๐ ^ ๐ โ ๐ , ๐ข ^ ๐ + ๐ ]
(11)
for some lower and upper endpoints ๐ ^ ๐ < ๐ข ^ ๐ that may depend on ๐ฆ . For this particularly chosen family of nested predictors, it follows that the mean interval length is
๐ผ ยฏ โข ( ๐ )
1 ๐ โข โ ๐ โ [ ๐ ] ( ๐ข ^ ๐ โ ๐ ^ ๐ ) + 2 โข ๐ ,
(12)
a linear function of ๐ . Moreover, we can instantiate ๐ ^ ๐ and ๐ข ^ ๐ to be the calibrated quantiles with entrywise coverage, i.e. โ ๐ ๐ผ โข ( ๐ฆ )
[ ๐ ^ ๐ , ๐ผ โ ๐ , ๐ข ^ ๐ , ๐ผ + ๐ ] .
For such a class of predictorsโsince the โ 01 loss is monotonically nonincreasingโthe original RCPS procedure (see Equation 7) is equivalent to the following constrained optimization problem
๐ ^
arg โก min ๐ โ ฮ โก ๐ผ ยฏ โข ( ๐ ) s.t. ๐ ^ 01 + โข ( ๐ โฒ ) < ๐ , โ ๐ โฒ โฅ ๐
(P 1 )
which naturally minimizes ๐ . However, optimizing the mean interval length over a single scalar parameter ๐ is suboptimal in general, as shown in Figure 1. With abuse of notationโwe do not generally refer to vectors with boldfaceโlet { โ ๐ โข ( ๐ฆ ) } ๐ โ ฮ ๐ be a family of predictors indexed by a ๐ -dimensional vector ๐
( ๐ 1 , โฆ , ๐ ๐ ) that satisfies the nesting property in Equation 5 in an entrywise fashion. A natural extension of Equation 11 is then
โ ๐ โข ( ๐ฆ ) ๐
[ ๐ ^ ๐ โ ๐ ๐ , ๐ข ^ ๐ + ๐ ๐ ] ,
(13)
from which one can define an equivalent function ๐ผ ยฏ โข ( ๐ ) . In particular, using the calibrated intervals as before, define
โ ๐ ๐ผ โข ( ๐ฆ )
[ ๐ ^ ๐ , ๐ผ โ ๐ ๐ , ๐ข ^ ๐ , ๐ผ + ๐ ๐ ] .
(14)
Note now that โ 01 โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) is entrywise monotonically nonincreasing. For fixed vectors ๐ ~ โ โ ๐ and ๐ผ โ โ ๐ in the positive orthant (i.e., ๐ผ โฅ 0 , entrywise), denote ๐ฒ ~
๐ ~ + ๐ โข ๐ผ , ๐ โ ฮ the set of points on a line with offset ๐ ~ and direction ๐ผ parametrized by ๐ . Then, the ๐ -dimensional extension of (P 1 ) becomes
๐ ^
arg โก min ๐ โ ๐ฒ ~ โข โ ๐ โ [ ๐ ] ๐ ๐ s.t. ๐ ^ 01 + โข ( ๐ + ๐ฝ โข ๐ผ ) < ๐ , โ ๐ฝ โฅ 0 .
(P ๐ )
We include an explicit analytical expression for ๐ ^ 01 + โข ( ๐ + ๐ผ ) in Appendix C. Intuitively, ๐ ^ minimizes the sum of its entries such that the UCB is smaller than ๐ for all points to its right along the direction of ๐ผ parametrized by ๐ฝ . We now show a general high-dimensional risk control result that holds for any entrywise monotonically nonincreasing loss function โ (and not just โ 01 as presented in (P ๐ )) with risk ๐ โข ( ๐ ) , empirical estimate ๐ ^ โข ( ๐ ) and respective UCB ๐ ^ + โข ( ๐ ) .8
(a) ๐
[ โ 1 , 1 ] ๐ . (b) ๐
[ โ 2 , 0.75 ] ๐ . Figure 1:Pictorial representation of the suboptimality of the choice of a single scalar parameter ๐ w.r.t. the mean interval length. ๐ฎ cal โผ ๐ฉ โข ( ๐ , ๐ 2 ) ๐ , ๐
128 , and ( โ ๐ ) ๐
[ โ 1 โ ๐ ๐ , 1 + ๐ ๐ ] , ๐
( ๐ 1 , ๐ 2 ) . For ๐
๐ฟ
0.1 , ๐ ^ 01 + โข ( ๐ ) is obtained via Hoeffding-Bentkus hybridization. Green areas indicate regions where ๐ ^ 01 + โข ( ๐ ) โค ๐ , and conversely for red regions. 0(a) Shows that when features are concentrated symmetrically around the intervals, minimizing ๐ 1
๐ 2
๐ (blue star) minimizes the mean interval length, while 0(b) shows that in the general case, the optimal ๐ (orange star) may have ๐ 1 โ ๐ 2 . ฮ highlights the gain in mean interval length obtained by choosing the orange star instead of the blue one. Theorem 3.3 (Optimal mean interval length risk control).
Let โ : ๐ณ ร ๐ณ โฒ โ โ , ๐ณ โฒ โ 2 ๐ณ , ๐ณ โ โ ๐ be an entrywise monotonically nonincreasing function and let { โ ๐ โข ( ๐ฆ )
[ ๐ ^ ๐ โ ๐ ๐ , ๐ข ^ ๐ + ๐ ๐ ] } ๐ โ ฮ ๐ be a family of set-valued predictors โ : ๐ด โ ๐ณ โฒ , ๐ด โ โ ๐ indexed by ๐ โ ฮ ๐ , ฮ โ โ ยฏ , for some lower and upper bounds ๐ ^ ๐ < ๐ข ^ ๐ that may depend on ๐ฆ . Given ๐ฒ ~
๐ ~ + ๐ โข ๐ , ๐ โ ฮ , for fixed ๐ ~ โ โ ๐ and ๐ โ โ ๐ , ๐ โฅ 0 , if
๐ ^
arg โก min ๐ โ ๐ฒ ~ โข โ ๐ โ [ ๐ ] ๐ ๐ s.t. ๐ ^ + โข ( ๐ + ๐ฝ โข ๐ผ ) < ๐ , โ ๐ฝ โฅ 0
(15)
then โ ๐ ^ โข ( ๐ฆ ) is an ( ๐ , ๐ฟ ) -RCPS and ๐ ^ minimizes the mean interval length over ๐ฒ ~ .
The proof is included in Section A.2. Since โ 01 is entrywise monotonically nonincreasing, it follows that the solution to (P ๐ ) controls risk. The attentive reader will have noticed (as shown in Figure 1) that the constraint set ๐ ^ 01 + โข ( ๐ ) โค ๐ need not be convex. Furthermore, and as shown in Figure 1(b), โ 01 is not convex in ๐ . Hence, it is not possible to optimally solve (P ๐ ) directly. Instead, we relax it to a convex optimization problem by means of a convex upper bound9
โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) )
1 ๐ โข โ ๐ โ [ ๐ ] [ 2 โข ( 1 + ๐ ) ๐ผ โข ( ๐ ) ๐ โข | ๐ฅ ๐ โ ๐ ๐ | โ ๐ ] + ,
(16)
where ๐
๐พ / ( 1 โ ๐พ ) , ๐พ โ [ 0 , 1 ) , ๐ผ โข ( ๐ ) ๐
๐ข ^ ๐ โ ๐ ^ ๐ + 2 โข ๐ ๐ , ๐ ๐
( ๐ข ^ ๐ + ๐ ^ ๐ ) / 2 , and [ โ ] +
max โก ( 0 , โ ) . As shown in Figure 1(a), the hyperparameter ๐พ controls the degree of relaxation by means of changing the portion of the intervals [ ๐ ^ ๐ , ๐ข ^ ๐ ] where the loss is 0. This way, ๐พ
0 retrieves the โ 1 loss centered at ๐ ๐ , and lim ๐พ โ 1 โ ๐พ
โ if โ ๐ โ [ ๐ ] : ๐ฅ ๐ โ [ ๐ ^ ๐ , ๐ข ^ ๐ ] and 0 otherwise.
(a) โ 01 , โ ๐พ as a function of ๐ฅ . (b) โ 01 , โ ๐พ as a function of ๐ . Figure 2:Visualization of โ 01 โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) and โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) for โ ๐ โข ( ๐ฆ )
[ 0.50 โ ๐ , 1.50 + ๐ ] , ๐พ โ { 0 , 0.5 , 0.9 } . In 1(a) ๐
0 , and in 1(b) ๐ฅ
1.6 .
While one can readily propose a convex alternative to (P ๐ ) by means of this new loss, we instead propose a generalization of this idea in our final problem formulation
๐ ~ ๐พ
arg โก min ๐ โ ฮ ๐พ โข โ ๐ โ [ ๐พ ] ๐ ๐ โข ๐ ๐ s.t. ๐ ^ ๐พ โข ( ๐ โข ๐ ) โค ๐ ,
(P ๐พ )
for any user-defined ๐พ -partition of the [ ๐ ] featuresโwhich can be identified by a membership matrix ๐ โ { 0 , 1 } ๐ ร ๐พ where each feature belongs to (only) one of the ๐พ groups with ๐ ๐ โ | { ๐ โ [ ๐ ] : ๐ ๐ โข ๐
1 } | , โ ๐ โ [ ๐พ ] ๐ ๐
๐ . As we will shortly see, it will be useful to define these groups as the empirical quantiles; i.e. set ๐ as that assigning each feature to their respective ๐ th quantile of the entrywise empirical loss over the optimization set (which is a vector in โ ๐ ). We remark that the constrain set in (P ๐พ ) is defined on the empirical estimate of the risk of โ ๐ โข ( ๐ฆ ) and it does not involve the computation of the UCB. Then, (P ๐พ ) can be solved with any standard off-the-shelf convex optimization software (e.g., CVXPY [19, 1], MOSEK [7]).
Our novel conformal risk control procedure, ๐พ -RCPS, finds a vector ๐ ^ ๐พ that approximates a solution to the nonconvex optimization problem (P ๐ ) via a two step procedure:
1.
First obtaining the optimal solution ๐ ~ ๐พ to a user-defined (P ๐พ ) problem, and then
2.
Choosing ๐ฝ ^ โ ฮ such that
๐ฝ ^
inf { ๐ฝ โ ฮ : ๐ ^ 01 + โข ( ๐ โข ๐ ~ ๐พ + ๐ฝ โฒ โข ๐ ) < ๐ , โ ๐ฝ โฒ โฅ ๐ฝ }
and return ๐ ^ ๐พ
๐ โข ๐ ~ ๐พ + ๐ฝ ^ โข ๐ .
Intuitively, the ๐พ -RCPS algorithm is equivalent to performing the original RCPS procedure along the line ๐ โข ๐ ~ ๐พ + ๐ฝ โข ๐ parametrized by ๐ฝ . We remark thatโas noted in Theorem 3.3โany choice of ๐ผ โฅ 0 provides a valid direction along which to perform the RCPS procedure. Here, we choose ๐ because it is precisely the gradient of the objective function. Future work entails devising more sophisticated algorithms to approximate the solution of (P ๐ ).
Algorithm 1 ๐พ -RCPS 1: Input: risk level ๐ โฅ 0 , failure probability ๐ฟ โ [ 0 , 1 ] , calibration set ๐ฎ cal
{ ( ๐ฅ ๐ , ๐ฆ ๐ ) } ๐
1 ๐ of ๐ i.i.d. samples such that ๐
๐ opt + ๐ RCPS , membership function โณ , family of set-valued predictors { โ ๐ โข ( ๐ฆ )
[ ๐ ^ ๐ โ ๐ ๐ , ๐ข ๐ + ๐ ๐ ] } ๐ โ ฮ ๐ , initial (large) value ๐ฝ max , stepsize d โข ๐ฝ
0 . 2: Split ๐ฎ cal into ๐ฎ opt , ๐ฎ RCPS 3: ๐ โ โณ โข ( ๐ฎ opt ) 4: ๐ ~ ๐พ โ SOLVE-PK โข ( ๐ฎ opt , ๐ ) 5: ๐ โ ๐ โข ๐ ~ ๐พ + ๐ฝ max โข ๐ 6: ๐ ^ 01 + โข ( ๐ ) โ 0 7: while ๐ ^ 01 + โข ( ๐ ) โค ๐ do 8: ๐ prev โ ๐ 9: ๐ โ ๐ โ ( d โข ๐ฝ ) โข ๐ 10: ๐ โ [ ๐ ] + 11: ๐ ^ 01 โข ( ๐ ) โ 1 / ๐ RCPS โ โ ( ๐ฅ ๐ , ๐ฆ ๐ ) โ ๐ฎ RCPS โ 01 โข ( ๐ฅ ๐ , โ ๐ โข ( ๐ฆ ๐ ) ) 12: ๐ ^ 01 + โข ( ๐ ) โ UCB โข ( ๐ RCPS , ๐ฟ , ๐ ^ 01 โข ( ๐ ) ) 13: end while 14: ๐ ^ ๐พ โ ๐ prev 15: return ๐ ^ ๐พ
Algorithm 1 implements the ๐พ -RCPS procedure for any calibration set ๐ฎ cal
{ ( ๐ฅ ๐ , ๐ฆ ๐ ) } ๐
1 ๐ , any general family of set-valued predictors of the form { โ ๐
[ ๐ ^ ๐ โ ๐ ๐ , ๐ข ^ ๐ + ๐ ๐ ] } ๐ โ ฮ ๐ , any membership function โณ : { ๐ณ ร ๐ด } ๐ โ { 0 , 1 } ๐ ร ๐พ , and a general UCB โข ( ๐ , ๐ฟ , ๐ ^ โข ( ๐ ) ) that accepts a number of samples ๐ , a failure probability ๐ฟ , and an empirical risk ๐ ^ โข ( ๐ ) and that returns a pointwise upper confidence bound ๐ ^ + โข ( ๐ ) that satisfies Equation 6. We remark that, following the split fixed sequence testing framework introduced in Angelopoulos et al. [4] and applied in previous work [36], the membership matrix and its optimization problem (P ๐พ ) are computed on a subset ๐ฎ opt of the calibration set ๐ฎ cal , such that the direction ๐ โข ๐ ~ ๐พ + ๐ฝ โข ๐ along which to perform the RCPS procedure is chosen before looking at the data ๐ฎ RCPS
๐ฎ cal โ ๐ฎ opt . We note that ๐พ -RCPS allows for some of the entries in ๐ ^ ๐พ to be set to 0, which preserves the original intervals such thatโif they are obtained as described in Section 3.1โthey still provide entrywise coverage of future samples at the desired level ๐ผ .
We now move onto showcasing the advantage of ๐พ -RCPS in terms on mean interval length on two real-world high dimensional denoising problems: one on natural images of faces as well as on CT scans of the abdomen.
4Experiments (a)CelebA dataset, ๐ 0 2
1.0 , ๐ผ
0.10 . (b)AbdomenCT-1K dataset, ๐ 0 2
0.4 , ๐ผ
0.20 . Figure 3:Calibrated quantiles โ ๐ผ โข ( ๐ฆ ) computed on 128 samples from ๐น โข ( ๐ฆ ) for noisy inputs ๐ฆ with noise level ๐ 0 2 . The difference ๐ข ^ ๐ผ โ ๐ ^ ๐ผ represents intervals sizes (i.e., larger intervals indicate larger uncertainty).
As a reminder, the methodological contribution of this paper is two-fold: ( ๐ ) we propose to use the calibrated quantiles โ ๐ผ โข ( ๐ฆ ) as a statistically valid notion of uncertainty for diffusion models, and ( ๐ โข ๐ ) we introduce the ๐พ -RCPS procedure to guarantee high-dimensional risk control. Although it is natural to use the two in conjunction, we remark that the ๐พ -RCPS procedure is agnostic of the notion of uncertainty and it can be applied to any nested family of set predictors { โ ๐ โข ( ๐ฆ ) } ๐ โ ฮ ๐ that satisfy the additive parametrization in Equation 13. Therefore, we compare ๐พ -RCPS with the original RCPS algorithm on several baseline notions of uncertainty: quantile regression [6], MC-Dropout [20], N-Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ [26], and naive (i.e., not calibrated) quantiles. We focus on denoising problems where ๐ฆ
๐ฅ + ๐ฃ 0 with ๐ฃ 0 โผ ๐ฉ โข ( 0 , ๐ 0 2 ) , on two imaging datasets: the CelebA dataset [40] and the AbdomenCT-1K dataset [41]. In particularโfor each datasetโwe train:
โข
A time-conditional score network ๐ โข ( ๐ฅ ~ , ๐ก ) โ โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ ~ ) following Song et al. [53] to sample from the posterior distribution ๐ โข ( ๐ฅ | ๐ฆ ) as described in Section 2.1, and
โข
a time-conditional image regressor ๐ : ๐ด ร โ โ ๐ณ 3 following Angelopoulos et al. [6] such that ๐ โข ( ๐ฆ , ๐ก )
( ๐ ^ ๐ผ / 2 , ๐ฅ ^ , ๐ ^ 1 โ ๐ผ / 2 ) , where ๐ฅ ^ โ ๐ผ โข [ ๐ฅ โฃ ๐ฆ ] minimizes the MSE loss between the noisy observation ๐ฆ and the ground truth ๐ฅ , and ๐ ^ ๐ผ / 2 , ๐ ^ 1 โ ๐ผ / 2 are the ๐ผ / 2 and 1 โ ๐ผ / 2 quantile regressors of ๐ฅ , respectively [34, 45, 6].
Both models are composed of the same NCSN++ backbone [53] with dropout ๐
0.10 for a fair comparison. We then fine-tune the original score network ๐ โข ( ๐ฅ ~ , ๐ก ) according to the N-Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ algorithm proposed by Horwitz and Hoshen [26] such thatโsimilarly to the image regressor ๐ โthe resulting time-conditional predictor ๐ ~ โข ( ๐ฆ , ๐ก )
( ๐ ^ ๐ผ / 2 , ๐ ^ 1 โ ๐ผ / 2 ) estimates the ๐ผ / 2 and 1 โ ๐ผ / 2 quantile regressors of ๐ฅ . Finally, in order to compare with MC-Dropout, we activate the dropout layers in the image regressor ๐ at inference time, and estimate the mean ๐ฅ ยฏ and standard deviation ๐ ^ over 128 samples ๐ฅ ^ 1 , โฆ , ๐ฅ ^ 128 . To summarize, we compare ๐พ -RCPS and RCPS on the following families of nested set predictors:
Quantile Regression (QR)
โ ๐ , QR โข ( ๐ฆ ) ๐
[ ๐ฅ ^ ๐ โ ๐ ๐ โข ( ๐ ^ ๐ผ / 2 ) ๐ , ๐ฅ ^ ๐ + ๐ ๐ โข ( ๐ ^ 1 โ ๐ผ / 2 ) ๐ ] ,
(17)
where ๐ โข ( ๐ฆ , ๐ก )
( ๐ ^ ๐ผ / 2 , ๐ฅ ^ , ๐ ^ 1 โ ๐ผ / 2 ) .
MC-Dropout
โ ๐ , MC-Dropout โข ( ๐ฆ ) ๐
[ ๐ฅ ยฏ ๐ โ ๐ ๐ โข ๐ ^ ๐ , ๐ฅ ยฏ ๐ + ๐ ๐ โข ๐ ^ ๐ ] ,
(18)
where ๐ฅ ยฏ , ๐ ^ are the sample mean and standard deviation over 128 samples ๐ฅ ^ 1 , โฆ , ๐ฅ ^ 128 obtained by activating the dropout layers in the image regressor ๐ .
N-Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ 10 We compare two different parametrizationsโmultiplicative and additive:
โ ๐ , Conffusion multiplicative โข ( ๐ฆ ) ๐
[ ( ๐ ^ ๐ผ / 2 ) ๐ ๐ ๐ , ๐ ๐ โข ( ๐ ^ 1 โ ๐ผ / 2 ) ๐ ]
(19)
and
โ ๐ , Conffusion additive โข ( ๐ฆ ) ๐
[ ( ๐ ^ ๐ผ / 2 ) ๐ โ ๐ ๐ , ( ๐ ^ 1 โ ๐ผ / 2 ) ๐ + ๐ ๐ ] ,
(20)
where ๐ ~ โข ( ๐ฆ , ๐ก )
( ๐ ^ ๐ผ / 2 , ๐ ^ 1 โ ๐ผ / 2 ) is the fine-tuned score network by means of quantile regression on 1000 additional samples.
Naive quantiles
โ ๐ , naive โข ( ๐ฆ ) ๐
[ ๐ ^ ๐ โ ๐ ๐ , ๐ข ^ ๐ + ๐ ๐ ] ,
(21)
where ๐ ^ , ๐ข ^ are the naive (i.e., not calibrated) ๐ผ / 2 and 1 โ ๐ผ / 2 entrywise empirical quantiles computed on 128 samples from the diffusion model.
Calibrated quantiles
โ ๐ ๐ผ โข ( ๐ฆ ) ๐
[ ๐ ^ ๐ , ๐ผ โ ๐ ๐ , ๐ข ^ ๐ , ๐ผ + ๐ ๐ ]
(22)
where ๐ ^ ๐ผ , ๐ข ^ ๐ผ are the entrywise calibrated quantiles computed on 128 samples from the diffusion model as described in Equation 9 (see Figure 3 for some examples).
(a)CelebA dataset ( ๐ผ
0.10 , ๐
0.10 ). (b)AbdomenCT-1K dataset ( ๐ผ
0.20 , ๐
0.05 ). Figure 4:Example optimal ๐ ^ ๐พ for ๐พ โ { 4 , 8 , 32 } , ๐ opt
256 , and ๐ opt
100 with respective conformalized uncertainty maps โ ๐ ^ ๐ผ โข ( ๐ฆ )
[ ๐ ^ ๐ , ๐ผ โ ( ๐ ^ ๐พ ) ๐ , ๐ข ^ ๐ , ๐ผ + ( ๐ ^ ๐พ ) ๐ ] . With probability at least 90 % no more than ๐ portion of the ground truth pixels will fall outside of โ ๐ ^ ๐ผ on future, unseen samples.
We include further details on the datasets, the models, and the training and sampling procedures in Appendix D. The implementation of ๐พ -RCPS with all code and data necessary to reproduce the experiments is available at https://github.com/Sulam-Group/k-rcps.
We compare all models and calibration procedures on 20 random draws of calibration and validation sets ๐ฎ cal , ๐ฎ val of length ๐ cal and ๐ val , respectively. We remark that for the ๐พ -RCPS procedure, ๐ opt samples from ๐ฎ cal will be used to solve the optimization problem (P ๐พ ). It follows that for a fixed ๐ cal , the concentration inequality used in the ๐พ -RCPS procedure will be looser compared to the one in the RCPS algorithm. We will show that there remains a clear benefit of using the ๐พ -RCPS algorithm in terms of mean interval length given the same amount of calibration data available (i.e., even while the concentration bound becomes looser). In these experiments, we construct the membership matrix ๐ by assigning each feature ๐ โ [ ๐ ] to the respective ๐ th , ๐
1 , โฆ , ๐พ quantile of the entrywise empirical estimate of the risk on ๐ฎ opt . Furthermore, even though (P ๐พ ) is low-dimensional (i.e., ๐พ โช ๐ ), the number of constraints grows as ๐ โข ๐ opt , which quickly makes the computation of ๐ ~ ๐พ inefficient and time-consuming (e.g., for the AbdomenCT-1K dataset, ๐ โข ๐ opt โผ 10 8 when ๐ opt
128 , a mild number of samples to optimize over). In practice, we randomly subsample a small number of features ๐ opt โช ๐ stratified by membership, which drastically speeds up computation. In the following experiments, we set ๐ opt โ { 50 , 100 } and use ๐พ โ { 4 , 8 , 32 } , which reduces the runtime of solving the optimization problem to less than a second for both datasets. Finally, we pick ๐พ that minimizes the objective function over 16 values equally spaced in [ 0.3 , 0.7 ] . The choice of these heuristics makes the runtime of ๐พ -RCPS comparable to that of RCPS, with a small overhead to solve the reduced (P ๐พ ) problem (potentially multiple times to optimize ๐พ ).
Figure G.1 shows that all combinations of notion of uncertainty and calibration procedures control the risk, as promised. In particular, we set ๐ฟ
0.10 for both datasets, and ๐
0.10 , 0.05 for the CelebA and AbdomenCT-1K dataset, respectively. We repeat all calibration procedures over 20 random samples of ๐ฎ cal , ๐ฎ val , with ๐ val
128 , and ๐ cal
640 or ๐ cal
512 for the CelebA or AbdomenCT-1K dataset, respectively. Figure 4 showcases some example ๐ ^ ๐พ โs obtained by running the ๐พ -RCPS procedure with ๐พ
4 , 8 , and 32 quantiles alongside their respective conformalized uncertainty maps from ๐ฎ val . We can appreciate how for both datasets, ๐ ^ ๐พ captures information about the structure of the data distribution (e.g., eyes and lips for the CelebA dataset, and the position of lungs and the heart for the AbdomenCT-1K dataset). Finally, we compare all baselines and calibration procedures in terms of the guarantees each of them provide and their mean interval length. In particular, we report whether each notion of uncertainty provides guarantees over a diffusion model or not. Note that naive and calibrated quantiles are the only notions of uncertainty that precisely provide guarantees on the samples from a diffusion model. Furthermore, calibrated quantiles are the only method that ensures entrywise coverage on future samples on the same noisy observation. For ๐พ -RCPS, we perform a grid search over ๐ opt โ { 128 , 256 } , ๐ opt โ { 50 , 100 } , and ๐พ โ { 4 , 8 , 32 } , and we report the optimal results in Table 1. For both datasets, ๐พ -RCPS provides the tightest intervals among methods that provide both entrywise coverage and risk control for diffusion models. When relaxing the constraint of entrywise coverage, ๐พ -RCPS still provides the tightest intervals. Across the uncertainty quantification methods that do not relate to a diffusion model, we found that ๐พ -RCPS with naive sampling provides better results on the CelebA dataset. For the AbdomenCT-1K dataset, N-Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ with multiplicative parametrization and RCPS provides slightly shorter intervals compared to Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ with additive parametrization and ๐พ -RCPS. However, we stress that the intervals provided by N-Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ are computed for a fine-tuned model that is different from the original diffusion model, and thus provide no guarantees over the samples of the original model. We found that N-Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ with multiplicative parametrization underperforms on the CelebA dataset because the lower bounds do not decay fast enough, and the loss is concentrated on features whose ( ๐ ^ ๐ผ / 2 ) ๐ / ๐ ๐
๐ฅ ๐ .
5Conclusions Table 1:Comparison of all notions of uncertainty with RCPS and ๐พ -RCPS in terms of guarantees provided and mean interval length over 20 independent draws of ๐ฎ cal . We refer the reader to Appendix E for a detailed discussion of the comparison with the Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ framework. Uncertainty Diffusion model? Entrywise coverage? Risk control? Calibration procedure Mean interval length CelebA AbdomenCT-1K QR โ โ โ RCPS 0.4843 ยฑ 0.0121
0.2943 ยฑ 0.0060
MC-Dropout โ โ โ RCPS 0.6314 ยฑ 0.0109
0.2810 ยฑ 0.0013
N-Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐
โ multiplicative โ โ โ RCPS 0.6949 ยฑ 0.0084
0.1126 ยฑ 0.0020
โ additive โ โ โ RCPS 0.3314 ยฑ 0.0040
0.1164 ยฑ 0.0024
โ additive โ โ โ ๐พ -RCPS 0.3131 ยฑ 0.0056
0.1136 ยฑ 0.0019
Naive quantiles โ โ โ RCPS 0.2688 ยฑ 0.0068
0.1518 ยฑ 0.0016
Naive quantiles โ โ โ ๐พ -RCPS 0.2523 ยฑ 0.0052
0.1374 ยฑ 0.0019
Calibrated quantiles โ โ โ RCPS 0.2762 ยฑ 0.0059
0.1506 ยฑ 0.0014
Calibrated quantiles โ โ โ ๐พ -RCPS 0.2644 ยฑ 0.0067
0.1369 ยฑ 0.0016
Diffusion models represent huge potential for sampling in inverse problems, alas how to devise precise guarantees on uncertainty has remained open. We have provided (i) calibrated intervals that guarantee coverage of future samples generated by diffusion models, (ii) shown how to extend RCPS to ๐พ -RCPS, allowing for greater flexibility by conformalizing in higher dimensions by means of a convex surrogate problem. Yet, our results are general and hold for any data distribution and any sampling procedureโdiffusion models or otherwise. When combined, these two contributions provide state of the art uncertainty quantification by controlling risk with minimal mean interval length. Our contributions open the door to a variety of new problems. While we have focused on denoising problems, the application of these tools for other, more challenging restoration tasks is almost direct since no distributional assumptions are employed. The variety of diffusion models for other conditional-sampling problem can readily be applied here too [61, 17]. Lastlyโand differently from other works that explore controlling multiple risks [36]โours is the first approach to provide multi-dimensional control of one risk for conformal prediction, and likely improvements to our optimization schemes could be possible. More generally, we envision our tools to contribute to the responsible use of machine learning in modern settings.
Acknowledgments
The authors sincerely thank Anastasios Angelopoulos for insightful discussions in early stages of this work, as well as the anonymous reviewers who have helped improve our paper.
References Agrawal et al. [2018] โ Akshay Agrawal, Robin Verschueren, Steven Diamond, and Stephen Boyd.A rewriting system for convex optimization problems.Journal of Control and Decision, 5(1):42โ60, 2018. Anderson [1982] โ Brian DO Anderson.Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313โ326, 1982. Angelopoulos and Bates [2021] โ Anastasios N Angelopoulos and Stephen Bates.A gentle introduction to conformal prediction and distribution-free uncertainty quantification.arXiv preprint arXiv:2107.07511, 2021. Angelopoulos et al. [2021] โ Anastasios N Angelopoulos, Stephen Bates, Emmanuel J Candรจs, Michael I Jordan, and Lihua Lei.Learn then test: Calibrating predictive algorithms to achieve risk control.arXiv preprint arXiv:2110.01052, 2021. Angelopoulos et al. [2022a] โ Anastasios N Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, and Tal Schuster.Conformal risk control.arXiv preprint arXiv:2208.02814, 2022a. Angelopoulos et al. [2022b] โ Anastasios N Angelopoulos, Amit Pal Kohli, Stephen Bates, Michael Jordan, Jitendra Malik, Thayer Alshaabi, Srigokul Upadhyayula, and Yaniv Romano.Image-to-image regression with distribution-free uncertainty quantification and applications in imaging.In International Conference on Machine Learning, pages 717โ730. PMLR, 2022b. ApS [2019] โ MOSEK ApS.The MOSEK optimization toolbox for MATLAB manual. Version 9.0., 2019.URL http://docs.mosek.com/9.0/toolbox/index.html. Barber et al. [2021] โ Rina Foygel Barber, Emmanuel J Candes, Aaditya Ramdas, and Ryan J Tibshirani.Predictive inference with the jackknife+.The Annals of Statistics, 49(1):486โ507, 2021. Barber et al. [2022] โ Rina Foygel Barber, Emmanuel J Candes, Aaditya Ramdas, and Ryan J Tibshirani.Conformal prediction beyond exchangeability.arXiv preprint arXiv:2202.13415, 2022. Bates et al. [2021] โ Stephen Bates, Anastasios Angelopoulos, Lihua Lei, Jitendra Malik, and Michael Jordan.Distribution-free, risk-controlling prediction sets.Journal of the ACM (JACM), 68(6):1โ34, 2021. Bentkus [2004] โ Vidmantas Bentkus.On hoeffdingโs inequalities.The Annals of Probability, 32(2):1650โ1673, 2004. Bishop and Nasrabadi [2006] โ Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learning, volume 4.Springer, 2006. Carlier et al. [2016] โ Guillaume Carlier, Victor Chernozhukov, and Alfred Galichon.Vector quantile regression: an optimal transport approach.The Annals of Statistics, 44(3):1165โ1192, 2016. Carlier et al. [2020] โ Guillaume Carlier, Victor Chernozhukov, Gwendoline De Bie, and Alfred Galichon.Vector quantile regression and optimal transport, from theory to numerics.Empirical Economics, pages 1โ28, 2020. Chernozhukov et al. [2017] โ Victor Chernozhukov, Alfred Galichon, Marc Hallin, and Marc Henry.Mongeโkantorovich depth, quantiles, ranks and signs.The Annals of Statistics, 45(1):223โ256, 2017. Corso et al. [2022] โ Gabriele Corso, Hannes Stรคrk, Bowen Jing, Regina Barzilay, and Tommi Jaakkola.Diffdock: Diffusion steps, twists, and turns for molecular docking.arXiv preprint arXiv:2210.01776, 2022. Croitoru et al. [2022] โ Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah.Diffusion models in vision: A survey.arXiv preprint arXiv:2209.04747, 2022. Croitoru et al. [2023] โ Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah.Diffusion models in vision: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. Diamond and Boyd [2016] โ Steven Diamond and Stephen Boyd.CVXPY: A Python-embedded modeling language for convex optimization.Journal of Machine Learning Research, 17(83):1โ5, 2016. Gal and Ghahramani [2016] โ Yarin Gal and Zoubin Ghahramani.Dropout as a bayesian approximation: Representing model uncertainty in deep learning.In international conference on machine learning, pages 1050โ1059. PMLR, 2016. Genevay et al. [2016] โ Aude Genevay, Marco Cuturi, Gabriel Peyrรฉ, and Francis Bach.Stochastic optimization for large-scale optimal transport.Advances in neural information processing systems, 29, 2016. Gupta et al. [2022] โ Chirag Gupta, Arun K Kuchibhotla, and Aaditya Ramdas.Nested conformal prediction and quantile out-of-bag ensemble methods.Pattern Recognition, 127:108496, 2022. Ho et al. [2020] โ Jonathan Ho, Ajay Jain, and Pieter Abbeel.Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840โ6851, 2020. Hoeffding [1994] โ Wassily Hoeffding.Probability inequalities for sums of bounded random variables.In The collected works of Wassily Hoeffding, pages 409โ426. Springer, 1994. Hoogeboom et al. [2022] โ Emiel Hoogeboom, Vฤฑctor Garcia Satorras, Clรฉment Vignac, and Max Welling.Equivariant diffusion for molecule generation in 3d.In International Conference on Machine Learning, pages 8867โ8887. PMLR, 2022. Horwitz and Hoshen [2022] โ Eliahu Horwitz and Yedid Hoshen.Conffusion: Confidence intervals for diffusion models.arXiv preprint arXiv:2211.09795, 2022. Hyvรคrinen and Dayan [2005] โ Aapo Hyvรคrinen and Peter Dayan.Estimation of non-normalized statistical models by score matching.Journal of Machine Learning Research, 6(4), 2005. Ingraham et al. [2022] โ John Ingraham, Max Baranov, Zak Costello, Vincent Frappier, Ahmed Ismail, Shan Tie, Wujie Wang, Vincent Xue, Fritz Obermeyer, Andrew Beam, et al.Illuminating protein space with a programmable generative model.bioRxiv, 2022. Kadkhodaie and Simoncelli [2021] โ Zahra Kadkhodaie and Eero Simoncelli.Stochastic solutions for linear inverse problems using the prior implicit in a denoiser.Advances in Neural Information Processing Systems, 34:13242โ13254, 2021. Karatzas et al. [1991] โ Ioannis Karatzas, Ioannis Karatzas, Steven Shreve, and Steven E Shreve.Brownian motion and stochastic calculus, volume 113.Springer Science & Business Media, 1991. Kawar et al. [2021a] โ Bahjat Kawar, Gregory Vaksman, and Michael Elad.Snips: Solving noisy inverse problems stochastically.Advances in Neural Information Processing Systems, 34:21757โ21769, 2021a. Kawar et al. [2021b] โ Bahjat Kawar, Gregory Vaksman, and Michael Elad.Stochastic image denoising by sampling from the posterior distribution.In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1866โ1875, 2021b. Kazerouni et al. [2022] โ Amirhossein Kazerouni, Ehsan Khodapanah Aghdam, Moein Heidari, Reza Azad, Mohsen Fayyaz, Ilker Hacihaliloglu, and Dorit Merhof.Diffusion models for medical image analysis: A comprehensive survey.arXiv preprint arXiv:2211.07804, 2022. Koenker and Bassett Jr [1978] โ Roger Koenker and Gilbert Bassett Jr.Regression quantiles.Econometrica: journal of the Econometric Society, pages 33โ50, 1978. Kutiel et al. [2022] โ Gilad Kutiel, Regev Cohen, Michael Elad, and Daniel Freedman.Whatโs behind the mask: Estimating uncertainty in image-to-image problems.arXiv preprint arXiv:2211.15211, 2022. Laufer-Goldshtein et al. [2022] โ Bracha Laufer-Goldshtein, Adam Fisch, Regina Barzilay, and Tommi Jaakkola.Efficiently controlling multiple risks with pareto testing.arXiv preprint arXiv:2210.07913, 2022. Lee et al. [2022] โ Holden Lee, Jianfeng Lu, and Yixin Tan.Convergence for score-based generative modeling with polynomial complexity.arXiv preprint arXiv:2206.06227, 2022. Lei and Wasserman [2014] โ Jing Lei and Larry Wasserman.Distribution-free prediction bands for non-parametric regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1):71โ96, 2014. Liu et al. [2016] โ Qiang Liu, Jason Lee, and Michael Jordan.A kernelized stein discrepancy for goodness-of-fit tests.In International conference on machine learning, pages 276โ284. PMLR, 2016. Liu et al. [2018] โ Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang.Large-scale celebfaces attributes (celeba) dataset.Retrieved August, 15(2018):11, 2018. Ma et al. [2021] โ Jun Ma, Yao Zhang, Song Gu, Cheng Zhu, Cheng Ge, Yichi Zhang, Xingle An, Congcong Wang, Qiyuan Wang, Xin Liu, et al.Abdomenct-1k: Is abdominal organ segmentation a solved problem.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021. Metzer et al. [2022] โ Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or.Latent-nerf for shape-guided generation of 3d shapes and textures.arXiv preprint arXiv:2211.07600, 2022. Pang et al. [2020] โ Tianyu Pang, Kun Xu, Chongxuan Li, Yang Song, Stefano Ermon, and Jun Zhu.Efficient learning of generative models via finite-difference score matching.Advances in Neural Information Processing Systems, 33:19175โ19188, 2020. Papadopoulos et al. [2002] โ Harris Papadopoulos, Kostas Proedrou, Volodya Vovk, and Alex Gammerman.Inductive confidence machines for regression.In European Conference on Machine Learning, pages 345โ356. Springer, 2002. Romano et al. [2019] โ Yaniv Romano, Evan Patterson, and Emmanuel Candes.Conformalized quantile regression.Advances in neural information processing systems, 32, 2019. Ronneberger et al. [2015] โ Olaf Ronneberger, Philipp Fischer, and Thomas Brox.U-net: Convolutional networks for biomedical image segmentation.In International Conference on Medical image computing and computer-assisted intervention, pages 234โ241. Springer, 2015. Rosenberg et al. [2022] โ Aviv A Rosenberg, Sanketh Vedula, Yaniv Romano, and Alex M Bronstein.Fast nonlinear vector quantile regression.arXiv preprint arXiv:2205.14977, 2022. Shafer and Vovk [2008] โ Glenn Shafer and Vladimir Vovk.A tutorial on conformal prediction.Journal of Machine Learning Research, 9(3), 2008. Sohl-Dickstein et al. [2015] โ Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli.Deep unsupervised learning using nonequilibrium thermodynamics.In International Conference on Machine Learning, pages 2256โ2265. PMLR, 2015. Song and Ermon [2019] โ Yang Song and Stefano Ermon.Generative modeling by estimating gradients of the data distribution.Advances in Neural Information Processing Systems, 32, 2019. Song and Ermon [2020] โ Yang Song and Stefano Ermon.Improved techniques for training score-based generative models.Advances in neural information processing systems, 33:12438โ12448, 2020. Song et al. [2020a] โ Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon.Sliced score matching: A scalable approach to density and score estimation.In Uncertainty in Artificial Intelligence, pages 574โ584. PMLR, 2020a. Song et al. [2020b] โ Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole.Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020b. Song et al. [2021] โ Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon.Solving inverse problems in medical imaging with score-based generative models.arXiv preprint arXiv:2111.08005, 2021. Torem et al. [2022] โ Nadav Torem, Roi Ronen, Yoav Y Schechner, and Michael Elad.Towards a most probable recovery in optical imaging.arXiv preprint arXiv:2212.03235, 2022. Vovk [2015] โ Vladimir Vovk.Cross-conformal predictors.Annals of Mathematics and Artificial Intelligence, 74(1):9โ28, 2015. Vovk et al. [2005] โ Vladimir Vovk, Alexander Gammerman, and Glenn Shafer.Algorithmic learning in a random world.Springer Science & Business Media, 2005. Watson et al. [2022] โ Joseph L Watson, David Juergens, Nathaniel R Bennett, Brian L Trippe, Jason Yim, Helen E Eisenach, Woody Ahern, Andrew J Borst, Robert J Ragotte, Lukas F Milles, et al.Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models.bioRxiv, 2022. Xie and Li [2022] โ Yutong Xie and Quanzheng Li.Measurement-conditioned denoising diffusion probabilistic model for under-sampled medical image reconstruction.arXiv preprint arXiv:2203.03623, 2022. Xu et al. [2022] โ Jiale Xu, Xintao Wang, Weihao Cheng, Yan-Pei Cao, Ying Shan, Xiaohu Qie, and Shenghua Gao.Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models.arXiv preprint arXiv:2212.14704, 2022. Yang et al. [2022] โ Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang.Diffusion models: A comprehensive survey of methods and applications.arXiv preprint arXiv:2209.00796, 2022. Zeng et al. [2022] โ Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, and Karsten Kreis.Lion: Latent point diffusion models for 3d shape generation.arXiv preprint arXiv:2210.06978, 2022. Appendix AProofs
In this section, we include the proofs for the results presented in this paper. Herein, denote โ : ๐ด โ ๐ณ โฒ a set-valued predictor from ๐ด โ โ ๐ into a space of subsets ๐ณ โฒ โ 2 ๐ณ for ๐ณ โ โ ๐ .
A.1Proof of Lemma 3.2
Let ๐น : ๐ด โ ๐ณ be a stochastic sampling procedure from ๐ด to ๐ณ such that for a fixed ๐ฆ โ ๐ด , ๐น โข ( ๐ฆ ) is a random vector with unknown distribution ๐ฌ ๐ฆ . We show that for a desired miscoverage level ๐ผ โ [ 0 , 1 ] , the entrywise calibrated empirical quantiles โ ๐ผ โข ( ๐ฆ ) ๐
[ ๐ ^ ๐ , ๐ผ , ๐ข ^ ๐ , ๐ผ ] defined in Equation 9 provide entrywise coverage as in Definition 3.1. That is, for each ๐ โ [ ๐ ] โ { 1 , โฆ , ๐ }
โ โข [ ๐น โข ( ๐ฆ ) ๐ โ โ ๐ผ โข ( ๐ฆ ) ๐ ] โฅ 1 โ ๐ผ .
(23) Proof.
The proof is a variation of the classical split conformal prediction coverage guarantee (see Angelopoulos and Bates [3], Theorem D.1). Let ๐น 1 , โฆ , ๐น ๐ , ๐น ๐ + 1 be ๐ + 1 i.i.d. samples from ๐น โข ( ๐ฆ ) . For a desired miscoverage level ๐ผ โ [ 0 , 1 ] and for each ๐ โ [ ๐ ] denote
๐ ^ ๐ , ๐ผ
inf { ๐ : | { ๐ : ( ๐น ๐ ) ๐ โค ๐ } | ๐ โฅ โ ( ๐ + 1 ) โข ๐ผ / 2 โ ๐ }
(24)
and
๐ข ^ ๐ , ๐ผ
inf { ๐ข : | { ๐ : ( ๐น ๐ ) ๐ โค ๐ข } | ๐ โฅ โ ( ๐ + 1 ) โข ( 1 โ ๐ผ / 2 ) โ ๐ }
(25)
the โ ( ๐ + 1 ) โข ๐ผ / 2 โ / ๐ and โ ( ๐ + 1 ) โข ( 1 โ ๐ผ / 2 ) โ / ๐ entrywise calibrated empirical quantiles of ๐น 1 , โฆ , ๐น ๐ . Assume that for each ๐ โ [ ๐ ] , the first ๐ samples are ordered in ascending order, i.e. ( ๐น 1 ) ๐ < โฏ < ( ๐น ๐ ) ๐ such that
๐ ^ ๐ , ๐ผ
( ๐น โ ( ๐ + 1 ) โข ๐ผ / 2 โ ) ๐ and ๐ข ^ ๐ , ๐ผ
( ๐น โ ( ๐ + 1 ) โข ( 1 โ ๐ผ / 2 ) โ ) ๐ .
(26)
Note that by symmetry of ( ๐น 1 ) ๐ , โฆ , ( ๐น ๐ ) ๐ it follows that ( ๐น ๐ + 1 ) ๐ is equally likely to fall between any of the first ๐ samples. That is, for any two indices ๐ 1 < ๐ 2
โ โข [ ( ๐น ๐ + 1 ) ๐ โ [ ( ๐น ๐ 1 ) ๐ , ( ๐น ๐ 2 ) ๐ ] ]
๐ 2 โ ๐ 1 ๐ + 1 .
(27)
Instantiating the above equality with โ ๐ผ โข ( ๐ฆ ) ๐
[ ๐ ^ ๐ , ๐ผ , ๐ข ^ ๐ , ๐ผ ] yields
โ โข [ ( ๐น ๐ + 1 ) ๐ โ โ ๐ผ โข ( ๐ฆ ) ๐ ]
โ โข [ ( ๐น ๐ + 1 ) ๐ โ [ ๐ ^ ๐ , ๐ผ , ๐ข ^ ๐ , ๐ผ ] ]
(28)
= โ โข [ ( ๐น ๐ + 1 ) ๐ โ [ ( ๐น โ ( ๐ + 1 ) โข ๐ผ / 2 โ ) ๐ , ( ๐น โ ( ๐ + 1 ) โข ( 1 โ ๐ผ / 2 ) โ ) ๐ ] ]
(29)
= โ ( ๐ + 1 ) โข ( 1 โ ๐ผ / 2 ) โ โ โ ( ๐ + 1 ) โข ๐ผ / 2 โ ๐ + 1
(30)
โฅ ( ๐ + 1 ) โข ( 1 โ ๐ผ ) ๐ + 1
1 โ ๐ผ
(31)
which concludes the proof. โ
A.2Proof of Theorem 3.3
Recall that for a calibration set ๐ฎ cal
{ ( ๐ฅ ๐ , ๐ฆ ๐ ) } ๐
1 ๐ of ๐ i.i.d. samples from an unknown distribution ๐ over ๐ณ ร ๐ด , a loss function โ : ๐ณ ร ๐ณ โฒ โ โ , and a family { โ ๐ โข ( ๐ฆ ) } ๐ โ ฮ ๐ of set-valued predictors indexed by a ๐ -dimensional vector ๐
( ๐ 1 , โฆ , ๐ ๐ ) โ ฮ ๐ , ฮ โ โ ยฏ
โ โช { ยฑ โ }
๐ โข ( ๐ )
๐ผ ( ๐ฅ , ๐ฆ ) โผ ๐ โข [ โ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) ] and ๐ ^ โข ( ๐ )
1 ๐ โข โ ( ๐ฅ ๐ , ๐ฆ ๐ ) โ ๐ฎ cal ๐ โ โข ( ๐ฅ ๐ , โ ๐ โข ( ๐ฆ ๐ ) )
(32)
denote the risk of โ ๐ โข ( ๐ฆ ) and its empirical estimate on the calibration set, respectively. Furthermore, let ๐ ^ + โข ( ๐ ) be a pointwise upper confidence bound (UCB) such that for each fixed ๐ โ ฮ ๐ and โ ๐ฟ โ [ 0 , 1 ]
โ โข [ ๐ โข ( ๐ ) โค ๐ ^ + โข ( ๐ ) ] โฅ 1 โ ๐ฟ
(33)
as presented in Equation 6. Equivalently to Definition 2.2, for a risk level ๐ โฅ 0 , we say that โ ๐ โข ( ๐ฆ ) is an ( ๐ , ๐ฟ ) -RCPS if
โ ๐ฎ cal โข [ ๐ โข ( ๐ ) โค ๐ ] โฅ 1 โ ๐ฟ .
(34)
Given fixed vectors ๐ ~ โ โ ๐ and ๐ผ โ โ ๐ , ๐ผ โฅ 0 , denote ๐ฒ ~
๐ ~ + ๐ โข ๐ผ , ๐ โ ฮ the line with offset ๐ ~ and direction ๐ผ parametrized by ๐ . We show that for entrywise monotonically nonincreasing loss functions and for the family of set-valued predictors of the form โ ๐ โข ( ๐ฆ )
[ ๐ ^ ๐ โ ๐ ๐ , ๐ข ^ ๐ + ๐ ๐ ] , for some lower and upper bounds ๐ ^ ๐ < ๐ข ^ ๐ that may depend on ๐ฆ , if
๐ ^
arg โก min ๐ โ ๐ฒ ~ โข โ ๐ โ [ ๐ ] ๐ ๐ s.t. ๐ ^ + โข ( ๐ + ๐ฝ โข ๐ผ ) < ๐ , โ ๐ฝ โฅ 0
(35)
โ ๐ ^ โข ( ๐ฆ ) is an ( ๐ , ๐ฟ ) -RCPS. We start by reminding the following definitions.
Definition A.1 (Entrywise monotonically nonincreasing function).
A loss function โ is entrywise monotonically nonincreasing if, for a fixed ๐ฅ , โ ๐ โ [ ๐ ]
โ โข ( ๐ฆ ) ๐ โ โ โฒ โข ( ๐ฆ ) ๐ โน โ โข ( ๐ฅ , โ โฒ โข ( ๐ฆ ) ) โค โ โข ( ๐ฅ , โ โข ( ๐ฆ ) ) .
(36) Definition A.2 (Entrywise nesting property).
A family of set predictors { โ ๐ โข ( ๐ฆ ) } ๐ โ ฮ ๐ is entrywise nested if, for a fixed ๐ฆ , โ ๐ โ [ ๐ ]
๐ ๐ , 1 < ๐ ๐ , 2 โน โ [ ๐ ๐ , 1 , ๐ โ ๐ ] โข ( ๐ฆ ) ๐ โ โ [ ๐ ๐ , 2 , ๐ โ ๐ ] โข ( ๐ฆ ) ๐ ,
(37)
where [ ๐ ๐ , ๐ โ ๐ ] is the vector that takes value ๐ ๐ in its ๐ th entry and ๐ โ ๐ in its complement โ ๐ โ [ ๐ ] โ { ๐ } .
Proof.
The proof is a high-dimensional extension of the validity of the original RCPS calibration procedure. Note that the family of set-valued predictors { โ ๐ โข ( ๐ฆ )
[ ๐ ^ ๐ โ ๐ ๐ , ๐ข ^ ๐ + ๐ ๐ ] } ๐ โ ฮ ๐ satisfies the entrywise nesting property in Definition A.2. For ๐ ^ chosen as in Equation 35, denote ๐ฟ : ฮ โ โ the one-dimensional function such that
๐ฟ โข ( ๐ฝ )
๐ โข ( ๐ ^ + ๐ฝ โข ๐ผ ) and ๐ฟ ^ + โข ( ๐ฝ )
๐ ^ + โข ( ๐ ^ + ๐ฝ โข ๐ผ ) .
(38)
It follows that ๐ฟ is monotonically nonincreasing because โ is entrywise monotonically nonincreasing by assumption, and ๐ผ belongs to the positive orthant. Furthermore, โ โข [ ๐ฟ โข ( ๐ฝ ) โค ๐ฟ ^ + โข ( ๐ฝ ) ] โฅ 1 โ ๐ฟ by definition of ๐ ^ + โข ( ๐ ) . Denote
๐ฝ *
inf { ๐ฝ โ ฮ : ๐ฟ โข ( ๐ฝ ) โค ๐ }
(39)
and suppose ๐ โข ( ๐ ^ )
๐ฟ โข ( 0 ) > ๐ . By monotonicity of ๐ฟ it follows that ๐ฝ * > 0 , and by definition of ๐ ^ , ๐ฟ ^ + โข ( ๐ฝ * )
๐ ^ + โข ( ๐ ^ + ๐ฝ * โข ๐ผ ) < ๐ . However, since ๐ฟ โข ( ๐ฝ * )
๐ and โ โข [ ๐ฟ โข ( ๐ฝ * ) โค ๐ฟ ^ + โข ( ๐ฝ * ) ] โฅ 1 โ ๐ฟ , we conclude that this event happens with probability at most ๐ฟ . Hence, โ โข [ ๐ โข ( ๐ ^ ) โค ๐ ] โฅ 1 โ ๐ฟ and โ ๐ ^ โข ( ๐ฆ ) is an ( ๐ , ๐ฟ ) -RCPS.
Lastly, it is easy to see that ๐ ^ minimizes the mean interval length ๐ผ ยฏ โข ( ๐ ) over ๐ฒ ~ . Note that
๐ผ ยฏ โข ( ๐ )
1 ๐ โข โ ๐ โ [ ๐ ] ( ๐ข ^ ๐ โ ๐ ^ ๐ ) + 2 โข โ ๐ โ [ ๐ ] ๐ ๐ ,
(40)
and the statement follows by definition. โ
A.3Proof that โ ๐พ is a convex upper bound to โ 01 (see Equation 16)
Recall that for a family of set-valued predictors { โ ๐ โข ( ๐ฆ )
[ ๐ ^ ๐ โ ๐ ๐ , ๐ข ^ ๐ + ๐ ๐ ] } ๐ โ ฮ ๐ indexed by a ๐ -dimensional vector ๐
( ๐ 1 , โฆ , ๐ ๐ ) โ ฮ ๐ , ฮ โ โ ยฏ โ โ โช { ยฑ โ } for some general lower and upper bounds ๐ ^ ๐ < ๐ข ^ ๐ that may depend on ๐ฆ , we define โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) to be
โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) )
1 ๐ โข โ ๐ โ [ ๐ ] [ 2 โข ( 1 + ๐ ) ๐ผ โข ( ๐ ) ๐ โข | ๐ฅ ๐ โ ๐ ๐ | โ ๐ ] + ,
(41)
where ๐
๐พ / ( 1 โ ๐พ ) , ๐พ โ [ 0 , 1 ) , ๐ผ โข ( ๐ ) ๐
๐ข ^ ๐ โ ๐ ^ ๐ + 2 โข ๐ ๐ , ๐ ๐
( ๐ข ^ ๐ + ๐ ^ ๐ ) / 2 , and [ ๐ข ] +
max โก ( 0 , ๐ข ) . First, we show that โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) is convex in ๐ for ๐ โฅ 0 .
Proof.
Note that โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) is separable in ๐ . Hence, it suffices to show that โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) is convex in each entry ๐ ๐ . That is, we want to show that
โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) ๐
[ 2 โข ( 1 + ๐ ) ๐ข ^ ๐ โ ๐ ^ ๐ + 2 โข ๐ ๐ โข | ๐ฅ ๐ โ ๐ ๐ | โ ๐ ] +
(42)
is convex in ๐ ๐ . Note that:
โข
The term 1 / ( ๐ข ^ ๐ โ ๐ ^ ๐ + 2 โข ๐ ๐ ) behaves like 1 / ๐ ๐ , hence it is convex for ๐ ๐ โฅ 0
โ ( ๐ข ^ ๐ โ ๐ ^ ๐ ) / 2 ,
โข
๐ถ
2 โข ( 1 + ๐ ) โข | ๐ฅ ๐ โ ๐ ๐ | is nonnegative, hence ๐ถ โ 1 / ( ๐ข ^ ๐ โ ๐ ^ ๐ + 2 โข ๐ ๐ ) is convex,
โข
๐ does not depend on ๐ ๐ , hence ๐ถ โ 1 / ( ๐ข ^ ๐ โ ๐ ^ ๐ + 2 โข ๐ ๐ ) โ ๐ is still convex, and finally
โข
the positive part [ ๐ข ] +
max โก ( 0 , ๐ข ) is a convex function of its argument, hence [ ๐ถ โ 1 / ( ๐ข ^ ๐ โ ๐ ^ ๐ + 2 โข ๐ ๐ ) โ ๐ ] + is convex.
We conclude that โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) ๐ is convex in each entry ๐ โ [ ๐ ] for ๐ ๐ โฅ 0 , hence โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) is convex for ๐ โฅ 0 . โ
Note that โ ๐พ is an upper bound to โ 01 by construction. One can see that โ ๐ , โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) โฅ โ 01 โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) .
Proof.
We will now show that โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) ๐ โฅ โ 01 โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) ๐ entrywise. To this end, we will show that both functions attain the same value (i.e., 1) at the extremes of the intervals (i.e., ๐ ^ โ ๐ and ๐ข ^ + ๐ ), and use the fact that โ ๐พ โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) ๐ is convex nonnegative.
First, note that โ ๐ , it holds that
โ ๐พ โข ( ๐ ^ โ ๐ , โ ๐ โข ( ๐ฆ ) ) ๐
โ ๐พ โข ( ๐ข ^ + ๐ , โ ๐ โข ( ๐ฆ ) ) ๐
[ 2 โข ( 1 + ๐ ) ๐ผ โข ( ๐ ) ๐ โ | ๐ผ โข ( ๐ ) ๐ 2 | โ ๐ ] +
1 ,
(43)
โ 01 โข ( ๐ ^ โ ๐ , โ ๐ โข ( ๐ฆ ) ) ๐
โ 01 โข ( ๐ข ^ + ๐ , โ ๐ โข ( ๐ฆ ) ) ๐
1 .
(44)
Furthermore,
โ
01
โข
(
๐ฅ
,
โ
๐
โข
(
๐ฆ
)
)
๐
0 , โ ๐ฅ ๐ โ [ ๐ ^ ๐ โ ๐ ๐ , ๐ข ^ ๐ + ๐ ๐ ]
(45)
by definition. We conclude that โ ๐พ upper bounds โ 01 because โ ๐พ is convex, nonnegative, it intersects โ 01 at ๐ ^ โ ๐ and ๐ข ^ + ๐ (for which both losses are equal to 1), and โ 01 is exactly 0 between ๐ ^ ๐ โ ๐ ๐ and ๐ข ^ ๐ + ๐ ๐ . โ
Appendix BRisk Controlling Prediction Sets (RCPS) [10]
In this section, we present the original RCPS procedure presented in Bates et al. [10]. Let โ : ๐ด ร ๐ด โฒ โ โ , ๐ด โฒ โ 2 ๐ด be a monotonically nonincreasing loss function (see Equation 4) over ๐ณ โ โ ๐ and ๐ด โ โ ๐ , and let { โ ๐ โข ( ๐ฆ ) } ๐ โ ฮ , ฮ โ โ ยฏ be a family of set-valued predictors โ : ๐ด โ ๐ณ โฒ that satisfies the nesting property in Equation 5. Here,
Algorithm 2 summarizes the original conformal risk control procedure introduced in Bates et al. [10] for a general bounding function UCB โข ( ๐ , ๐ฟ , ๐ ^ โข ( ๐ ) ) that accepts the number of samples in a calibration set ๐ฎ cal
{ ( ๐ฅ ๐ , ๐ฆ ๐ ) } ๐
1 ๐ of ๐ i.i.d. samples from an unknown distribution ๐ over ๐ณ ร ๐ด , a failure probability ๐ฟ โ [ 0 , 1 ] , the empirical estimate ๐ ^ โข ( ๐ )
1 / ๐ โ โ ๐
1 ๐ โ โข ( ๐ฅ ๐ , โ ๐ โข ( ๐ฆ ๐ ) ) evaluated on ๐ฎ cal , and that returns a pointwise upper confidence bound ๐ ^ + โข ( ๐ ) that satisfies
โ โข [ ๐ โข ( ๐ ) โค ๐ ^ + โข ( ๐ ) ] โฅ 1 โ ๐ฟ
(46)
as presented in Equation 6. For example, for losses bounded above by 1, Hoeffdingโs inequality [24] yields
๐ ^ + โข ( ๐ )
UCB โข ( ๐ , ๐ฟ , ๐ ^ โข ( ๐ ) )
๐ ^ โข ( ๐ ) + 1 2 โข ๐ โข log โก ( 1 ๐ฟ ) .
(47)
In practice, tighter alternatives exist (see Bates et al. [10], Section 3.1 for a thorough discussion).
Algorithm 2 ( ๐ , ๐ฟ ) -RCPS (see [6], Algorithm 2) 1: Input: risk level ๐ โฅ 0 , failure probability ๐ฟ โ [ 0 , 1 ] , calibration set ๐ฎ cal
{ ( ๐ฅ ๐ , ๐ฆ ๐ ) } ๐
1 ๐ of ๐ i.i.d. samples, monotonically nonincreasing loss function โ , family of nested set-valued predictors { โ ๐ } ๐ โ ฮ , ฮ โ โ ยฏ , initial (large) value ๐ max , stepsize d โข ๐
0 . 2: ๐ โ ๐ max 3: ๐ ^ + โข ( ๐ ) โ 0 4: while ๐ ^ + โข ( ๐ ) โค ๐ do 5: ๐ โ ๐ โ d โข ๐ 6: ๐ ^ โข ( ๐ ) โ 1 / ๐ โ โ ( ๐ฅ ๐ , ๐ฆ ๐ ) โ ๐ฎ cal โ โข ( ๐ฅ ๐ , โ ๐ โข ( ๐ฆ ๐ ) ) 7: ๐ ^ + โข ( ๐ ) โ UCB โข ( ๐ , ๐ฟ , ๐ ^ โข ( ๐ ) ) 8: end while 9: ๐ ^ โ ๐ + d โข ๐ 10: return ๐ ^ Appendix CComputing the UCB
In this section, we include a detailed analytical example on how one can compute the UCB ๐ ^ 01 + โข ( ๐ + ๐ผ ) by means of concentration inequalities. Recall that, for a family of nested set predictors { โ ๐ โข ( ๐ฆ ) } ๐ โ ฮ ๐ indexed by a vector ๐
( ๐ 1 , โฆ , ๐ ๐ ) , the 01 loss โ 01 โข ( ๐ฅ , โ ๐ โข ( ๐ฆ ) ) counts the number of pixels in ๐ฅ that fall outside of their respective intervals in โ ๐ โข ( ๐ฆ ) . For example, choosing โ ๐ โข ( ๐ฆ ) ๐
[ ๐ ^ ๐ โ ๐ ๐ , ๐ข ^ ๐ + ๐ ๐ ] as proposed in Equation 11, given a vector ๐ผ
( ๐ 1 , โฆ , ๐ ๐ ) yields
โ 01 โข ( ๐ฅ , โ ๐ + ๐ผ โข ( ๐ฆ ) )
1 ๐ โข โ ๐ โ [ ๐ ] ๐ โข [ ๐ฅ ๐ โ [ ๐ ^ ๐ โ ( ๐ ๐ + ๐ ๐ ) , ๐ข ^ ๐ + ( ๐ ๐ + ๐ ๐ ) ] ] .
(48)
Then, ๐ 01 โข ( ๐ + ๐ผ )
๐ผ โข [ โ 01 โข ( ๐ฅ , โ ๐ + ๐ผ โข ( ๐ฆ ) ) ] denotes the risk of โ ๐ + ๐ผ and its empirical estimate over a calibration set ๐ฎ cal
{ ( ๐ฅ ๐ , ๐ฆ ๐ ) } ๐
1 ๐ of ๐ i.i.d. samples from an unknown distribution ๐ over ๐ณ ร ๐ด is
๐ ^ 01 โข ( ๐ + ๐ผ )
1 ๐ โข โ ๐ โ [ ๐ ] โ 01 โข ( ๐ฅ ๐ , โ ๐ + ๐ผ โข ( ๐ฆ ๐ ) ) .
(49)
In turn, for a user-specified failure probability ๐ฟ โ [ 0 , 1 ] , the UCB ๐ 01 + โข ( ๐ + ๐ผ ) can be obtained by means of common concentration inequalities. For example, Hoeffdingโs inequality implies
๐ ^ 01 + โข ( ๐ + ๐ผ )
๐ ^ 01 โข ( ๐ + ๐ผ ) + 1 2 โข ๐ โข log โก ( 1 ๐ฟ ) .
(50) Appendix DExperimental Details
In this section, we include further experimental details. All experiments were performed on a private cluster with 8 NVIDIA RTX A5000 with 24 GB of memory.
D.1Datasets (a)CelebA dataset. (b)AbdomenCT-1K dataset. Figure D.1:Example images. D.1.1CelebA dataset
The CelebA dataset [40] (available at http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) contains more than 200 ร 10 3 , 178 ร 218 pixel images of celebrity faces with several landmark locations and binary attributes (e.g., eyeglasses, bangs, smiling). In this work, we center-crop all images to 128 ร 128 pixels and normalize them between [ โ 1 , 1 ] .
D.1.2AbdomenCT-1K dataset
The AbdomenCT-1K dataset [41] (available at https://github.com/JunMa11/AbdomenCT-1K) comprises more than 1000 abdominal CT scans (for a total of more than 200 ร 10 3 , 512 ร 512 pixel individual images) aggregated from 6 existing datasets. Scans are provided in NIfTI format, so we first convert them to their Hounsfield unit (HU) values and subsequently window them with the standard abdomen setting (WW = 400, WL = 40) such that pixels intensities are normalized in [ 0 , 1 ] and they represent the same tissue across images.
Figure D.1 shows some example images for both datasets.
D.2Models
Recall that in this work, we train both:
โข
A score network ๐ โข ( ๐ฅ ~ , ๐ก ) โ โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ ~ ) to sample from ๐ โข ( ๐ฅ | ๐ฆ ) as introduced in Section 2.1, and
โข
a modified time-conditional image regressor ๐ : ๐ด ร โ โ ๐ณ 3 such that ๐ โข ( ๐ฆ , ๐ก )
( ๐ ^ ๐ผ / 2 , ๐ฅ ^ , ๐ ^ ๐ผ / 2 ) , where ๐ฅ ^ โ ๐ผ โข [ ๐ฅ โฃ ๐ฆ ] and ๐ ^ ๐ผ / 2 , ๐ ^ 1 โ ๐ผ / 2 are the entrywise ๐ผ / 2 and 1 โ ๐ผ / 2 quantile regressors of ๐ฅ [34, 45, 6], respectively.
Both models are implemented wit the same U-net-like [46] backbone: NCSN++, which was introduced by Song et al. [53] (code is available at https://github.com/yang-song/score_sde). We use the original NCSN++ configurations presented in Song et al. [53] for the CelebA dataset and, for the AbdomenCT-1K dataset, for the FFHQ dataset (available at https://github.com/NVlabs/ffhq-dataset) given the larger image size of 512 ร 512 . For the image regressor ๐ , we use the original implementation of the quantile regression head used in Angelopoulos et al. [6] (available at https://github.com/aangelopoulos/im2im-uq) on top of the NCSN++ backbone. This allows us to maintain a time-conditional backbone and extend the original image regressor presented in Angelopoulos et al. [6] to all noise levels as the score network.
D.3Training D.3.1Data Augmentation
For both datasets, we augment the training data by means of random horizontal and vertical flips, and random rotations between [ โ ๐ / 2 , ๐ / 2 ] degrees.
D.3.2Forward SDE (a)CelebA dataset. (b)AbdomenCT-1K dataset. Figure D.2:Example of perturbed images via the forward SDE. The final level of noise is ๐ 2
1 .
Recall that in this work, we are interested in solving the classical denoising problem where ๐ฆ
๐ฅ + ๐ฃ 0 is a noisy observation of a ground truth image ๐ฅ perturbed with random Gaussian noise with known variance ๐ 0 2 . As done by previous works [50, 51, 54, 31], we model the observation process with a variance exploding (VE) forward-time SDE
d โข ๐ฅ
d โข [ ๐ 2 โข ( ๐ก ) ] d โข ๐ก โข d โข ๐ค where ๐ โข ( ๐ก )
๐ min โข ( ๐ max ๐ min ) ๐ก , ๐ก โ [ 0 , 1 ]
(51)
such that ๐ โข ( 0 )
๐ min and ๐ โข ( 1 )
๐ max . In particular, we set ๐ min
0 and ๐ max
90 for the CelebA dataset and ๐ max
1 for the AbdomenCT-1K dataset. It has been shown [53] that the above forward-time SDE is equivalent to the following discrete Markov chain
๐ฅ ๐ก
๐ฅ ๐ก โ 1 + ๐ ๐ก 2 โ ๐ ๐ก โ 1 2 โข ๐ง , ๐ง โผ ๐ฉ โข ( 0 , ๐ ) , for ๐ก
1 , โฆ , ๐
(52)
and { ๐ ๐ } ๐
0 ๐ noise levels. That is, ๐ฅ ๐ก
๐ฅ + ๐ง ๐ก , where ๐ง ๐ก โผ ( 0 , ๐ ๐ก 2 ) . Figure D.2 shows some example images from both datasets perturbed via the forward SDE described in this section.
D.3.3Denoising Score-matching
Here, we briefly describe the loss function used to train the time-conditional score network ๐ โข ( ๐ฅ ~ , ๐ก ) . Denote ๐ โ ฮ the parametrization of ๐ , then, following Song and Ermon [50], Song et al. [53], we have ๐ โข ( ๐ฅ ~ , ๐ก )
๐ ๐ * โข ( ๐ฅ ~ , ๐ก ) , where
๐ *
arg โก min ๐ โ ฮ ๐ผ ๐ก โผ ๐ โข ( 0 , 1 ) [ ๐ ( ๐ก ) ๐ผ ๐ฅ โผ ๐ โข ( ๐ฅ ) , ๐ฅ โข ( ๐ก ) | ๐ฅ [ โฅ ๐ ๐ ( ๐ฅ ( ๐ก ) , ๐ก ) โ โ ๐ฅ log ๐ ๐ก ( ๐ฅ ( ๐ก ) | ๐ฅ ) โฅ 2 ] ]
(53)
= arg โก min ๐ โ ฮ โก ๐ผ ๐ก โผ ๐ โข ( 0 , 1 ) โข [ ๐ โข ( ๐ก ) โข ๐ผ ๐ฅ โผ ๐ โข ( ๐ฅ ) , ๐ฅ โข ( ๐ก ) | ๐ฅ โข [ โ ๐ ๐ โข ( ๐ฅ โข ( ๐ก ) , ๐ก ) + ๐ฅ โข ( ๐ก ) โ ๐ฅ ๐ โข ( ๐ก ) โ 2 ] ] ,
(54)
where ๐ โข ( ๐ก ) โ ๐ 2 โข ( ๐ก ) and ๐ โข ( 0 , 1 ) is the uniform distribution over [ 0 , 1 ] .
D.3.4Quantile Regression (a)CelebA dataset. (b)AbdomenCT-1K dataset. Figure D.3:Example of results with the modified image regressor.
Similarly to above, we briefly describe the loss function used to train the modified time-conditional image regressor ๐ . Denote ๐ โ ฮ the parametrization of ๐ , and recall that for a desired ๐ผ quantile and its respective quantile regressor ๐ ^ ๐ผ , the quantile loss function [34, 45, 6] โ ๐ผ โข ( ๐ฅ , ๐ ^ ๐ผ ) is
โ ๐ผ โข ( ๐ฅ , ๐ ^ ๐ผ )
๐ผ โข ( ๐ฅ โ ๐ ^ ๐ผ ) โ ๐ โข [ ๐ฅ
๐ ^ ๐ผ ] + ( 1 โ ๐ผ ) โข ( ๐ ^ ๐ผ โ ๐ฅ ) โ ๐ โข [ ๐ฅ โค ๐ ^ ๐ผ ] .
(55)
For the image regressor ๐ , we set ๐ผ
0.10 and use the original implementation of the quantile regression head from Angelopoulos et al. [6] (available at https://github.com/aangelopoulos/im2im-uq) which minimizes the multi-loss objective
โ โข ( ๐ฅ , ๐ โข ( ๐ฆ , ๐ก ) )
โ ๐ผ / 2 โข ( ๐ฅ , ๐ ^ ๐ผ / 2 ) + ( ๐ฅ โ ๐ฅ ^ ) 2 + โ 1 โ ๐ผ / 2 โข ( ๐ฅ , ๐ ^ 1 โ ๐ผ / 2 )
(56)
on top of the NCSN++ backbone. This allows us to maintain a time-conditional backbone and extend the original image regressor presented in Angelopoulos et al. [6] to all noise levels as the score network. Then, ๐ โข ( ๐ฆ , ๐ก )
๐ ๐ * โข ( ๐ฆ , ๐ก ) where
๐ *
arg โก min ๐ โ ฮ ๐ผ ๐ก โผ ๐ โข ( 0 , 1 ) [ โ ( ๐ฅ , ๐ ( ๐ฅ ( ๐ก ) , ๐ก ) ] .
(57) D.4Sampling (a)CelebA dataset. (b)AbdomenCT-1K dataset. Figure D.4:Example of images sampled via Algorithm 3.
Here, we briefly describe the sampling procedure used in this work to solve the conditional reserve-time SDE
d โข ๐ฅ
[ โ โข ( ๐ฅ , ๐ก ) โ ๐ โข ( ๐ก ) 2 โข โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ | ๐ฆ ) ] โข d โข ๐ก + ๐ โข ( ๐ก ) โข d โข ๐ค ยฏ
(58)
where ๐ฆ is the initial noisy observation with known noise level ๐ 0 2 , and ๐ โข ( ๐ก )
d โข [ ๐ 2 โข ( ๐ก ) ] / d โข ๐ก , โ โข ( ๐ฅ , ๐ก )
0 as in Equation 51. Although several different discretization schemes exist [53], we use the classical Euler-Maruyama discretization [30] since the contribution of this work is not focused on improving existing diffusion models. Recall that for a general Itรด process
d โข ๐ฅ
โ โข ( ๐ฅ , ๐ก ) โข d โข ๐ก + ๐ โข ( ๐ก ) โข d โข ๐ค ,
(59)
its Euler-Maruyama discretization with step ฮ โข ๐ก is
๐ฅ ๐ก + 1
๐ฅ ๐ก + โ โข ( ๐ฅ , ๐ก ) โข ฮ โข ๐ก + ๐ โข ( ๐ก ) โข ฮ โข ๐ก โข ๐ง , ๐ง โผ ๐ฉ โข ( 0 , ๐ ) .
(60)
Furthermore, under the forward SDE described in Equation 51, and as shown by previous works [32, 31, 29], it holds that
โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ โข ( ๐ก ) | ๐ฆ )
โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ โข ( ๐ก ) ) + ๐ฆ โ ๐ฅ โข ( ๐ก ) ๐ 0 2 โ ๐ 2 โข ( ๐ก ) .
(61)
We conclude that the reverse-time Euler-Maruyama discretization of the conditional SDE above is
๐ฅ ๐ก โ 1
๐ฅ ๐ก + ๐ ๐ก 2 โข [ โ ๐ฅ log โก ๐ ๐ก โข ( ๐ฅ ๐ก ) + ๐ฆ โ ๐ฅ ๐ก ๐ 0 2 โ ๐ ๐ก 2 ] โข ฮ โข ๐ก + ๐ ๐ก โข ฮ โข ๐ก โข ๐ง , ๐ง โผ ๐ฉ โข ( 0 , ๐ ) .
(62)
Algorithm 3 implements the sampling procedure for a general score network ๐ โข ( ๐ฅ ~ , ๐ก ) .
Algorithm 3 Denoising Reverse-time SDE 1: Input: observation ๐ฆ , initial noise level ๐ 0 , ๐ max , ๐ min , number of steps ๐ 2: ๐ก 0 โ ( log โก ๐ 0 โ log โก ๐ min ) / ( log โก ๐ max โ log โก ๐ min ) 3: ฮ โข ๐ก โ 1 / ๐ 4: ๐ โ โ ๐ก 0 / ฮ โข ๐ก โ 5: ๐ฅ ๐ โ ๐ฆ 6: for ๐
๐ , โฆ , 1 do 7: Draw ๐ง ๐ โผ ๐ฉ โข ( 0 , ๐ ) 8: ๐ก ๐ โ ๐ โข ฮ โข ๐ก 9: ๐ ๐ โ ๐ min โ ( ๐ max / ๐ min ) ๐ก ๐ 10: ๐ ๐ โ ๐ ๐ โ 2 โข log โก ( ๐ max / ๐ min ) 11: ๐ฅ ๐ โ 1 โ ๐ฅ ๐ + ๐ ๐ 2 โข [ ๐ โข ( ๐ฅ ๐ , ๐ก ๐ ) + ( ๐ฆ โ ๐ฅ ๐ ) / ( ๐ 0 2 โ ๐ ๐ 2 ) ] โข ฮ โข ๐ก + ๐ ๐ โข ฮ โข ๐ก โข ๐ง ๐ 12: end for 13: return ๐ฅ 0 Appendix EComparison with Conffusion
In this section, we discuss the differences between the Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ framework proposed by Horwitz and Hoshen [26] and the contributions of this paper. Although similar in spirit and motivation, the two are fundamentally different and address separate and distinct questions. Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ deploys ideas of quantile regression [34] to fine-tune an existing score network and obtain an image regressor equipped with heuristic uncertainty intervals. Such intervals can then be conformalized to provide risk control. On the other hand, our ๐พ -RCPS is a novel high-dimensional calibration procedure for any stochastic sampler and any notion of uncertainty (i.e., it is agnostic of the notion of uncertainty), including diffusion models and quantile regression. In this paper, we present ๐พ -RCPS for diffusion models given their remarkable performance in solving inverse problems via conditional sampling.
In order to contextualize the choices of parametrization for the families of set predictors in Equations 19 and 20, we highlight a gap between the presentation of Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ in the arXiv paper available at https://arxiv.org/abs/2211.09795v1 and the official code release in https://github.com/eliahuhorwitz/Conffusion. In particular Section 3.2 of Horwitz and Hoshen [26] introduces the following family of set predictors:
โ ๐ , Conffusion arXiv โข ( ๐ฆ ) ๐
[ ๐ ๐ โข ( ๐ ^ ๐ผ / 2 ) ๐ , ๐ ๐ โข ( ๐ ^ 1 โ ๐ผ / 2 ) ๐ ] ,
(63)
which does not satisfy the nesting property in Equation 5. It is easy to see thatโwithout any further assumptionsโas ๐ ๐ increases, โ ๐ , Conffusion arXiv โข ( ๐ฆ ) ๐ does not cover the interval [ 0 , 1 ] and the 01 loss in Equation 10 is not entrywise monotonically non-increasing. This renders the RCPS procedure (and most calibration procedures) not applicable as presented in the arXiv paper. While this family of set predictors is thus not applicable, we resorted to the authorsโ GitHub release where such parametrization is different (see https://github.com/eliahuhorwitz/Conffusion/blob/fffe5c946219cf9dead1a1c921a131111e31214e/inpainting_n_conffusion/core/calibration_masked.py#L28). More precisely, the GitHub implementation reflects the multiplicative parametrization presented in Equation 19, which does indeed satisfy the entrywise nesting property. Finally, we additionally propose the additive parametrization in Equation 20 to showcase how ๐พ -RCPS is agnostic of the notion of uncertainty and it can be used in conjunction with Con ๐ โข ๐ โข ๐ข โข ๐ โข ๐ โข ๐ โข ๐ .
Appendix FErratum of Theorem 3.3
In this section, we address the differences between the version of Theorem 3.3 presented in this manuscript and the published paper available at https://proceedings.mlr.press/v202/teneggi23a.html, which contains a small typo. We include the two theorems side-by-side with differences highlighted.
Theorem (This manuscript). Let โ : ๐ณ ร ๐ณ โฒ โ โ , ๐ณ โฒ โ 2 ๐ณ , ๐ณ โ โ ๐ be an entrywise monotonically nonincreasing function and let { โ ๐ โข ( ๐ฆ )
[ ๐ ^ ๐ โ ๐ ๐ , ๐ข ^ ๐ + ๐ ๐ ] } ๐ โ ฮ ๐ be a family of set-valued predictors โ : ๐ด โ ๐ณ โฒ , ๐ด โ โ ๐ indexed by ๐ โ ฮ ๐ , ฮ โ โ ยฏ , for some lower and upper bounds ๐ ^ ๐ < ๐ข ^ ๐ that may depend on ๐ฆ . Given ๐ฒ ~
๐
~
+
๐
โข
๐
,
๐
โ
ฮ
, for fixed
๐
~
โ
โ
๐
and
๐
โ
โ
๐
,
๐
โฅ
0
, if
๐
^
arg โก min ๐ โ ๐ฒ ~ โข โ ๐ โ [ ๐ ] ๐ ๐ s.t. ๐ ^ + โข ( ๐ + ๐ฝ โข ๐ผ ) < ๐
โ ๐ฝ โฅ 0 , then โ ๐ ^ โข ( ๐ฆ ) is an ( ๐ , ๐ฟ ) -RCPS and ๐ ^ minimizes the mean interval length over ๐ฒ ~ .
Theorem (Published paper). Let โ : ๐ณ ร ๐ณ โฒ โ โ , ๐ณ โฒ โ 2 ๐ณ , ๐ณ โ โ ๐ be an entrywise monotonically nonincreasing function and let { โ ๐ โข ( ๐ฆ )
[
๐
^
๐
โ
๐
๐
,
๐ข
^
๐
+
๐
๐
]
}
๐
โ
ฮ
๐
be a family of set-valued predictors
โ
:
๐ด
โ
๐ณ
โฒ
,
๐ด
โ
โ
๐
indexed by
๐
โ
ฮ
๐
,
ฮ
โ
โ
ยฏ
, for some lower and upper bounds
๐
^
๐
<
๐ข
^
๐
that may depend on
๐ฆ
. For a fixed vector
๐
โ
โ
๐
,
๐
โฅ
0
, if
๐
^
arg โก min ๐ โ ฮ ๐ โข โ ๐ โ [ ๐ ] ๐ ๐ s.t. ๐ ^ + โข ( ๐ + ๐ฝ โข ๐ผ ) < ๐
โ ๐ฝ โฅ 0 , then โ ๐ ^ โข ( ๐ฆ ) is an ( ๐ , ๐ฟ ) -RCPS, and ๐ ^ minimizes the mean interval length.
Reason for the differences
The domain of the optimization problem in the published paper, i.e. ๐ โ ฮ ๐ , does not reflect the pointwise coverage guarantee provided by ๐ ^ + . That is, for each fixed ๐ โ ฮ ๐ , โ โข [ ๐ โข ( ๐ ) โค ๐ ^ + โข ( ๐ ) ] โฅ 1 โ ๐ฟ . To solve the optimization problem over ๐ โ ฮ ๐ , ๐ ^ + should provide uniform coverage, i.e. โ ๐ โ ฮ ๐ , โ โข [ ๐ โข ( ๐ ) โค ๐ ^ + โข ( ๐ ) ] โฅ 1 โ ๐ฟ .
Effect on results
We remark that the changes in Theorem 3.3 do not affect the ๐พ -RCPS procedure nor the results presented in the published version of the paper.
Appendix GFigures
This section contains supplementary figures.
(a)CelebA dataset. (b)AbdomenCT-1K dataset. Figure G.1:Empirical estimates of risk over 20 random draws of ๐ฎ cal . All combinations of notion of uncertainty and calibration procedure successfully control risk at level ๐
0.10 , 0.05 for the CelebA and AbdomenCT-1K dataset, respectively, with probability at least 1 โ ๐ฟ , ๐ฟ
0.10 . Generated by L A T E xml Instructions for reporting errors
We are continuing to improve HTML versions of papers, and your feedback helps enhance accessibility and mobile support. To report errors in the HTML that will help us improve conversion and rendering, choose any of the methods listed below:
Click the "Report Issue" button. Open a report feedback form via keyboard, use "Ctrl + ?". Make a text selection and click the "Report Issue for Selection" button near your cursor. You can use Alt+Y to toggle on and Alt+Shift+Y to toggle off accessible reporting links at each section.
Our team has already identified the following issues. We appreciate your time reviewing and reporting rendering errors we may not have found yet. Your efforts will help us improve the HTML versions for all readers, because disability should not be a barrier to accessing research. Thank you for your continued support in championing open access for all.
Have a free development cycle? Help support accessibility at arXiv! Our collaborators at LaTeXML maintain a list of packages that need conversion, and welcome developer contributions.
Report Issue Report Issue for Selection
Xet Storage Details
- Size:
- 93.5 kB
- Xet hash:
- cabee630903582d8e42aaee2e94ab5da031af5b2f8b7d186078eb0cdb4d5a468
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.