Spaces:

Ayushs799
/

Conformal_Predictions

Sleeping

App Files Files Community

Ayushs799 commited on Nov 15, 2023

Commit

d3ca6ca

1 Parent(s): f7b62f3

Upload Introduction.md

Browse files

Files changed (1) hide show

Introduction.md +29 -2

Introduction.md CHANGED Viewed

@@ -2,8 +2,8 @@
 In Decision-Making, Machine Learning models need to not only make predictions but also quantify their predictions' uncertainty. A point prediction from the model might be dramatically different from the real value because of the high stochasticity of the real world. But, on the other hand, if the model could estimate the range which guarantees to cover the true value with high probability, model could compute the best and worst rewards and make more sensible decisions.
 For example
-* While buying a house, the predictions' upper bound can be useful for a buyer to be certain whether it will be able to buy a house or not.
-* While Identifying an object, Applying threshold on softmax predictions can help us identify what that object could be.
 Conformal prediction is a technique for quantifying such uncertainties for AI systems. In particular, given an input, conformal prediction estimates a prediction interval in regression problems and a set of classes in classification problems. Both the prediction interval and sets are guaranteed to cover the true value with high probability.
@@ -11,3 +11,30 @@ Conformal prediction is a technique for quantifying such uncertainties for AI sy
 #### Theory

 In Decision-Making, Machine Learning models need to not only make predictions but also quantify their predictions' uncertainty. A point prediction from the model might be dramatically different from the real value because of the high stochasticity of the real world. But, on the other hand, if the model could estimate the range which guarantees to cover the true value with high probability, model could compute the best and worst rewards and make more sensible decisions.
 For example
+* While buying a house, the predictions' upper bound can be useful for a buyer to be certain whether they will be able to buy a house or not.
+* While Identifying an object, applying threshold on softmax predictions can help us identify what that object could be.
 Conformal prediction is a technique for quantifying such uncertainties for AI systems. In particular, given an input, conformal prediction estimates a prediction interval in regression problems and a set of classes in classification problems. Both the prediction interval and sets are guaranteed to cover the true value with high probability.
 #### Theory
+### 1. Prediction Regions
+Prediction regions represent intervals that contain the true value of the prediction with a certain confidence level. In regression, this is often expressed as a prediction interval. Let's denote the prediction region as \[$a$, $b$\], where $a$ and $b$ are the lower and upper bounds, respectively. The confidence level is denoted by $\alpha$. In classification, the prediction region is a set of classes that's above a certain threshold. The threshold is calculated by $\alpha$.
+### 2. Validity
+A conformal predictor is considered valid if the true value falls within the predicted region with the specified confidence level over repeated experiments. Mathematically, for a given prediction $\hat{y}$ and a true outcome $y$, the validity condition is given by $P(y \in [a, b]) \geq 1 - \alpha$. In classification, the validity condition is given by $P(y \in \hat{C}) \geq 1 - \alpha$, where $\hat{C}$ is the predicted set of classes.
+### 3. Inductive Conformal Prediction
+Inductive Conformal Prediction is characterized by its adaptability to the data at hand without relying on specific assumptions about the underlying distribution. It ensures flexibility across various types of problems. The algorithm is as follows:
+1. Given a dataset $D = \{(x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\}$, split it into a training set $D_{train}$ and a calibration set $D_{cal}$.
+2. Train a machine learning model $M$ on $D_{train}$.
+3. For each test sample $x_i$ in the calibration set $D_{cal}$, compute the prediction $\hat{y}_i$ and the prediction error $e_i = |y_i - \hat{y}_i|$ (for example, MAE in case of Regression).
+4. Sort the prediction errors in ascending order and select the top $k$ errors, where $k$ is the number of samples in the calibration set $D_{cal}$.
+5. For each test sample $x_i$ in the calibration set $D_{cal}$, compute the prediction region $R_i$ using the top $k$ errors.
+6. For each test sample $x_i$ in the test set $D_{test}$, compute the
+    - prediction region $R_i$ using the top $k$ errors.
+    - validity score $s_i$ by counting the number of times the true value $y_i$ falls within the prediction region $R_i$ over repeated experiments.
+    - p-value $p_i$ by dividing the validity score $s_i$ by the number of repeated experiments.
+    - prediction region $R_i$ using the top $k$ errors and the p-value $p_i$.
+    - validity score $s_i$ by counting the number of times the true value $y_i$ falls within the prediction region $R_i$ over repeated experiments.
+    - p-value $p_i$ by dividing the validity score $s_i$ by the number of repeated experiments.
+    - prediction region $R_i$ using the top $k$ errors and the p-value $p_i$.