Abstract (eng)
The aim of the present work is to construct prediction intervals via a Jackknife-approach whose coverage probability conditional on the training data is close to its nominal level in finite samples and can be asymptotically valid in high-dimensions. The main innovation is to generalize the results of Steinberger and Leeb (2023, The Annals of Statistics 51.1, 290–311) to a non-continuous response distribution and to the case of non-linear models. More specifically, this work is split into four parts: in the first part we link the prediction interval's coverage probability to the accuracy of estimating the distribution of the prediction error in different metrics. While in the case of a continuous distribution the Kolmogorov distance is a suitable choice, we introduce the epsilon-variational divergence to deal with the non-continuous case and discuss advantages to the Kolmogorov distance, the Lp-norm and the Lévy metric. Moreover, the usability (i.e. the informativeness) of the epsilon-variational divergence extends to the estimation of other functions of the prediction error, such as the mean-squared prediction error or the mean-absolute prediction error. In the second part of the work, we define an approach based on the Jackknife for the estimation of the prediction error's distribution conditional on the training data. Thirdly, we present upper bounds for the distance between the conditional prediction error's distribution and its estimate measured in terms of different measurements of distance. We state our results both in finite sample and asymptotically. Our results include both the low-dimensional and the high-dimensional case. Moreover, we show that the prediction error's distribution can be estimated consistently if two conditions are fulfilled: the prediction error should be bounded in probability and the prediction algorithm should satisfy a stability condition. In the last part we show that under mild assumptions these two properties are fulfilled for the OLS estimator and the James-Stein estimator in a low-dimensional setting, for the minimum-norm interpolator in high-dimensions and for the ridge regression regardless of the number of regressors. Furthermore, we also present an example in the case of binary classification where the corresponding predictor fulfills these properties.