This article evaluates the reliability of sensitivity tests. Using Monte Carlo methods we show that, first, the definition of robustness exerts a large influence on the robustness of variables. Second and more importantly, our results also demonstrate that inferences based on sensitivity tests are most likely to be valid if determinants and confounders are almost uncorrelated and if the variables included in the true model exert a strong influence on outcomes. Third, no definition of robustness reliably avoids both false positives and false negatives. We find that for a wide variety of data-generating processes, rarely used definitions of robustness perform better than the frequently used model averaging rule suggested by Sala-i-Martin. Fourth, our results also suggest that Leamer’s extreme bounds analysis and Bayesian model averaging are extremely unlikely to generate false positives. Thus, if based on these inferential criteria a variable is robust, it is almost certain to belong into the empirical model. Fifth and finally, we also show that researchers should avoid drawing inferences based on lack of robustness.