ok this gets hairy. we don't have lots of structural data for mutations, so the backbone similarity used in this paper is maybe not the best way to extract information from the model.
however there are lots of papers that look at correlations of model outputs against protein stability data