Title | : | Interpretability Using Compact Models |
Speaker | : | Abhishek Ghose (IITM) |
Details | : | Mon, 20 Feb, 2023 11:00 AM @ online meeting |
Abstract: | : | The increased use of Machine Learning in various real world systems has led to a corresponding increased need for models to be understandable, either by being interpretable or explainable. We focus on interpretability in our work, noting that it is preferred for models to be small in size for them to be considered interpretable, e.g., a decision tree of depth 5 is easier to interpret than one of depth 50. Smaller models also tend to have high bias. This suggests an inherent trade-off between interpretability and accuracy. Our work addresses this by proposing techniques to create compact models: small-sized models that minimize the difference in accuracy relative to their larger counterparts. We propose two model agnostic techniques to construct such models. Both of them operate on the assumption that focusing on specific instances during model training potentially leads to increased model accuracy. The task of identifying such instances and their relative influence is formulated as that of learning a sampling distribution. The distribution is represented as an infinite Beta mixture model, and its parameters are learned using an optimization procedure that maximizes held-out accuracy. The approaches differ in the representation of the optimization search space: in the first case, referred to as the density tree based approach, information about proximity of instances to class boundaries is used. This information is derived from the input space partitioning produced by decision trees. The second technique, referred to as the oracle based approach, utilizes the uncertainty in predictions of an oracle model. Rigorous empirical validation of the proposed techniques is presented that uses multiple real world datasets and interpretable models with various notions of model size. We observe statistically significant relative improvements in the F1-score - occasionally greater than 100% - between a model of a given size trained in a standard manner, and a compacted version of the same size created by our techniques. As a corollary, we challenge the conventional wisdom that train and test data need to be drawn from the same distribution for optimal learning, instead showing that this is not tru e when model sizes are small. |