Title | : | Deep Networks Learn Features for Classification from Local Label Imbalances |
Speaker | : | Prithaj Banerjee (IITM) |
Details | : | Thu, 30 May, 2024 10:30 AM @ SSB-233 |
Abstract: | : | Deep neural networks outperform kernel machines on several datasets because of 'feature learning'. A natural notion of features (for classification tasks) is discontinuities in the true label function (Bayes classifier) and the model function. The main feature learning hypothesis we make can be stated as -- "label function discontinuities attract model function discontinuities while training". To test this hypothesis, we perform experiments on classification data where the true label function is given by an oblique decision tree -- a setup that allows easy enumeration of label function discontinuities. We then design/construct a novel deep architecture called a Deep Linearly Gated Network (DLGN). The discontinuities of a DLGN in the input space can be easily enumerated, as opposed to a ReLU network. In this setup, we provide supporting evidence demonstrating the movement of model function discontinuities towards the label function discontinuities. We note that feature learning is essential for good performance in this setup -- standard ReLU nets and DLGNs outperform static kernel methods (such as Kernel-SVM) and tree methods. We then demonstrate the greater mechanistic interpretability of a DLGN by extracting the parameters of a decision tree from the parameters of a DLGN. We also show that the DLGN is competitive with ReLU networks and other tree-learning algorithms on several real-world tabular datasets. |