Title | : | Towards exponentially cheap AI |
Speaker | : | Aditya Desai (Doctoral Student @ Rice University) |
Details | : | Tue, 13 Feb, 2024 11:00 AM @ SSB-233 |
Abstract: | : | The recent advancements in the capabilities of AI models have been extraordinary. However, training and deploying these models is prohibitively costly. The primary reason for increasing costs is the exponential increase in model sizes, which requires commensurate computing and memory resources. The high resource usage is the root of many problems: (1) Training these AI models is possible for only a few large corporations -- the majority of the AI community cannot participate and benefit from the growth opportunities it offers. (2) Training these models is financially draining and leaves an enormous carbon footprint. (3) Even the developed models cannot be deployed on devices we interact with the most, such as phones, etc. How do we make AI more resource-efficient? The existing research in efficiency can only provide a constant factor improvement in performance. Thus, to combat the exponential demand for resources, we need to rethink the efficiency of AI more fundamentally. In this talk, I will discuss a new approach to making ML models efficient, drawing inspiration from probabilistic methods. Focusing on reducing the model's memory, I will discuss how we developed a model compression method that provides better memory-quality tradeoff and exponentially reduced the size of the Deep Learning Recommendation Model (DLRM) from 100GB to 10MB without losing quality, improving throughput by a factor of 3x and decreasing training and inference costs up to a factor of 25x. |