Pooling CPU Memory for LLM Inference” was published by researchers at UC Berkeley. Abstract “The rapid growth of LLMs has ...