Name: Paper Lead: Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Start: 2024-05-17T16:00:00+00:00
End: 2024-05-17T16:00:00+00:00

NOTE: time change

In this talk, we dive into scaling Transformer-based Large Language Models (LLMs) using a groundbreaking technique called Infini-attention.

We'll explore how Infini-attention revolutionizes the traditional attention mechanism by incorporating compressive memory, masked local attention, and long-term linear attention—all in a single Transformer block. This innovative approach enables us to handle long-context tasks effortlessly, from language modeling benchmarks to complex text summarization, with LLMs ranging from 1 billion to 8 billion parameters.

But it's not just about handling massive inputs. This approach introduces minimal memory constraints, making it efficient and practical for real-world applications. Imagine the possibilities: fast streaming inference, seamless integration with diverse tasks, and the ability to unlock new frontiers in natural language understanding.

Whether you're an AI researcher, developer, or simply an AI aficionado, join me as we explore the power of Infini-attention and how it paves the way for scalable, efficient, and boundary-breaking language models. Get ready to be inspired by the infinite potential of Transformer-based LLMs!

Paper: https://arxiv.org/abs/2404.07143