Taro Logo
1 Like

Paper Reading: Self-supervised Vision Transformers (Foundational Model)

Session led by Mandar Deshpande

Dino is a self-supervised learning approach for computer vision tasks. It leverages vision transformers without using any labeled data. Dino v2 introduces several improvements over its predecessor, enhancing performance and scalability. It achieves state-of-the-art results on various benchmarks. Notably, Dino v2 demonstrates strong performance on downstream tasks such as image classification and object detection.

We will be walking through

  • Basics of computer vision
  • Self supervised learning
  • ViT (vision transformers)
  • DINO
  • DINO v2 model

Paper link - https://arxiv.org/abs/2304.07193

Event link