Taro Logo
Profile picture
Yayun Jin, Ph.D.ML Engineer at Reddit | Ex-Microsoft & Workday | Mentoring 200+ Engineers into ML Roles

ML Algorithm Example: K-means

In this segment, we walk through a high-level structure for implementing k-means clustering from scratch, focusing on clarity, modularity, and real-time reasoning. The emphasis is not on memorizing code, but on understanding the logic and being able to build it confidently.

  • We start by defining an __init__ method to store core parameters like the number of clusters (k) and max iterations.
  • We initialize centroids by randomly selecting k data points; interviewers may also ask about smarter strategies like k-means++.
  • In the assign_clusters step, we calculate distances from each point to every centroid and assign points to the nearest one.
  • The update_centroids method recomputes each centroid as the mean of its assigned points; optionally, we may include a loss function to track clustering quality.
  • The fit method loops through initialization, assignment, and updates until convergence, while predict assigns clusters to new data after training.
  • Most importantly, we aim for clean, modular, and well-structured code, and narrate our logic during the interview to demonstrate deep understanding—not just syntax.

If you want to learn even more from Yayun: