In this segment, we walk through a high-level structure for implementing k-means clustering from scratch, focusing on clarity, modularity, and real-time reasoning. The emphasis is not on memorizing code, but on understanding the logic and being able to build it confidently.
- We start by defining an
__init__
method to store core parameters like the number of clusters (k) and max iterations.
- We initialize centroids by randomly selecting k data points; interviewers may also ask about smarter strategies like k-means++.
- In the assign_clusters step, we calculate distances from each point to every centroid and assign points to the nearest one.
- The update_centroids method recomputes each centroid as the mean of its assigned points; optionally, we may include a loss function to track clustering quality.
- The fit method loops through initialization, assignment, and updates until convergence, while predict assigns clusters to new data after training.
- Most importantly, we aim for clean, modular, and well-structured code, and narrate our logic during the interview to demonstrate deep understanding—not just syntax.
If you want to learn even more from Yayun: