Taro Logo

Paper Reading Session: Simple Testing Can Prevent Most Critical Failures

In this session, we will be discussing this paper titled "Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems".

Large, production quality distributed systems still fail periodically, and do so sometimes catastrophically, where most or all users experience an outage or data loss. This paper presents the result of a comprehensive study investigating 198 randomly selected, user-reported failures that occurred on Cassandra, HBase, Hadoop Distributed File System (HDFS), Hadoop MapReduce, and Redis, with the goal of understanding how one or multiple faults eventually evolve into a user-visible failure and how simple testing can prevent majority of them.

Presented by Sudhi Awasthi, an engineer working at Bloomberg. Connect with her on LinkedIn: https://www.linkedin.com/in/sudhiawasthi/.