Taro Logo
0

What are the best methods to debug and fix a bug in production?

Profile picture
Software Engineer at Government15 days ago

These are tremendously tricky to deal with, so what are your best strategies to navigate these pesky bugs?

Under the assumption: it only occurs in the highest level prod environment.

45
5

Discussion

(5 comments)
  • 3
    Profile picture
    Tech Lead/Manager at Meta, Pinterest, Kosei
    14 days ago

    How long does it take to deploy code, and what's the cost of the bug in production?

    If the deployment speed is very fast and the bug is not very severe, one option is to just make (educated) guesses with blind fixes, e.g. guard different method calls or add checks.

    The more methodical fix is to add logging and instrumentation so you can see where the bug is coming from. Use something like posthog or a log collection tool to figure out what the errors are.

    See this case study from Meta: https://www.jointaro.com/lesson/xAVw3j6fAB1GR9LUnq8n/meta-case-study-debugging-a-massive-production-issue/

  • 2
    Profile picture
    Tech Lead @ Robinhood, Meta, Course Hero
    14 days ago

    Here's my overall process to fix any bug:

    1. Understand the end-to-end flow (break it down into steps)
    2. Go through each step and see if it works/breaks
    3. Once you find the breaking step, analyze the code and find the fix

    If you can never find the exact breaking step, you probably have to decompose it down more into sub-steps.

    In other words, follow the advice from here: [Masterclass] How To Become A Debugging Master And Fix Issues Faster

    The very tricky part is when you have limited observability, so it's hard to figure out if a particular step is breaking or even what the exact steps are. In that case, I recommend sharing more exact context when you ask for support in Taro, and we can creatively jam on some ideas 😊

    • 3
      Profile picture
      Friendly Tarodactyl
      Taro Community
      14 days ago

      I can vouch for the limited observability part. Sometimes mobile app problems are so much harder to fix than backend. We can't ssh into users phone, while we can easily have root control of backend server. Mobile app uses different libraries in different phones, while we have 100% control in backend

    • 1
      Profile picture
      Tech Lead @ Robinhood, Meta, Course Hero
      14 days ago

      Mobile issues can indeed be very gnarly, especially on Android. If you have a more global app (like Instagram where I spent the biggest chunk of my career), there are going to be a lot of users on janky old phones that are on an ancient version of the Android OS and have a weird screen size. Pain.

    • 0
      Profile picture
      Software Engineer [OP]
      Government
      13 days ago

      This particular issue;

      • Intermittent
      • On Mobile
      • It can only be tested if built to the phone (no f5 debugging)
      • Testing on the Google Play Store vs building directly to the phone results in various failure %s

      So a very tricky issue.

      I might have fixed it now but I ended up implementing:

      • Logging service that sends errors and stacks to a local API and displays physical toasts
      • Code push to more quickly deploy the app to the Play Store
      • Posthog
      • Firebase crashlytics (this was a gamechanger as it was erroring in an abstraction layer via a firebase auth package whose error couldn't easily be observed)