Taro Logo
0

How to best handle production errors?

Profile picture
Software Engineer at Startup2 months ago

I have a startup that I'm working on by myself and recently had what I would classify as a "SEV 1 Production Incident" from a tiny feature update that greets the user when you login.

In the mobile app if a user would login the screen would instantly go grey and no action could be performed again, even if the user exited and reopened the app.

The issue occurred due to an exception being raised for a string function that was called.

Claude 3.7 wrote that quick greeting function and I didn't pick up on any issues as in other languages (like JavaScript) there wouldn't be an exception for that specific string function (subString).

But in Dart (the app is made in Flutter) it raises an out of bounds error.

How I handled it
- Deployed a hot-fix via shorebird patching (Codepush for Flutter)
- Sent an email to all affected users with the fix (Shorebird requires users to exit and reopen the app to apply the fix)
- Updated the welcome email, website to alert people of the issue & a Threads post
- Sent an immediate App Store update with the true fix

My take aways
- Compartmentalise UI components so 1 breaking won’t break the entire app
- Add sentry to log errors so I get emails when this happens (current error logging just gets sent to the server but with no email or detailed info)
- Add a push notification service so users can be informed of any fixes/issues straight away
- E2E tests might be a good idea

Luckily my app was only recently released so not many users were affected and the true fix got approved quickly.

But, what are some ways that you have addressed high severity prod bugs and could I have done anything differently?

Thanks!

30
5

Discussion

(5 comments)
  • 1
    Profile picture
    Thoughtful Tarodactyl
    Taro Community
    2 months ago

    Might be worth looking into https://www.coderabbit.ai/ as an extra check when deploying code

  • 1
    Profile picture
    Employee @ Robinhood
    2 months ago

    Did you manually test the change before merging & deploying the code?

    • 0
      Profile picture
      Software Engineer [OP]
      Startup
      2 months ago

      Unfortunately it was only triggered when the day of the date was between 1-9!
      So even when I tested it, I didn't properly consider how it would handle that case

  • 0
    Profile picture
    Helpful Tarodactyl
    Taro Community
    2 months ago

    The suggestions other Taro members have provided are all amazing!

    Would like to second Jonathan's point about testing the code before merging or deploying to staging/prod.

    I like how you spend time reflecting on the incident and thinking of ways to improve. Maybe you could write a document similar to a postmortem following up on the error?

    • 0
      Profile picture
      Software Engineer [OP]
      Startup
      2 months ago

      I did test it but as it was todo with a certain date, the exception was only triggered from the 1st-9th of any given month!

      Incredibly easy bug to fix as I just wasn't handling the substring correctly as the language I'm using throws an exception with substring when the most common implementation of substring in js, doesn't.

      Because it was a tiny cosmetic feature I just let AI write the 10 lined function and chucked it into prod when it looked like it was working haha.

      It has definitely been a learning experience and I had a full postmortem meeting (with myself).