Error 403 when opening app pages

Incident Report for codefortynine

Postmortem

We updated our app servers starting at 15:21 UTC with a changed deployment script. Unfortunately, the new script used the wrong configuration, which caused our apps to be pointed to the wrong database tables. The apps then didn’t know how to authenticate our users, falling back to a “403 error” instead. Fortunately it included a message to contact support@codefortynine.com where we were notified by a few customers starting at 15:44 UTC.

The cause of the issue was identified very quickly as the deployment script change was just rolled out. We started deploying the correct configuration at 16:03 UTC which finished at 16:15 UTC with the last servers using the wrong configuration shutting down. No data was lost and only workflow triggers and other webhooks (e.g. Slack notifications) might have been affected.

We’re very sorry that this has happened at all since we didn’t have a critical outage for more than a year. It was the first time that we were able to communicate the outage on statuspage which helped to quickly communicate the status to our customers.

In terms of preventing an outage like that in the future, we’ll definitely be more careful rolling out deployment changes. We also want to be alerted earlier so we don’t rely on customers highlighting a critical outage for us. This will be achieved by implementing live tests, that will continuously run our apps and report any errors to ourselves. We’re already alerted if the number of 403 (and other 4xx errors) is unusually high, but it would’ve probably taken another hour until we would’ve realized that a critical outage is ongoing.

We’re committed to make high quality apps for the Atlassian Marketplace and will continue to increase the reliability of our apps in the future.

Posted Mar 18, 2020 - 09:27 UTC

Resolved

This incident has been resolved.

Posted Mar 17, 2020 - 16:31 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Mar 17, 2020 - 16:17 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted Mar 17, 2020 - 16:08 UTC

Investigating

We are currently investigating this issue.

Posted Mar 17, 2020 - 16:07 UTC

This incident affected: Deep Clone for Jira App, Google Calendar for Confluence, Merge Agent for Jira, Quick Filters for Jira Dashboards, Slack for Confluence, Snipe-IT for Jira, and Version Sync for Jira.