Unexpected Outage Postmortem

At 3.55am AEST we started getting notifications of degraded performance across our network. Then suddenly, everything stopped working, none of our servers were talking to one another.

As a startup this is your worst nightmare, entire systems just failing during the middle of the night without explanation. We are actually prepared for this scenario with redundant failovers of systems.

But this was different, all our hardware was completely fine. For some reason none of the servers were able to communicate with each other.

After some back and forth with the datacenter we were notified that some upstream network hardware in the datacenter failed, which had completely killed the internal network. So our servers could accept information from the outside, but not transmit it internally (when you have 20+ servers this is a fairly serious problem). We waited for them to replace the hardware and restore the network which happened at 4.41am AEST.

We're sincerely sorry for what happened this morning, even 1 minute of downtime is unacceptable for us.

See Next Article

Backend UI Updates

We've added a few features to improve the backend UI experience, particularly when it comes to accessing or dealing with larger amounts of data.