Elevated 502 and 525 error rate.
Incident Report for Xeno
Resolved
Hello, I’m Rémi, I manage our product and tech team here at Xeno.

I’m here to explain what happened to our service today.

I know this has been a very frustrating and trying time for you as an Xeno customer, and for that I apologize.

We failed to provide you with the service you deserve. I wish I could tell you this outage was unpredictable, or it was all an external party’s fault, but it wasn’t, and here’s what happened:

3 months ago, we released a feature allowing our users to generate an SSL certificate for their Public Knowledgebase custom domains. Those SSL certificates are generated using the awesome Let’s Encrypt API, and we had to change our own servers configurations to allow our web app to validate those domains using an “Acme Challenge”.

- Today at 04:47pm CEST, we were alerted that both our load-balancers and Cloudflare (our DNS and CDN provider) were unable to communicate with our production servers (serving both the Xeno x Slack API and the Chatbox API).
- Around 04:50pm CEST, we discovered that our internal SSL certificates (used to communicate in a secure way between our load-balancers and our servers) were outdated, resulting by a very high number of 525 errors.
Those certificated are (normally) auto-renewed by our servers themselves but, since we released the previously mentioned feature, they were unable to do so due to newly added domains restrictions, improving custom domains security.
- At 04:58pm CEST, we rolled back our NGINX configurations to re-allow our servers to renew their own certificates, and a few seconds later, our load-balancers were again capable to communicate with our app servers, resulting to a restoration of Xeno service.
- Service was fully restored at about 04:59pm CEST and all systems are still stable as of now.

We’re working on a complete rewrite and testing of our NGINX configuration to allow both features to work as expected.

I know that’s probably a lot of information, but our support team is standing by to help. Thanks again for being an awesome Xeno customer.

Rémi Delhaye
Chief Technology Officer, Xeno
Posted Oct 08, 2019 - 15:49 UTC
Monitoring
Our tech team fixed the issue and we're monitoring both Chat & Widget API and Xeno.app web application.
Posted Oct 08, 2019 - 15:03 UTC
Identified
We've identified the source of those errors and are working on a fix
Posted Oct 08, 2019 - 15:02 UTC
Investigating
We are currently investigating this issue.
Posted Oct 08, 2019 - 15:01 UTC
This incident affected: Chat & Widget API, Web app (xeno.app), and Corporate website..