Resolved -
Lag has recovered and the system is completely functional again.
Sorry for any inconvenience caused by this incident.
May 13, 07:03 UTC
Monitoring -
The cluster is now responsive and the data ingestion has been resumed. The app is responding better now.
We are still monitoring a couple of fixes we have pushed. We identified a query that was flooding the cluster and which may have been the root cause of this.
May 12, 21:56 UTC
Update -
We have recovered a good part of the cluster, but we are still working to bring it back completely.
The performance may be still degraded. We think some problematic queries may have been the root cause, we are still investigating it.
May 12, 21:10 UTC
Update -
We are trying to bring back the cluster.
The app may be completely unresponsive, and lag is expected during this time, we'll try to provide an update as soon as possible.
May 12, 20:05 UTC
Investigating -
We have detected a partial outage in our ClickHouse cluster and it's impacting the application response and performance getting insights.
We are investigating the root cause.
May 12, 18:41 UTC