US degraded performance

Incident Report for PostHog

Resolved

Lag has recovered and the system is completely functional again.

Sorry for any inconvenience caused by this incident.

Posted May 13, 2025 - 07:03 UTC

Monitoring

The cluster is now responsive and the data ingestion has been resumed. The app is responding better now.

We are still monitoring a couple of fixes we have pushed. We identified a query that was flooding the cluster and which may have been the root cause of this.

Posted May 12, 2025 - 21:56 UTC

Update

We have recovered a good part of the cluster, but we are still working to bring it back completely.

The performance may be still degraded. We think some problematic queries may have been the root cause, we are still investigating it.

Posted May 12, 2025 - 21:10 UTC

Update

We are trying to bring back the cluster.

The app may be completely unresponsive, and lag is expected during this time, we'll try to provide an update as soon as possible.

Posted May 12, 2025 - 20:05 UTC

Investigating

We have detected a partial outage in our ClickHouse cluster and it's impacting the application response and performance getting insights.

We are investigating the root cause.

Posted May 12, 2025 - 18:41 UTC

This incident affected: US Cloud 🇺🇸 (App, Event and Data Ingestion Lag).