All Systems Operational
PostHog.com ? Operational
90 days ago
100.0 % uptime
Today
US Cloud 🇺🇸 Operational
90 days ago
99.96 % uptime
Today
App ? Operational
90 days ago
99.92 % uptime
Today
Event and Data Ingestion Operational
90 days ago
100.0 % uptime
Today
Feature Flags and Experiments ? Operational
90 days ago
99.97 % uptime
Today
EU Cloud 🇪🇺 Operational
90 days ago
99.98 % uptime
Today
App ? Operational
90 days ago
99.95 % uptime
Today
Event and Data Ingestion Operational
90 days ago
100.0 % uptime
Today
Feature Flags and Experiments ? Operational
90 days ago
99.99 % uptime
Today
Support APIs Operational
90 days ago
100.0 % uptime
Today
Update Service Operational
90 days ago
100.0 % uptime
Today
License Server Operational
90 days ago
100.0 % uptime
Today
AWS US 🇺🇸 Operational
AWS ec2-us-east-1 Operational
AWS elb-us-east-1 Operational
AWS rds-us-east-1 Operational
AWS elasticache-us-east-1 Operational
AWS kafka-us-east-1 Operational
AWS EU 🇪🇺 Operational
AWS elb-eu-central-1 Operational
AWS elasticache-eu-central-1 Operational
AWS rds-eu-central-1 Operational
AWS ec2-eu-central-1 Operational
AWS kafka-eu-central-1 Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
US Ingestion End to End Time ?
Fetching
US Decide Endpoint Response Time
Fetching
US App Response Time
Fetching
US Event/Data Ingestion Response Time
Fetching
EU Ingestion End to End Time ?
Fetching
EU App Response Time
Fetching
EU Decide Endpoint Response Time
Fetching
EU Event/Data Ingestion Endpoint Response Time
Fetching
Past Incidents
Jul 27, 2024

No incidents reported today.

Jul 26, 2024

No incidents reported.

Jul 25, 2024
Resolved - This incident has been resolved.
Jul 25, 11:40 UTC
Update - We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.
Jul 25, 10:35 UTC
Investigating - We have detected a connectivity issue between our ClickHouse cluster and ZooKeeper.

We are working on bringing it back to normal. It may be possible to suffer some delays on event ingestion or problems with the queries performed.

Jul 25, 10:30 UTC
Resolved - Everything is back to normal operation.
Jul 25, 09:50 UTC
Monitoring - We identified a slow query that was slowing down our primary app database. We are monitoring the situation to root cause the issue.
Jul 25, 08:49 UTC
Investigating - We're experiencing issues with our EU region that is causing the app to be unreachable and data processing to be delayed. We are investigating the cause.
Jul 25, 08:36 UTC
Jul 24, 2024

No incidents reported.

Jul 23, 2024
Resolved - Data is ingesting normally.
Jul 23, 20:57 UTC
Identified - Our data processing infrastructure is running behind which is causing inaccuracies in the reporting tools. No data has been lost and the system should be caught up shortly.
Jul 23, 20:11 UTC
Resolved - We are all caught up on all replicas and all events are accounted for. We've also identified exactly what the root cause is that caused this per partition lag and have mitigating steps in place that will prevent us from having this issue again. We apologize for any inconvenience this may have caused.
Jul 23, 01:15 UTC
Update - We've recovered our backlog on all but one instance which is going slower than anticipated. All recent events that have come in since Saturday are up to date, but a small percentage of events that came in on Thursday - Friday of last week are still in flight. As long as events continue to be processed on this node at the current rate we will be fully up to date by the end of the day.
Jul 22, 17:33 UTC
Update - We will be 100% caught up on all events by the end of day today. We'll send out another status update as soon as that backfill is complete.
Jul 21, 14:33 UTC
Monitoring - We've identified the root cause of the issue and have mitigated it. We have also kicked off a backfill that will run over the weekend. We are shooting to have all events back in order and up to date by Monday morning. Expect updates over the weekend on the progress of the backfill of missing events. Thank you all for your patience and we hope you enjoy the rest of your Friday and the weekend!
Jul 19, 22:53 UTC
Update - We continue investigating. We are close to understand the reason behind the event ingestion problem. It seems the root cause is not in the Kafka table engine, but on our write path to the distributed tables.

Events ingestion has been resumed, but it's going slowly to avoid those events disappearing, so there will be lag in the ingestion for some hours. We are working on pushing another patch to fix the lag.

After that is solved, we'll start the event backfill for the missing dates.

Jul 19, 11:36 UTC
Update - We are investigating an issue with our kafka table engines and have purposely induced lag on our pipeline. All events are safe and will show up after this investigation, but for the moment we will fall behind on processing events and you will notice the last few hours missing in your reporting.
Jul 18, 16:51 UTC
Update - We have started event recovery.

Data may be missing since 2024-07-17 at 21:00 UTC. The missing events will eventually be available for querying.

We are now working on pushing a fix to avoid this happening again.

Jul 18, 13:29 UTC
Investigating - We've spotted that the events ingested are lower than expected. We are identifying the root cause of the issue.

No data have been lost and we are already tracing a plan to recover it, identifying the impacted dates.

Jul 18, 12:34 UTC
Jul 22, 2024
Jul 21, 2024
Jul 20, 2024

No incidents reported.

Jul 19, 2024
Resolved - Ingestion LAG has been recovered.
Jul 19, 03:44 UTC
Monitoring - The issue has been fixed and the ingestion LAG is recovering.
Jul 19, 02:06 UTC
Update - We've identified the issue and we are currently fixing it enable the ingestion again.
Jul 18, 22:36 UTC
Update - We're monitoring recovery of postgres now. Some tables are very large and this might take several hours - we're investigating whether we can work some magic to speed this up.

Current impact is still that event ingestion is delayed.

Note that this means person updates aren't being processed so any experiments or flags that rely on changes to person profiles won't see those until the event lag is resolved.

Jul 18, 18:19 UTC
Identified - We've spotted an issue with our postgres infrastructure and we're working to resolve it right now.

You'll experience ingestion lag since we're delaying event ingestion to reduce load while we fix this. No events have been lost.

We'll update with an expected time to recovery as soon as we have one. Sorry for the interruption.

Jul 18, 17:25 UTC
Jul 18, 2024
Jul 17, 2024

No incidents reported.

Jul 16, 2024
Resolved - This incident has been resolved.
Jul 16, 14:07 UTC
Monitoring - Recovery is continuing well and we expect to be caught up within an hour.

Sorry for the interruption!

Jul 16, 13:02 UTC
Identified - We've restarted our recordings ingestion infrastructure and ingestion is recovering. Folk will be experiencing between 40 minutes and 90 minutes of delay but that's already recovering quickly.

We're still looking into the root cause

Jul 16, 12:38 UTC
Investigating - We've spotted that recordings ingestion is delayed. we're investigating to identify why
Jul 16, 12:21 UTC
Jul 15, 2024

No incidents reported.

Jul 14, 2024
Resolved - Workers are processing queries as they arrive. All systems nominal.
Jul 14, 19:53 UTC
Monitoring - We have restarted the failed workers - queries are back to normal now.
Jul 14, 17:28 UTC
Identified - Async queries are failing - we are restarting the workers now.
Jul 14, 17:15 UTC
Investigating - Queries are timing out on EU, we are taking a look into what’s going on.
Jul 14, 16:51 UTC
Jul 13, 2024

No incidents reported.