Resolved -
All data ingestion has caught up and our systems are back to normal. Thank you for your patience.
Jan 7, 14:28 UTC
Monitoring -
We've mitigated the underlying issue in our datastore and are now catching back up. We will be totally caught up in ~8 hours. No events were lost in this incident and we have identified several steps we will take to prevent this from happening again. <3 Posthog
Jan 7, 05:20 UTC
Update -
We're still monitoring our application. Our lag is at 4 hours right now—i.e., events appear inside PostHog 4 hours after they're sent—while we continue recovering some of the affected infrastructure. We'll update this once we have more information.
No data has been lost as we're still collecting all the events.
Jan 7, 03:36 UTC
Update -
In an attempt to sort out the problem , we've restarted some of our Clickhouse nodes. For that reason, we're also seeing some delays in batch exports. All batch exports will complete once the nodes are back online.
Jan 6, 20:36 UTC
Investigating -
We're still seeing degraded performance when ingesting events and session replays. We're still working on tweaking our internal infrastructure to process the backlog.
There will be no data loss as we're still storing all of the events for later processing.
Jan 6, 20:23 UTC
Update -
We're still monitoring our infrastructure. Ingestion delay has plateaued at around 45 minutes of delay between sending the event vs. you seeing the events inside our platform. It's important to notice that there's no data loss as everything is being captured appropriately.
Jan 6, 18:24 UTC
Monitoring -
We've identified the likely culprit behind the increased ingestion time. We're monitoring it closely and expect to see a decrease in our ingestion lag soon.
Jan 6, 16:22 UTC
Investigating -
We're experiencing some delay in ingesting new events. We're investigating.
Jan 6, 15:56 UTC