Confluence Status
Operational
Last incident: 4/14/2026
Current Status
Overall StatusOperational
Last IncidentDisrupted Rovo availability for Automation rules
Incident Statusresolved
Recent Incidents
Disrupted Rovo availability for Automation rules
4/14/2026, 12:25:17 PM
On April 14, 2026, affected users may have experienced some service disruption with automation rules that use Rovo agents. The issue has now been resolved, and the service is operating normally for all affected customers.
Affected Components:
View Content
iOS App
Create and Edit
Android App
Comments
Authentication and User Management
Search
Administration
Notifications
Marketplace Apps
Purchasing & Licensing
Signup
Confluence Automations
Cloud to Cloud Migrations - Copy Product Data
Server to Cloud Migrations - Copy Product Data
Users experiencing issues with login across Atlassian products
4/13/2026, 7:29:45 AM
### Summary
On April 13, 2026, between 05:49 and 06:29 UTC, customers experienced failures when attempting to log in, sign up, reset passwords, and complete multi-factor authentication flows across Atlassian cloud products. Approximately 90% of authentication requests failed during the peak impact window, affecting users in the US East and EU regions. The incident was mitigated within 40 minutes through manual intervention, and full service was restored by 06:29 UTC.
### **IMPACT**
* **Duration**: ~40 minutes \(05:49–06:29 UTC, April 13, 2026\)
* **Affected regions**: US East and EU \(authentication infrastructure serves EU traffic from US East, with traffic primarily from EU at this time of day\).
* **Affected products**: All Atlassian cloud products requiring authentication, including Jira, Confluence, Jira Service Management, and Trello.
* **Customer experience**: Users attempting to log in, sign up, reset passwords, or complete MFA flows received errors. Users already logged in with active sessions were unaffected.
### **ROOT CAUSE**
This incident had several contributing factors that combined to produce a failure that the system could not recover from without manual intervention.
**The primary cause** was a recently enabled change that caused our authentication infrastructure to retry requests to a downstream identity service when those requests were slow to respond. This retry behaviour was rolled out to 100% of traffic earlier the same day. Under normal conditions this would be benign, but it meant that any slowness in the downstream service was amplified. Since multiple upstream services were also independently retrying their own failed requests, the amplification compounded further into a retry storm.
**The trigger** was a burst of legitimate user traffic. A pattern of many parallel link preview requests for a single user caused a concentrated load spike on a downstream identity service, pushing its response times above the retry threshold. On its own, this kind of spike had occurred many times before and always recovered. With the retry amplification now in effect, the spike instead created a runaway feedback loop: slow responses caused retries, retries increased load, increased load caused slower responses, preventing recovery.
The incident was mitigated by manually scaling up the downstream identity service to provide sufficient capacity to absorb the amplified load. Once scaled, the service recovered immediately, bringing authentication error rates to zero within one minute.
**REMEDIAL ACTIONS PLAN & NEXT STEPS**
We are taking the following actions designed to prevent recurrence and improve our resilience:
1. **Immediate**: The retry-on-timeout change has been disabled.
2. **Load shedding and self-healing**: We are adding load shedding capabilities to our authentication services so that they can automatically shed excess load and self-recover during traffic spikes, without requiring action before automatic scaling starts.
3. **Reducing request fan-out**: We are reviewing patterns where a single user action can generate many parallel downstream requests, and will introduce methods where possible to reduce the amplification potential.
We apologize to customers whose services were interrupted by this incident and we are taking immediate steps to improve the platform’s reliability.
Thanks,
Atlassian Customer Support
Affected Components:
View Content
iOS App
Create and Edit
Android App
Comments
Authentication and User Management
Search
Administration
Notifications
Marketplace Apps
Purchasing & Licensing
Signup
Confluence Automations
Cloud to Cloud Migrations - Copy Product Data
Server to Cloud Migrations - Copy Product Data
Multiple products impacted by search failures
4/8/2026, 5:41:18 AM
### Summary
On April 8, 2026, between 04:46 UTC and 12:09 UTC, search functionality was unavailable or degraded across several Atlassian Cloud products, including Jira, Confluence, Jira Service Management, Rovo, Rovo Dev, Loom, Guard Standard, Customer Service Management and Atlassian Administration.
A configuration change increased the resources reserved for a core system component that runs on nodes in our compute platform. On a subset of clusters configured for high‑density workloads, the increased reservations exceeded available node capacity interrupting search and related experiences for affected customers.
The root cause was identified and a rollback was merged at 05:42 UTC with some systems seeing recovery by 07:33 UTC**.** Core search functionality was restored approximately by 08:55 UTC, and full downstream recovery completed by 12:09 UTC.
### **IMPACT**
During the impact period, some customers experienced outages or degradation in search across Jira, Confluence, Jira Service Management, Rovo, Rovo Dev, Loom, Guard Standard, Customer Service Management and Atlassian Administration. Other experiences that rely on search such as quick find, navigation, AI assistants, dashboards, were also intermittently affected during this period.
Impacted customers may have been unable to find pages or recordings and experienced degraded performance in finding issues; received empty or delayed search results; or experienced AI assistants and dashboards that could not retrieve relevant context.
**Jira, Jira Service Management and Customer Service Management:** Search and experiences that depend on search like finding issues and agent responses in CSM remained available but with degraded performance in fallback mode. By 12:09 UTC, search indexes and search performance was fully restored from fallback to full capacity across all regions.
**Guard Standard and Atlassian Administration:** Search functionality was unavailable for parts of the incident window. As a result, Domain Claims, usage tracking, and managed accounts were degraded for portions of the window. These services were restored to operational status by 07:33 UTC. Guard Premium was not impacted by this issue.
**Confluence:** Search functionality was unavailable for parts of the incident window. Recovery began at 07:30 UTC as backend search clusters were restored. Full recovery, including search index replay, completed at 11:37 UTC.
**Loom:** Search functionality and some experiences that rely on Confluence Search, such as sharing to spaces\) was unavailable for portions of the window and fully restored at 11:37 UTC.
**Rovo and Rovo Dev:** Rovo agents remained responsive but experienced degraded functionality due to loss of search capabilities in underlying services. They were unable to reliably return context about work items or pages. Functionality was fully restored at 11:37 UTC.
### **ROOT CAUSE**
Atlassian products rely on OpenSearch clusters to power their search capabilities including issue search, content search, and AI-powered search features.
An infrastructure configuration change increased resource reservations \(CPU & Memory\) for a system component that runs across our compute platform. On a subset of clusters configured for high-density workloads, the increased reservations exceeded available node capacity. This caused search workloads to be evicted and, in some clusters, could not reschedule onto any available nodes impacting search functionality across affected products.
The change was deployed across multiple production clusters in a short time frame, limiting the opportunity to detect the capacity conflict in a smaller subset of clusters before it reached the wider fleet. Automated scaling systems attempted to recover by provisioning additional capacity but in the worst‑affected clusters this led to runaway node scaling and exhaustion of available network resources, prolonging recovery time.
### **REMEDIAL ACTIONS PLAN & NEXT STEPS**
We understand that service disruptions impact your productivity. In addition to our existing testing and preventative processes, Atlassian is prioritizing the following actions to help reduce the likelihood and impact of similar incidents in the future and to speed up recovery when issues occur:
* **Enforce smaller deployment cohorts and larger soak for critical platform changes for these cluster types**
Implement smaller deployment cohorts, mandatory soak periods between environments, and automated health gates so that changes are validated on a limited set of clusters before being promoted more broadly.
* **Strengthen automated pre‑deploy validation for resource changes**
Add validation checks to ensure resource changes for system components are compatible with node capacity and reserved headroom, preventing system workloads from crowding out customer workloads.
* **Improve post‑deploy verification and alerting**
Enhance monitoring and post‑deployment verification to detect patterns such as spikes in pending pods, runaway node scaling, and low pod‑IP headroom closely correlated with new configuration being rolled out.
* **Align autoscaling behavior with capacity and safety limits**
Align autoscaling capacity calculations with node reservations and introduce safeguards and circuit breakers to prevent runaway scaling and to enforce safe limits on node and pod IP counts.
* **Enhance recovery automation**
Improve automation and runbooks so we can safely disable autoscaling, remove empty nodes in bulk, and restore normal operations faster across multiple clusters in parallel.
We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability and to reduce the risk and impact of similar issues in future.
Thanks,
Atlassian Customer Support
Affected Components:
View Content
iOS App
Create and Edit
Android App
Comments
Authentication and User Management
Search
Administration
Notifications
Marketplace Apps
Purchasing & Licensing
Signup
Confluence Automations
Cloud to Cloud Migrations - Copy Product Data
Server to Cloud Migrations - Copy Product Data
Static macro rendering is failing in Confluence Cloud
3/12/2026, 11:42:39 AM
On March 12, 2026, affected Confluence Cloud users may have experienced some service disruption. The issue has now been resolved, and the service is operating normally for all affected customers.
Affected Components:
View Content
iOS App
Create and Edit
Android App
Comments
Authentication and User Management
Search
Administration
Notifications
Marketplace Apps
Purchasing & Licensing
Signup
Confluence Automations
Cloud to Cloud Migrations - Copy Product Data
Server to Cloud Migrations - Copy Product Data
Automation events delayed for some customers in the APAC region
3/10/2026, 12:15:49 AM
On 9 March UTC, Automation users in the APAC region may have experienced performance degradation within Jira, Jira Product Discovery, Jira Service Management, Jira Work Management, Confluence. The issue has now been resolved, and the service is operating normally for all affected customers.
Affected Components:
View Content
iOS App
Create and Edit
Android App
Comments
Authentication and User Management
Search
Administration
Notifications
Marketplace Apps
Purchasing & Licensing
Signup
Confluence Automations
Cloud to Cloud Migrations - Copy Product Data
Server to Cloud Migrations - Copy Product Data