ScaleMatrix Power Incident Update

** US-West-O1 San Diego Site Outage ** Tertiary Update **

As of early this morning, all data center infrastructure is back online, including cooling and UPS (both A & B systems), and N+1 generator protection is enabled. The secondary generator will undergo emergency parts replacement tomorrow. A notice will be sent as soon as this unit is placed back into service.

Phone, email, and ScalePanel Client Portal services are online, and an incident tracking ticket has been opened with the various updates which were sent throughout the evening via email or other means.  Please use this ticket for any inquiries or questions related to this event.

The service interruption at the data center caused connectivity and data storage issues within the hosted cloud and managed services platforms. Those solutions, albeit more slowly than preferred, are now coming back online – and we being run through a post incident QA processes.  Clients are encouraged to call or open a ticket should they continue to see any impact to their services.

ScaleMatrix has invested heavily in power infrastructure, with both UPS and Generator systems being deployed new within the last two years, and each being from top of the line manufacturers. Each component undergoes regular testing and maintenance, and has provided reliable protection against a variety of SDGE circuit outages since being brought online in 2019. We are investigating whether the severity of the high voltage outage had any bearing on the incident and its outcome.

While we have provided some preliminary findings in the previous update, we will work through the weekend to identify root cause, and to decide on both restorative and preventative paths forward.

You will be contacted by your account manager within the next 2 – 3 business days, and our facilities and management team will be available to address any concerns which remain following the issuance of the RCA next week. Do not hesitate to reach out at your convenience, and please accept our most sincere apologies for any impact you may have experienced.

 

** US-West-01 San Diego Site Outage ** Secondary Update **

Utility service to the site has been restored, A + B power legs within the data center are online, A side UPS protection is online with B side UPS remaining in bypass. Cooling infrastructure is back online. Due to the nature of the issue, extensive checks of all physical infrastructure were being conducted prior to allowing customer access back on the data center floor. Access is now being granted. Electrical teams and 3rd party vendors have identified a blown component in one of the two core backup generators. This component issue appears to have caused a failure of this generator approximately 20 minutes after the initial utility power loss at the site. Engineers are currently examining an ATS issue, related to an upcoming power upgrade project which may have caused a communication issue with the secondary backup generator. This is extremely preliminary information, which will be fully vetted as part of the RCA effort.  While failover of our corporate website worked as intended, a data base issue occurred within our ScalePanel Client Portal, which prevented that platform and other communication tools (phone service) from failing over to our secondary site in Dallas. These issues are being resolved at this time. We apologize for the impact this outage has caused, and will provide further updates tomorrow with additional findings, while we aim to deliver a full incident RCA by Monday July 12th. Please call or open a ticket within ScalePanel so that our team members can assist with any service impact you may be experiencing.

 

** US-West-01 San Diego Site Outage ** Initial Update **

High voltage utility services in the area surrounding the US-West-O1 San Diego Data Center experienced a major outage, involving both high voltage service and an adjacent water supply.  Emergency backup systems at the San Diego Data Center did not function correctly, and critical power was lost.  Emergency crews (SDGE, Fire, and Emergency) responded to the area due to the magnitude of the SDGE outage, and at this time, SDGE power remains offline.  Attempts to reinitialize critical backup systems at the site took longer than expected – the cause of this is being investigated.  Engineering and emergency teams are working to fully resolve the issue as rapidly as possible.  Customer portal access is being restored.  With power services being reinitialized, storage, cloud, and other supporting services will begin coming online shortly.  Additional updates will be provided when all systems are running as designed, so that clients may validate the uptime of their specific services, sites, and platforms.

Davin Roos
Follow Us

Leave a Comment

Scroll to Top