Should Companies Be Held Liable For Software Flaws?

Following the CrowdStrike event two weeks ago, there has been an interesting exchange between Delta Airlines and CrowdStrike. In particular, Delta has threatened to sue CrowdStrike to pursue compensation for the estimated $500M of losses allegedly incurred during the outage. CrowdStrike has recently hit back at Delta claiming the airline’s recovery efforts took far longer than their peers and other companies impacted by the outage. This entire exchange prompts some interesting questions about whether a technology company should be held liable for flaws in their software and where the liability should start and end.

Strategic Technology Trends

Software quality, including defects that lead to vulnerabilities, has been identified as a strategic imperative according to CISA and the Whitehouse in the 2023 National Cybersecurity Strategy. Specifically, the United States wants to “shift liability for software products and services to promote secure development practices” and it would seem the CrowdStrike event falls into this category of liability and secure software development practices.

In addition to strategic directives, I am also seeing companies prioritize speed to market over quality (and even security). In some respects it makes sense to prioritize speed, particularly when pushing updates for new detections. However, there is clearly a conflict in priorities when a company optimizes for speed over quality for a critical detection update that causes an impact larger than if the detection update had not been pushed at all. Modern cloud infrastructure and software development practices prioritize speed to market over all else. Hyperscale cloud providers have made a giant easy button that allows developers to consume storage, network and compute resources without consideration for the down stream consequences. Attempts by the rest of the business to introduce friction, gates or restrictions on these development processes are met with derision and usually follow accusations of slowing down the business or impeding sales. Security often falls in this category of “bad friction” because they are seen as the “department of no”, but as the CrowdStrike event clearly shows, there needs to be a balance between speed and quality in order to effectively manage risk to the business.

One last trend is the reliance on “the cloud” as the only BCP / DR plan. While cloud companies certainly market themselves as globally available services, they are not without their own issues. Cloud environments still need to follow IT operations best practices by completing a business impact analysis and implementing a BCP / DR plan. At the very least, cloud environments should have a rollback option in order to revert to the last known good state.

…as the CrowdStrike event clearly shows, there needs to be a balance between speed and quality in order to effectively manage risk to the business.

What Can Companies Do Differently?

Companies that push software updates, new services or new products to their customers need to adopt best practices for quality control and quality assurance. This means rigorously testing your products before they hit production to make sure they are as free of defects as possible. CrowdStrike clearly failed to properly test their update due to a claimed flaw in their testing platform. While it is nice to know why the defect made it into production, CrowdStrike still has a responsibility to make sure their products are free from defects and should have had additional testing and observability in place.

Second, for critical updates (like detections), there is an imperative by companies to push the update globally as quickly as possible. Instead, companies like CrowdStrike should prioritize customers in terms of industry risk. They should then create a phased rollout plan that stages their updates with a ramping schedule. By starting small, monitoring changes and then ramping up the rollout, CrowdStrike could have minimized the impact to a handful of customers and avoided a global event.

Lastly, companies need to implement better monitoring and BCP / DR for their business. In the case of CrowdStrike, they should have had monitoring in place that immediately detected their products going offline and they should have had the ability to roll back or revert to the last known good state. Going a step further they could even change the behavior of their software where instead of causing a kernel panic that crashes the system, the OS recovers gracefully and automatically rolls back to the last known good state. However, the reality is sophisticated logic like this costs money to develop and it is difficult for development teams to justify this investment unless the company has felt a financial penalty for their failures.

Cloud environments still need to follow IT operations best practices by completing a business impact analysis and implementing a BCP / DR plan.

Contracts & Liability

Speaking of financial penalties, the big question is whether or not CrowdStrike can be held liable for the global outage. My guess is this will depend on what it says in their contracts. Most contracts have a clause that limits liability for both sides and so CrowdStrike could certainly face damages within those limits (probably only a few million at most). It is more likely CrowdStrike will face losses for new customers and existing customers that are up for contract renewal. Some customers will terminate their contracts. Others will negotiate better terms or expect larger discounts on renewal to make up for the outage. At most this will hit CrowdStrike for the next 3 to 5 years (depending on contract length) and then the pricing and terms will bounce back. It will be difficult for customers to exit CrowdStrike en masse because it is already a sunk cost and companies wont want to spend the time or energy to deploy a new technology. Some of the largest customers may have the best terms and ability to extract concessions from CrowdStrike, but overall I don’t think this will impact them for very long and I don’t think they will be held legally liable in any material sense.

Delta Lags Industry Standard

If CrowdStrike isn’t going to be held legally liable, what happens to Delta and their claimed lost $500M? Let’s look at some facts. First, as CrowdStrike has rightfully pointed out, Delta lagged the world for recovering from this event. They took about 20 times longer to get back to normal operations than other airlines and large companies. This points to clear underinvestment in identifying critical points of failure (their crew scheduling application) and developing sufficient plans to backup and recover if critical parts of their operation failed.

Second, Delta clearly hasn’t designed their operations for ease of management or resiliency. They have also failed to perform an adequate Business Impact Analysis (BIA) or properly test their BCP / DR plans. I don’t know any specifics about their underlying IT operations, but a few recommendations come to mind such as implementing active / active instances for critical services and moving to thin clients or PXE boot for airport kiosks and terminals. Remove the need for a human to touch any of these systems physically, and instead implement processes to remotely identify, manage and recover these systems from a variety of different failure scenarios. Clearly Delta has a big gap in their IT Operations processes and their customers suffered as a result.

Wrapping Up

What the CrowdStrike event highlights is the need for companies to prioritize quality, resiliency and stability over speed to market. The National Cybersecurity Strategy has identified software defects as a strategic imperative because they lead to vulnerabilities, supply chain compromise and global outages. Companies with the size and reach of CrowdStrike can no longer afford to prioritize speed over all else and instead need to shift to a more mature and higher quality SDLC. In addition, companies that use popular software need to consider diversifying their supply chain, implementing IT operations best practices (like SRE) and implementing a mature BCP and DR plan on par with industry standards.

What the CrowdStrike event highlights is the need for companies to prioritize quality, resiliency and stability over speed to market.

When it comes to holding companies liable for global outages, like the one two weeks ago, I think it will be difficult for this to play out in the courts without resorting to a legal tit-for-tat that no one wins. Instead, the market and customers need to weigh in and hold these companies accountable through share prices, contractual negotiation or even switching to a competitor. Given the complexity of modern software, I don’t think companies should be held liable for software flaws because it is impossible to eliminate all flaws. Additionally, modern SDLCs and CI/CD pipelines are exceptionally complex and this complexity can often result in failure. This is why BCP/DR and SRE is so important, so you can recover quickly if needed. Yes, CrowdStrike could have done better, but clearly Delta wasn’t even meeting industry standards. Instead of questioning whether companies should be held liable for software flaws, a better question is: At what point does a company become so essential that they by default become critical infrastructure?

What’s Better – Complete Coverage With Multiple Tools Or Partial Coverage With One Tool?

The debate between complete coverage with multiple tools versus imperfect coverage with one tool regularly pops up in discussions between security professionals. What we are really talking about is attempting to choose between maximum functionality and simplicity. Having pursued both extremes over the course of my security career I offer this post to share my perspective on how CISOs can think about navigating this classic tradeoff.

In Support Of All The Things

Let’s start with why you may want to pursue complete coverage by using multiple technologies and tools.

Heavily Regulated And High Risk Industries

First, heavily regulated and high risk businesses may be required to demonstrate complete coverage of security requirements. These are industries like the financial sector or government and defense. (I would normally say healthcare here, but despite regulations like HIPAA the entire industry has lobbied against stronger security regulations and this has proven disastrous via major incidents like the Change Healthcare Ransomware Attack). The intent behind any regulation is to establish a minimum set of required security controls businesses need to meet in order to operate in that sector. It may not be possible to meet all of these regulatory requirements with a single technology and therefore, CISOs may need to evaluate and select multiple technologies to meet the requirements.

Defense In Depth

Another reason for selecting multiple tools is to provide defense in depth. The thought process is: multiple tools will provide overlap and small variances in how they meet various security controls. These minor differences can offer defenders an advantage because if one piece of technology is vulnerable to an exploit, another piece of technology may not be vulnerable. By layering these technologies throughout your organization you reduce the chances an attacker will be successful.

An example of this would be if your business is protected from the internet by a firewall made by Palo Alto. Behind this PA firewall is a DMZ and the DMZ is separated from your internal network by a firewall from Cisco. This layered defense will make it more difficult for attackers to get through the external firewall, DMZ, internal firewall and into the LAN. (See image below for a very simplistic visual)

Downside Of All The Things

All the things may sound great, but unless you are required to meet that level of security there can be a lot of downsides.

First, multiple technologies introduce complexity into an environment. This can make it more difficult to troubleshoot or detect issues (including security events). It can also make it more difficult to operationally support these technologies because they may have different interfaces, APIs, protocols, configurations, etc. It may not be possible to centrally manage these technologies, or it may require the introduction of an additional technology to manage everything.

Second, all of these technologies can increase the number of people required to support them. People time can really add up as a hidden cost and shouldn’t be thrown away lightly. People time starts the second you begin discussing the requirements for a new technology and can include the following:

  • Proof of Concepts (PoCs)
  • Tradeoff & Gap Analysis
  • Requests for Information (RFI)
  • Requests for Proposal (RFP)
  • Requests for Quotes (RFQ)
  • Contract Negotiation
  • Installation
  • Integration
  • Operation & Support

Finally, multiple technologies can cause performance impacts, increased costs and waste. Performance impacts can happen due to differences in technologies, complexity, configuration errors or over consumption of resources (such as agent sprawl). Waste can happen due to overlap and duplicated functionality because not all of the functionality may not get used despite the fact you are paying for it.

Advantages and Disadvantages Of A Single Tool

A single tool that covers the majority, but not all, of your requirements offers one advantage – simplicity. This may not sound like much, but after years of chasing perfection, technology simplicity can have benefits that may not be immediately obvious.

First, seeking out a single tool that meets the majority of requirements will force your security team to optimize their approach for the one that best manages risk while supporting the objectives of the business. Second, a single tool is easier to install, integrate, operate and support. There is also less demand on the rest of the business in terms of procurement, contract negotiation and vendor management. Lastly, a single tool requires less people to manage it and therefore you can run a smaller and more efficient organization.

The biggest disadvantage of a single tool is it doesn’t provide defense in depth. One other disadvantage is it won’t meet all of your security requirements and so the requirements that aren’t met should fall within the risk tolerance of the business or somehow get satisfied with other compensating controls.

A single tool that covers the majority, but not all, of your requirements offers one advantage – simplicity.

Wrapping Up

There are a lot of advantages to meeting all of your requirements with multiple tools, but these advantages come with a tradeoff in terms of complexity, operational overhead, duplicated functionality and increased personnel requirements. If you operate a security program in a highly regulated or highly secure environment you may not have a choice so it is important to be aware of these hidden costs. A single tool reduces complexity, operational overhead and personnel demands, but can leave additional risk unmet and fails to provide defense in depth. Generally, I favor simplicity where possible, but you should always balance the security controls against the risk tolerance and needs of the business.

Security Vendor Questionnaires: Too Much or Not Enough?

Over the past few years there has been an increasing trend for customers and partners to request security teams to fill out lengthy security questionnaires seeking specific details about the state of their security program. These requests often come as part of routine audits, regulatory requirements or contract negotiations. As someone who has both sent out questionnaires and been a recipient of questionnaires I’m wondering if the industry has gone too far down this trend or if it hasn’t gone far enough? Let me explain…

As a CSO, I want to discover and manage as much risk as possible. This includes conducting business with partners, customers and other companies. I want to understand my supply chain and limit my exposure to any of their security weaknesses that could be used to attack my company. However, I also want to limit the amount of information I disclose about my security program because once I disclose it, I no longer control that information and it could eventually make its way to an adversary if the company I disclosed it to has a breach.

How do we balance these differing requirements and are security questionnaires really the best mechanism for understanding your supply chain?

How Did We Get Here?

Let’s take a step back and consider how we collectively arrived at the need for security questionnaires. There have been several high profile breaches that have set us down this path. The first was the Target breach in 2013 where Target had their point of sale systems compromised as a result of a third party HVAC vendor. The magnitude of this breach along with the realization that Target was compromised via a third party placed a spotlight on supply chain security for the entire industry.

The second high profile breach was the Solar Winds attack in 2020. This attack infiltrated the software supply chain of Solar Winds and placed a backdoor in the product. Given that Solar Winds was used by a huge number of companies this effectively compromised the software supply chain of those companies as well. This attack increased the scrutiny on the supply chain with additional emphasis on software supply chain and even leading to some sectors (like the government) requiring disclosure of a Software Bill of Materials (SBOM).

Increased Regulatory Pressure

These notable attacks (among others) have lead to an increase in regulations that force companies to disclose details around security breaches, but also to invest appropriately in security programs. Despite these investments and disclosures, companies can still face steep fines and costly lawsuits for security breaches. New regulations such as the SEC Cyber Risk Management rules and recent White House Executive Orders on Improving Cybersecurity and establishing a National Cybersecurity Strategy have elevated awareness and focus on supply chain security to the national stage.

Cyber Insurance Isn’t Helping

As Cybersecurity insurance premiums become more and more expensive, companies will continue to look for ways to decrease the cost, while still maintaining coverage. One of the most effective ways to do this is to establish and document a mature security program that you review in detail with your insurer to explain your risks and how you are mitigating them with appropriate controls. Questionnaires are one part of a security program that can demonstrate how you are evaluating and managing supply chain risk and hopefully drive down your premiums (for now). The problem is this creates an incentive where it is everyone for themselves in an attempt to lower their own premiums.

Transparency Is Lacking

The biggest issue security questionnaires are attempting to address is lack of transparency into the details of the security programs. In general, large publicly traded companies (particular cloud companies) and security product companies tend to be more transparent about security because it is built into their brand as a selling point to attract customers. However, details about technologies, program structure, response times, etc. generally lack specificity (for good reason) and the security questionnaire is an attempt to uncover those details to understand what risks exist when entering into a relationship with another company.

You might argue that companies should simply be more transparent with details about their security program, but this is not the solution. Companies should cover high level details with some specificity to demonstrate they have a security program and how it is structured. However, giving specific details about processes, response times, technologies, etc. will reveal details that can be used by an adversary for an attack. Additionally, do we really know what is happening with all this data from security questionnaires? It may be protected under Non-Disclosure Agreements (NDAs) and confidentiality agreements, but that doesn’t prevent the data from being leaked via an unprotected S3 bucket. It is extremely difficult to change a security program quickly and so it may be in the best interest of the responding company to refuse to answer the questionnaire and instead have an undocumented conversation (depending on your level of paranoia).

Yet More Audit and Regulatory Pressure

On top of all these issues, there are still very real requirements to respond to audits, questions from regulators or to provide these responses to your customers and partners that operate in heavily regulated industries (finance, healthcare, government, etc.). Responding to the questionnaires still takes time and places a burden on your security team and still comes with the risk that the information could be involuntarily disclosed to an adversary.

What’s Really Going On Here?

Responding to regulatory and audit requirements aren’t new requirements for our industry and so answering security questionnaires has been the norm for quite some time. However, I think the use of the security questionnaire has been hijacked by the industry as a catch-all way to accomplish a few things with respect to security:

  1. Assert your security requirements over another company (may work for small companies if they can fund it, but generally doesn’t work for large companies with mature programs).
  2. Minimize risk of doing business with and potentially pass on liability to the recipient company. I.e. “We asked them if they did this thing, but we got breached because of them so they clearly didn’t do that thing so it is their fault.”
  3. Create negotiating points as part of contract negotiations for concessions.

The problem with these is they are attempting to impose a solution or liability on a program they don’t control. As a CSO I can’t agree to these things because being contractually obligated to a specific security solution or SLAs removes my decision making power for how to best manage that risk within my security program. Security programs change and are constantly adapting to stay ahead of threats and risks to the business. Being boxed into a solution contractually can actually create a risk, where there wouldn’t normally be one.

Fatigue Is Real

I genuinely struggle with this problem because security questionnaires have their uses, but are causing real fatigue across the industry. The questions fall into one of two categories: either they are largely the same across customers or they are completely bonkers and don’t justify a response. As both a sender and recipient of questionnaires I can definitely understand both sides of the issue. I want as much information about my supply chain customers, but want to minimize the specifics that I share outside of my control. I want lower cyber insurance premiums and I want to pass all my audits and regulatory inquiries. However, I think the industry has deviated from the original intent of the security questionnaire due to the real fear of being held liable for a failure in their security program, which includes their supply chain.

Vendors: If You Want To Reach CISOs Stop All The Noise

I had an interesting conversation with a bunch of CISO friends last week about the current problem with vendor communications and particularly how there is a high volume of noise in the industry right now. The current problem is predicated on the assumption that more volume of communication will lead to sales leads, which will lead to an actual deal. I’ve been both on the sales side and the corporate side and I can tell you that vendors view it as a numbers game. However, I’m here to tell you the numbers game isn’t helping you achieve your goals because you are annoying the very people you are trying to connect with.

Bad Behavior

Most CISOs and security professionals I know don’t start out having a contentious relationship with vendors, but once you achieve a specific level and title, the calls, texts and SPAM inevitably start. If you are a vendor, here are a few things guaranteed to get you blacklisted from the people you are trying to reach:

  • Claiming you were referred by someone else in the company (when you weren’t. Yes we check.)
  • Sending repeated SPAM (it is unsolicited for a reason and we have no obligation to respond so stop it.)
  • Calling our personal cell phones (at any time)
  • Reaching out to our boss and saying we aren’t responding to you (you’re kidding right?)
  • Sending unsolicited texts
  • Sending unsolicited gifts

I know the behavior above is referring to a small minority of vendors, but the problem is the volume of noise from these vendors drowns out any legitimate outreach from anyone else.

What Can CISOs Do To Cut Down On The Noise?

If you have a public profile on any social media platform it is inevitable vendors will find you. My suggestion is to use a custom email address for each platform so you know if the social media platform sells or leaks your information. You can do the same thing for conferences or if you fill in your email for webinars and other vendor activities. If you don’t have the ability to create custom email addresses you can also use throw away or temporary free email services. If the vendor won’t accept these temp email services then move on…trust me that white paper isn’t worth it.

Controlling your email is not the only way to cut down on the noise. If vendors, conferences or other activities require you to input your work address you can use the primary address field or secondary address field to list the name of the vendor. This way if they send you junk mail or gifts you know who it came from.

Aside from controlling your information, I know several CISOs that set specific rules for how to engage with them. These are typically rules like only going through a certain Value Added Reseller (VAR) and only coming to the office and presenting on certain days (like the last Friday of every month). These specific days are good ways for security groups and CISOs to learn about a lot of different technologies at once.

What Can Vendors Do To Reach CISOs?

The biggest thing vendors can do to reach CISOs is respect boundaries. Treat CISOs and their security teams the same way you want to be treated. Most importantly, no one wants to feel like a transaction. The security industry is small and if you truly want results, cultivate lasting relationships that can follow you wherever you go. If you abuse people’s time and attention you will get blacklisted. We all talk and word will get around quickly.

The most effective way for vendors to connect with CISOs is to go to vendor days or partner with VARs and get on their docket for innovation days where the VAR shows you off along with a list of other companies that are of interest to the CISO.

Wrapping Up

Being a CISO is stressful enough without getting harassed constantly by vendors. CISOs need to set rules for how vendors can engage with them so vendors know not to resort to bad behavior to get their attention. On the other hand, vendors need to respect boundaries and cultivate lasting relationships independent of the technology company they represent. If you resort to bad behavior you will be blacklisted and word will get around not to do business with you.