#risk – 370 Security Blog

Navigating The First 90-180 Days In A New CISO Role

Late one Friday afternoon a call comes in and you find out you landed your next CISO role. All the interview prep, research, networking and public speaking has paid off! Then it dawns on you that you could be walking into a very difficult situation over the next few months. Even though the interview answered a lot of questions, you won’t know the reality of the situation until you start. How will your expectations differ from reality? What can you do to minimize risk as you come up to speed? How should you navigate these first 90-180 days in your new role?

Prior To Starting

Let’s assume you have some time to wind down your current position and you are also going to take some time off before starting the new role. During this transition period I highly advise you reach out to your peers in the new role and start asking questions to get more detail about the top challenges and risks you need to address. Start with the rest of the C-Suite, but also get time with board members and other senior business leaders to get their perspectives. Focus on building rapport, but also gather information to build on what you learned during the interview process so you can hit the ground running.

You can also use this time to reach out to your CISO peers in your network who are in the same industry, vertical or company type to get their perspective on what they did when they first joined their company. Learn from their experience and try to accelerate your journey once you start. Keep the lines of communication open so if you run into a situation you are unsure of you can ask for advice.

Once You Start

Build Relationships

First and foremost, start building relationships as quickly as possible. Target senior leadership first, such as board members, the C-Suite and other senior leaders. Work your way down by identifying key influencers and decision makers throughout the org. Play the “new person card” and ask questions about anything and everything. Gain an understanding of the “operational tempo” of the business such as when key meetings take place (like board meetings). Understand the historical reasons why certain challenges exist. Understand the political reasons why challenges persist. Understand the OKRs, KPIs and other business objectives carried by your peers. Learn the near and long term strategy for the business. Start building out a picture of what the true situation is and how you want to begin prioritizing.

Understand the historical reasons why certain challenges exist. Understand the political reasons why challenges persist.

Plan For The Worst

Don’t be surprised if you take a new role and are immediately thrown into an incident or other significant situation. You may not have had time to review playbooks or processes, but you can still fall back on your prior experience to guide the team through this event and learn from it. Most importantly, you can use this experience to identify key talent and let them lead, while you observe and take notes. You can also use your observation of the incident to take notes on things that need to be improved such as interaction with non-security groups, when to inform the board, how to communicate with customers or how to improve coordination among your team.

Act With Urgency

Your first few months in the role are extremely vulnerable periods for both you and the company. During this period you won’t have a full picture of the risks to the business and you may not have fully developed your long term plan. Despite these challenges, you still need to act with urgency to gain an understanding of the business and the risk landscape as quickly as possible. Build on the existing program (if any) to document your assumptions, discoveries, controls and risks so you can begin to litigation proof your org. Map the maturity of security controls to an industry framework to help inform your view of the current state of risk at the company. Begin building out templates for communicating your findings, asks, etc. to both the board and your peers. Most importantly, the company will benefit from your fresh perspective so be candid about your findings and initial recommendations.

Evaluate The Security Org

In addition to the recommendations above, one of the first things I like to do is evaluate the org I have inherited. I try to talk to everyone and answer a few questions:

Is the current org structure best positioned to support the rest of the business?
How does the rest of the business perceive the security org?
Where do we have talent gaps in the org?
What improvements do we need to make to culture, diversity, processes, etc. to optimize the existing talent of the org?

Answering these questions may require you to work with your HR business partner to build out new role definitions and career paths for your org. You may also need to start a diversity campaign or a culture improvement campaign within the security org. Most importantly, evaluate the people in your org to see if you have the right people in the right places with the right skillsets.

A Plan Takes Shape

As you glide past the 90 day mark and start establishing your position as a trusted business partner, you should arrive at a point where a clear vision and strategy is starting to take shape. Use the information you have gathered from your peers, your program documentation and your observations to start building a comprehensive plan and strategy. I’ve documented this process in detail here. In addition to building your program plan you can also begin to more accurately communicate the state of your security program to senior leaders and the board. Show how much the existing program addresses business risk and where additional investment is needed. I’ve documented a suggested process here. Somewhere between your 90 and 180 day mark you should have a formalized plan for where you are over invested, under invested or need to make changes to optimize existing investment. This could include restructuring your org, buying a new technology, adjusting contractual terms or purchasing short term cyber insurance. It could even include outsourcing key functions of the security org for the short term, until you can get the rest of your program up to a certain standard. Most importantly, document how you arrived at key decisions and priorities.

Take Care Of Yourself

Lastly, on a personal note, make sure to take care of yourself. Starting a new role is hectic and exciting, but it is also a time where you can quickly overwork yourself. Remember building and leading a successful security program is a marathon not a sprint. The work is never done. Get your program to a comfortable position as quickly as possible by addressing key gaps so you can avoid burning yourself out. Try to establish a routine to allow for physical and mental health and communicate your goals to your business partners so they can support you.

During this time (or the first year) you may also want to minimize external commitments like dinners, conferences and speaking engagements. When you start a new role everyone will want your time and attention, but be cautious and protective of your time. While it is nice to get a free meal, these dinners can often take up a lot of time for little value on your end (you are the product after all). Most companies have an active marketing department that will ask you to engage with customers and the industry. Build a good relationship with your marketing peers to interweave customer commitments with industry events so you are appropriately balancing your time and attending the events that will be most impactful for the company, your network and your career.

Wrapping Up

Landing your next CISO role is exciting and definitely worth celebrating. However, the first 90-180 days are critical to gain an understanding of the business, key stakeholders and how you want to start prioritizing activities. Most importantly, build relationships, act with urgency and document everything so you can minimize the window of exposure as you are coming up to speed in your new role.

Should Companies Be Held Liable For Software Flaws?

Following the CrowdStrike event two weeks ago, there has been an interesting exchange between Delta Airlines and CrowdStrike. In particular, Delta has threatened to sue CrowdStrike to pursue compensation for the estimated $500M of losses allegedly incurred during the outage. CrowdStrike has recently hit back at Delta claiming the airline’s recovery efforts took far longer than their peers and other companies impacted by the outage. This entire exchange prompts some interesting questions about whether a technology company should be held liable for flaws in their software and where the liability should start and end.

Strategic Technology Trends

Software quality, including defects that lead to vulnerabilities, has been identified as a strategic imperative according to CISA and the Whitehouse in the 2023 National Cybersecurity Strategy. Specifically, the United States wants to “shift liability for software products and services to promote secure development practices” and it would seem the CrowdStrike event falls into this category of liability and secure software development practices.

In addition to strategic directives, I am also seeing companies prioritize speed to market over quality (and even security). In some respects it makes sense to prioritize speed, particularly when pushing updates for new detections. However, there is clearly a conflict in priorities when a company optimizes for speed over quality for a critical detection update that causes an impact larger than if the detection update had not been pushed at all. Modern cloud infrastructure and software development practices prioritize speed to market over all else. Hyperscale cloud providers have made a giant easy button that allows developers to consume storage, network and compute resources without consideration for the down stream consequences. Attempts by the rest of the business to introduce friction, gates or restrictions on these development processes are met with derision and usually follow accusations of slowing down the business or impeding sales. Security often falls in this category of “bad friction” because they are seen as the “department of no”, but as the CrowdStrike event clearly shows, there needs to be a balance between speed and quality in order to effectively manage risk to the business.

One last trend is the reliance on “the cloud” as the only BCP / DR plan. While cloud companies certainly market themselves as globally available services, they are not without their own issues. Cloud environments still need to follow IT operations best practices by completing a business impact analysis and implementing a BCP / DR plan. At the very least, cloud environments should have a rollback option in order to revert to the last known good state.

…as the CrowdStrike event clearly shows, there needs to be a balance between speed and quality in order to effectively manage risk to the business.

What Can Companies Do Differently?

Companies that push software updates, new services or new products to their customers need to adopt best practices for quality control and quality assurance. This means rigorously testing your products before they hit production to make sure they are as free of defects as possible. CrowdStrike clearly failed to properly test their update due to a claimed flaw in their testing platform. While it is nice to know why the defect made it into production, CrowdStrike still has a responsibility to make sure their products are free from defects and should have had additional testing and observability in place.

Second, for critical updates (like detections), there is an imperative by companies to push the update globally as quickly as possible. Instead, companies like CrowdStrike should prioritize customers in terms of industry risk. They should then create a phased rollout plan that stages their updates with a ramping schedule. By starting small, monitoring changes and then ramping up the rollout, CrowdStrike could have minimized the impact to a handful of customers and avoided a global event.

Lastly, companies need to implement better monitoring and BCP / DR for their business. In the case of CrowdStrike, they should have had monitoring in place that immediately detected their products going offline and they should have had the ability to roll back or revert to the last known good state. Going a step further they could even change the behavior of their software where instead of causing a kernel panic that crashes the system, the OS recovers gracefully and automatically rolls back to the last known good state. However, the reality is sophisticated logic like this costs money to develop and it is difficult for development teams to justify this investment unless the company has felt a financial penalty for their failures.

Cloud environments still need to follow IT operations best practices by completing a business impact analysis and implementing a BCP / DR plan.

Contracts & Liability

Speaking of financial penalties, the big question is whether or not CrowdStrike can be held liable for the global outage. My guess is this will depend on what it says in their contracts. Most contracts have a clause that limits liability for both sides and so CrowdStrike could certainly face damages within those limits (probably only a few million at most). It is more likely CrowdStrike will face losses for new customers and existing customers that are up for contract renewal. Some customers will terminate their contracts. Others will negotiate better terms or expect larger discounts on renewal to make up for the outage. At most this will hit CrowdStrike for the next 3 to 5 years (depending on contract length) and then the pricing and terms will bounce back. It will be difficult for customers to exit CrowdStrike en masse because it is already a sunk cost and companies wont want to spend the time or energy to deploy a new technology. Some of the largest customers may have the best terms and ability to extract concessions from CrowdStrike, but overall I don’t think this will impact them for very long and I don’t think they will be held legally liable in any material sense.

Delta Lags Industry Standard

If CrowdStrike isn’t going to be held legally liable, what happens to Delta and their claimed lost $500M? Let’s look at some facts. First, as CrowdStrike has rightfully pointed out, Delta lagged the world for recovering from this event. They took about 20 times longer to get back to normal operations than other airlines and large companies. This points to clear underinvestment in identifying critical points of failure (their crew scheduling application) and developing sufficient plans to backup and recover if critical parts of their operation failed.

Second, Delta clearly hasn’t designed their operations for ease of management or resiliency. They have also failed to perform an adequate Business Impact Analysis (BIA) or properly test their BCP / DR plans. I don’t know any specifics about their underlying IT operations, but a few recommendations come to mind such as implementing active / active instances for critical services and moving to thin clients or PXE boot for airport kiosks and terminals. Remove the need for a human to touch any of these systems physically, and instead implement processes to remotely identify, manage and recover these systems from a variety of different failure scenarios. Clearly Delta has a big gap in their IT Operations processes and their customers suffered as a result.

Wrapping Up

What the CrowdStrike event highlights is the need for companies to prioritize quality, resiliency and stability over speed to market. The National Cybersecurity Strategy has identified software defects as a strategic imperative because they lead to vulnerabilities, supply chain compromise and global outages. Companies with the size and reach of CrowdStrike can no longer afford to prioritize speed over all else and instead need to shift to a more mature and higher quality SDLC. In addition, companies that use popular software need to consider diversifying their supply chain, implementing IT operations best practices (like SRE) and implementing a mature BCP and DR plan on par with industry standards.

What the CrowdStrike event highlights is the need for companies to prioritize quality, resiliency and stability over speed to market.

When it comes to holding companies liable for global outages, like the one two weeks ago, I think it will be difficult for this to play out in the courts without resorting to a legal tit-for-tat that no one wins. Instead, the market and customers need to weigh in and hold these companies accountable through share prices, contractual negotiation or even switching to a competitor. Given the complexity of modern software, I don’t think companies should be held liable for software flaws because it is impossible to eliminate all flaws. Additionally, modern SDLCs and CI/CD pipelines are exceptionally complex and this complexity can often result in failure. This is why BCP/DR and SRE is so important, so you can recover quickly if needed. Yes, CrowdStrike could have done better, but clearly Delta wasn’t even meeting industry standards. Instead of questioning whether companies should be held liable for software flaws, a better question is: At what point does a company become so essential that they by default become critical infrastructure?

A CISO’s Analysis Of the CrowdStrike Global Outage

Overnight from July 18 to July 19, 2024, Windows systems running CrowdStrike ceased functioning and displayed the blue screen of death (BSOD). As people woke up on the morning of July 19th they discovered a wide reaching global outage of the consumer services they rely on for their daily lives, such as healthcare, travel, fast food and even emergency services. The ramifications of this event will continue to be felt for at least the next week as businesses recover from the outage and investors react to the realization that global businesses are extremely fragile when it comes to technology and business operations.

Technical Details

An update by CrowdStrike (CS) to the C-00000291*.sys file dated 0409UTC was pushed to all customers running CS Falcon agents. This file was corrupt (reports indicate a null byte header issue) and when Windows attempted to load this file it crashed. Rebooting the impacted systems does not resolve the issue because of the way CS Falcon works. CS Falcon has access to the inner workings of the operating system (kernel) such as memory access, drivers, and registry entries that allow CS to detect malicious software and activity. The CS Falcon agent is designed to receive updates automatically in order to keep the agent up to date with the latest detections. In this case, the update file was not properly tested and somehow made it through Quality Assurance and Quality Control, before being pushed globally to all CS customers. Additionally, CrowdStrike customers are clearly running CS Falcon on production systems and do not have processes in place to stage updates to CS Falcon in order to minimize the impact of failed updates (more on this below).

Global Impact

This truly is a global outage and the list of industries is far reaching attesting to the success of CS, but also the risks that can impact your software supply chain. As of Monday, Delta airlines is still experiencing flight cancellations and delays as a result of impacts to their pilot scheduling system. The list of impacted companies can be found here, here and here, but I’ll provide a short list as follows:

Travel – United, Delta, American, major airports

Banking and Trading – VISA, stock exchanges

Emergency & Security Services – Some 911 services and ADT

Cloud Providers – AWS, Azure

Consumer – Starbucks, McDonalds, FedEx

Once the immediate global impact subsides, there will be plenty of finger pointing at CrowdStrike for failing to properly test an update, but what this event clearly shows is a lack of investment by some major global companies in site reliability engineering (SRE), business continuity planning (BCP), disaster recovery (DR), business impact analysis (BIA) and proper change control. If companies were truly investing in SRE, BCP, DR and BIA beyond a simple checkbox exercise, this failed update would have been a non-event. Businesses would have simply executed their BCP / DR plan and failed over, or immediately recovered their critical services to get back up and running (which some did). Or, if they are running proper change control along immutable infrastructure they could have immediately rolled back to the last good version with minimal impact. Clearly, more work needs to be done by all of these companies to improve their plans, processes and execution when a disruptive event occurs.

Are global companies really allowing live updates to mission critical software in production without going through proper testing? Or even better, production systems should be immutable, preventing any change to production without being updated in the CI/CD pipeline and then re-deployed. Failed updates became an issue almost two decades ago when Microsoft began patch Tuesday. Companies quickly figured out they couldn’t trust the quality of the patches and instead would test the patches in staging, which runs a duplicate environment to production. While this may have created a short window of vulnerability, it came with the advantages of stability and uninterrupted business operations.

Modern day IT Operations (called Platform Engineering or Site Reliability Engineering) now design production environments to be immutable and somewhat self healing. All changes need to be updated in code and then re-pushed through dev , test and staging environments to make sure proper QA and QC is followed. This minimizes impact from failed code pushes and will also minimize disruption from failed patches and updates like this one. SRE also closely monitors production environments for latency thresholds, availability targets and other operational metrics. If the environment exceeds a specific threshold then it throws alerts and will attempt to self heal by allocating more resources, or by rolling back to the previous known good image.

Ramifications

Materiality

Setting aside maturity of business and IT operations, there are some clear ramifications for this event. First, this had a global impact to a wide variety of businesses and services. Some of the biggest impacts were felt by publicly traded companies and as a result these companies will need to make an 8K filing with the SEC to report a material event to their business. Even though this wasn’t a cybersecurity attack, it was still an event that disrupted business operations and so companies will need to report the expected impact and loss accordingly. CrowdStrike in particular will need to make an 8K filling, not only for loss of stock value, but for expected loss of revenue through lost customers, contractual concessions and other tangible impacts to their business. When I started this post Friday of the even, CS stock was down over 10% and by Monday morning they were down almost 20%. The stock has started to recover, but that is clearly a material event to investors.

Greater Investment In BCP / DR & BIA

Recent events, such as this one and the UHC Change Healthcare ransomware attack, have clearly shown that some business are not investing properly in BCP / DR. They may have plans on paper, but plans still need to be fully tested including rapidly identifying service degradation and implementing recovery operations as quickly as possible. The reality is this should have been a non-event and any business that was impacted longer than a few hours needs to consider additional investment in their BCP / DR plan to minimize the impact of future events. CISOs need to work with the rest of the C-Suite to review existing BCP / DR plans and update them accordingly based on the risk tolerance of the business and desired RTO and RPO.

Boards Need To Step Up

During an event like this one boards need to take a step back and remember their primary purpose is to represent and protect investors. In this case, the sub-committees that govern technology, cybersecurity and risk should be asking hard questions about how to minimize the impact of future events like this and consider if the existing investment in BCP / DR technology and processes is sufficient to offset a projected loss of business. This may include more frequent reports on when the last time BCP / DR plans were properly tested and if those plans are properly accounting for all of the possible scenarios that could impact the business such as ransomware, supply chain disruption or global events like this one. The board may also push the executive staff to accelerate plans to invest in and modernize IT operations to eliminate tech debt and adopt industry best practices such as immutable infra or SRE. The board may also insist on a detailed analysis of the risks of the supply chain, including plans to minimize single points of failure, while limiting the blast radius of future events.

Negative Outcomes

Unfortunately, this event is likely to cause a negative perception of cybersecurity in the short term for a few different reasons. First, the obvious business disruption is one people will be questioning. How, is it a global cybersecurity company is able to disrupt so much with a single update? Could this same process act as an attack vector for attackers? Reports are already indicating that malicious domains have been set up to look like the fix for this event, but instead push malware. There are also malicious domains that have been created for phishing purposes and the reality is any company impacted by this event may also be vulnerable to ransomware attacks, social engineering and other follow on attacks.

Second, this event may cause a negative perception of automatic updates within the IT operations groups. I personally believe this is the wrong reaction, but the reality is some businesses will turn off the auto-updates, which will leave them more vulnerable to malware and other attacks.

The reality is this should have been a non-event and any business that was impacted longer than a few hours needs to consider additional investment in their BCP / DR plan to minimize the impact of future events.

What CISOs Should Do

With all this in mind, what should CISOs do to help the board, the C-Suite and the rest of the business navigate this event? Here are my suggestions:

First, review your contractual terms with 3rd party providers to understand contractually defined SLAs, liability, restitution and other clauses that can help protect your business due to an event caused by a third party. This should also include a risk analysis of your entire supply chain to determine single points of failure and how to protect your business appropriately.

Second, insist on increased investment in your BIA, BCP and DR plans including designing for site reliability and random events (chaos monkey) to proactively identify and recover from disruption, including review of RTO and RPO. If your BCP / DR plan is not where it needs to be, it may require investment in a multi-year technology transformation plan including resolving legacy systems and tech debt. It may also require modernizing your SDLC to shift to CI/CD including dev, test, staging and prod environments that are tightly controlled. The ultimate goal will be to move to immutable infrastructure and IT operations best practices that allow your services to operate and recover without disruption. I’ve captured my thoughts on some of the best practices here.

Third, resist the temptation to over react. The C-Suite and investors are going to ask some hard questions about your business and they will suggest a wide range of solutions such as turning off auto-patches, ripping out CS or even building your own solution. All of these suggestions have a clear tradeoff in terms of risk and operational investment. Making a poor, reactive, decision immediately after this event can harm the business more than it can help.

Finally, for mission critical services consider shifting to a heterogeneous environment that statistically minimizes the impact of any one vendor. The concept is simple, if you need an security technology to protect your systems consider purchasing multiple vendors that all have similar capabilities, but will minimize the impact of your business operations if one of them has an issue. This obviously raises the complexity and operational cost of your environment and should only be used for mission critical or highly sensitive services that need to absolutely minimize any risk to operations. However, this event does highlight the risks of consolidating to a single vendor and you should conduct a risk analysis to determine the best course of action for your business and supply chain.

Wrapping Up

For some companies this was a non-event. Once they realized there was an outage they simply executed their recovery plans and were back online relatively quickly. For other companies, this event highlighted lack of investment in IT operations fundamentals like BCP / DR or supply chain risk management. On the positive side, this wasn’t a ransomware or other cybersecurity attack and so recovery is relatively straightforward for most businesses. On the negative side, this event can have negative consequences if businesses over react and make poor decisions. As a CISO, I highly recommend you take advantage of this event to learn from your weaknesses and make plans to shore up aspects of your operations that were sub-standard.

How Should CISOs Think About Risk?

There are a lot of different ways for CISOs to think about and measure risk, which can be bucketed into two different categories. Qualitative measurement, which is a subjective measurement that follows an objective process or quantitative measurement, which is an objective measurement grounded in dollar amounts. Quantitative risk measurement is what CISOs should strive to achieve for a few reasons. One, it grounds the risk measurement in objective numbers which removes people’s opinions from the calculation; two, it assesses risk in terms of dollar amounts, which is useful for communicating to the rest of the business; and three, it can highlight areas of immaturity across the business if they are unable to quantify how their division contributes to the overall bottom line of the company. In this post I want to explore how CISOs should think about quantitatively measuring risk and in particular, measuring mitigated, unmitigated and residual risk for the business.

Where should you start?

A good place to start is with an industry standard risk management framework like NIST 800-37, CIS RAM or ISO 31000 and for the purposes of this post I’ll stick with the NIST 800-37 to be consistent. In order for CISOs to obtain a qualitative risk assessment from the NIST 800-37 they need to add a step into the categorize step by working with finance and the business owners to understand the P&L of the system(s) they are categorizing. The first step is to go through every business system and get a dollar amount (in terms of revenue) for how much the systems(s) contribute to the overall bottom line of the business.

Internal and External Security Costs

After you get a revenue dollar amount for every set of systems, you now need to move to the assess stage of the NIST 800-37 RMF to determine which security controls are in place to protect the systems, how much they cost and ultimately what percentage the security controls cover. There are two categories of security controls and costs you will need to build a model for. The first category is internal costs, which includes:

Tooling and technology
Licenses
Training
Headcount (fully burdened cost)
Travel
R&D
Technology operating costs (like cloud costs directly attributable to security tooling, etc.)

The second category is external costs, which includes:

3rd party penetration tests
Audits
Managed Security Service Provider (MSSP) costs
Insurance

As you fill in the costs or annual budget for each of these items you can map the coverage of these internal and external costs to your business to determine the total cost of your security program and how much risk the program is able to cover (in terms of a percentage).

Mapping Risk Coverage

Once you have all of these figures you can start to map risk coverage to determine if your security program is effectively protecting the business. Let’s say your business generates $1B in annual revenue. Your goal as a CISO is to maintain a security program that provides $1B of risk coverage of the business. Or, if you are unable to provide total coverage, then you need to communicate which parts of the business are not protected so the rest of the C-Suite and board can either accept the risk or approve additional funding.

As a simple example, let’s say you spend $1M/year on a SIEM tool, which takes 6 people to operate and maintain. The total cost of the 6 people is approximately $6M / yr (including benefits, etc.). The SIEM and people provide 100% monitoring coverage for the business and the SIEM and people can be mapped to 20% of your security controls in NIST. I’m skipping a lot of details for simplicity, but for a $1B business this means your SIEM function costs $7M / yr, but protects $200M of revenue ($1B x 20%). As you map the other tools, processes, people, etc. back to the business you will get a complete picture of how much risk your security program is managing and make informed decisions about your program to the board.

For example, you may find your security program costs $100M / year, but is only able to manage risk for $750M (75%) of the business. Your analysis should clearly articulate whether this remaining 20% of risk is residual (will never go away and is acceptable) or is unmanaged and needs attention.

Complete The Picture

By mapping your security program costs to the percentage of controls they cover and then mapping those controls to the business, CISOs should be able to get an accurate picture of the effectiveness of their security program. By breaking out the security program costs into the internal and external categories I’ve listed above, they can also compare and contrast the costs to the total amount of risk to determine which investments yield the best value. These analyses can be extremely effective when having conversations with the rest of the C-Suite or board, who may be inclined to decline additional budget requests or subjectively recommend a solution. By informing these stakeholders of the cost per control and the risk value of that cost, you can help them support your recommendation for additional investment to help increase risk management coverage or to help increase the value of risk management provided by the security program.

The following chart is an example of what this analysis can yield.

Once you have this data and analysis you can start driving conversations with the rest of the C-Suite and the board to inform them of how much risk is being managed, how much is residual, how much risk is unmanaged and your recommendation for additional investment (or acceptance). These conversations can also benefit from further analysis such as the ratio of cost to managed risk to determine which investment is providing the best value and ultimately support your recommendation for how the company should manage this risk going forward (people or technology).

Wrapping Up

Managing P&L is a fundamental skill for all CISOs to master and can help drive conversations across the company for how risk is being managed. CISOs need to master skills in financial analysis and partner with other parts of the business like business operations or business owners to understand how the business operates and what percentage of the business is effectively covered by the existing security program. The results of this analysis will help CISOs shape the conversation around risk, investment and ultimately the strategic direction of the business.

Should CISOs Be Technical?

Don’t want to read this? Watch a video short of the topic here.

There are a lot of different paths to becoming a CISO and everyone’s journey is different, however two of the most common paths are coming up through the technical ranks or transitioning over from the compliance function. Coming up through the technical ranks is common because cybersecurity is a technically heavy field, particularly when attempting to understand the complexities of how exploits work and the best way to defend against attackers. Coming up through the compliance ranks is also common because companies are often focused on getting a particular compliance certification in order for them to conduct business and interact with the customers. Each of these paths offers advantages and disadvantages, but I will argue being technical is more challenging than some of the softer cybersecurity disciplines like compliance, which leads to a common question – do CISOs need to be technical?

Yes, but…

If you don’t want to read any further the short answer is yes, CISOs need to be technical. The longer answer is, being technical is a necessary, but insufficient characteristic of a well rounded CISO. The reason being technical is insufficient is because for the past few years the CISO role at public companies has been transforming from a technical role to a business savvy executive role. CISOs are expected to report to the board, which requires speaking the language of business, risk and finance. I have seen CISOs quickly lose their audience in board meetings when they start talking about tooling, vulnerabilities and detailed technical aspects of their security program. CISOs need to be able to translate their security program into the language of risk and they need to be savvy enough to weave in financial and business terminology that the board and other C-Suite executives will understand.

Obtain (and maintain) A Technical Grounding

Even though being technical is no longer sufficient for a well rounded CISO it is important for a CISO to obtain or maintain a technical grounding. A technical grounding will help the CISO translate technical concepts (like vulnerabilities and exploits) into higher level business language like strategy, risk or profit and loss (P&L). It is also important for a CISO to understand technical concepts so they can dig in when needed to make sure their program is on track or controls are operating effectively. Lastly, it is important to maintain technical credibility with other technical C-Suite stakeholders like the CTO and CIO. Speaking their language will help align these powerful C-Suite members with your security program, who can then lend critical support when making asks for the rest of the C-Suite or board.

What other skills does a CISO need?

In addition to a technical grounding, there are a number of skills CISOs need to master in order to be effective in their role. The following is a short list of skills CISOs need to have in order to be successful at a public company:

Executive presence and public speaking skills with the ability to translate security concepts into business risk that resonates with senior executives and the board
Ability to lead and communicate during a crisis
Politically savvy, with ability to partner with and build alliances with other parts of the business
Ability to understand the core parts of the business, how they operate and what their strategy is
Ability to explain the “value” of your security program in business and financial terms
Strong understanding of financial concepts such as CAPEX, OPEX, P&L, budgeting and ability to understand balance sheets, earning results and SEC filings
Understand and navigate legal concepts (such as privilege), regulations and compliance activities with the ability to map these concepts back to your security program or testify in court (if needed)
Ability to interact with auditors (when needed) to satisfy compliance asks or guide responses
Ability to interact with customers to either reassure them about the maturity of your security program or act as an extension of the sales team to help acquire new customers
Interact with law enforcement and other government agencies, depending on the nature of the business

If this seems like a long list that doesn’t fit your concept of what a CISO does, then you may have some weaknesses you need to work on. This list also reflects the evolving nature of the CISO role, particularly with respect to board interaction and leadership at public companies. More importantly, a lot of these concepts are not covered in popular security certifications and you definitely won’t get all of this experience from start ups or non-public companies. That is ok, because recognizing and acknowledging your weaknesses is the first step to becoming a better CISO.

Navigating Hardware Supply Chain Security

Lately, I’ve been thinking a lot about hardware supply chain security and how the risks and controls differ from software supply chain security. As a CSO, one of your responsibilities is to ensure your supply chain is secure, yet the distributed nature of our global supply chain makes this a challenging endeavor. In this post I’ll explore how a CSO should think about the risks of hardware supply chain security, how they should think about governing this problem and some techniques for implementing security assurance within your hardware supply chain.

What Is Hardware Supply Chain?

Hardware supply chain relates to the manufacturing, assembly, distribution and logistics of physical systems. This includes the physical components and the underlying software that comes together to make a functioning system. A real world example could be something as complex as an entire server or something as simple as a USB drive. Your company can be at the start of the supply chain by sourcing and producing raw materials like copper and silicon, at the middle of the supply chain producing individual components like microchips, or at the end of the supply chain assembling and integrating components into an end product for customers.

What Are The Risks?

There are a lot of risks when it comes to the security of hardware supply chains. Hardware typically has longer lead times and longer shelf life than software. This means compromises can be harder to detect (due to all the stops along the way) and can persist for a long time (e.g. decades in cases like industrial control systems). It can be extremely difficult or impossible to mitigate a compromise in hardware without replacing the entire system (or requiring downtime), which is costly to a business or deadly to a mission critical system.

The risk of physical or logical compromise can happen in two ways – interdiction and seeding. Both involve physically tampering with a hardware device, but occur at different points in the supply chain. Seeding occurs during the physical manufacture of components and involves someone inserting something malicious (like a backdoor) into a design or component. Insertion early in the process means the compromise can persist for a long period of time if it is not detected before final assembly.

Interdiction happens later in the supply chain when the finished product is being shipped from the manufacturer to the end customer. During interdiction the product is intercepted en route, opened, altered and then sent to the end customer in an altered or compromised state. The hope is the recipient won’t detect the slight shipping delay or the compromised product, which will allow anything from GPS location data to full remote access.

Governance

CSOs should take a comprehensive approach to manage the risks associated with hardware supply chain security that includes policies, processes, contractual language and technology.

Policies

CSOs should establish and maintain policies specifying the security requirements at every step of the hardware supply chain. This starts at the requirements gathering phase and includes design, sourcing, manufacturing, assembly and shipping. These policies should align to the objectives and risks of the overall business with careful consideration for how to control risk at each step. An example policy could be your business requires independent validation and verification of your hardware design specification to make sure it doesn’t include malicious components or logic. Or, another example policy can require all personnel who physically manufacture components in your supply chain receive periodic background checks.

Processes

Designing and implementing secure processes can help manage the risks in your supply chain and CSOs should be involved in the design and review these processes. Processes can help detect compromises in your supply chain and can create or reduce friction where needed (depending on risk). For example, if your company is involved in national security programs you may establish processes that perform verification and validation of components prior to assembly. You also may want to establish robust processes and security controls related to intellectual property (IP) and research and development (R&D). Controlling access to and dissemination of IP and R&D can make it more difficult to seed or interdict hardware components later on.

Contractual Language

An avenue CSOs should regularly review with their legal department are the contractual clauses used by your company for the companies and suppliers in your supply chain. Contractual language can extend your security requirements to these third parties and even allow your security team to audit and review their manufacturing processes to make sure they are secure.

Technology

The last piece of governance CSOs should invest in is technology. These are the specific technology controls to ensure physical and logical security of the manufacturing and assembly facilities that your company operates. Technology can include badging systems, cameras, RFID tracking, GPS tracking, anti-tamper controls and even technology to help assess the security assurance of components and products. The technologies a CSO selects should complement and augment their entire security program in addition to normal security controls like physical security, network security, insider threat, RBAC, etc.

Detecting Compromises

One aspect of hardware supply chain that is arguably more challenging than software supply chain is detection of compromise. With the proliferation of open source software and technologies like sandboxing, it is possible to review and understand how a software program behaves. Yet, it is much more difficult to do this at the hardware layer. There are some techniques that I have discovered while thinking about and researching this problem and they all relate back to how to detect if a hardware component has been compromised or is not performing as expected.

Basic Techniques

Some of the more simple techniques for detecting if hardware has been modified is via imaging. After the design and prototype is complete you can image the finished product and then compare all products produced against this image. This can tell you if the product has had any unauthorized components added or removed, but it won’t tell you if the internal logic has been compromised.

Another technique for detecting compromised components is similar to unit testing in software and is known as functional verification. In functional verification, individual components have their logic and sub-logic tested against known inputs and outputs to verify they are functioning properly. This may be impractical to do with every component if they are manufactured at scale so statistical sampling may be needed to probabilistically ensure all of the components in a batch are good. The assumption here is if all of your components pass functional verification or statistic sampling then the overall system has the appropriate level of integrity.

To detect interdiction or logistics compromises companies can implement logistics tracking such as unique serial numbers (down to the component level), tamper evident seals, anti-tamper technology that renders the system inoperable if tampered with or makes it difficult to tamper with something without destroying it and even shipping thresholds to detect shipping delay abnormalities.

Advanced Techniques

More advanced detection techniques for detecting compromise can include destructive testing. Similar to statistical sampling, destructive testing involves physically breaking apart a component to make sure nothing malicious has been inserted. Destructive testing makes sure the component was physically manufactured and assembled properly.

In addition to destructive testing, companies can create hardware signatures that include expected patterns of behavior for how a system should physically behave. This is a more advanced method of functional testing where multiple components or even finished products are analyzed together for known patterns of behavior to make sure they are functioning as designed and not compromised. Some hardware components that can assist with this validation are technologies like Trusted Platform Modules (TPM).

Continuing with functional operation, a more advanced method of security assurance for hardware components is function masking and isolation. Function masking attempts to mask a function so it is more difficult to reverse engineer the component. Isolation limits how components can behave with other components and usually has to be done at the design level, which effectively begins to sandbox components at the hardware level. Isolation could rely on TPM to limit functionality of components until the integrity of the system can be verified, or it could just limit functionality of one component with another.

Lastly, one of the most advanced techniques for detecting compromise is called 2nd order analysis and validation. 2nd order analysis looks at the byproduct of the component when it is operating by looking at things like power consumption, thermal signatures, electromagnetic emissions, acoustic properties and photonic (light) emissions. These 2nd order emissions can be analyzed to see if they are within expected limits and if not it could indicate the component is compromised.

Wrapping Up

Hardware supply chain security is a complex space given the distributed nature of hardware supply chains and the variety of attack vectors spanning physical and logical realms. A comprehensive security program needs to weigh the risks of supply chain compromise against the risks and objectives of the business. For companies that operate in highly secure environments, investing in advanced techniques ranging from individual component testing to logistics security is absolutely critical and can help ensure your security program is effectively managing the risks to your supply chain.

References:

Guarding Against Supply Chain Attacks Part 2 (Microsoft)

Long-Term Strategy for DoD Trusted Foundry Needs (ITEA)

What’s Better – Complete Coverage With Multiple Tools Or Partial Coverage With One Tool?

The debate between complete coverage with multiple tools versus imperfect coverage with one tool regularly pops up in discussions between security professionals. What we are really talking about is attempting to choose between maximum functionality and simplicity. Having pursued both extremes over the course of my security career I offer this post to share my perspective on how CISOs can think about navigating this classic tradeoff.

In Support Of All The Things

Let’s start with why you may want to pursue complete coverage by using multiple technologies and tools.

Heavily Regulated And High Risk Industries

First, heavily regulated and high risk businesses may be required to demonstrate complete coverage of security requirements. These are industries like the financial sector or government and defense. (I would normally say healthcare here, but despite regulations like HIPAA the entire industry has lobbied against stronger security regulations and this has proven disastrous via major incidents like the Change Healthcare Ransomware Attack). The intent behind any regulation is to establish a minimum set of required security controls businesses need to meet in order to operate in that sector. It may not be possible to meet all of these regulatory requirements with a single technology and therefore, CISOs may need to evaluate and select multiple technologies to meet the requirements.

Defense In Depth

Another reason for selecting multiple tools is to provide defense in depth. The thought process is: multiple tools will provide overlap and small variances in how they meet various security controls. These minor differences can offer defenders an advantage because if one piece of technology is vulnerable to an exploit, another piece of technology may not be vulnerable. By layering these technologies throughout your organization you reduce the chances an attacker will be successful.

An example of this would be if your business is protected from the internet by a firewall made by Palo Alto. Behind this PA firewall is a DMZ and the DMZ is separated from your internal network by a firewall from Cisco. This layered defense will make it more difficult for attackers to get through the external firewall, DMZ, internal firewall and into the LAN. (See image below for a very simplistic visual)

Downside Of All The Things

All the things may sound great, but unless you are required to meet that level of security there can be a lot of downsides.

First, multiple technologies introduce complexity into an environment. This can make it more difficult to troubleshoot or detect issues (including security events). It can also make it more difficult to operationally support these technologies because they may have different interfaces, APIs, protocols, configurations, etc. It may not be possible to centrally manage these technologies, or it may require the introduction of an additional technology to manage everything.

Second, all of these technologies can increase the number of people required to support them. People time can really add up as a hidden cost and shouldn’t be thrown away lightly. People time starts the second you begin discussing the requirements for a new technology and can include the following:

Proof of Concepts (PoCs)
Tradeoff & Gap Analysis
Requests for Information (RFI)
Requests for Proposal (RFP)
Requests for Quotes (RFQ)
Contract Negotiation
Installation
Integration
Operation & Support

Finally, multiple technologies can cause performance impacts, increased costs and waste. Performance impacts can happen due to differences in technologies, complexity, configuration errors or over consumption of resources (such as agent sprawl). Waste can happen due to overlap and duplicated functionality because not all of the functionality may not get used despite the fact you are paying for it.

Advantages and Disadvantages Of A Single Tool

A single tool that covers the majority, but not all, of your requirements offers one advantage – simplicity. This may not sound like much, but after years of chasing perfection, technology simplicity can have benefits that may not be immediately obvious.

First, seeking out a single tool that meets the majority of requirements will force your security team to optimize their approach for the one that best manages risk while supporting the objectives of the business. Second, a single tool is easier to install, integrate, operate and support. There is also less demand on the rest of the business in terms of procurement, contract negotiation and vendor management. Lastly, a single tool requires less people to manage it and therefore you can run a smaller and more efficient organization.

The biggest disadvantage of a single tool is it doesn’t provide defense in depth. One other disadvantage is it won’t meet all of your security requirements and so the requirements that aren’t met should fall within the risk tolerance of the business or somehow get satisfied with other compensating controls.

A single tool that covers the majority, but not all, of your requirements offers one advantage – simplicity.

Wrapping Up

There are a lot of advantages to meeting all of your requirements with multiple tools, but these advantages come with a tradeoff in terms of complexity, operational overhead, duplicated functionality and increased personnel requirements. If you operate a security program in a highly regulated or highly secure environment you may not have a choice so it is important to be aware of these hidden costs. A single tool reduces complexity, operational overhead and personnel demands, but can leave additional risk unmet and fails to provide defense in depth. Generally, I favor simplicity where possible, but you should always balance the security controls against the risk tolerance and needs of the business.

Security Considerations For M&A and Divestitures

I’ve been speaking to security startups over the last few weeks and some of the discussions made me think about the non-technical aspects of security that CISOs need to worry about. Specifically, things like mergers, acquisitions and divestitures and the different risks you will run into when executing these activities. There are a number of security issues that can materialize when combining businesses or separating businesses and in this post I’ll share some of the things you need to think about from a security perspective that may not be obvious at first glance.

What’s Going On Here?

There are a number of reasons for mergers & acquisitions (M&A) or divestitures. For the past two decades, the tech industry has used M&A to acquire smaller startup companies as a way to collect intellectual property, acquire specific talent or gain a competitive advantage. Divestitures may be the result of changing business priorities, separating business functions for regulatory reasons, eliminating redundancies or a way to sell a part of the business to cover costs. Mergers, acquisitions and divestitures are similar because you will want to review the same things from a security perspective, but it is probably easiest to think of divestitures as the reverse of an M&A – you are separating a business instead of combining a business. Divestitures are definitely less common than M&A in the tech space, but they aren’t unheard of. There are also differences in terms of the security risks you need to think about depending on if you are acquiring a business or separating a business. My best advice is to work with the legal and finance teams performing the due diligence and have a set process (that you have contributed to) so you don’t forget anything. With that, let’s dive into a few different areas.

Physical Security

Physical security is something you will need to think about for both M&A and divestitures. For M&A you will want to perform a physical security assessment on the facilities you are acquiring to make sure they meet or exceed your standards. Reviewing physical security controls like badging systems, fencing, bollards, cameras, fire suppression, emergency lighting, tempest controls (if required), safes and door locks will all help make sure your new facilities are up to standard. If you aren’t sure how to perform this, hire a company that specializes in physical security assessments or physical red teaming.

While physical security for M&A may seems straight forward, there are a few gotchas when performing divestitures. The biggest gotcha is understanding and reviewing the existing access of the people that are part of the divestiture because you will now need to consider them outsiders. All of your standard off-boarding processes will apply here such as terminating accesses to make sure someone doesn’t retain access to a system they are no longer authorized to access (like HR, Finance, etc.).

Things can get complicated if parts of the business are divesting, but not fully. Some examples of this are when the business divests a smaller part, but allows the smaller part to co-locate in their existing facilities. This may complicate physical security requirements such as how to schedule or access common areas, how to schedule conference rooms, how to separate wifi and network access, etc. In the above example, the larger company may act like a service provider to the divested part of the business, but there still needs to be effective security controls in place between the two parts.

Personnel Security

I touched on this a bit already, but personnel security is something to consider when performing M&A or divestitures. With M&A the biggest issue will be how to smash the two IAM systems and HR systems together without punching huge holes in your network. Typically what happens is the two parts operate separately for a while and then consolidate to a single system and the employees of the acquired business get new accounts and access.

For divestitures, particularly if they don’t result in a clean split, you will need to focus heavily on access control and insider threats. Think about how you will separate access to things like source code, financial systems, HR systems, etc. If the smaller company has physical access to your space then you need to build in proper physical and logical controls to limit what each business can do, particularly for confidentiality and competitive reasons.

What’s an example of where this can go wrong? Let’s say business A is going to divest a small part of its business (business B). The complete divestiture is going to take a while to finalize so company A agrees to allow company B to continue to access their existing office space, including conference rooms. However, the legal team didn’t realize the conference rooms are tied to company A’s SSO and calendaring system so company B has no way to schedule the conference rooms without retaining access to company A’s IAM system creating a major security risk. Whoops!

The biggest gotcha is understanding and reviewing the existing access of the people that are part of the divestiture because you will now need to consider them outsiders.

Contracts

Contracts may not seem like a typical security issue, but they should be part of your review, particularly when performing M&A. Why? You are acquiring a business that is worth something and that business will have existing contracts with customers. The contractual terms with those customers may not match the contractual terms of the acquiring company, which can cause a risk if there is a significant difference in contract terms. Smaller companies are more agile, but they also usually have less negotiating power compared to large companies and as a result are more likely to agree to non-standard contract terms. What are some terms you need to think about?

Vulnerability Remediation Times – How quickly did the new company promise to fix vulnerabilities for their customers?
Incident & Breach Disclosure Time Frames – How quickly did the new company promise to notify customers of a breach or incident? I have seen very small time frames suggested in contracts, which are impossible to meet, so I definitely recommend reviewing these.
Disclosure of Security Postures – Does the new company have contractual terms promising to provide SBOMs or other security posture assessments to their customers on a regular basis?
Compliance Requirements – Has the new company agreed to be contractually obligated to maintain compliance certifications such as PCI-DSS, SOC 2, ISO27001, etc.
Penetration Testing & Audits – Has the new company contractually agreed to have their products or services penetration tested or have their security program audited? Have they agreed to provide these reports to their customers on a regular basis?
Privacy & Data Governance Terms – Is the new company required to comply with privacy regulations such as allowing customers have their data deleted, or mandating certain data governance requirements like DLP, encryption, data deletion, etc?
BCP/DR and SLAs – Are there contractual uptime SLAs or response times and does the existing BCP/DR plan support these SLAs?

My advice is to set a timeline post acquisition to review and standardize all of your contracts to a single set of standard clauses covering the above topics. This is usually part of a security addendum that the legal team can help you create. The biggest challenge with contracts will be to “re-paper” all of your customers to hopefully get them on the same standardized contract terms so your security program doesn’t have a bunch of different requirements they have to try to meet.

Accuracy Of M&A’s

One of the biggest risk of performing M&A’s is trying to get an accurate picture of the existing security posture of the company being acquired. Why is this so difficult? The company being acquired is trying to look as good as possible so they get top dollar. They can’t hide things, but they aren’t going to tell you where all the skeletons are buried either. The acquiring company usually doesn’t get a full picture of the existing security posture until after the deal is done and you start trying to integrate the two parts of the business. If you have a chance to interview the existing security team before the M&A closes definitely ask to see their latest audit reports, compliance certifications, penetration testing reports, etc. Consider working with legal to set conditions for how old these reports can be (e.g. no older than 6 months) to hopefully give you a more accurate picture or require the acquired company to update them before the deal closes. Interview key members of the staff to ask how processes work, what are their biggest pain points, etc. Consider hiring an outside company to perform an assessment, or you can even consider talking to one of their largest customers to get their external view point (if possible).

Wrapping Up

M&A and divestitures can be exiting and stressful at the same time. It is important for the security team to be integrated into both processes and to have documented steps to make sure risks are being assessed and addressed. I’ve listed a few key focus areas above, but most importantly standardizing your M&A security review can help avoid “buyers remorse” or creating unnecessary risk to the acquiring business. Finally, having a documented divestiture process and reviewing the divestiture with legal can help avoid security risks after the fact.

Security Theater Is The Worst

We have all been there…we’ve had moments in our life where we have had to “comply” or “just do it” to meet a security requirement that doesn’t make sense. We see this throughout our lives when we travel, in our communities and in our every day jobs. While some people may think security theater has merit because it “checks a box” or provides a deterrent, in my opinion security theater does more harm than good and should be eradicated from security programs.

What Is Security Theater?

Security theater was first coined by Bruce Schneier and refers to the practice of implementing security measures in the form of people, processes or technologies that give the illusion of improved security. In practical terms, this means there is something happening, but what that something is and how it actually provides any protection is questionable at best.

Examples Of Security Theater

Real life examples of security theater can be seen all over the place, particularly when we travel. The biggest travel security theater is related to liquids. TSA has a requirement that you can’t bring liquids through security unless they are 3 ounces or smaller. However, you can bring a bottle of water through if it is fully frozen…what? Why does being frozen matter? What happens if I bring 100, 3 ounce shampoo bottles through security? I still end up with the same volume of liquid and security has done nothing to prevent me from bringing the liquid through. As for water, the only thing that makes sense for why they haven’t relaxed this requirements is to prop up the businesses in the terminal that want to sell overpriced bottles of water to passengers. Complete theater.

“Security theater is the practice of implementing security measures that give the illusion of improved security.”

Corporate security programs also have examples of security theater. This can come up if you have an auditor that is evaluating your security program against an audit requirement and they don’t understand the purpose of the requirement. For example, and auditor may insist you install antivirus on your systems to prevent viruses and malware, when your business model is to provide Software as a Service (SaaS). With SaaS your users are consuming software in a way that nothing is installed on their end user workstations and so there is little to no risk of malware spreading from your SaaS product to their workstations. Complete theater.

Another example of security theater is asking for attestation a team is meeting a security requirement instead of designing a process or security control that actually achieves the desired outcome. In this example, the attestation is nothing more than a facade designed to pass accountability from the security team, that should be designing and implementing effective controls, to the business team. It is masking ineffective process and technologies. Complete theater.

Lastly, a classic example of security theater is security by obscurity. Otherwise known as hiding in plain sight. If your security program is relying on the hope that attackers won’t find something in your environment then prepare to be disappointed. Reconnaissance tools are highly effective and with enough time threat actors will find anything you are trying to hide. Hope is not a strategy. Complete theater.

What Is The Impact Of Security Theater?

Tangible And Intangible Costs

Everything we do in life has a cost and this is certainly true with security theater. In the examples above there is a real cost in terms of time and money. People who travel are advised to get to the airport at least two hours early. This cost results in lost productivity, lost time with family and decreased self care.

In addition to tangible costs like those above, there are also intangible costs. If people don’t understand the “why” for your security control, they won’t be philosophically aligned to support it. The end result is security theater will erode confidence and trust in your organization, which will undermine your authority. This is never a place you want to be as a CISO.

Some people may argue that security theater is a deterrent because the show of doing “security things” will deter bad people from doing bad things. This sounds more like a hope than reality. People are smart. They understand when things make sense and if you are implementing controls that don’t make sense they will find ways around them or worse, ignore you when something important comes up.

With any effective security program the cost of a security control should never outweigh the cost of the risk, but security theater does exactly that.

Real Risks

The biggest problem with security theater is it can give a false sense of security to the organization that implements it. The mere act of doing “all the things” can make the security team think they are mitigating a risk when in reality they are creating the perfect scenario for a false negative.

How To Avoid Security Theater?

The easiest way to avoid security theater is to have security controls that are grounded in sound requirements and establish metrics to evaluate their effectiveness. Part of your evaluation should evaluate the cost of the control versus the cost of the risk. If your control costs more than the risk then it doesn’t make sense and you shouldn’t do it.

The other way to avoid security theater is to exercise integrity. Don’t just “check the box” and don’t ask the business you support to check the box either. Take the time to understand requirements from laws, regulations and auditors to determine what the real risk is. Figure out what an effective control will be to manage that risk and document your reasoning and decision.

The biggest way to avoid security theater is to explain the “why” behind a particular security control. If you can’t link it back to a risk or business objective and explain it in a way people will understand then it is security theater.

Can we stop with all the theater?

Using Exceptions As A Discovery Tool

Security exceptions should be used sparingly and should be truly exceptional circumstances that are granted after the business accepts a risk. In mature security programs the security exceptions process is well defined and has clear criteria for what will and will not meet the exception criteria. In mature programs exceptions should be the exception, not the norm. However, in newer security programs exceptions can be a useful tool that provides discovery as well as risk acceptance.

Maturing A Security Program

One of the first things a new CISO will need to do is understand the business and how it functions. As part of this process the CISO will need to take an inventory of the current state of things so he or she can begin to form a strategy on how to best manage risk. As a new CISO your security program may not have well defined security policies and standards. As you begin to define your program and roll out these policies, the exception process can be a valuable tool that gives the perception of a choice, while allowing the security team to uncover areas of the business that need security improvement. Over time, as the business and security program mature, the CISO can gradually deny any requests to renew or extend these exceptions.

Rolling Out A New Security Process

Another area that is useful to have an exceptions process is when rolling out a new security process. For example, if you are rolling out a new process that will require teams to perform SAST and DAST scanning of their code and fix vulnerabilities before going into production, then allowing security exceptions during the initial rollout of the process can be useful to allow teams more time to adapt their development processes to incorporate the new security process. Allowing exceptions can foster good will with the development team and allow the security function visibility into the behavior and culture of the rest of the business. This can allow the security function and development team the opportunity to collaborate together with the ultimate goal of removing any exceptions and following the process to reduce risk to the business.

Tackling Security Tech Debt or Shadow IT

A common maturity evolution for companies is the elimination of shadow IT. The security function can assist with the elimination of shadow IT by creating an exception process and allowing an amnesty period where the business is allowed to continue to operate their shadow IT as long as it is declared. In reality you are giving the business the perception that they will be granted an exception when they are really giving the security function visibility into things they wouldn’t otherwise know about. This can be a useful tool to discover and eliminate policy exceptions as long as it is used sparingly and with good intent (not punitively).

Documentation Is Key

No matter how you choose to use exceptions within your security program there are a few best practices to follow.

Exceptions should be truly exceptional. If you do grant one for discovery purposes make sure there is a plan to close the exception. Exceptions shouldn’t be the rule and they shouldn’t be expected. Sometimes the rest of the business just needs someone to tell them no.
Time box the exception. Don’t just grant an exception without some sort of end date. The business needs to know an exception is temporary and there should be a well defined plan to make improvements and close the exception. The security team should grant a reasonable amount of time to execute that plan, but it shouldn’t be a never ending story.
Review often. Security exceptions should be reviewed often. Part of your security program should review the open exceptions, which ones are ending, if there are patterns where there are lots of similar exceptions and if there are teams who request a high volume of exceptions. Reviewing exceptions gives you insight into how well security processes and controls are working. It also gives you insight into which parts of the business need help.
Require the business owner to sign off. The reality of a well run security program is the business ultimately owns the decision if they want to accept a risk or not. The CISO makes a recommendation, but they don’t own the business systems or processes. As a result, the security exception process should require the business owner to sign off on any exception. This will ensure there is documentation that they were made aware of the risk, but this can also act as a visibility tool for the business owner into their own teams. I’ve often found a business leader is not always aware of what their teams are doing at the tactical level and the exceptions process can provide them the opportunity to check their team and correct behavior before it gets to the CISO.

Wrapping Up

The exception process can be a valuable tool for discovery of hidden risk throughout the business. By offering an amnesty period and giving the perception of flexibility, the security team can foster good will with the business while gaining valuable visibility into areas that may be hidden. The exception process also is a valuable tool for the security program to document risk acceptance by the applicable business owner, but can also provide business owners visibility into how well their team is meeting security requirements. Lastly, as the security program matures, the security team can gradually require the business to close down the exceptions by improving their security posture.