#risk – 370 Security Blog

Navigating The First 90-180 Days In A New CISO Role

Late one Friday afternoon a call comes in and you find out you landed your next CISO role. All the interview prep, research, networking and public speaking has paid off! Then it dawns on you that you could be walking into a very difficult situation over the next few months. Even though the interview answered a lot of questions, you won’t know the reality of the situation until you start. How will your expectations differ from reality? What can you do to minimize risk as you come up to speed? How should you navigate these first 90-180 days in your new role?

Prior To Starting

Let’s assume you have some time to wind down your current position and you are also going to take some time off before starting the new role. During this transition period I highly advise you reach out to your peers in the new role and start asking questions to get more detail about the top challenges and risks you need to address. Start with the rest of the C-Suite, but also get time with board members and other senior business leaders to get their perspectives. Focus on building rapport, but also gather information to build on what you learned during the interview process so you can hit the ground running.

You can also use this time to reach out to your CISO peers in your network who are in the same industry, vertical or company type to get their perspective on what they did when they first joined their company. Learn from their experience and try to accelerate your journey once you start. Keep the lines of communication open so if you run into a situation you are unsure of you can ask for advice.

Once You Start

Build Relationships

First and foremost, start building relationships as quickly as possible. Target senior leadership first, such as board members, the C-Suite and other senior leaders. Work your way down by identifying key influencers and decision makers throughout the org. Play the “new person card” and ask questions about anything and everything. Gain an understanding of the “operational tempo” of the business such as when key meetings take place (like board meetings). Understand the historical reasons why certain challenges exist. Understand the political reasons why challenges persist. Understand the OKRs, KPIs and other business objectives carried by your peers. Learn the near and long term strategy for the business. Start building out a picture of what the true situation is and how you want to begin prioritizing.

Understand the historical reasons why certain challenges exist. Understand the political reasons why challenges persist.

Plan For The Worst

Don’t be surprised if you take a new role and are immediately thrown into an incident or other significant situation. You may not have had time to review playbooks or processes, but you can still fall back on your prior experience to guide the team through this event and learn from it. Most importantly, you can use this experience to identify key talent and let them lead, while you observe and take notes. You can also use your observation of the incident to take notes on things that need to be improved such as interaction with non-security groups, when to inform the board, how to communicate with customers or how to improve coordination among your team.

Act With Urgency

Your first few months in the role are extremely vulnerable periods for both you and the company. During this period you won’t have a full picture of the risks to the business and you may not have fully developed your long term plan. Despite these challenges, you still need to act with urgency to gain an understanding of the business and the risk landscape as quickly as possible. Build on the existing program (if any) to document your assumptions, discoveries, controls and risks so you can begin to litigation proof your org. Map the maturity of security controls to an industry framework to help inform your view of the current state of risk at the company. Begin building out templates for communicating your findings, asks, etc. to both the board and your peers. Most importantly, the company will benefit from your fresh perspective so be candid about your findings and initial recommendations.

Evaluate The Security Org

In addition to the recommendations above, one of the first things I like to do is evaluate the org I have inherited. I try to talk to everyone and answer a few questions:

Is the current org structure best positioned to support the rest of the business?
How does the rest of the business perceive the security org?
Where do we have talent gaps in the org?
What improvements do we need to make to culture, diversity, processes, etc. to optimize the existing talent of the org?

Answering these questions may require you to work with your HR business partner to build out new role definitions and career paths for your org. You may also need to start a diversity campaign or a culture improvement campaign within the security org. Most importantly, evaluate the people in your org to see if you have the right people in the right places with the right skillsets.

A Plan Takes Shape

As you glide past the 90 day mark and start establishing your position as a trusted business partner, you should arrive at a point where a clear vision and strategy is starting to take shape. Use the information you have gathered from your peers, your program documentation and your observations to start building a comprehensive plan and strategy. I’ve documented this process in detail here. In addition to building your program plan you can also begin to more accurately communicate the state of your security program to senior leaders and the board. Show how much the existing program addresses business risk and where additional investment is needed. I’ve documented a suggested process here. Somewhere between your 90 and 180 day mark you should have a formalized plan for where you are over invested, under invested or need to make changes to optimize existing investment. This could include restructuring your org, buying a new technology, adjusting contractual terms or purchasing short term cyber insurance. It could even include outsourcing key functions of the security org for the short term, until you can get the rest of your program up to a certain standard. Most importantly, document how you arrived at key decisions and priorities.

Take Care Of Yourself

Lastly, on a personal note, make sure to take care of yourself. Starting a new role is hectic and exciting, but it is also a time where you can quickly overwork yourself. Remember building and leading a successful security program is a marathon not a sprint. The work is never done. Get your program to a comfortable position as quickly as possible by addressing key gaps so you can avoid burning yourself out. Try to establish a routine to allow for physical and mental health and communicate your goals to your business partners so they can support you.

During this time (or the first year) you may also want to minimize external commitments like dinners, conferences and speaking engagements. When you start a new role everyone will want your time and attention, but be cautious and protective of your time. While it is nice to get a free meal, these dinners can often take up a lot of time for little value on your end (you are the product after all). Most companies have an active marketing department that will ask you to engage with customers and the industry. Build a good relationship with your marketing peers to interweave customer commitments with industry events so you are appropriately balancing your time and attending the events that will be most impactful for the company, your network and your career.

Wrapping Up

Landing your next CISO role is exciting and definitely worth celebrating. However, the first 90-180 days are critical to gain an understanding of the business, key stakeholders and how you want to start prioritizing activities. Most importantly, build relationships, act with urgency and document everything so you can minimize the window of exposure as you are coming up to speed in your new role.

Build A Proactive Security Program By Focusing On The Fundamentals

A common topic at security conferences, CISO dinners and networking events is: “how you are preparing your program for a new and upcoming regulation?” For CISOs, this conversation is a way to exchange ideas, gather information and compare programs. Unfortunately, CISOs often express feeling underprepared for the upcoming shift in the regulatory landscape causing them to scramble to meet the new requirements. I’m sure this feeling has existed since the first CISO role was created and has been continuing through SOX, PCI-DSS, HIPAA, GDPR, DORA and CMMC. If you have ever felt your program can be better prepared for new challenges or are looking to be more proactive then this post is for you. The goal is to prepare your security program so well that any new challenges are a non-event and I fundamentally believe there are lots of things CISOs can do with their security programs to achieve this goal.

What Causes Programs To Be Reactive?

Underfunding

There are several issues that can cause a security program to be reactive and understanding the problem is the first step to over coming it. One of the most common issues with any security program is underfunding. Underfunding a security program can have ripple effects on staff, technology, risk management and compliance activities. Underfunding can be a conscious choice of the business, but more often it is the result of the CISO failing to articulate or demonstrate how the security program creates value for the business. If you can’t link your security program back to business objectives and risk then your program is falling short. When a program is underfunded it can’t innovate or gain breathing room. As a result the program will be in a perpetual state of reactivity and constantly responding to the next problem that comes up.

Poor Understanding Of Risk

But wait! You say. My program is well funded. I have the staff and technology I need, but we are still reactive. This can be for a few other reasons, such as your program has a poor understanding of the risk landscape for the business. At a basic level this means documenting your program, controls, policies, exceptions and strategy so you are in lock step with what the business is trying to accomplish. The culture of the security program should be “help me say yes to your security ask”, instead of always saying no.

Thoroughly understanding the risk landscape for the business, such as where your security program effectively manages that risk and where the business can take on more risk, is critical to helping the business operate, expand and be successful. If you haven’t mapped your program to risk then your program will always be reactive because you will have to constantly evaluate the changing business conditions each time slowing down the business and pulling resources from other areas.

Shiny Thing Syndrome

One final reason your security program can be reactive is shiny thing syndrome. This is where someone in the org (it can be you, the CTO, the CEO, etc.) is constantly enamored with new technology, things they read in Harvard Business Review or whatever they think is “cool”. This means your program will constantly lurch from thing to thing without ever gaining momentum. It also means instead of following a clear and well laid out strategy and roadmap, your program will hop around and never achieve success. They best way to counter shiny thing syndrome is with a well documented program, with a clear understanding of where you are and where you are going.

Shifting To Become Proactive

So the big question is: how do you shift your program to become proactive? We can talk about a lot of ideas like automation, AI, processes, etc., but I truly believe the core of any security program should be the fundamentals and by focusing on these fundamentals you can stop being reactive.

Don’t Practice During The Game

Here is an analogy that I like to use for what a proactive security program means. Consider you are learning to play baseball. You could go out into the field look around and hope the ball doesn’t get hit to you. Worse, you could have no idea which way to face, what to do with the glove or even how to win the game. You are just standing there… waiting to react to whatever happens and hoping to figure it out. This is a security program that hasn’t mastered the fundamentals.

However hope is not a strategy and you shouldn’t practice your skills at the game. You should practice the skills you need before the game, hone them over and over until they become instinctive allowing you to proactively shift your strategy during the game. This is what a proactive security program can do. By focusing on the fundamentals like knowing what you have, where it is and what the status is, you know you won’t have to scramble to figure these things out when a new regulation comes out or a new incident hits. By thoroughly documenting your program against an industry standard framework and continually measuring compliance and risk against that framework you will eventually master the fundamentals and become proactive. Focusing on and mastering the fundamentals allows you to continually refine your program so you can anticipate where the business, industry and regulatory environment is going. In fact, any changes in the business, industry or regulatory environment should be a non-event because your program is so robust that you can help the business take on and manage whatever new risk comes up.

Wrapping Up

Next time you are faced with a challenging incident, new regulation, new compliance activity or are at odds with the business, ask yourself if your program has mastered the fundamentals. Do an honest assessment of your program, conduct a retrospective of past activities and assess where you need to improve. Find new ways to articulate the value of your program and link your program back to business risk so you can get the funding and support you need. By mastering the fundamentals you are mastering important skills when it doesn’t matter, so you can be proactive and anticipate events before they matter.

Should Security Be An Approver For IT and Business Requests?

Over the course of my career I have consistently seen security in the approval chain for various IT operations and business requests, such as identity, network and even customer contracts. Having security in the approval chain may seem logical at first glance, but it can actually mask or exacerbate underlying operations issues. Having a second set of eyes on requests can make sense to provide assurance, but the question is – does it make sense for security to be an approver?

Understand The Scope of Your Security Program

First and foremost, the scope of your security program will ultimately dictate how and when security should be included in an approval process. For example, if security owns networking or identity, then it will make sense to staff an operations team to support these areas and it will make sense to have security as an approver for requests related to these functions.

It may also make sense to include security in the approval chain as an evaluator of risk for functions security doesn’t own. For example, security won’t own the overall contract, finance or procurement processes, but they should be included as an approver to make sure contract terms and purchases align to security policies and are not opening up the business to unnecessary risk. They can also be included in large financial transfers as a second set of eyes to make sure the business isn’t being scammed out of money. In these examples, security is creating good friction to slow critical processes down in a healthy way to make sure they make sense and to use time as a defense mechanism.

Other Benefits Of Security As An Approver

Including security as an approver for general IT processes can have other benefits, but these need to be weighed carefully against the risks and overall function of the business. For example, security can help provide an audit trail for approving activities that may create risk for the company. This audit trail can be useful during incident investigations to determine root cause for an incident. It can also help avoid compliance gaps for things like FedRAMP, SOC, etc. where some overall business or IT changes need to be closely managed to maintain compliance. However, creating an audit trail is not unique to the security function and, if the process is properly designed, can be performed by other functions as well.

Another advantage of including security in the approval chain is separation of duties. For example, if one team owns identity, and they are requesting elevated privileges to something, it can present a conflict of interest if they approve their own access requests. Instead, security often acts as a secondary reviewer and approver to provide separation of duties to make sure requests by a team that owns the thing isn’t approving their own requests.

Where Including Security As An Approver Can Go Wrong

The biggest issue with having security in the approval chain for most things is they typically are not the owner of those things. If approval processes are not designed properly (with other approvers besides security), then the processes can confuse ownership and give a false impression of security or compliance. For example, I typically see security as an approver for identity and access requests when security doesn’t own the identity function. At first glance, this seems to make sense because identity is a critical IT function that needs to be protected. However, if security doesn’t own the identity function (or the systems that need access approved), how do they know whether the request should be approved or not. Instead, what happens is almost all requests end up being approved (unless they are egregious) and the process serves no real purpose other than creating unnecessary friction and giving a false sense of security.

Another issue I have seen with including security in the approval chain is they effectively become “human software” where they are manually performing tasks that should be automated instead. Using security personnel as “middleware” masks the true pain and inefficiency of the process for the process owner. This takes critical human capital away from their intended purpose, is a costly solution to a problem and opens up the business to additional risk.

When Does It Make Sense For Security To Be An Approver?

I’ve listed a few examples where it makes sense for security to be an approver for things it doesn’t own – like large financial transactions, some procurement activities and security specific contract terms. However, I argue security shouldn’t be included as an approver in most IT operations processes unless security actually owns that process or thing that needs a specific security approval. Instead, the business owner of the thing should be the ultimate approver and processes should be designed to provide appropriate auditing and compliance, but without needing security personnel to perform those checks manually.

One of the few areas where it will always make sense to have security as an approver is for security exceptions. First, exceptions should be truly exceptional and not used as a band-aid for broken or poorly designed process. Second, exceptions should be grounded in business risk, while documenting the evaluation criteria, decisions, associated policies and duration. This is a core security activity because exceptions are ultimately evaluating risk and deviation from policy. I’ve written other posts about how the exception process can have other benefits as well.

Wrapping Up

Don’t fall into the trap of using your security team as a reviewer and approver for IT operations requests if security doesn’t actually own the thing related to the request. This places the security team in an adversarial position, can be a costly waste of resources, masks process inefficiencies, gives a false sense of security and can open up the business to risk. Instead, be ruthlessly focused in how your security team is utilized to make sure when they are engaged it is to perform a function that is at the core of their mission – protecting the business and managing risk.

Should Companies Be Held Liable For Software Flaws?

Following the CrowdStrike event two weeks ago, there has been an interesting exchange between Delta Airlines and CrowdStrike. In particular, Delta has threatened to sue CrowdStrike to pursue compensation for the estimated $500M of losses allegedly incurred during the outage. CrowdStrike has recently hit back at Delta claiming the airline’s recovery efforts took far longer than their peers and other companies impacted by the outage. This entire exchange prompts some interesting questions about whether a technology company should be held liable for flaws in their software and where the liability should start and end.

Strategic Technology Trends

Software quality, including defects that lead to vulnerabilities, has been identified as a strategic imperative according to CISA and the Whitehouse in the 2023 National Cybersecurity Strategy. Specifically, the United States wants to “shift liability for software products and services to promote secure development practices” and it would seem the CrowdStrike event falls into this category of liability and secure software development practices.

In addition to strategic directives, I am also seeing companies prioritize speed to market over quality (and even security). In some respects it makes sense to prioritize speed, particularly when pushing updates for new detections. However, there is clearly a conflict in priorities when a company optimizes for speed over quality for a critical detection update that causes an impact larger than if the detection update had not been pushed at all. Modern cloud infrastructure and software development practices prioritize speed to market over all else. Hyperscale cloud providers have made a giant easy button that allows developers to consume storage, network and compute resources without consideration for the down stream consequences. Attempts by the rest of the business to introduce friction, gates or restrictions on these development processes are met with derision and usually follow accusations of slowing down the business or impeding sales. Security often falls in this category of “bad friction” because they are seen as the “department of no”, but as the CrowdStrike event clearly shows, there needs to be a balance between speed and quality in order to effectively manage risk to the business.

One last trend is the reliance on “the cloud” as the only BCP / DR plan. While cloud companies certainly market themselves as globally available services, they are not without their own issues. Cloud environments still need to follow IT operations best practices by completing a business impact analysis and implementing a BCP / DR plan. At the very least, cloud environments should have a rollback option in order to revert to the last known good state.

…as the CrowdStrike event clearly shows, there needs to be a balance between speed and quality in order to effectively manage risk to the business.

What Can Companies Do Differently?

Companies that push software updates, new services or new products to their customers need to adopt best practices for quality control and quality assurance. This means rigorously testing your products before they hit production to make sure they are as free of defects as possible. CrowdStrike clearly failed to properly test their update due to a claimed flaw in their testing platform. While it is nice to know why the defect made it into production, CrowdStrike still has a responsibility to make sure their products are free from defects and should have had additional testing and observability in place.

Second, for critical updates (like detections), there is an imperative by companies to push the update globally as quickly as possible. Instead, companies like CrowdStrike should prioritize customers in terms of industry risk. They should then create a phased rollout plan that stages their updates with a ramping schedule. By starting small, monitoring changes and then ramping up the rollout, CrowdStrike could have minimized the impact to a handful of customers and avoided a global event.

Lastly, companies need to implement better monitoring and BCP / DR for their business. In the case of CrowdStrike, they should have had monitoring in place that immediately detected their products going offline and they should have had the ability to roll back or revert to the last known good state. Going a step further they could even change the behavior of their software where instead of causing a kernel panic that crashes the system, the OS recovers gracefully and automatically rolls back to the last known good state. However, the reality is sophisticated logic like this costs money to develop and it is difficult for development teams to justify this investment unless the company has felt a financial penalty for their failures.

Cloud environments still need to follow IT operations best practices by completing a business impact analysis and implementing a BCP / DR plan.

Contracts & Liability

Speaking of financial penalties, the big question is whether or not CrowdStrike can be held liable for the global outage. My guess is this will depend on what it says in their contracts. Most contracts have a clause that limits liability for both sides and so CrowdStrike could certainly face damages within those limits (probably only a few million at most). It is more likely CrowdStrike will face losses for new customers and existing customers that are up for contract renewal. Some customers will terminate their contracts. Others will negotiate better terms or expect larger discounts on renewal to make up for the outage. At most this will hit CrowdStrike for the next 3 to 5 years (depending on contract length) and then the pricing and terms will bounce back. It will be difficult for customers to exit CrowdStrike en masse because it is already a sunk cost and companies wont want to spend the time or energy to deploy a new technology. Some of the largest customers may have the best terms and ability to extract concessions from CrowdStrike, but overall I don’t think this will impact them for very long and I don’t think they will be held legally liable in any material sense.

Delta Lags Industry Standard

If CrowdStrike isn’t going to be held legally liable, what happens to Delta and their claimed lost $500M? Let’s look at some facts. First, as CrowdStrike has rightfully pointed out, Delta lagged the world for recovering from this event. They took about 20 times longer to get back to normal operations than other airlines and large companies. This points to clear underinvestment in identifying critical points of failure (their crew scheduling application) and developing sufficient plans to backup and recover if critical parts of their operation failed.

Second, Delta clearly hasn’t designed their operations for ease of management or resiliency. They have also failed to perform an adequate Business Impact Analysis (BIA) or properly test their BCP / DR plans. I don’t know any specifics about their underlying IT operations, but a few recommendations come to mind such as implementing active / active instances for critical services and moving to thin clients or PXE boot for airport kiosks and terminals. Remove the need for a human to touch any of these systems physically, and instead implement processes to remotely identify, manage and recover these systems from a variety of different failure scenarios. Clearly Delta has a big gap in their IT Operations processes and their customers suffered as a result.

Wrapping Up

What the CrowdStrike event highlights is the need for companies to prioritize quality, resiliency and stability over speed to market. The National Cybersecurity Strategy has identified software defects as a strategic imperative because they lead to vulnerabilities, supply chain compromise and global outages. Companies with the size and reach of CrowdStrike can no longer afford to prioritize speed over all else and instead need to shift to a more mature and higher quality SDLC. In addition, companies that use popular software need to consider diversifying their supply chain, implementing IT operations best practices (like SRE) and implementing a mature BCP and DR plan on par with industry standards.

What the CrowdStrike event highlights is the need for companies to prioritize quality, resiliency and stability over speed to market.

When it comes to holding companies liable for global outages, like the one two weeks ago, I think it will be difficult for this to play out in the courts without resorting to a legal tit-for-tat that no one wins. Instead, the market and customers need to weigh in and hold these companies accountable through share prices, contractual negotiation or even switching to a competitor. Given the complexity of modern software, I don’t think companies should be held liable for software flaws because it is impossible to eliminate all flaws. Additionally, modern SDLCs and CI/CD pipelines are exceptionally complex and this complexity can often result in failure. This is why BCP/DR and SRE is so important, so you can recover quickly if needed. Yes, CrowdStrike could have done better, but clearly Delta wasn’t even meeting industry standards. Instead of questioning whether companies should be held liable for software flaws, a better question is: At what point does a company become so essential that they by default become critical infrastructure?

A CISO’s Analysis Of the CrowdStrike Global Outage

Overnight from July 18 to July 19, 2024, Windows systems running CrowdStrike ceased functioning and displayed the blue screen of death (BSOD). As people woke up on the morning of July 19th they discovered a wide reaching global outage of the consumer services they rely on for their daily lives, such as healthcare, travel, fast food and even emergency services. The ramifications of this event will continue to be felt for at least the next week as businesses recover from the outage and investors react to the realization that global businesses are extremely fragile when it comes to technology and business operations.

Technical Details

An update by CrowdStrike (CS) to the C-00000291*.sys file dated 0409UTC was pushed to all customers running CS Falcon agents. This file was corrupt (reports indicate a null byte header issue) and when Windows attempted to load this file it crashed. Rebooting the impacted systems does not resolve the issue because of the way CS Falcon works. CS Falcon has access to the inner workings of the operating system (kernel) such as memory access, drivers, and registry entries that allow CS to detect malicious software and activity. The CS Falcon agent is designed to receive updates automatically in order to keep the agent up to date with the latest detections. In this case, the update file was not properly tested and somehow made it through Quality Assurance and Quality Control, before being pushed globally to all CS customers. Additionally, CrowdStrike customers are clearly running CS Falcon on production systems and do not have processes in place to stage updates to CS Falcon in order to minimize the impact of failed updates (more on this below).

Global Impact

This truly is a global outage and the list of industries is far reaching attesting to the success of CS, but also the risks that can impact your software supply chain. As of Monday, Delta airlines is still experiencing flight cancellations and delays as a result of impacts to their pilot scheduling system. The list of impacted companies can be found here, here and here, but I’ll provide a short list as follows:

Travel – United, Delta, American, major airports

Banking and Trading – VISA, stock exchanges

Emergency & Security Services – Some 911 services and ADT

Cloud Providers – AWS, Azure

Consumer – Starbucks, McDonalds, FedEx

Once the immediate global impact subsides, there will be plenty of finger pointing at CrowdStrike for failing to properly test an update, but what this event clearly shows is a lack of investment by some major global companies in site reliability engineering (SRE), business continuity planning (BCP), disaster recovery (DR), business impact analysis (BIA) and proper change control. If companies were truly investing in SRE, BCP, DR and BIA beyond a simple checkbox exercise, this failed update would have been a non-event. Businesses would have simply executed their BCP / DR plan and failed over, or immediately recovered their critical services to get back up and running (which some did). Or, if they are running proper change control along immutable infrastructure they could have immediately rolled back to the last good version with minimal impact. Clearly, more work needs to be done by all of these companies to improve their plans, processes and execution when a disruptive event occurs.

Are global companies really allowing live updates to mission critical software in production without going through proper testing? Or even better, production systems should be immutable, preventing any change to production without being updated in the CI/CD pipeline and then re-deployed. Failed updates became an issue almost two decades ago when Microsoft began patch Tuesday. Companies quickly figured out they couldn’t trust the quality of the patches and instead would test the patches in staging, which runs a duplicate environment to production. While this may have created a short window of vulnerability, it came with the advantages of stability and uninterrupted business operations.

Modern day IT Operations (called Platform Engineering or Site Reliability Engineering) now design production environments to be immutable and somewhat self healing. All changes need to be updated in code and then re-pushed through dev , test and staging environments to make sure proper QA and QC is followed. This minimizes impact from failed code pushes and will also minimize disruption from failed patches and updates like this one. SRE also closely monitors production environments for latency thresholds, availability targets and other operational metrics. If the environment exceeds a specific threshold then it throws alerts and will attempt to self heal by allocating more resources, or by rolling back to the previous known good image.

Ramifications

Materiality

Setting aside maturity of business and IT operations, there are some clear ramifications for this event. First, this had a global impact to a wide variety of businesses and services. Some of the biggest impacts were felt by publicly traded companies and as a result these companies will need to make an 8K filing with the SEC to report a material event to their business. Even though this wasn’t a cybersecurity attack, it was still an event that disrupted business operations and so companies will need to report the expected impact and loss accordingly. CrowdStrike in particular will need to make an 8K filling, not only for loss of stock value, but for expected loss of revenue through lost customers, contractual concessions and other tangible impacts to their business. When I started this post Friday of the even, CS stock was down over 10% and by Monday morning they were down almost 20%. The stock has started to recover, but that is clearly a material event to investors.

Greater Investment In BCP / DR & BIA

Recent events, such as this one and the UHC Change Healthcare ransomware attack, have clearly shown that some business are not investing properly in BCP / DR. They may have plans on paper, but plans still need to be fully tested including rapidly identifying service degradation and implementing recovery operations as quickly as possible. The reality is this should have been a non-event and any business that was impacted longer than a few hours needs to consider additional investment in their BCP / DR plan to minimize the impact of future events. CISOs need to work with the rest of the C-Suite to review existing BCP / DR plans and update them accordingly based on the risk tolerance of the business and desired RTO and RPO.

Boards Need To Step Up

During an event like this one boards need to take a step back and remember their primary purpose is to represent and protect investors. In this case, the sub-committees that govern technology, cybersecurity and risk should be asking hard questions about how to minimize the impact of future events like this and consider if the existing investment in BCP / DR technology and processes is sufficient to offset a projected loss of business. This may include more frequent reports on when the last time BCP / DR plans were properly tested and if those plans are properly accounting for all of the possible scenarios that could impact the business such as ransomware, supply chain disruption or global events like this one. The board may also push the executive staff to accelerate plans to invest in and modernize IT operations to eliminate tech debt and adopt industry best practices such as immutable infra or SRE. The board may also insist on a detailed analysis of the risks of the supply chain, including plans to minimize single points of failure, while limiting the blast radius of future events.

Negative Outcomes

Unfortunately, this event is likely to cause a negative perception of cybersecurity in the short term for a few different reasons. First, the obvious business disruption is one people will be questioning. How, is it a global cybersecurity company is able to disrupt so much with a single update? Could this same process act as an attack vector for attackers? Reports are already indicating that malicious domains have been set up to look like the fix for this event, but instead push malware. There are also malicious domains that have been created for phishing purposes and the reality is any company impacted by this event may also be vulnerable to ransomware attacks, social engineering and other follow on attacks.

Second, this event may cause a negative perception of automatic updates within the IT operations groups. I personally believe this is the wrong reaction, but the reality is some businesses will turn off the auto-updates, which will leave them more vulnerable to malware and other attacks.

The reality is this should have been a non-event and any business that was impacted longer than a few hours needs to consider additional investment in their BCP / DR plan to minimize the impact of future events.

What CISOs Should Do

With all this in mind, what should CISOs do to help the board, the C-Suite and the rest of the business navigate this event? Here are my suggestions:

First, review your contractual terms with 3rd party providers to understand contractually defined SLAs, liability, restitution and other clauses that can help protect your business due to an event caused by a third party. This should also include a risk analysis of your entire supply chain to determine single points of failure and how to protect your business appropriately.

Second, insist on increased investment in your BIA, BCP and DR plans including designing for site reliability and random events (chaos monkey) to proactively identify and recover from disruption, including review of RTO and RPO. If your BCP / DR plan is not where it needs to be, it may require investment in a multi-year technology transformation plan including resolving legacy systems and tech debt. It may also require modernizing your SDLC to shift to CI/CD including dev, test, staging and prod environments that are tightly controlled. The ultimate goal will be to move to immutable infrastructure and IT operations best practices that allow your services to operate and recover without disruption. I’ve captured my thoughts on some of the best practices here.

Third, resist the temptation to over react. The C-Suite and investors are going to ask some hard questions about your business and they will suggest a wide range of solutions such as turning off auto-patches, ripping out CS or even building your own solution. All of these suggestions have a clear tradeoff in terms of risk and operational investment. Making a poor, reactive, decision immediately after this event can harm the business more than it can help.

Finally, for mission critical services consider shifting to a heterogeneous environment that statistically minimizes the impact of any one vendor. The concept is simple, if you need an security technology to protect your systems consider purchasing multiple vendors that all have similar capabilities, but will minimize the impact of your business operations if one of them has an issue. This obviously raises the complexity and operational cost of your environment and should only be used for mission critical or highly sensitive services that need to absolutely minimize any risk to operations. However, this event does highlight the risks of consolidating to a single vendor and you should conduct a risk analysis to determine the best course of action for your business and supply chain.

Wrapping Up

For some companies this was a non-event. Once they realized there was an outage they simply executed their recovery plans and were back online relatively quickly. For other companies, this event highlighted lack of investment in IT operations fundamentals like BCP / DR or supply chain risk management. On the positive side, this wasn’t a ransomware or other cybersecurity attack and so recovery is relatively straightforward for most businesses. On the negative side, this event can have negative consequences if businesses over react and make poor decisions. As a CISO, I highly recommend you take advantage of this event to learn from your weaknesses and make plans to shore up aspects of your operations that were sub-standard.

How Should CISOs Think About Risk?

There are a lot of different ways for CISOs to think about and measure risk, which can be bucketed into two different categories. Qualitative measurement, which is a subjective measurement that follows an objective process or quantitative measurement, which is an objective measurement grounded in dollar amounts. Quantitative risk measurement is what CISOs should strive to achieve for a few reasons. One, it grounds the risk measurement in objective numbers which removes people’s opinions from the calculation; two, it assesses risk in terms of dollar amounts, which is useful for communicating to the rest of the business; and three, it can highlight areas of immaturity across the business if they are unable to quantify how their division contributes to the overall bottom line of the company. In this post I want to explore how CISOs should think about quantitatively measuring risk and in particular, measuring mitigated, unmitigated and residual risk for the business.

Where should you start?

A good place to start is with an industry standard risk management framework like NIST 800-37, CIS RAM or ISO 31000 and for the purposes of this post I’ll stick with the NIST 800-37 to be consistent. In order for CISOs to obtain a qualitative risk assessment from the NIST 800-37 they need to add a step into the categorize step by working with finance and the business owners to understand the P&L of the system(s) they are categorizing. The first step is to go through every business system and get a dollar amount (in terms of revenue) for how much the systems(s) contribute to the overall bottom line of the business.

Internal and External Security Costs

After you get a revenue dollar amount for every set of systems, you now need to move to the assess stage of the NIST 800-37 RMF to determine which security controls are in place to protect the systems, how much they cost and ultimately what percentage the security controls cover. There are two categories of security controls and costs you will need to build a model for. The first category is internal costs, which includes:

Tooling and technology
Licenses
Training
Headcount (fully burdened cost)
Travel
R&D
Technology operating costs (like cloud costs directly attributable to security tooling, etc.)

The second category is external costs, which includes:

3rd party penetration tests
Audits
Managed Security Service Provider (MSSP) costs
Insurance

As you fill in the costs or annual budget for each of these items you can map the coverage of these internal and external costs to your business to determine the total cost of your security program and how much risk the program is able to cover (in terms of a percentage).

Mapping Risk Coverage

Once you have all of these figures you can start to map risk coverage to determine if your security program is effectively protecting the business. Let’s say your business generates $1B in annual revenue. Your goal as a CISO is to maintain a security program that provides $1B of risk coverage of the business. Or, if you are unable to provide total coverage, then you need to communicate which parts of the business are not protected so the rest of the C-Suite and board can either accept the risk or approve additional funding.

As a simple example, let’s say you spend $1M/year on a SIEM tool, which takes 6 people to operate and maintain. The total cost of the 6 people is approximately $6M / yr (including benefits, etc.). The SIEM and people provide 100% monitoring coverage for the business and the SIEM and people can be mapped to 20% of your security controls in NIST. I’m skipping a lot of details for simplicity, but for a $1B business this means your SIEM function costs $7M / yr, but protects $200M of revenue ($1B x 20%). As you map the other tools, processes, people, etc. back to the business you will get a complete picture of how much risk your security program is managing and make informed decisions about your program to the board.

For example, you may find your security program costs $100M / year, but is only able to manage risk for $750M (75%) of the business. Your analysis should clearly articulate whether this remaining 20% of risk is residual (will never go away and is acceptable) or is unmanaged and needs attention.

Complete The Picture

By mapping your security program costs to the percentage of controls they cover and then mapping those controls to the business, CISOs should be able to get an accurate picture of the effectiveness of their security program. By breaking out the security program costs into the internal and external categories I’ve listed above, they can also compare and contrast the costs to the total amount of risk to determine which investments yield the best value. These analyses can be extremely effective when having conversations with the rest of the C-Suite or board, who may be inclined to decline additional budget requests or subjectively recommend a solution. By informing these stakeholders of the cost per control and the risk value of that cost, you can help them support your recommendation for additional investment to help increase risk management coverage or to help increase the value of risk management provided by the security program.

The following chart is an example of what this analysis can yield.

Once you have this data and analysis you can start driving conversations with the rest of the C-Suite and the board to inform them of how much risk is being managed, how much is residual, how much risk is unmanaged and your recommendation for additional investment (or acceptance). These conversations can also benefit from further analysis such as the ratio of cost to managed risk to determine which investment is providing the best value and ultimately support your recommendation for how the company should manage this risk going forward (people or technology).

Wrapping Up

Managing P&L is a fundamental skill for all CISOs to master and can help drive conversations across the company for how risk is being managed. CISOs need to master skills in financial analysis and partner with other parts of the business like business operations or business owners to understand how the business operates and what percentage of the business is effectively covered by the existing security program. The results of this analysis will help CISOs shape the conversation around risk, investment and ultimately the strategic direction of the business.

Should CISOs Be Technical?

Don’t want to read this? Watch a video short of the topic here.

There are a lot of different paths to becoming a CISO and everyone’s journey is different, however two of the most common paths are coming up through the technical ranks or transitioning over from the compliance function. Coming up through the technical ranks is common because cybersecurity is a technically heavy field, particularly when attempting to understand the complexities of how exploits work and the best way to defend against attackers. Coming up through the compliance ranks is also common because companies are often focused on getting a particular compliance certification in order for them to conduct business and interact with the customers. Each of these paths offers advantages and disadvantages, but I will argue being technical is more challenging than some of the softer cybersecurity disciplines like compliance, which leads to a common question – do CISOs need to be technical?

Yes, but…

If you don’t want to read any further the short answer is yes, CISOs need to be technical. The longer answer is, being technical is a necessary, but insufficient characteristic of a well rounded CISO. The reason being technical is insufficient is because for the past few years the CISO role at public companies has been transforming from a technical role to a business savvy executive role. CISOs are expected to report to the board, which requires speaking the language of business, risk and finance. I have seen CISOs quickly lose their audience in board meetings when they start talking about tooling, vulnerabilities and detailed technical aspects of their security program. CISOs need to be able to translate their security program into the language of risk and they need to be savvy enough to weave in financial and business terminology that the board and other C-Suite executives will understand.

Obtain (and maintain) A Technical Grounding

Even though being technical is no longer sufficient for a well rounded CISO it is important for a CISO to obtain or maintain a technical grounding. A technical grounding will help the CISO translate technical concepts (like vulnerabilities and exploits) into higher level business language like strategy, risk or profit and loss (P&L). It is also important for a CISO to understand technical concepts so they can dig in when needed to make sure their program is on track or controls are operating effectively. Lastly, it is important to maintain technical credibility with other technical C-Suite stakeholders like the CTO and CIO. Speaking their language will help align these powerful C-Suite members with your security program, who can then lend critical support when making asks for the rest of the C-Suite or board.

What other skills does a CISO need?

In addition to a technical grounding, there are a number of skills CISOs need to master in order to be effective in their role. The following is a short list of skills CISOs need to have in order to be successful at a public company:

Executive presence and public speaking skills with the ability to translate security concepts into business risk that resonates with senior executives and the board
Ability to lead and communicate during a crisis
Politically savvy, with ability to partner with and build alliances with other parts of the business
Ability to understand the core parts of the business, how they operate and what their strategy is
Ability to explain the “value” of your security program in business and financial terms
Strong understanding of financial concepts such as CAPEX, OPEX, P&L, budgeting and ability to understand balance sheets, earning results and SEC filings
Understand and navigate legal concepts (such as privilege), regulations and compliance activities with the ability to map these concepts back to your security program or testify in court (if needed)
Ability to interact with auditors (when needed) to satisfy compliance asks or guide responses
Ability to interact with customers to either reassure them about the maturity of your security program or act as an extension of the sales team to help acquire new customers
Interact with law enforcement and other government agencies, depending on the nature of the business

If this seems like a long list that doesn’t fit your concept of what a CISO does, then you may have some weaknesses you need to work on. This list also reflects the evolving nature of the CISO role, particularly with respect to board interaction and leadership at public companies. More importantly, a lot of these concepts are not covered in popular security certifications and you definitely won’t get all of this experience from start ups or non-public companies. That is ok, because recognizing and acknowledging your weaknesses is the first step to becoming a better CISO.

Navigating Hardware Supply Chain Security

Lately, I’ve been thinking a lot about hardware supply chain security and how the risks and controls differ from software supply chain security. As a CSO, one of your responsibilities is to ensure your supply chain is secure, yet the distributed nature of our global supply chain makes this a challenging endeavor. In this post I’ll explore how a CSO should think about the risks of hardware supply chain security, how they should think about governing this problem and some techniques for implementing security assurance within your hardware supply chain.

What Is Hardware Supply Chain?

Hardware supply chain relates to the manufacturing, assembly, distribution and logistics of physical systems. This includes the physical components and the underlying software that comes together to make a functioning system. A real world example could be something as complex as an entire server or something as simple as a USB drive. Your company can be at the start of the supply chain by sourcing and producing raw materials like copper and silicon, at the middle of the supply chain producing individual components like microchips, or at the end of the supply chain assembling and integrating components into an end product for customers.

What Are The Risks?

There are a lot of risks when it comes to the security of hardware supply chains. Hardware typically has longer lead times and longer shelf life than software. This means compromises can be harder to detect (due to all the stops along the way) and can persist for a long time (e.g. decades in cases like industrial control systems). It can be extremely difficult or impossible to mitigate a compromise in hardware without replacing the entire system (or requiring downtime), which is costly to a business or deadly to a mission critical system.

The risk of physical or logical compromise can happen in two ways – interdiction and seeding. Both involve physically tampering with a hardware device, but occur at different points in the supply chain. Seeding occurs during the physical manufacture of components and involves someone inserting something malicious (like a backdoor) into a design or component. Insertion early in the process means the compromise can persist for a long period of time if it is not detected before final assembly.

Interdiction happens later in the supply chain when the finished product is being shipped from the manufacturer to the end customer. During interdiction the product is intercepted en route, opened, altered and then sent to the end customer in an altered or compromised state. The hope is the recipient won’t detect the slight shipping delay or the compromised product, which will allow anything from GPS location data to full remote access.

Governance

CSOs should take a comprehensive approach to manage the risks associated with hardware supply chain security that includes policies, processes, contractual language and technology.

Policies

CSOs should establish and maintain policies specifying the security requirements at every step of the hardware supply chain. This starts at the requirements gathering phase and includes design, sourcing, manufacturing, assembly and shipping. These policies should align to the objectives and risks of the overall business with careful consideration for how to control risk at each step. An example policy could be your business requires independent validation and verification of your hardware design specification to make sure it doesn’t include malicious components or logic. Or, another example policy can require all personnel who physically manufacture components in your supply chain receive periodic background checks.

Processes

Designing and implementing secure processes can help manage the risks in your supply chain and CSOs should be involved in the design and review these processes. Processes can help detect compromises in your supply chain and can create or reduce friction where needed (depending on risk). For example, if your company is involved in national security programs you may establish processes that perform verification and validation of components prior to assembly. You also may want to establish robust processes and security controls related to intellectual property (IP) and research and development (R&D). Controlling access to and dissemination of IP and R&D can make it more difficult to seed or interdict hardware components later on.

Contractual Language

An avenue CSOs should regularly review with their legal department are the contractual clauses used by your company for the companies and suppliers in your supply chain. Contractual language can extend your security requirements to these third parties and even allow your security team to audit and review their manufacturing processes to make sure they are secure.

Technology

The last piece of governance CSOs should invest in is technology. These are the specific technology controls to ensure physical and logical security of the manufacturing and assembly facilities that your company operates. Technology can include badging systems, cameras, RFID tracking, GPS tracking, anti-tamper controls and even technology to help assess the security assurance of components and products. The technologies a CSO selects should complement and augment their entire security program in addition to normal security controls like physical security, network security, insider threat, RBAC, etc.

Detecting Compromises

One aspect of hardware supply chain that is arguably more challenging than software supply chain is detection of compromise. With the proliferation of open source software and technologies like sandboxing, it is possible to review and understand how a software program behaves. Yet, it is much more difficult to do this at the hardware layer. There are some techniques that I have discovered while thinking about and researching this problem and they all relate back to how to detect if a hardware component has been compromised or is not performing as expected.

Basic Techniques

Some of the more simple techniques for detecting if hardware has been modified is via imaging. After the design and prototype is complete you can image the finished product and then compare all products produced against this image. This can tell you if the product has had any unauthorized components added or removed, but it won’t tell you if the internal logic has been compromised.

Another technique for detecting compromised components is similar to unit testing in software and is known as functional verification. In functional verification, individual components have their logic and sub-logic tested against known inputs and outputs to verify they are functioning properly. This may be impractical to do with every component if they are manufactured at scale so statistical sampling may be needed to probabilistically ensure all of the components in a batch are good. The assumption here is if all of your components pass functional verification or statistic sampling then the overall system has the appropriate level of integrity.

To detect interdiction or logistics compromises companies can implement logistics tracking such as unique serial numbers (down to the component level), tamper evident seals, anti-tamper technology that renders the system inoperable if tampered with or makes it difficult to tamper with something without destroying it and even shipping thresholds to detect shipping delay abnormalities.

Advanced Techniques

More advanced detection techniques for detecting compromise can include destructive testing. Similar to statistical sampling, destructive testing involves physically breaking apart a component to make sure nothing malicious has been inserted. Destructive testing makes sure the component was physically manufactured and assembled properly.

In addition to destructive testing, companies can create hardware signatures that include expected patterns of behavior for how a system should physically behave. This is a more advanced method of functional testing where multiple components or even finished products are analyzed together for known patterns of behavior to make sure they are functioning as designed and not compromised. Some hardware components that can assist with this validation are technologies like Trusted Platform Modules (TPM).

Continuing with functional operation, a more advanced method of security assurance for hardware components is function masking and isolation. Function masking attempts to mask a function so it is more difficult to reverse engineer the component. Isolation limits how components can behave with other components and usually has to be done at the design level, which effectively begins to sandbox components at the hardware level. Isolation could rely on TPM to limit functionality of components until the integrity of the system can be verified, or it could just limit functionality of one component with another.

Lastly, one of the most advanced techniques for detecting compromise is called 2nd order analysis and validation. 2nd order analysis looks at the byproduct of the component when it is operating by looking at things like power consumption, thermal signatures, electromagnetic emissions, acoustic properties and photonic (light) emissions. These 2nd order emissions can be analyzed to see if they are within expected limits and if not it could indicate the component is compromised.

Wrapping Up

Hardware supply chain security is a complex space given the distributed nature of hardware supply chains and the variety of attack vectors spanning physical and logical realms. A comprehensive security program needs to weigh the risks of supply chain compromise against the risks and objectives of the business. For companies that operate in highly secure environments, investing in advanced techniques ranging from individual component testing to logistics security is absolutely critical and can help ensure your security program is effectively managing the risks to your supply chain.

References:

Guarding Against Supply Chain Attacks Part 2 (Microsoft)

Long-Term Strategy for DoD Trusted Foundry Needs (ITEA)

What’s Better – Complete Coverage With Multiple Tools Or Partial Coverage With One Tool?

The debate between complete coverage with multiple tools versus imperfect coverage with one tool regularly pops up in discussions between security professionals. What we are really talking about is attempting to choose between maximum functionality and simplicity. Having pursued both extremes over the course of my security career I offer this post to share my perspective on how CISOs can think about navigating this classic tradeoff.

In Support Of All The Things

Let’s start with why you may want to pursue complete coverage by using multiple technologies and tools.

Heavily Regulated And High Risk Industries

First, heavily regulated and high risk businesses may be required to demonstrate complete coverage of security requirements. These are industries like the financial sector or government and defense. (I would normally say healthcare here, but despite regulations like HIPAA the entire industry has lobbied against stronger security regulations and this has proven disastrous via major incidents like the Change Healthcare Ransomware Attack). The intent behind any regulation is to establish a minimum set of required security controls businesses need to meet in order to operate in that sector. It may not be possible to meet all of these regulatory requirements with a single technology and therefore, CISOs may need to evaluate and select multiple technologies to meet the requirements.

Defense In Depth

Another reason for selecting multiple tools is to provide defense in depth. The thought process is: multiple tools will provide overlap and small variances in how they meet various security controls. These minor differences can offer defenders an advantage because if one piece of technology is vulnerable to an exploit, another piece of technology may not be vulnerable. By layering these technologies throughout your organization you reduce the chances an attacker will be successful.

An example of this would be if your business is protected from the internet by a firewall made by Palo Alto. Behind this PA firewall is a DMZ and the DMZ is separated from your internal network by a firewall from Cisco. This layered defense will make it more difficult for attackers to get through the external firewall, DMZ, internal firewall and into the LAN. (See image below for a very simplistic visual)

Downside Of All The Things

All the things may sound great, but unless you are required to meet that level of security there can be a lot of downsides.

First, multiple technologies introduce complexity into an environment. This can make it more difficult to troubleshoot or detect issues (including security events). It can also make it more difficult to operationally support these technologies because they may have different interfaces, APIs, protocols, configurations, etc. It may not be possible to centrally manage these technologies, or it may require the introduction of an additional technology to manage everything.

Second, all of these technologies can increase the number of people required to support them. People time can really add up as a hidden cost and shouldn’t be thrown away lightly. People time starts the second you begin discussing the requirements for a new technology and can include the following:

Proof of Concepts (PoCs)
Tradeoff & Gap Analysis
Requests for Information (RFI)
Requests for Proposal (RFP)
Requests for Quotes (RFQ)
Contract Negotiation
Installation
Integration
Operation & Support

Finally, multiple technologies can cause performance impacts, increased costs and waste. Performance impacts can happen due to differences in technologies, complexity, configuration errors or over consumption of resources (such as agent sprawl). Waste can happen due to overlap and duplicated functionality because not all of the functionality may not get used despite the fact you are paying for it.

Advantages and Disadvantages Of A Single Tool

A single tool that covers the majority, but not all, of your requirements offers one advantage – simplicity. This may not sound like much, but after years of chasing perfection, technology simplicity can have benefits that may not be immediately obvious.

First, seeking out a single tool that meets the majority of requirements will force your security team to optimize their approach for the one that best manages risk while supporting the objectives of the business. Second, a single tool is easier to install, integrate, operate and support. There is also less demand on the rest of the business in terms of procurement, contract negotiation and vendor management. Lastly, a single tool requires less people to manage it and therefore you can run a smaller and more efficient organization.

The biggest disadvantage of a single tool is it doesn’t provide defense in depth. One other disadvantage is it won’t meet all of your security requirements and so the requirements that aren’t met should fall within the risk tolerance of the business or somehow get satisfied with other compensating controls.

A single tool that covers the majority, but not all, of your requirements offers one advantage – simplicity.

Wrapping Up

There are a lot of advantages to meeting all of your requirements with multiple tools, but these advantages come with a tradeoff in terms of complexity, operational overhead, duplicated functionality and increased personnel requirements. If you operate a security program in a highly regulated or highly secure environment you may not have a choice so it is important to be aware of these hidden costs. A single tool reduces complexity, operational overhead and personnel demands, but can leave additional risk unmet and fails to provide defense in depth. Generally, I favor simplicity where possible, but you should always balance the security controls against the risk tolerance and needs of the business.

If Data Is Our Most Valuable Asset, Why Aren’t We Treating It That Way?

There have been several high profile data breaches and ransomware attacks in the news lately and the common theme between all of them has been the disclosure (or threat of disclosure) of customer data. The after effects of a data breach or ransomware attack are far reaching and typically include loss of customer trust, refunds or credits to customer accounts, class action lawsuits, increased cyber insurance premiums, loss of cyber insurance coverage, increased regulatory oversight and fines. The total cost of these after effects far outweigh the cost of implementing proactive security controls like proper business continuity planning, disaster recovery (BCP/DR) and data governance, which begs the question – if data is our most valuable asset, why aren’t we treating it that way?

The Landscape Has Shifted

Over two decades ago, the rise of free consumer cloud services, like the ones provided by Google and Microsoft, ushered in the era of mass data collection in exchange for free services. Fast forward to today, the volume of data growth and the value of that data has skyrocketed as companies have shifted to become digital first or mine that data for advertising purposes and other business insights. The proliferation of AI has also ushered in a new data gold rush as companies strive to train their LLMs on bigger and bigger data sets. While the value of data has increased for companies, it has also become a lucrative attack vector for threat actors in the form of data breaches or ransomware attacks.

The biggest problem with business models that monetize data is: security controls and data governance haven’t kept pace with the value of the data. If your company has been around for more than a few years chances are you have a lot of data, but data governance and data security has been an afterthought. The biggest problem with bolting on security controls and data governance after the fact is it is hard to reign in pandoras box. This is also compounded by the fact that it is hard to put a quantitative value on data, and re-architecting data flows is seen as a sunk cost to the business. The rest of the business may find it difficult to understand the need to rearchitect their entire business IT operations since there isn’t an immediate and tangible business benefit.

Finally, increased global regulation is changing how data can be collected and governed. Data collection is shifting from requiring consumers to opt-out to requiring them to explicitly opt-in. This means consumers and users (an their associated data) will no longer be the presumptive product of these free services without their explicit consent. Typically, increased regulation also comes with specific requirements for data security, data governance and even data sovereignty. Companies that don’t have robust data security and data governance are already behind the curve.

False Sense Of Security

In addition to increased regulation and a shifting business landscape, the technology for protecting data really hasn’t changed in the past three decades. However, few companies implement effective security controls on their data (as we continue to see in data breach notifications and ransomware attacks). A common technology used to protect data is encryption at rest and encryption in transit (TLS), but these technologies are insufficient to protect data from anything except physical theft and network snooping (MITM). Both provide a false sense of security related to data protection.

Furthermore, common regulatory compliance audits don’t sufficiently specify protection of data throughout the data lifecycle beyond encryption at rest, encryption in transit and access controls. Passing these compliance audits can give a company a false sense of security that they are sufficiently protecting their data, when the opposite is true.

Just because you passed your compliance audit, doesn’t mean you are good to go from a data security and governance perspective.

Embrace Best Practices

Businesses can get ahead of this problem to make data breaches and ransomware attacks a non-event by implementing effective data security controls and data governance, including BCP/DR. Here are some of my recommendations for protecting your most valuable asset:

Stop Storing and Working On Plain Text Data

Sounds simple, but this will require significant changes to business processes and technology. The premise is the second data hits your control it should be encrypted and never, ever, unencrypted. This means data will be protected even if an attacker accesses the data store, but it also will mean the business will need to figure out how to modify their operations to work on encrypted data. Recent technologies such as homomorphic encryption have been introduced to solve these challenges, but even simpler activities like tokenizing the data can be an effective solution. Businesses can go one step further and create a unique cryptographic key for every “unique” customer. This would allow for simpler data governance, such as deletion of data.

Be Ruthless With Data Governance

Storage is cheap and it is easy to collect data. As a result companies are becoming digital data hoarders. However, to truly protect your business you need to ruthlessly govern your data. Data governance policies need to be established and technically implemented before any production data touches the business. These policies need to be reviewed regularly and data should be purged the second it is no longer needed. A comprehensive data inventory should be a fundamental part of your security and privacy program so you know where the data is, who owns it and where the data is in the data lifecycle.

The biggest problem with business models that monetize data is: security controls and data governance haven’t kept pace with the value of the data.

Ruthlessly governing data can have a number of benefits to the business. First, it will help control data storage costs. Second, it will minimize the impact of a data breach or ransomware attack to the explicit time period you have kept data. Lastly, it can protect the business from liability and lawsuits by demonstrating the data is properly protected, governed and/or deleted. (You can’t disclose what doesn’t exist).

Implement An Effective BCP/DR and BIA Program

Conducting a proper Business Impact Analysis (BIA) of your data should be table stakes for every business. Your BIA should include what data you have, where it is and most importantly, what would happen if this data wasn’t available? Building on top of the BIA should be a comprehensive BCP/DR plan that appropriately tiers and backs up data to support your uptime objectives. However, it seems like companies are still relying on untested BCP/DR plans or worse solely relying on single cloud regions for data availability.

Every BCP/DR plan should include a write once, read many (WORM) backup of critical data that is encrypted at the object or data layer. Create WORM backups to support your RTO and RPO and manage the backups according to your data governance plan. Having a WORM backup will prevent ransomware attacks from being able to encrypt the data and if there is a data breach it will be meaningless because the data is encrypted. BCP / DR plans should be regularly tested (up to full business failover) and security teams need to be involved in the creation of BCP/DR plans to make sure the data will have the confidentiality, integrity and availability when needed.

Don’t Rely On Regulatory Compliance Activities As Your Sole Benchmark

My last recommendation for any business is – just because you passed your compliance audit, doesn’t mean you are good to go from a data security and governance perspective. Compliance audits exist as standards for specific industries to establish a minimum bar for security. Compliance standards can be watered down due to industry feedback, lobbying or legal challenges and a well designed security program should be more comprehensive than any compliance audit. Furthermore, compliance audits are typically tailored to specific products and services, have specific scopes and limited time frames. If you design your security program to properly manage the risks to the business, including data security and data governance, you should have no issues passing a compliance audit that assesses these aspects.

Wrapping Up

Every business needs to have proper data security and data governance as part of a comprehensive security program. Data should never be stored in plain text and it should be ruthlessly governed so it is deleted the second it is no longer needed. BCP/DR plans should be regularly tested to simulate data loss, ransomware attacks or other impacts to data and, while compliance audits are necessary, they should not be the sole benchmark for how you measure the effectiveness of your security program. Proper data protection and governance will make ransomware and data breaches a thing of the past, but this will only happen if businesses stop treating data as a commodity and start treating it as their most valuable asset.