Following SnowFlake, Cloud Providers Need To Shift To Secure By Default

In May 2024, SnowFlake experienced a data breach as a result of exposed credentials that allowed a threat actor to access customer accounts that weren’t secured with MFA. The fallout from this data breach ultimately impacted large SnowFlake customers like Ticketmaster, AutoZone, Santander Bank and AT&T. Following the announcement of the breach, SnowFlake implemented refined security measures to avoid similar incidents in the future. However, the question remains why aren’t publicly accessible cloud companies secure by default?

A Pervasive Stigma Against Security

Before we can answer the question about why companies aren’t secure by default, we need to look at the underlying psychology and motivation for companies and in particular the arguments that are made against implementing security.

Startup Mentality

One of the most pervasive (and quite frankly horrible) arguments against building in security by default is the “move fast and break things” mentality that is pervasive at startups. Startup life is a tough one and a good metaphor is you are building your parachute as you are falling. Either you succeed and live, or you burn in and cease to exist. The problem with startup mentality is when you succeed and live, most startups fail to to shift from survival mode to maturity mode as the company grows and matures.

In maturity mode, companies need to resolve all of the debt they incurred just to survive. This can be operational debt, technical debt or security debt. Unfortunately, if the survival mentality persists, this debt continues to accrue and can kill the company because the cost to continue to operate exceeds the incoming revenue.

Security Is Bad For Productivity

Another argument that frequently pops up against implementing security is the perception that security is bad for productivity. I find this argument particularly ironic since employees seem willing to tolerate bad processes, bad experiences and other examples of bad friction, yet they complain the loudest about new security controls (like being required to change their password periodically). My own opinion about this perception is employees are largely indifferent to security (or in general they think it is a good thing). However, security often results in very visible changes to processes and ways of working and it is the change that employees don’t like. They associate security with change and since change is bad, security is bad.

This is similar to the argument that security increases friction and the assumption that all friction is bad. While this assumption is not only false, it also leads to the thought process that any friction in the customer experience will lead to lost customers and sales. The reality is some friction is good and acts as safeguards to steer people towards a desired (secure) outcome.

Security As An Upsell

One last reason for failing to implement security by default is when companies choose to profit from security as an upsell (I’m looking at you Microsoft). By charging extra for the most useful or best features these companies are implicitly and explicitly placing a cost on adding in security, which is perpetuating the stigma that security is bad.

The reality is some friction is good and acts as safeguards to steer people towards the desired (secure) outcome.

Changing Perception

Leading research for high performing cultures indicates teams that are able to effectively prioritize and execute on all of their demands are the highest performing teams. In particular, teams that were able to incorporate security into their processes actually went faster and performed better, than teams who struggled with or ignored security altogether. If you want to read more on this you can check out Accelerate by Nicole Forsgren, PhD.

One other thing we can do to change this negative perception of security is to stop allowing members of the security function introduce bad friction. We have all experienced bad friction in the form of time wasters, security theater and the dreaded “no”. This behavior doesn’t help the mission of security and perpetuates the stigma against our profession.

Default Opted-In

Assuming companies can overcome the startup mentality, successfully incorporate security into their development processes and overcome the stigma of security as being bad, what should they be doing to make their products and services secure by default?

The first thing companies can do is discard the notion that increased security will inhibit sales or drive customers away. Instead, companies should use security as a selling point and configure their services to be secure by default, which means customers will need to go through some sort of initial security setup when they purchase the product or service. Customers that don’t want to do this will need to explicitly opt-out or seek alternate providers, firmly placing the liability for not meeting security best practices on their shoulders.

Enforce Security Best Practices

What security functionality should companies offer by default to their customers? Here is a short list:

Multi-Factor Authentication – including the option for OTP, secure tokens and passkeys.

Encryption – all data and transport protocols should be encrypted by default with the latest versions available.

Access Control and Detection – default deny for access to resources and make customers explicitly allow access. This includes making resources non-public by default until a customer specifies otherwise. Detect changes in the state of resources and notify customer contacts of abnormalities.

Easy Button For Fundamentals – make it easy for customers to pull a comprehensive asset inventory, control their instance or tenancy with a master account and offer simple reports for ways they can improve their security posture.

Wrapping Up

There a lots of reasons why security becomes an afterthought for companies. Often, it is because they fail to shift from survival mode to maturity mode. Other times, their culture persists the notion that security has a bad stigma and inhibits the business. Some companies even upsell customers on security functionality, which limits the adoption of security controls. The reality is companies that practice secure by design and incorporate security into development cultures move faster and outpace their competition. Companies that offer publicly available software and services need to shift their mentality to make security a default setting that is turned on at the onset of the relationship, like any other core product feature. Until companies start making security default opt-in, we will continue to experience massive data breaches like the one from SnowFlake.

Navigating The CISO Job Market

I had an interesting conversation with a friend over coffee last week and we were discussing how weird the CISO job market is right now. Even though the unemployment rates are favorable, the tech sector has actually seen slightly negative employment growth rates, which is not normal. This is largely due to a hangover effect from record hiring during COVID, but there are also other issues in the market right now that is making it challenging. The following is a review of all the things I am seeing in the tech job market right now, particularly with respect to hiring for CISO positions.

Macro Tech Environment

Let’s take a step back and look at the overall economy to understand some of the higher level factors influencing the CISO job market. First, let’s look at one end of the tech market starting with large companies. Over hiring and high compensation packages from COVID have made existing employees stay in place and so natural turnover at public companies is below average. In addition to this, fears of a recession and high interest rates have made large companies cautious about hiring new employees. When the cost to borrow money is higher, it slows growth and ultimately impacts hiring. As a result, companies are trying to get back to growth through layoffs and attrition. They are trying to artificially increase attrition by withholding bonuses, pay raises and promotions, or requiring new job requirements like return to office 4 or 5 days a week.

Second, at the other end of the market, higher interest rates impact Venture Capital (VC) and Private Equity (PE), which ultimately impacts funding for startups and subsequent job creation. With the smaller end of the market being squeezed (VC / PE) and the larger end of the market also being squeezed there aren’t a lot of options for candidates to go. Compound this with record tech layoffs over the past year and an influx of new college grads to the job market and you create a highly competitive market.

Too Much Noise

The highly competitive job market is making job candidates seeking employment and existing CISOs seeking career growth (or a change) compete with each other. The competition is causing candidates to get desperate and apply to any job that sounds sounds remotely interesting, regardless of whether or not they are qualified for the role. This is also compounded by unrealistic career expectations from past promotions, boot camps and college campuses that make people think they can qualify for the top spots, despite lacking meaningful experience. Add in how easy LinkedIn and other jobs sites have made it to apply for jobs and the net effect is to create tons of noise for recruiters and drown out qualified candidates.

I spoke to a recruiter a few weeks ago who had a job posting up for 24 hours and received thousands of applicants, of which only a handful were qualified and advanced to the interview process. Due to the volume of unqualified applicants, recruiters are only pushing through the first handful of qualified candidates and are passing on the rest of the backlog. Of all these applicants the only candidates who are getting to the first round interview phase are direct referrals.

In addition to too much applicant noise, recruiters are also finding a high number of candidates that are mis-representing themselves. Recruiters and hiring managers aren’t stupid. They can read between the lines of your career history and discern what you were really doing. If you claim to be a CISO, yet have never held more than a manager level job, then you are mis-representing yourself. The reality is, recruiters want to get paid on placing the top candidates. They are unwilling to put someone forward for a top spot that can’t back up their resume. Top candidates can not only defend their experience, but have lots of direct and indirect network connections that can vouch for them as referrals, if needed. The CISO community is a small one and people know who is the real deal and who is faking it. The sad reality is, people who mis-represent themselves are only hurting themselves by artificially placing themselves in a higher, more competitive tier than they are qualified for and as a result will never land that top spot.

Companies Are Being More Strict

High interest rates, tight budgets and a noisy applicant process mean companies are being more strict with their job requirements. More top CISO positions are requiring candidates to be on site at the corporate headquarters location at least 4 days a week. Companies are also searching globally, but hiring locally by giving preference to local candidates they don’t have to relocate and also preference to internal candidates that cost less than a retained search. CISO salaries have also slowed or stagnated with only the top spots paying top salaries. The rest are paying mid-range or low balling candidates in an attempt to get a qualified applicant at a lower price. On top of this, companies are also being more strict with degree requirements (usually a Masters for CISOs), years of experience and certifications. They are also filtering out candidates with lots of job hopping and short career stints because even though you may have carried the CISO title, it is highly unlikely you accomplished anything meaningful if you were there for less than 18 months.

The only candidates who are getting to the first round interview phase are direct referrals.

Be Cautions

Lastly, there are a few other issues that are disrupting the job market. The first is fake job postings. There are more and more reports of fake job postings that entice applicants, but are really out to steal their personal information. Be cautious and use your network to validate the postings if you are interested in applying for a CISO role (this comes back to direct referrals also).

Second, companies are leaving zombie positions out there to give the impression they have open roles, when they really don’t. They are doing this for a few reasons – they want the market and their employees to think they are hiring and growing even when budgets are tight and companies are trying to cut headcount. If you see a job posting out there for more than a few days, it is highly likely it is a zombie posting.

The last issue I want to highlight is how job sites mis-represent numbers to entice companies to spend money with them, while hurting applicants. I’m specifically referring to how LinkedIn and other job sites show metrics on “number of applicants” for job postings, when in reality these are only the number of people that have viewed the posting, not applied. I mention this because I have seen a number of posts from people who have expressed interest in a role, but have been discouraged by the “number of applicants” and as a result didn’t apply.

Maximizing Your Opportunity

Now that you understand what is going on with the job market, let’s discuss what you can do to maximize the likelihood you will land that interview and get the job.

  1. Invest in yourself – take this time to get certifications, degrees, etc. that make you competitive and demonstrate constant learning and knowledge. Invest in yourself while looking for a new role.
  2. Invest in your network – do a deep dive on your network. LinkedIn makes it easy to download your list of connections and sort them my company, degree of connection, etc. Use this analysis to understand where you have connections and where you don’t. Look for people that can connect you to individuals that hire for positions you want at your targeted companies. Find ways to meet with these people. Do the same for recruiters. Build these connections before you need them because it is always better to be a live person than a random InMail on LinkedIn.
  3. Update your resume and LinkedIn – Seriously, if you don’t know how then ask someone or pay someone. First impressions matter.
  4. Practice interview questions – Write down key accomplishments and the details for how you achieved them. Think of your weaknesses and how you turn those into strengths. Ask your network for recent interview questions and develop answers. Preparation matters and will pay off during the interview process.
  5. Stop blasting your resume into the ether – If you see a role you want to apply for, poll your network to see if you know anyone at the company or if your network knows someone at the company. Get your resume directly into the hands of the recruiter or hiring manager. Direct referrals are the only reliable way to get an interview.
  6. Get focused – Have you been attending a lot of networking events lately in the hope of meeting someone who is hiring? Consider the value of all the “networking” activities you are doing. As a single person you can’t scale to attend every event that is out there so you need to be targeted. Consider the audience of who is attending and consider the value of the event. If you are attending events that are also attended by all of your competition then you probably aren’t going to land your next job there. Instead, consider all the events and networking groups in your area, which one’s have the most likelihood of putting you in front of people that hire for your role and focus on maximizing the potential of those events.
  7. Stop directly asking people for jobs – there is no faster way to end a conversation or relationship than asking someone for a job they don’t have. Instead, if you have the opportunity to make an ask of someone, ask them to connect you with someone they know may be looking for someone with your background. Take the pressure off of them, keep the connection alive and expand your network at the same time.
  8. Consider staying put – the tech sector seems to lag what the overall economy is doing by a few years. If the tech sector is contracting it will eventually expand and get back positive employment rates. This can also give you time to build your credentials, while looking for the ideal next step.

Should Companies Be Held Liable For Software Flaws?

Following the CrowdStrike event two weeks ago, there has been an interesting exchange between Delta Airlines and CrowdStrike. In particular, Delta has threatened to sue CrowdStrike to pursue compensation for the estimated $500M of losses allegedly incurred during the outage. CrowdStrike has recently hit back at Delta claiming the airline’s recovery efforts took far longer than their peers and other companies impacted by the outage. This entire exchange prompts some interesting questions about whether a technology company should be held liable for flaws in their software and where the liability should start and end.

Strategic Technology Trends

Software quality, including defects that lead to vulnerabilities, has been identified as a strategic imperative according to CISA and the Whitehouse in the 2023 National Cybersecurity Strategy. Specifically, the United States wants to “shift liability for software products and services to promote secure development practices” and it would seem the CrowdStrike event falls into this category of liability and secure software development practices.

In addition to strategic directives, I am also seeing companies prioritize speed to market over quality (and even security). In some respects it makes sense to prioritize speed, particularly when pushing updates for new detections. However, there is clearly a conflict in priorities when a company optimizes for speed over quality for a critical detection update that causes an impact larger than if the detection update had not been pushed at all. Modern cloud infrastructure and software development practices prioritize speed to market over all else. Hyperscale cloud providers have made a giant easy button that allows developers to consume storage, network and compute resources without consideration for the down stream consequences. Attempts by the rest of the business to introduce friction, gates or restrictions on these development processes are met with derision and usually follow accusations of slowing down the business or impeding sales. Security often falls in this category of “bad friction” because they are seen as the “department of no”, but as the CrowdStrike event clearly shows, there needs to be a balance between speed and quality in order to effectively manage risk to the business.

One last trend is the reliance on “the cloud” as the only BCP / DR plan. While cloud companies certainly market themselves as globally available services, they are not without their own issues. Cloud environments still need to follow IT operations best practices by completing a business impact analysis and implementing a BCP / DR plan. At the very least, cloud environments should have a rollback option in order to revert to the last known good state.

…as the CrowdStrike event clearly shows, there needs to be a balance between speed and quality in order to effectively manage risk to the business.

What Can Companies Do Differently?

Companies that push software updates, new services or new products to their customers need to adopt best practices for quality control and quality assurance. This means rigorously testing your products before they hit production to make sure they are as free of defects as possible. CrowdStrike clearly failed to properly test their update due to a claimed flaw in their testing platform. While it is nice to know why the defect made it into production, CrowdStrike still has a responsibility to make sure their products are free from defects and should have had additional testing and observability in place.

Second, for critical updates (like detections), there is an imperative by companies to push the update globally as quickly as possible. Instead, companies like CrowdStrike should prioritize customers in terms of industry risk. They should then create a phased rollout plan that stages their updates with a ramping schedule. By starting small, monitoring changes and then ramping up the rollout, CrowdStrike could have minimized the impact to a handful of customers and avoided a global event.

Lastly, companies need to implement better monitoring and BCP / DR for their business. In the case of CrowdStrike, they should have had monitoring in place that immediately detected their products going offline and they should have had the ability to roll back or revert to the last known good state. Going a step further they could even change the behavior of their software where instead of causing a kernel panic that crashes the system, the OS recovers gracefully and automatically rolls back to the last known good state. However, the reality is sophisticated logic like this costs money to develop and it is difficult for development teams to justify this investment unless the company has felt a financial penalty for their failures.

Cloud environments still need to follow IT operations best practices by completing a business impact analysis and implementing a BCP / DR plan.

Contracts & Liability

Speaking of financial penalties, the big question is whether or not CrowdStrike can be held liable for the global outage. My guess is this will depend on what it says in their contracts. Most contracts have a clause that limits liability for both sides and so CrowdStrike could certainly face damages within those limits (probably only a few million at most). It is more likely CrowdStrike will face losses for new customers and existing customers that are up for contract renewal. Some customers will terminate their contracts. Others will negotiate better terms or expect larger discounts on renewal to make up for the outage. At most this will hit CrowdStrike for the next 3 to 5 years (depending on contract length) and then the pricing and terms will bounce back. It will be difficult for customers to exit CrowdStrike en masse because it is already a sunk cost and companies wont want to spend the time or energy to deploy a new technology. Some of the largest customers may have the best terms and ability to extract concessions from CrowdStrike, but overall I don’t think this will impact them for very long and I don’t think they will be held legally liable in any material sense.

Delta Lags Industry Standard

If CrowdStrike isn’t going to be held legally liable, what happens to Delta and their claimed lost $500M? Let’s look at some facts. First, as CrowdStrike has rightfully pointed out, Delta lagged the world for recovering from this event. They took about 20 times longer to get back to normal operations than other airlines and large companies. This points to clear underinvestment in identifying critical points of failure (their crew scheduling application) and developing sufficient plans to backup and recover if critical parts of their operation failed.

Second, Delta clearly hasn’t designed their operations for ease of management or resiliency. They have also failed to perform an adequate Business Impact Analysis (BIA) or properly test their BCP / DR plans. I don’t know any specifics about their underlying IT operations, but a few recommendations come to mind such as implementing active / active instances for critical services and moving to thin clients or PXE boot for airport kiosks and terminals. Remove the need for a human to touch any of these systems physically, and instead implement processes to remotely identify, manage and recover these systems from a variety of different failure scenarios. Clearly Delta has a big gap in their IT Operations processes and their customers suffered as a result.

Wrapping Up

What the CrowdStrike event highlights is the need for companies to prioritize quality, resiliency and stability over speed to market. The National Cybersecurity Strategy has identified software defects as a strategic imperative because they lead to vulnerabilities, supply chain compromise and global outages. Companies with the size and reach of CrowdStrike can no longer afford to prioritize speed over all else and instead need to shift to a more mature and higher quality SDLC. In addition, companies that use popular software need to consider diversifying their supply chain, implementing IT operations best practices (like SRE) and implementing a mature BCP and DR plan on par with industry standards.

What the CrowdStrike event highlights is the need for companies to prioritize quality, resiliency and stability over speed to market.

When it comes to holding companies liable for global outages, like the one two weeks ago, I think it will be difficult for this to play out in the courts without resorting to a legal tit-for-tat that no one wins. Instead, the market and customers need to weigh in and hold these companies accountable through share prices, contractual negotiation or even switching to a competitor. Given the complexity of modern software, I don’t think companies should be held liable for software flaws because it is impossible to eliminate all flaws. Additionally, modern SDLCs and CI/CD pipelines are exceptionally complex and this complexity can often result in failure. This is why BCP/DR and SRE is so important, so you can recover quickly if needed. Yes, CrowdStrike could have done better, but clearly Delta wasn’t even meeting industry standards. Instead of questioning whether companies should be held liable for software flaws, a better question is: At what point does a company become so essential that they by default become critical infrastructure?