For those of you that follow me on social media, you will know that the last week I’ve been publically busy dealing with the fallout from the discovery by Google of the flaws in the Intel CPU design that have led to many of us being extremely busy.
The press coverage in the main is sensationalist but in a way useful. It tells a story of a major company (Intel) who didn’t ever think about the fact separation in the kernel was a good thing.
More it speaks volumes where many believe a major company, Google could have been viewed potentially to have played commercial advantage to seek an embargo that ran from discovery in mid-2017 to the 9th of January. I’m not sure I agree entirely with that given the investment Google has in kit powered by Intel across it’s Cloud and Waymo estate, plus the Chromebook market, but I understand why that would appear to some to be an advantage.
The issue is the delay in the lead time that did not include timely bringing in the key players who would be able to generate mainstream fixes for business as usual compute needs. Now we can only speculate why that was that the likes of Red Hat and Canonical (who are still at the RC kernel phase for some platforms) were not brought in until November but it’s not a good look for the management of either Google or Intel.
In fact, it stinks. When you have a bug this massive that potentially affects the mainstream OS’s you talk to the vendors as soon as you have a clear idea. Not three months later. I’m going to tell you why and you won’t see it coming.
Just so that this is very clear, I am writing this as me, a technology journalist who writes for other outlets, a security guy who has invented and released stuff that protects big swathes of the worlds educational and commercial computing fabric. But I’m also writing this as a member of the Open Source community for two-plus decades and a former Red Hat security staffer in my own words without influence.
So the security industry I have chosen to work in since we wrote SmoothWall in August 2000 is made up of key players in software, hardware and services. The software side is the small part of the industry, the majority of key revenues are tin, companies buying major chunks of hardware in the firewall, DNS, load balancing, gateways, VPN hardware, specialist hardware for authentication and key encryption (more on that later) and lots more besides. Trade shows such as Infosec and RSA are kept afloat by the vendor stands packed full of racks of equipment.
All of this kit from hundreds of vendors, has a lifecycle. Now here is a statement of fact.
If you look at the demographic of many of these security vendors they survive by evolution of kit and tech refresh demands of customers. If you delve deeper you see that the majority of their effort goes into sales and marketing of this kit, closely followed by customer services and maybe return to base support in that order. The development environments are tiny. The number of developers on staff at most security hardware vendors is small by comparison to the number of folk involved in the commercial side of the business. It’s not unusual to see a security vendor where more people work in HR there than work in the brains of the software business.
The software that runs on these devices is predominantly Linux, sometimes BSD, but normally Linux and normally its a version of CentOS (although there are some Ubuntu rack based devices out there). CentOS is a derivative of Red Hat that I am closely associated with and that follows the Red Hat Enterprise Linux (RHEL) pathway. E.g they have a live 6.x and 7.x and EPEL release cycle. CentOS is not RHEL. CentOS and RHEL do not share a kernel. CentOS is used in hundreds of millions of deployments daily globally.
If CentOS did not exist the likes of Facebook and eBay, CERN and GoDaddy would have a problem, you don’t see those organisations ponying up to Red Hat to part with cash, like many others they shun RHEL to use CentOS which they see as “like enough” to stand up mission critical platforms. More importantly, they support themselves with intelligent capable engineering staff capable of standing up repositories and dealing with day to day proactive support.
For everyone else, there is Red Hat RHEL supported world-class Linux backed up by QE, backed up by amazing support staff and with a legacy history of being best in class. CentOS maintainers as a rule since a few years back work at Red Hat and we all respect each other hugely and count each other as friends. But let me repeat, CentOS is not RHEL, even if they do release the patches and RPMs that Red Hat release once Red Hat has put the QE and massive security patching that my former team get out the door.
Now we’ve got that straight lets work out why this is a bad thing for the device market, and potentially for the entire security market as we understand the longterm issues surrounding Meltdown.
We made the point already that many of the security vendors have small dev teams. Many security vendors making tin go bust, many get swallowed up by other vendors. The one common thing is that security kit in the field whether the vendor has gone bust or been acquired or is still trading is running a Linux derivative on an Intel chassis, some Xeon or Haswell chipset or any of the thirty-plus derivatives of each going back seven plus years.
Many (a lot) of these devices are still running platforms that started out in the development lab at the vendor as CentOS 4/5/6/7 development trees. For the later versions thats fine and dandy, kernel and microcode patches are available due to CentOS benefitting from the hard work Red Hat did to get the patches out for a multitude of architectures. Hat tip to my amazing friend and brother-in-arms Cliff Perry for having lost a lot of Christmas dealing with this so capably, and assisted by his team in Brno and the engineering team in Westford.
However, a lot of the devices are running versions 4 and 5 and have long since departed from being “standard builds”. And theres a reason for this and it’s not one you’ll notice straight away because it’s utterly non obvious.
Many of the Linux based tools out there that run older versions of CentOS 4 and 5 from big name vendors run older versions of Samba the CIFS tool we developed to allow Linux to sit in heterogenous Windows environments. The older 2.x version of Samba being licenced under GPL version 2. In Samba versions 3 onwards changes in the GPL licencing meant that the licencing and patents issue reared it’s head. I’m not going to go into details as most people reading this immediately get it and understand why it meant for major vendors building tin relying on Samba / Winbind / basic Active Directory authentication it meant potential loss of IP. You can read more on the differences between GPL v2 and v3 here so that I don’t need to go into detail.
If you are a vendor deploying a Linux based device using GPL code you are supposed to have on your website somewhere, or even ship with your device as many vendors do, a copy of the GPL and make applicable modified sources available. TP-Link, D-Link, Netgear and other vendors understand that their reliance entirely on the work of the Open Source community and Linux as a whole makes them understand their lineage and do just that.
There are many vendors in the security space who harness large amounts of Linux as the base OS and base development environment who do not. That gripe, that will have to wait for another day to moan about as it’s only a side effect it’s not the gift that keeps on giving that will keep security folk on their toes for the next 3-5 years.
No the bigger issue is that there are major vendors out there with devices empowering large chunks of the internet estate and cloud estate we rely on with deployed racked kit running CentOS versions 4 and 5. Either because they have dependencies on libraries and tools they’ve developed or their IP which is compiled against those kernels, or the need to run pre GPL ver 2 Samba variants and supported dependencies. The list of those vendors includes many of the household names in hardware that you see at many security trade shows.
Traditionally those vendors would prefer that customers did not treat their devices with the same duty of care as they would, for instance, a RHEL/CentOS server in production running NoSQL or as a web server. No they’d prefer you treated it as an appliance. The fact that both that production server and the appliance are racked 1U apart and connected to the same switch is not important to the vendor, they just want you to remember – they ship an appliance, not a server.
If you were to draw a chart and look at the updates applied to the server and to “the appliance” that affected actual computing security needs (we are not talking spurious non-mission critical updates with no dependency) to both boxes over time it would be illuminating. Illuminating because you’d see the server would receive regular kernel updates, regular updates to OpenSSH, OpenSSL etc. If you looked at the appliance over the same period in time the number of updates that were applied would be a lot smaller, appliance vendors traditionally being a lot weaker. Also, they have staff who although supportive of kit still under lifecycle support are tasked with writing next generation lifeblood for new kit under development.
So we have appliances and we have servers. And for some reason, we are supposed to treat them differently. Both have privileged users, both are based on Linux, but we’re supposed to treat one with care as a server as it’s performing a task that is mission critical and we’re supposed to treat the other as a bit of tin (that’s also performing a task that is mission critical).
Here is the problem. A huge chunk of our security estate is built out on non supported non patchable variants of CentOS and other Linux variants. Those devices are authenticated on our networks, many have small to medium amounts of storage on board, many of them you can get a shell on. Many of these devices are end of life and still in use in many organisations who haven’t removed them at tech refresh because they still work and are the glue they require, and if it ain’t broke why fix it.
All of them run Intel hardware.
Eighteen months ago when I was still at Red Hat running security strategy I built a plan to go and see the vendors to get them to stop using non supported CentOS and to use Red Hat Enterprise Linux as their base because it gave them seven to ten years support for shipping binaries so situations like this couldn’t happen. I left before I got a chance to do it. We had identified the gap and the scale of the issue and it was enormous. I’m sorry I never got the chance to but other opportunities came up and after seven years at the helm I had a chance to exit and took it.
Meltdown and Spectre. I’m not overly interested in the patching of workstations and servers, for me as a security guy I’m interested in the glue that holds this all together. The fact is that major chunks of the internet estate are now glued together using non patchable kit.
For those of us in security monitoring it’s manna from heaven, corporates of all sizes are affected massively and now have to deal with it as their vendors will not be able to release timely patches to secure architecture. There is literally no alternative if you’re made aware that your estate is at risk you need monitoring at an enhanced level. If you choose not to take it there’s the risk that when you are owned, fined and censured that there would be literally nowhere to hide from a culpability perspective and business owners will not want that reality.
Bearing in mind this includes key material appliances for above and below classified spaces in air gapped and non air gapped appliances, appliances on trading room floors and banking environments then you start to see the issue amplify. A high percentage of this kit will never receive a patch for this problem.
What does the industry need to do ? The reality is it knows exactly what it needs to do and that is to be better community bedfellows, partake and contribute back and also be better open source citizens and think about how you develop, release and support.
This Intel issue, however badly managed it was, may just be a klaxon call. Lets hope so.