Code scanning: what is it, and do you need it?

Colleen Barry sits down with Ketch Co-founder and Head of Product Maxwell Anderson to unpack the buzz around Code Scanning
stream this episode on

Summary

Code scanning has emerged as a buzzworthy approach to privacy compliance and data mapping, but a closer look reveals meaningful limitations. While the technique can be valuable for understanding internal back-end data processing, it falls short when applied to third-party SaaS tools — where organizations simply don't have access to the underlying code. The framing of "scanning code" sounds thorough, but the reality of how modern data collection works complicates that promise. The core issue is that most web data collection doesn't happen in readable source code at all — it happens through script tags and configurations housed within third-party vendor platforms. Once a script tag is placed on a page, a cascade of automated collection events follows that is invisible to code scanning. Browser emulation is offered as the more accurate alternative for capturing what is actually being collected on a website or in an app. For teams evaluating privacy software vendors, a practical due diligence test is proposed: ask the vendor to demonstrate exactly how a specific third-party tool — like Amplitude or Facebook — is tracked within their solution. That test quickly surfaces whether a scanning approach truly captures automated collection beyond what's written in the code, providing a concrete benchmark before any purchasing decision is made.

Transript

**Colleen:** Code scanning is a white hot topic right now. How much should people really be thinking about adopting code scanning based solutions? **Maxwell:** Yeah. Man, I mean, it's cool. If nothing else, we should have a moment of appreciation. It's a clever new approach to solving this problem. I'm personally a firm believer in using the right tool for the job. I don't think you can solve any one of these privacy problems with a single approach paradigm or tool. And similarly here, I think code scanning definitely has its place in the context of back end data mapping with a unique tilt to the internal data processing that happens within an organization, less so the third party SaaS stuff because you don't have access to that code more often than not. And, certainly, as it relates to the white hot issue of what is actually being collected on a website or in an app, code scanning, I don't think is the right solution. And the reason for that is the majority of the data collection that happens on a website isn't actually readable in the code. The majority of web data collection happens through script tags being placed on the website and configurations that exist in the context of the third party SaaS tool that provides the service associated with that script tag. That's not code on the page. That's configurations on a third party tool. And once that script tag gets put onto the page through the configurations, lots of data collection events happen automatically thereafter. If all you're looking at is the code that is on the page, you will miss that entire set of data collection. The better approach in our opinion to solving that problem is browser emulation. From a marketing standpoint, code scanning sounds cool. Like, it sounds very deep and comprehensive. **Colleen:** Well, that's the scandal. **Maxwell:** I don't know if it's scandalous. It's marketing. You can hate the player. You can hate the game. I think it's genius marketing, and it is an important and valuable way to go about this. But as comprehensive and thorough as it might sound to scan code, it's only when you understand and appreciate the nuances of how data collection happens on websites and in apps that you can actually realize and appreciate it is not the right approach to a comprehensive understanding of what's happening on a website. **Colleen:** So what would you ask the vendor, a privacy software vendor, to make sure that they understand this comprehensively? **Maxwell:** I would actually ask more or less how do you ensure that you're capturing all of the data that's being collected by my third party SaaS tools? That would be the first question. And, of course, the answer will be something to the effect of, well, we see it all because it's in the code. And I would just do a test. I would say, okay. Great. If I have a third party SaaS application that I use, like Amplitude, for example, or Facebook? Can you show me where in the code base all the data that gets collected by that particular tool exists? And there might be some examples where code is written to emit specific events, and that will happen, but there's quite a bit of additional data collection that happens automatically, by virtue of that script tag being on page that code scanning won't show. So there's some tests and ways that we can help our customers, and we've done this in the past, set up test harnesses and frameworks to do comparisons with. Happy to make those available to people if they're interested.

Subscribe
to the
Ketch Up newsletter

Trend watching, best practices, case studies, latest Privacy Huddles and more. Once a month, straight to your inbox:

Related episodes

view all episodes