New research preview: digital data collection

This week, Jonathan and Alysa discuss a recent study conducted by Ketch on data collection and opt-out practices across 170 websites. The findings revealed a significant increase in web trackers over the past decade, with some sites hosting up to 250 tags. After testing opt-out mechanisms, it was found that 40% of trackers, particularly advertising-related ones, continued collecting data despite opt-out actions.
stream this episode on

Summary

Ketch ran a study across 170 websites—spanning the top 60 largest in the US plus small and midsize sites across industries—measuring both the volume of data collection and the effectiveness of opt-out mechanisms. On data volume: a decade ago, a typical page had roughly 10 trackers; today it’s 50 to 60, with one site in the study running 250 tags. After simulating browsing sessions and then triggering a Global Privacy Control opt-out, researchers found that 40 percent of tags that should have stopped continued firing. Half of all trackers observed were advertising-related, and half of those kept collecting data after opt-out. Alysa was not surprised by the findings. Website hygiene lacks clear ownership within most organizations, and compliance with opt-out signals requires sustained engineering attention across a continuously shifting ecosystem of vendors. Many companies believe their consent management platform is handling everything when in fact it is misconfigured, stale after a site refresh, or simply not wired to respect GPC in addition to site-specific opt-outs. A small number of companies manage compliance through server-side suppression after a tag fires—technically compliant, but almost nobody actually does this at scale. The practical takeaway: compliance here is technically difficult, and the gap between stated intent and actual tag behavior is wide. A notable secondary finding: cross-referencing advertising tech companies observed in the study against California’s data broker registry revealed that roughly a third of ad networks, exchanges, and demand-side platforms were also registered data brokers. This dual role—facilitating buy/sell transactions while collecting and onward-selling data generated by those transactions—drew comparisons to financial market self-dealing. Enforcement pressure is already arriving: Texas has conducted outreach sweeps prompting ad-tech companies to register. Even among registered data brokers, 20 percent continued firing post-opt-out—a lower rate than average, but concerning because data brokers onward-sell their data, meaning that 20 percent gap multiplies through the ecosystem.

Transript

**Jonathan:** Hey, Alisa. **Alysa:** Hi. **Jonathan:** Hey. So I've got something kinda hot off the presses, but I wanted to chat with you in a totally, like, prepared. We haven't talked about this before kinda way if that's okay with you. **Alysa:** Sure. Now I'm I'm very curious. **Jonathan:** Okay. So we so we've been talking a lot about why it's happening cases and opt outs and and, I mean, how maybe how will businesses are responding to opt outs and if they have the infrastructure to do it. So we ran a we ran a study at Ketch a hundred and seventy different websites, the, you know, top sixty, biggest websites in the country, but also just, you know, some small midsize websites across, different industries. And I wanna give you some of the headlines then just see see what you think. The first headline is the sheer amount of data collection on the web is kinda crazy. I'm kinda surprised by it, actually. So when we we did a study like this ten years ago at a different company, and to give you some, like, idea of scale, there were ten trackers, ten cookies placed per web per page, and now it's, like, fifty or sixty. Right. There was one website that had two hundred and fifty tags run. Anybody can do this. Right? You go to tag explorer, and you can just see the the things start to start to file. So okay. But that's that's the ecosystem that we're in. So what we did is we jumped on these websites. We clicked around. We left. We we did a universal opt out GPC. We came back, and then we saw how many of those tags continue to fire. And, basically, the headline number is that forty percent of them, the one that should have been turned off, kept firing. So half of the trackers that run were advertising, advertising related, and of those, half just continue to collect data. And I know there's some nuance there. Like, you don't have to stop the tags. Right? Like, if you're controlling it in some other way but people aren't controlling it in some other way. Like, it's if it's collecting data, it's right? **Alysa:** Well, yeah, I mean, I'm not I have to say I'm not super surprised by that statistic just because very as a practical matter, I think website hygiene is not something that that there's a whole lot of, like, accountability all in one person or in one team for a lot of companies. There was not a business reason for that. And so you have these legal reasons why somebody might now start to, to to be more intentional on their website. But it's it's a whack a mole because finding who within the business is moving forward with putting tags and different, engagements with third parties. That can be tricky and complicated. And then there's the challenge of if you are going to use a a CMP, a privacy vendor to to manage the choice, are you pressure testing whether it's configured and working accurately? And maybe you did, but maybe your website got refreshed, and a month later, it's not. And so that's I I'm I'm not super surprised. And then you have some companies who have configured it for opt out, but they have not yet configured it for global privacy control. Or they're using a geofence, and they're applying it only in certain states. And so I think you did the testing maybe in California. So if that's if that's the case, well, then I don't have an answer to that. But I think it's tough. I think website hygiene, I think education around just even digital advertising and how it works. So how can you address compliance if you don't really understand how how the advertising tags work? **Jonathan:** It's, I I was thinking through it a little. Like, why would why would I thought it was pretty bad. Forty percent was pretty bad. But, funny enough, I tested it with a couple of brands and I'm like, oh, I thought it would be way more than that, you know, which is kinda interesting. Like, okay. So it was a tough problem. And there was some other studies that said it it was way worse. So in in a small way, it got better. But I was thinking about, like, why why is it so hard? And I think you're you're a hundred percent right. It's it's a moving feast. The who's who's on your site and who else is piggybacking and who they're ushering in. It's like a constant thing. I'm with you that it's like, okay. Maybe we configured direct site opt outs, and we didn't do GPC. Okay. Kind of buy that. Oh, we did do a little I did do a little, you know, unscientific, but but just a little comparison to see. And some some brands weren't turning off, you know, site specific opt outs either. We didn't do that in an automated way. And then, yes, it was in California. I think you're right that some some folks think that the CMP is covering it for them. The consent manager's covering it for them, and they're not. So, yeah, all these reason I think just I I landed I landed on this part where it's just technically hard, I think, for brands. **Alysa:** Yeah. I think that's the driver we can come up with around the edges. Maybe some of those really are service providers. And if somebody opts out, then they limit the kind of processing they're doing to just a service provider activity. It's possible. But when you've got forty to sixty tags on your site, the the likelihood that that's all in that bucket seems maybe well. **Jonathan:** Yeah. I wondered about the service provider thing as well. But so remind me on this one, Lisa. It's in in in the with the purpose of cross contextual advertising, can you be a service provider? Like or what's I know it's it's kind of a tricky ground. Right. **Alysa:** So you cannot enable cross context behavioral advertising after somebody's opted out. But under the CCPA, certainly, the recipient, if you share personal information with a signal that that person has opted out and the recipient company is going to suppress and manage the opt out, that's okay. You can also maybe just for California, different on the recipient end for the California data. So you might receive it, but maybe there's a no value or you do something. You know, we've we've certainly got some options in the marketplace on how some companies, some big platforms are trying to address that. **Jonathan:** Yeah. Gotcha. Gotcha. Okay. And we and we saw that as well. It's like, hey. We're dealing with the server side. So you're gonna see the tax file, but we're dealing with it later on. But the but the percentage of folks actually doing that was tiny. Right. It's it's an engineering lift. It's easier to turn them off, and that's why we're pretty confident in the final. You need market demand to really drive the amount of attention needed, and that demand either comes because there's business reasons or there's legal reasons. And on the legal side, you see, like, New York put out that guidance that we've talked about before. They're only doing that because they're seeing broad practices in the marketplace that there's there are issues. And you think about all of the state attorney general enforcement so far and the top issue be really being opt outs, they are doing that because many companies' practice related to opt outs, you know, have room to improve. **Alysa:** Gotcha. Yeah. Yeah. All good. Well, I'm I'm glad I'm glad you agree because I was I was I'm thinking doing this something, but no. I wanna look at it every which way. The other one, final point, was we looked at who's what type of data systems are collecting data. Right? Ad networks and the chain exchanges, demand side platforms, and whatnot, measurement companies. What was interesting about that is we cross referenced those companies with the data broker register in California. And what we found is, like, without networks and exchanges and demand side platforms, a third of those were also data brokers. And it's funny. Right? Is that like, okay. Everybody knew that, JJ. Like, **Alysa:** what people knew that? I was a little alarmed by it. Maybe the simplified version, you're a data broker if you're you're selling data with around an individual that you don't have a direct relationship with. So you're both collecting it, and you are onward selling it, and there's no direct relationship. It's a it's a pretty broad strokes, you know, to to fall in the bucket. And then if you're only collecting synonymous information, that that is even though the the definition is so broad and maybe you're only collecting the synonymous or or some narrow, scope of data, that still can technically trigger you as a data broker, and there's consequences. If you are a data broker legally and you don't register, there's pretty severe consequences. So I think that has pushed a a lot of companies, particularly in the ad tech space, to go ahead and and register. At and we can debate whether they trigger it or not, but that seemed to be the smarter mood. **Jonathan:** Gotcha. So it's not I mean, I I thought about it in the context of, like, financial markets. I mean, it felt it felt like self dealing that you're on either side of this trade. You're buying you're you're facilitating a buy sell, but you're also kind of in the middle of it selling the data you pick up while you run that trade. And And I thought, I wonder if this would be legal if it was a financial system, like it was a New York Stock Exchange. You know? And and do those same rules apply that data, or is it just a very different thing in programmatic advertising? **Alysa:** So you're not alum, but you're like, yeah. Okay. That's how it works. You you you sign up for the data broker register because it looks like you might be one. Yeah. I mean, I look. I don't think anybody is really enthused to be on the data broker list, but it a lot I mean, at least a lot of the companies that I work with, they do wanna comply with these laws, and they are subject to it. They're gonna follow through. And we've already seen Texas, for example, do a bit of a sweep in reaching out to companies who are not listed in the ad space and prompting them, you need to go get listed. So I think it is an issue that companies are right to to be thoughtful about. **Jonathan:** Gotcha. And, look, they did they did do a little better, actually, at respecting opt outs. You know, where the average was, you know, was around forty percent, didn't respect the opt out opt outs. In the data broker community, it was twenty percent. So it's better, but re the reason to worry was that twenty percent multiplies through the ecosystem if they're on selling their event and pushing it around. **Alysa:** Yeah. Right. So there's room for improvement everywhere. **Jonathan:** Yeah. For sure. Okay. It's a journey. It's a journey. Good. Well, this this is a and I sort of like, here's where we're at. Lisa, thanks for letting me take you through that. I know it's it's super early findings for us. We're gonna wrap up this report and send it out. **Alysa:** Interesting. Well, I look forward to reading the report. Should be cool. Thanks, Lisa. Have a good one.

Subscribe
to the
Ketch Up newsletter

Trend watching, best practices, case studies, latest Privacy Huddles and more. Once a month, straight to your inbox:

Related episodes

view all episodes