Podcast: Understanding Interoperable Private Attribution (with Ben Savage)

My guest on this episode of the podcast is Ben Savage, who works at the intersection of advertising and privacy at Meta. Ben is Meta’s representative to the W3C in forums like the Private Advertising Technology Community Group and the Privacy Community Group. A transcript of our conversation is provided at the bottom of this post.
At Meta, Ben has been heavily involved with a framework called Interoperable Private Attribution, or IPA, which is a distributed attribution and aggregation protocol that exists as a joint proposal between Meta and Mozilla. IPA has gained a great deal of attention as a potential solution for privacy-safe advertising attribution, and it is the subject of this episode. Ben and I go deep on several topics related to IPA and digital advertising privacy more broadly, including:
- A high-level conceptual overview of Interoperable Private Attribution;
- The origin of IPA as a joint proposal from Mozilla and Meta;
- The importance of the ease of adoption by advertisers and publishers for any privacy-enhancing technology;
- The degree of buy-in required from consumers to advance privacy-safe advertising solutions;
- And the process of establishing standards within W3C working groups
For more information about Interoperable Private Attribution and the broader subject of Multi-Party Computation, I suggest this YouTube series developed by Ben that is designed to explain these concepts to non-technical audiences.
The Mobile Dev Memo podcast is available on:
Conversation Transcript (Machine Generated)
Eric Seufert: Ben Savage, it is very nice to have you here speaking with me today on the Mobile Dev Memo Podcast. How are you, sir?
Ben Savage: I’m good. Thanks for having me, Eric. It’s good to be here.
ES: Well, I appreciate you taking the time to speak to me on the podcast. We’ve interacted before, but I don’t believe we’ve ever met in person. Is that correct? And forgive me if I’m forgetting.
BS: That’s right. We’ve never met in person. I’ve had a phone call I think once before.
ES: That’s right. That’s right. With Graham back in the day. But this, I think the podcast, there’s a request for this on Threads. At one point someone had asked that we speak on this topic, and in any case, I think it’s a very interesting conversation for us to have, but I don’t think there’s ever been a case where the genesis of a podcast episode could be traced back to a request. So I think it potentially speaks to the success of the podcast or at least the importance of what you’re doing. So I will have just introduced you in the introductory segment of the podcast, but before we dive into the conversation, why don’t you introduce yourself to the audience in your own words.
BS: So I’m Ben Savage. I’m a software engineer. I’ve been working at Meta for over 10 years on a bunch of things, but all in the sort of the ad space. For the last four years or so, I’ve been interacting with the W three C, the Worldwide Web Consortium where I represent meta talking about subjects like ads and privacy. So we’re trying to figure out how to negotiate this transition to this more private web of the future where third party cookies are gone and all these types of tracking are blocked, and how are we going to do advertising in that world? So that’s the space that I play in.
ES: 10 years is a long time and you seem like a very youthful person. But what were you doing before that?
BS: I was running a startup in the Bay Area that I founded out of college, ran that for about five years, and then I sold it to Meta back then known as Facebook.
ES: I see. So long tenure in the ad space, deep connectivity to the advertising ecosystem. And you work in or on a project that is known as IPA. Maybe just I have some very pointed questions to ask you, but maybe just very briefly, talk about what IPA is and even just unpack that acronym and elucidate the audience as to what that aims to do?
BS: Sure. So maybe I’ll start with a little bit of context to help explain. So like I was saying a second ago in the W three C, I’m working together to figure out how we’re going to chart this course to a more private future. And specifically there’s this group called the Private Advertising Technology Community Group, which has stakeholders from a whole bunch of different parts of the digital ads ecosystem, from publishers, advertisers, user groups, but also the browsers, the web browsers. And we’re trying to figure out how we can have a future without any tracking and profiling, but where we don’t destroy the ads world that can still function. So people can still use free apps and websites. And the approach that we’re all pretty much aligned on is that we need to add new APIs, new functionality to web browsers and mobile operating systems that just baked in, that gives you the ability to do ads measurement in a new standard private way.
So what we’re disagreeing about and sort of try to hash out the details on is what are the exact details of what that API should look like? So the first proposal for how to do this was from Apple, and then Google had a counter proposal, and then the IPA or interoperable private attribution proposal is a co-product of here us at Meta and Mozilla sort of pose this counter counter proposal. And now more recently, Apple’s updated their sort of version two proposal, and that’s all great. This is exactly how standards bodies are supposed to work. You get a bunch of smart people together trying to solve a problem, bouncing ideas off of each other, and hopefully with each proposal we one step closer to consensus, which is important eventually, this is like a consensus driven organization. We have to all find some solution that we can all live with, we can all be happy together with.
So our proposal, IPA, like I said, for interoperable private attribution, and at a high level, the way it works is sort of like this, your web browser or your mobile phone makes up a random number and it never tells anybody. It just lives there on your laptop or on your phone and never leaves. It doesn’t even get shared back with the browser vendor. Then we just add one simple API to the web browser called get encrypted match key. And what that does is it takes this random number, it splits it up into three random pieces and then it encrypts pairs of those things and gives it back to you. And you can’t do much with these encrypted objects that come back. You can’t decrypt them, and they’re different every single time you call the API. So the only thing you can do is you can sort of save it together with other activity data that happened on your website, like what ad did you show somebody or how much did they spend in your online shop? And then you send all this information together to this multi-party computation or MPC, and it performs attribution sums things up and gives you the results. So hopefully you can still get aggregate advertising measurement about how effective your ads campaigns were, but without any tracking or profiling because you can’t use these identifiers for tracking.
ES: So you talked about IPA being a very specific, very novel form of essentially using encryption if I’m understanding correctly. And by no means I do, I have any expertise in this space, so please correct me if at any point I get something wrong, which I’m bound to do. So you’ve got this IPA is this very sort of novel form of taking the data and splitting it up such that multiple parties have different portions of it that can’t really be united easily to trace the full stack of data back to any individual event that could be tied to any individual user. But you invoke this idea of MPC, right? So secure multi-party computation. Maybe we can take kind of a quick step back and you could talk about what that is. So it feels like your novel solution, which is IPA, sits on top of this fundamental concept of MPC.
Maybe you could talk about MPC first. What is MPC? How does that solve these problems, these and others, right? And what is the general idea behind MPC? And then how does IPA sort of invoke that to solve this very specific ads attribution problem where you want to make sure that in aggregate, the whole idea here is that in aggregate there’s multiple parties that exist in this transactional of showing ads, having ads be clicked, having ads result in purchases, any type of conversion event. And then having all of that be sort of unified in such a way that you can say, look, this was a performance ad campaign such that we can’t actually draw lines direct lines to the people that participated in that, but we know that the campaign as its own sort of commercial entity was successful. It seems like this IPA sits on top of this MPC framework. So maybe quickly go into what MPC is and then talk about why IPA is the natural outgrowth of that for ads attribution.
BS: Sounds good. So MPC or secure multi-party B computation at a high level, it just makes it possible to do any kind of data processing that you want while keeping the input data a secret. So for example, if you have a group of 10 people and you want to use MPC to calculate their average salary while keeping every individual person’s salary a secret, that would be a great example of how you could use MPC. So there’s tons of potential applications in this technology all over the place, everywhere that you have data that you want to keep private or is siloed. So for example, one application is in the healthcare industry where you might want to do confidential DNA sequencing. So it’s probably really interesting, valuable research insights that we could gain if we could bring together health data and DNA data. But those are both super sensitive data sets and they can’t be shared.
But with MPC, you could potentially perform that data analysis and understand these correlations and trends without sharing any patient level information, keeping it all secret. Another application could be for building a public transport network. So if you want to build able a good public transportation network, you need some location data. You need to know specifically about the trips people are taking when they get on, when they get off and at what times. And so you could use MPC to glean all those insights and figure out where you need to add a new rail or bus line to relieve congestion, but not need to actually track everybody’s physical location all over the place. So of course, digital advertising is another great place for MPC to be applied. And this is the idea with IPA. So you have actions on one website where you’re seeing an ad and you have actions on a different website where you’re making a purchase, say. And NPC allows us to sort of join those things together and understand this causal relationship between how many of those purchases are happening because of ads you saw somewhere else without needing to share people’s web browsing activity and keeping all of the individual web browsing activity private.
ES: That’s great. And let me just read that back to you and you can tell me if I’m understanding it correctly and correct as needed. So MPC essentially means that the multi-party qualifier important, and that means that there’s multiple data sets, there’s multiple parties that have their own sort of first party data sets that independently aren’t that valuable, but the in combination are. And so they submit their data sets to some centralized party, I don’t dunno if you like the word centralized, but some other party that absorbs them, accepts them, ingests them, whatever, tries to match them, and then only leaks back only transmits back to all the independent submitters of data, the aggregates. So there’s one party that gets the opportunity to do the matching. They do it in some way that obfuscates the source, what’s shared back to all the participants, all the people that submitted data are high level aggregates that can’t really be disaggregated. Am I understanding that correctly?
BS: Yeah. Let me sort of clarify a little bit more. I didn’t explain much about what the whole multi-party part means. So the way that this works is you keep the input data a secret by what is called secret sharing it. So secret sharing is this really cool idea where you have a number and what you do is you split it up into a couple of randomly distributed pieces where if you add those things all together, you get back to the original. So simple example, let’s say we wanted to do a multi-party computation to compute the sum of our age. A group of say, 20 of us. The way that you would secret share your age would be, say you pick a random number between negative a billion and positive a billion. So I pick like negative 256 million, blah, blah, blah, blah, blah, some giant negative number.
And then that’s my first secret share. It’s just this random number. And in my second secret share, you just flip the sign and add 38, 38 years old. So obviously if you only have one of those two numbers, it’s just garbage. It’s just this gigantic random number that means nothing to you. But if you have both of those two pieces and you add them together, they sum to my age. So that’s a pretty cool way of splitting data up into these random pieces where if you only have one of these pieces, you learn nothing. So the really cool way that this works in multi-party computation is you have a couple of helpers like call ’em helper nodes. And so the one I was just describing, you would need two if you pick these two people. And all of us would go to the first person, we tell them our first random number, and we go to the second person, we tell ’em each our second random number, and then independently each of those people would just add up the numbers they were given and they learn nothing by doing so because literally adding a bunch of random numbers together.
And then when they’re both done adding, they add their two totals and that is the sum of everybody’s age or their secret number that you’re trying to add. It’s like, well, how did that happen? That was kind of magic. But if you think about it, it’s like it doesn’t matter what order you add numbers, you can add numbers together in any order and you get the same total. So if you add sort of call them wise and then row wise versus row wise, and then column wise, it doesn’t matter you’re getting the same total. That’s just a really, really simple example. But the general idea is you take information and you anonymize it by breaking it into these just randomly distributed numbers where if you’re given this random number, you learn nothing from it. And then they do this independent computation on these random data sets where at the end you actually get back this interesting aggregate that has some value, but as you say, can’t be connected back to a single person.
ES: So the multi-party is not really related to the idea that there’s multiple parties sending data into this aggregator. It’s more that the data is broken out into different pieces for any given piece of data, it’s split up into different pieces that’s shared with these helpers. And that’s what invokes the multi-party connotation there.
BS: That’s right. And the really cool thing about this and why this is so exciting is because it lets you build a system that doesn’t have any single point of failure. So we see these data clean rooms and things like this, which involve a single central party receiving all the data and joining it together. And that’s cool, but that type of system is a single point of failure. All the data is together in one place, and if something goes wrong, I don’t know, let’s say it’s compromised by a hacker or compelled by some government agency to hand everything over, you have the keys to the kingdom. And the cool thing with multi-party computation is you can devise a system where if one of these helper parties turns evil or goes rogue or gets compromised or compelled by a government, it doesn’t matter, user privacy is still totally preserved,
ES: Right? Data clear clean rooms to me have always felt like a, let’s call it an imperfect solution. To me it’s just like, okay, well let’s take all the data that we didn’t want to be joined and join it in one place and have that particular entity be responsible for that joint that doesn’t actually solve any problems. That just shifts them to a third party anyway. To my mind, data clean rooms are not really a progressive solution to this. Just to circle back to the point, so you said you’re 38? Yeah, I’m 39. I described myself the other day as being in my mid thirties and my wife excoriated me for that. And I was like, well, hey, I’m trying to obfuscate my age here. I’m trying to protect my privacy. What are you doing? Why are you blowing my cover? Okay, talk to me a little bit about, so I a, you mentioned that it’s a joint proposal from Mozilla and Medic. Can you talk to me about how that came about? Yeah,
BS: For sure. So as I mentioned earlier, IP is like a counter-counter proposal to this Apple and Google proposal that came before it. So I saw those proposals and I was not optimistic that they were trending towards consensus it like we weren’t going to reach a place that all the participants in the W three C get together. So together with some other folks at Meta, we sat down to see if we could come up with a proposal that seemed like it had a chance of reaching consensus. And so after we drafted up this idea, we first brought it to Mozilla to get their feedback because Mozilla is so well respected in the W three C, they’re these real thought leaders that everybody really respects their opinion. So we brought it to them and said, what do you think of this idea? And they looked at it and they’re like, Hey, that’s kind of interesting.
That might have legs. What do you want to do? And we said, yeah, would you like to put your name on it? And they said, maybe let’s spend some time working on it together and see if we can improve. It said, yeah, totally. So we spent a couple months together with the M Brazilians and they radically improved it because super sharp and really, really smart. And so they found a whole bunch of problems and weaknesses with the proposal, but they helped us make it better and they proposed ways of improving it. And eventually after a couple months together, we got it to a point that they said, yeah, this is pretty decent now we’re willing to put our name to this. And then we proposed it in the W three C and we shared it with everybody and gotten more feedback. Some, I’m super grateful to them for engaging in this constructive way to try to move everybody towards consensus.
ES: Got it. And can you talk to me a little bit more about the standards establishment process with the W three C? So first of all, I guess the first question I have is the W three C. What is the applicable domain there? Because my understanding is it’s really just the web browser domain. So you’re talking about the web domain. Does that apply to the app, to the mobile platforms? It feels like if it doesn’t, well, that’s where the ball game is being played. So if this is only the web, and in reality if we’re talking about the web, we’re not talking about the mobile web either because on iOS, every browser has to use WebKit, right? So there’s one sort of browser operator on iOS. So if we’re talking about the W three Cs applicable domain, is that just desktop browser? Just to clarify here, I think that’s right, but is that correct?
BS: That’s a really good question. It’s a little bit of a complicated answer, but I mean there’s work going on in the W three C that might extend a little bit beyond what you would consider the traditional boundaries of the web. So there’s decentralized identifiers, which is sort of interesting and different. And then there’s activity pub, which is this protocol that the decentralized federated web is running on. And that’s also kind of like a server to server protocol, which is a little outside of what you would maybe typically have considered to be the web. So it definitely works on a variety of things. But yeah, the web, as you sort of understand it as web browsers interacting with websites, it’s sort of its traditional domain, but I have repeatedly made the point in the W three C in the private advertising group that we can’t build a solution here that only works for websites.
We have to talk about apps and websites. And most people’s experience of the web is not that fragmented using apps that link to websites and websites that link back to apps and it’s all connected. It’s part of how people experience the internet and advertising bridges that boundary. Most of the ads that meta shows are shown in apps, but a lot of them take people to websites. So we definitely need to build a solution that works across all of these things. And I have endeavored to make sure that the charter of the private advertising technology group isn’t super constrained down to just the web, that it’s okay, we can go talk about solutions that are bigger and broader and maybe can continue to work across all those things. And IPA is definitely one of those. This IPA proposal could just as easily be applied to mobile operating systems and could even be interoperable between app and web. So you could see an ad on one and have a conversion event on the other, and they could all get joined together to produce this aggregate attribution report.
ES: Got it. And so IPA – MPC, the whole idea here is that data leaves the device. Please correct me if I’m wrong here. I feel like there’s two kinds of schools of thought with respect to digital privacy. One is that data leaves the device, it goes through some sort of mechanism that makes it private, and then that relays the relevant metadata or details back to the participants that submitted data. And that’s kind of what we’re talking about now. The other approach is that everything staged on the device, it just never leaves the device, right? Everything is computer on the device, like federated learning, everything stays on the device. And maybe there’s some mechanism that updates the model coefficients, those get sent back off the device, but really no relevant detail about any specific transaction or whatever conversion does. Talk to me about the differences between those two methodologies, right? So what are the downsides of entirely on-device solutions to privacy relative to a solution like IPA and what are the benefits?
BS: So I think the line is actually much blurrier than that. You can’t really build a system where no data ever leaves the device. Even in something like Federated learning, you have to aggregate these model update vectors together somehow. Somewhere if you’re going to have a report leave and get sent back out some statistic that you computed, you have to at least aggregate the contributions from a whole bunch of different devices. So necessarily you have to have something leaving the devices. So what leaves the device, I think is the interesting question, and this is sort of the crux of the difference between Apple’s updated proposal, Google’s proposal, and this sort of IPA proposal is where does the attribution happen? You’re joining together this click and this purchase to do this attribution. Is that joining happening on the device? And then we’re using some system like MPC to just do aggregation or are the independent events of the Qlik and the purchase being sent out And then the attribution happens in the MPC, but it’s worth noting that both apple’s updated V two design and our IPA proposal both use MPC.
The Apple one would just use it for aggregation, and IPA would use it for attribution and aggregation. So we have to have some data leaving the device. And MPC is a really, really good way to sort of control what happens after it leaves the device. So if you look at the trade-offs between those two designs, there’s a number of them. It’s a really important architectural decision and a whole bunch of things flow from it. So from the privacy standpoint, if you look at what Apple has said in the W three C and the reasons why they’re proposing this on device attribution, one of the reasons is if you take a, well, what happens, worst case scenario, the MPC falls apart and all the parties that are running it collude with one another to violate privacy, which is of course we design the whole thing for that to not happen.
You have a governance system in place and audits and whatever to make sure that doesn’t happen, but assuming it did, what’s the worst case scenario? And I think the answer is it’s pretty bad for both, but it’s been slightly worse in the world where you’re doing attribution in mtc. The other one is about explainability. So the Apple engineers have said this is super important to them that if you’re someone who has an iPhone and you want to know what data is leaving my device exactly, and how is it being used? They want to have some UI somewhere in the settings menu that can show you that information. What they can do if the attribution happens on device is you could say something like, okay, well you clicked this ad on website A and you made a purchase on website BE, and an attribution happened and we contributed a value of seven to histogram bucket 159.
That might not be super understandable as the end user, but you can perhaps tell them how they affected the aggregate. It’s a little bit harder to tell that story if you do the attribution off device because you don’t exactly know which bucket got updated in this sort of aggregate histogram that you produce, but you can still probably tell them something. And now that we know this is a super important thing to Apple to be able to tell that story, we’re working on an update of what’s the best we could do in terms of explainability on the utility side. I have some reasons why I prefer the off-ice attribution. So I’m a little bit worried about this winner take all way where when a conversion happens, it gets attributed to some ad and then this information, it’s sort of deleted and removed from the phone, it’s gone.
So there was a period of about three to four years where I worked on Facebook’s audience network, our third-party ad network. So I got an opportunity to of see how this plays out in the open web ecosystem, and I’m really worried that you create this incentive to generate accidental clicks so that low quality publishers steal credit from higher-quality publishers because the last click or whatever might get the attribution and then nobody else gets it. The interesting thing about IPA is you can run multiple queries. You could say, okay, let me try last click, and you could run a query and then you could say, okay, that’s interesting. What if I try this different attribution heuristic like equal credit and you can run another query and you could say, okay, what if I removed this publisher from the mix and then run another query?
So you can sort of do these counterfactuals and use that to try to triangulate the actual impact and effectiveness of various different surfaces where you’re buying ads. And I think that’s maybe also possible with on-device, but you can’t do it the same way. And it’s probably more difficult to do that. You’d have to slice the audience of the web into multiple pieces and run these experiments at the same time. You need to know in advance what you’re going to do. And then it’s a lot of work to set this up. You can’t just run a couple of queries at the end to triangulate things.
ES: I mean, that’s super important, right? Because I mean that’s essentially, that’s the job. The job of any media buying team is to continuously test those attribution windows, the attribution logic, the attribution methodology. And if that data is totally withheld, well then you can’t do that. Then actually you are just left with whatever the ingrained attribution logic is of the entire mechanism that does it, right? And so you have no agency with respect to how you optimize your own media buying plan. That’s the job. That’s the critically important piece of being able to do sort of optimized media buying is to be able to determine, okay, well how should we think about attribution and going way beyond just sort of attribution windows or last click versus whatever. It’s all of those things. And it feels like ingesting all of that logic into a mechanism that is controlled potentially by the device operator that feels like too much control to seed.
You can’t do that. You won’t be able to test the efficacy of any different sort of approach to buying media. You’re just stuck with what the device manufacturer allows you to utilize. That feels very, very sort of restricted. But one thing I want to circle back, you talked about communicating this to the consumer, that to me is critically important and it’s just sort manifest in the idea of agency, of my agency as a consumer, as a whatever smartphone owner over my data. How do you communicate that? And I guess the question I’d pose to you is how much buy-in for any of these approaches, not IPA, but anything else, how much buy-in is required from consumers in order to advance these solutions, right? So is there an inherent dilemma in approaching consumer privacy with technical solutions given the difficulty in communicating their underlying mechanics?
BS: That’s a great question. I have a couple of things I want to say about this. The first one is I think that we really have to work hard to try to do a much better job than we have in the past to explain to people how does this all work? And it is going to be difficult. We have to explain these complicated concepts like multi-party computation. But I think that we can do that. So I personally have put the time and work into this YouTube channel where I’ve been making these explainer videos to try to demystify multi-party computation.
ES: They’re very, very well done. They are very well produced. Kudos to your production team. It’s a fantastic explainer to the topic. Sorry to interrupt.
BS: No problem. Maybe you can share a link to them at the end. So I think that we can as an industry really put some time and energy into explainers to try to do at least as well as we’ve done on encryption. Most people don’t understand the technical details of how encryption works, and that’s fine. They have a high level metaphor that works for them. I have my secret information, I put it in the box, I lock the box with a padlock and only the person with the key can open the box. That’s fine. That is a perfectly reasonable metaphor for how encryption works. I think what we need to do is develop similar kinds of metaphors for these other technologies like multi-party computation. So people have a working model without needing to understand all the details, but also with encryption. If you want to know the details, you can go search the web and find a whole bunch of great resources that can go really take you down the rabbit hole as deep as you feel like going to learn about diffy helman and elliptic curve points and whatever you want to know.
So I think we need to create similar types of resources for MPC to let people go as deep as they want to go into understanding the details. So that’s sort of what I was trying to do with these YouTube videos. That’s sort of one answer. But on the other, I have another answer, which is, let’s take a metaphor. You don’t have to understand how an airplane’s autopilot works to trust it and to sit on that airplane and take that flight because you sort of have this chain of trust, like you trust in the brand of the airline and in the regulators of the airline industry that smart technical people who understand this field have reviewed this thing and they’ve certified it to be safe. And so I think that’s a big part of why we’ve been doing this multi-stakeholder process through the W three C where all these people are involved representing a bunch of different stakeholders.
So the Center for Democracy and Technology and Mozilla and a bunch of other folks are there who are specifically analyzing this to say from the consumer perspective, how are we protecting them? And also we’re engaging with regulators. So just recently we finished this pilot in the Singapore regulatory sandbox for Privacy enhancing technologies where we did a little pilot of IPA to get them to understand in great detail exactly how does this thing work, what’s the mechanism used to protect privacy and what’s your opinion about it? And they gave us feedback and analyzed it after we spent a long time going through all the details on a whiteboard like this one behind me of explaining how the whole thing works. So I hope that at the end, if the W three C collectively produces this thing that’s been reviewed by academics, by stakeholders, and by regulators, that people might have trust in that collective product based on all the involved parties.
ES: So the podcast is audio only, but I’ll make clear to the audience that there is a whiteboard behind Ben to which he was referring.
Okay. So we talked about consumer conceptualization of these concepts. Take your point. They don’t need to understand the nitty gritty. They need to have trust in the broader system. So let me kind of shift then to the ease of adoption by advertising and publishers, right? Because that’s closer to the actual system than the consumer. So if you think about these systems and just generally advertising, there’s really three parties. There’s the consumer, they’re important, they’re central to the entire enterprise. There’s the advertisers and the publishers. They’re very important to the enterprise. And then there’s the ad tech layer, call it the ad platforms, the ad networks, whatever, the whole ecosystem. So on the GitHub for the W three C working group, the Chrome team had published some feedback on IPA, which included this statement. The client side implementation of IPA is simple, making it easier for independent implementations to be interoperable. So how critical to the success of any privacy centric attribution tool or framework is the ease of adoption by advertisers slash publishers because they’re one third of that whole, that ecosystem, right? There’s three parties, there’s the consumer, there’s the ad platform slash network. So wherever the ad tech component and then there’s the advertisers and publishers. How easy does this have to be for them in order to make it viable?
BS: Great question. I think it needs to be not too much work to migrate your current systems to this new reality if it’s really a lot of work. I mean, these are all businesses. They’re busy, they have a lot of priorities. And so if you make it a ton of work and really, really difficult transition, it’s going to be harder. So I think with IPA, we’ve done like okay, job finding a middle ground where it’s different than how things work today, but hopefully not absurdly difficult to adopt. So it’s today where you would go to a website and you’d get a third party cookie, and then you would log that third party cookie together with the information about the ad that was shown where the purchase that was made. You’ve sort of just replaced that one for one with this encrypted identifier that you’re getting. Instead of logging this third party cookie, you log this encrypted identifier.
So that part can stay the same. And the part that’s different is when you want to do a measurement, you want to sort of combine this click data and conversion data together and understand attribution. You can’t do it yourself. You have to send that data out to this multi-party consortium that sort of does the computation for you instead of you being able to run it in-house. But hopefully you can run the same type of queries as you were before. You can slice and dice it. You can make some subset of the data. You can decide what breakdown keys you want to use to look at various drill downs about that data, and you have that same flexibility. So we hope it’s not terribly complicated. Some of the competing proposals from Apple and Google, they’d require I think, a little bit more work on the advertiser and publisher side to, for example, say in advance, before you start running this campaign, you kind of need to know exactly what queries you’re going to want to run in the future where you’d have to sort of preload onto everybody’s devices, all the logic for the various queries you want to run in the future.
That’s a lot more work and a little bit more tricky. It’s not impossible, but I think it does make adoption more difficult. So this is one of the rationales of why we think this sort of off-ice attribution architecture might make it easier for adoption.
ES: Let’s talk about timelines. So you talk about IPA, you’re in this standards approval process. What could potentially be the timeline for this being rolled out to where it’s used by advertisers?
I do not want to over promise and underdeliver gear standards. It moves pretty slow. This is a really, really slow moving kind of part of the world. I have been engaging in the W three C for about four years now, and we’ve made a lot of progress. We’ve reached consensus on a number of points. I think we’ve pretty much reached consensus. There can be some server side thing. It doesn’t have to all happen just on device. There could be some at least aggregation happening server side. We reached consensus that differential privacy is an acceptable way of dealing with output privacy from such a system. We reach consensus on a number of points, and I feel like we’re getting closer and closer with the proposals, but it could still easily be years before you start seeing something like this in production.
ES: Ben Savage, I appreciate you taking the time to chat with me today. We are just up against the clock here. How can people learn more about IPA? How can they learn more about the work that you’re doing with the W three C and how can they reach you on the internet?
Well, what you can do is you can get involved in the W3C. I think it’s great to have a diversity of voices there representing all these different stakeholders participating to make sure we come up with a really good outcome. And if you perhaps work on the advertiser buy side, I think there’s not as much representation in there as I’d love to see from the buy side of things of actual advertisers, but their specific use cases. So maybe consider getting involved. It’s a community group. Everybody can join. It’s open it. You want to learn more about IPA. We have a GitHub repo that has some information, but I also give various talks and recordings that are on places, for example, on YouTube. And if you want to reach me, you can reach me on LinkedIn or on threads.
ES: Got it. And out of curiosity, because you present these papers, which are very dense technical papers that I read through in preparation for the call, but you present at these conferences that are targeted at the crypto realm. And I’m just curious, how often is it that someone shows up expecting to see a bunch of Bitcoin content and they’re sorely disappointed that they’re hearing about ads attribution?
BS: Oh, you’re talking about the real world crypto conference at that particular conference. There’s no Bitcoin, there’s no financial anything going on there that’s actual cryptographers doing actual cryptography. I think the word crypto kind of got co-opted by is like coin things. And I think it’s somewhat frustrating to the actual cryptography community that they’re like, no, this word had a meaning before it meant cryptography. So that particular conference, there was no one expecting us to talk about Bitcoin.
ES: I see. Okay. Okay. Well, Ben Savage, I appreciate your time. Thank you very much for taking the time out of your day to talk to the mobile dev memo audience. I appreciate the work that you’re doing and I look forward to seeing more from you soon.
BS: Thanks for having me, Eric.
ES: Yep. Take care.
Comments: