Understanding Approximate Differential Privacy and Gaussian Mechanism

Convert to note

Overview of Differential Privacy Mechanisms

Traditional differential privacy (DP) uses the Laplace mechanism, adding noise proportional to the L1 sensitivity of the function to protect individual data.
When releasing multiple outputs (a vector of values), noise is added independently to each component based on the L1 norm of sensitivity.

Introduction to Approximate Differential Privacy (ε, δ-DP)

Approximate DP relaxes the strict privacy guarantee by allowing a small probability δ where the privacy guarantee may not hold.
Formally, for all neighboring datasets X and X', and all output subsets S, the probability that the mechanism outputs a value in S differs by at most a factor of e^ε plus δ.
When δ=0, this reduces to standard ε-DP.

Gaussian Mechanism for Approximate DP

Instead of Laplace noise, Gaussian noise with zero mean and variance proportional to (L2 sensitivity)^2 / ε^2 is added.
L2 sensitivity measures the Euclidean distance between outputs on neighboring datasets.
This mechanism achieves (ε, δ)-DP, providing a trade-off between privacy and utility.

Advantages of Gaussian Mechanism

Noise added scales with the square root of the number of outputs (√D), compared to linear scaling (D) in the Laplace mechanism.
This results in significantly less noise for high-dimensional outputs, improving utility.
The logarithmic factor involving δ is typically small, making the noise reduction substantial.

Key Properties of Approximate Differential Privacy

1. Post-Processing Invariance

Any function applied to the output of an (ε, δ)-DP mechanism does not degrade privacy guarantees.
This ensures privacy is preserved even after further computations on the released data.

2. Composition

When combining K mechanisms each providing (ε, δ)-DP, the overall privacy loss is at most (Kε, Kδ) under basic composition.
Advanced composition for Gaussian mechanisms improves this to roughly (√Kε, Kδ), allowing more queries or iterations with less privacy loss.
This is crucial for iterative algorithms like gradient descent in machine learning.

Implications for Privacy in Machine Learning

Approximate DP and Gaussian noise enable private training of models with better utility due to reduced noise.
The advanced composition property allows many iterations of private computations while controlling cumulative privacy loss.

Summary

Approximate differential privacy introduces a small relaxation (δ) to enable better utility.
Gaussian mechanism leverages L2 sensitivity and adds noise with variance tuned to ε and δ.
Post-processing and composition properties ensure robust privacy guarantees in complex workflows.
These concepts are foundational for developing privacy-preserving machine learning algorithms.

For a deeper understanding of the underlying principles of differential privacy, you may find the following resources helpful:

[Music] hello um so far we've seen um a way to achieve Epsilon differential privacy in

a trusted curator model uh and to do this we calculated the sensitivity of the function for which we want to uh be

function that we want to be private and uh we added laian noise where the spread or the I mean approximately the variance

of the noise uh depends on the sensitivity of the function that we are trying to uh protect or make it private

now um before we move ahead and talk about uh an approximate notion of privacy uh what I just wanted to point

out one thing is that um the same idea of computing the sensitivity and adding laian noise also works when you are

releasing as a mechanism not just a single value but you might release a bunch of values what do I mean by that

um in this case let's say you have some data set X and your function that operates on this data set X might take

this data set and um map it to not just a single value but a bunch of D different values for example let's say

um again in the survey example that we have been looking at we get binary variables values from and different

people um and we were interested in the average value um earlier now we might also be interested let's

say in the in other um functions of the binary variables let's say the median or let's say some uh um quantile values

and so on and so forth basically what I'm trying to say is that it's the same data set and from the same data set we

compute different functions and then release a bunch of values you can think of this as if we releasing not a single

number but a vector right so the vector is in D Dimension which means that there are D different values that you're

Computing from the same data set and releasing it to the to the public now uh how can we make this private well the

idea is exactly the same um instead of adding a onedimensional laian noise we will now

add laan noise with you know mean z a d dimensional zero vector and um the

uh this the sensitivity is still something like d by Epsilon uh where now D is defined slightly differently

um well I should say this is Del by different into I and we'll talk about this uh D is just you know Max over all

X comma X Prime neighboring now remember my my function f returns a vector so we are comparing

two different vectors and um earlier we were just comp Computing the absolute difference now what we will do is we

will compute the the L1 Norm between these two right so uh L1 that is we will compute

um sum over IAL 1 to D the the ith component absolute difference okay um and and now um we

will add Del by Epsilon as the spread parameter for the laian noise for each of these components

um in other words if you want to think of this as um a covariance matrix then you can multiply it by I but maybe it's

a better better way to say that um maybe I will say it this way uh La of 0 comma d by Epsilon uh in all

components independently of each other what does that mean that means that um what we are

doing is you know we are adding we are sampling a noise value for each component from a distribution which is

laian with zero mean and D by Epsilon as the um as a spread parameter where D is defined as the L1 sensitivity of the

function that we are trying to compute now one can show that this is also Epsilon differentially private uh when

you releasing F which is a vector valued function so that's one point I just wanted to make uh make it clear that you

know um it's not just that we're releasing uh we have to release just a single value we might release multiple

values um and and the idea is you know if you're releasing multiple values somehow um information can get leaked in

different ways uh and so you'll have to protect it together so you know naturally the sensitivity um you know is

going to be larger because you're protecting the different uh components okay good so that's one point

I wanted to see um so now let's talk about um slightly different variant of

differential privacy uh which is what is called as approximate differential privacy DP for differential privacy um

and let me put down the definition and then we'll talk about uh what it means um we will say some mechanism

M as usual uh and and let me say what is this me mechanism well it takes n different values and Maps it to some

output which can be let's say zero or one depending on or or real number depending on your problem at

hand is Epsilon Delta [Music]

approximately differentially private or sometimes this word approximately is not used we'll just say epsilon Delta

differentially private and that means we are talking about approximate differential privacy see um if for all

neighboring x and x Prime um in xn and all you know y uh sorry set s subset of

Y uh we have the following criteria that gets satisfied let me put it in a different color um

the probability that M of x subset um subset of s um is less than or equal to sorry yeah

belongs to S it's less than or equal to e to the Epsilon probability that M of X Prime belongs to S so far it's the same

but then uh because it's an approximate notion we will add plus Delta

so um so this is the definition right so given an Epsilon and Delta we will say that a mechanism m is

approximately Epsilon Delta differentially private if for all neighboring data sets x and x Prime the

probability of M of X belongs to some subset in the output s um is at most e to the Epsilon times the probability of

you know the same uh the mechanism giving out the value in the same subset for a neighboring data set plus Delta so

in other words uh well of course if Delta is zero this means that it's exactly the same as the definition of uh

differential privacy that we that we know already um but this Delta can be thought of as giving us some leway right

so there might be some uh low probability events or some subsets yes where the Epsilon differential privacy

property may not hold right so and we are saying that you know if if if that is a very small probability um event

then you know where where that is where the probability is at most Delta we are still safe right so we are we are not in

other words we are not enforcing that um the Epsilon differential privacy property which is a very strict

requirement that no matter what the subset s is in the output we need it to we need the property to be satisfied the

ratio should be at most e to the Epsilon but now we are saying that well there is some legal room of Delta for some

subsets yes right so that's that's why it's called approximate differential privacy because you are achieving EP so

one I mean one can show that if this property is true then you are achieving uh Epsilon differential privacy with for

most of the subsets s right so with very high probability um you are achieving Epsilon differential privacy so to

say um Okay so two questions one is how do we achieve this well one obvious way to achieve

this would be to use the LA mechanism itself right so because the LA mechanism is satisfies a stronger notion of

Epsilon differential privacy which means that with Delta equals z you know you get Epsilon comma zero differential

privacy if you use the laas mechanism uh but the whole idea is that you know because we are relaxing this a little

bit um maybe that's a different way to to achieve this and uh by doing that different way of achieving this can we

gain something um in terms of let's say utility right so that's the bigger question bigger hope that we have and

the answer is yes um and and again we won't go into the uh proofs of this uh in this course but um I would at least

like to tell you how to achieve this and what does that essentially give us right so to achieve

this achieve this uh use the what I'm going to call as the gausian

mechanism so instead of the laian mechanism we can use a slightly relaxed mechanism called the gaussian mechanism

where we can achieve this and how how exactly do we achieve that well um again I'm I'm going to give you the mechanism

itself we won't prove that this is Epsilon Delta in this this uh course but it's not too hard to prove also so we

will add gaussian noise as simple as that right so instead

of the laian noise uh we will add gausian noise to the function that we are trying to compute and what exactly

is this gaussian noise well gausian noise is typically indicated as n n for normal normal distribution same as

gausian distribution um with mean zero um and variance

um lawn of 1 by Delta um Del s by Epsilon squ so it's a slightly different noise

that you're adding so so if you if you are aware the laian noises you know uh PDF looks something like this now the

gaussian is a more common wellknown known um PDF um it it looks like this right so it falls

faster um than the laan uh than the laian noise um okay so what are we doing here

we are saying that if you want if you if I give you a specific Epsilon and Delta then the noise that you should add is of

course zero mean and the Delta plays a role here in the in the variance right so um and well

so there are two Deltas here so this is uh uh this is the Delta for the Epsilon Delta differential privacy this is the

sensitivity I'm not going to call this as Delta I'll call this as a sensitivity so this is the sensitivity squar by

Epsilon squar now it can one can show that if you if you do this then we can achieve Epsilon Delta differential

privacy now um like how for the laian case um when you have m multiple outputs we would add laian noise to each

of these components separately um and the sensitivity was calculated as an L1 sensitivity the L1 Norm difference

between the outputs of the neighboring uh data sets here what we will do is you know uh

for high dimensional outputs the noise that you will add will be zero mean lwn 1 by Delta the same thing Del s

by Epsilon squ but this will be del2 squ right so and that's just that just means this is the

L2 sensitivity right so instead of the L1 n we are looking at L2 Norm so basically

Del del2 can be defined as Max over X comma X Prime neighboring F of xus f of x Prime L2

Norm what is L2 Norm well L2 Norm is su/ I = 1 Tod f ofx - f of x Prime ith component square and then you take the

square root right so this is the standard ukian Norm the way we measure distance between any two points in space

in nulean geometry this is just that um L2 Norm so we have L2 squar sensitivity squared different

divide divided by Epsilon squ is what is the noise that you should add to achieve Epsilon Delta differential

privacy now the question is why are we doing this right so what do we gain by um adding gaussian noise um of course

what we lose is that from a Epsilon differential privacy we have lost the strong notion of Epsilon differential

privacy we only have an approximate notion right so that is there is some Pro some set s in the output space of

some probability for which our Epsilon differential privacy guarantee may not hold that's what this Delta is you know

allowing us right so there is some low probability event uh which if it happens um we cannot guarantee right so in fact

we can uh perhaps that means that you know by looking at the output maybe if that event happens we we will be able to

figure out whether X generated that output or X Prime generated that output the

difference is larger there right so and so we'll be able to figure out but such an output happens with very very low

probability and so we are we are kind of okay with that right so for most of the times if you draw noise from gaussian

distribution uh you are not going to see something where you know you will not have Epsilon differential privacy for

most of the time you will have strict Epsilon differential privacy for very rare events you will not have so

nevertheless we have lost something there right so if the RAR event happens then yes we don't have necessarily

Epsilon differential privacy but what have we gained right so we should we should have gained something otherwise

we wouldn't do this um and what we gain again is so we gain in terms of and this should be easy

to guess by now uh in terms of utility and and the way we will describe that is as follows

um well if you think of uh sensitivity itself right so how much uh um what's what is the sensitivity that we have

um so for the laian noise if just to give an example right so we were adding La of 0a d by Epsilon

uh which is like La of 0 comma um so if you if if each of the component was something like one by n sensitive um

when I say 1 byn you have n different binary variables and if if this binary varibles so then you know for

neighboring data sets it cannot change let's say something like a mean uh cannot change by more than 1 byn and

let's say if each of the components had that 1 byn kind of a sensitivity then the L1 Norm is adding up the uh maximum

difference in each of the components so there are D different outputs uh and each component is 1 byn sensitive so the

total noise that we'll essentially end up adding is um is of the order d by n Epsilon

this Epsilon is already there this D by n comes because it's just sum over I you know the absolute difference between F

of XI the I component which is all 1 by n and I = 1 to D and so this is d byn right so this is d by an Epsilon that's

the laian case now in the gaussian case what will happen is is that we will gain something in the gaussian case what we

will end up adding is n of0 comma um if you do the calculation for something very similar like this uh we will end up

getting something like square root of D log 1 by Delta by n Epsilon um so when I say that so this is

the uh standard deviation here right so uh so roughly the spread corresponds to the standard deviation um essentially

what the key thing to note is that the the N Epsilon in the denominator is the same that is the same so then we are

kind of trying to compare you know how much noise that we add um by looking at the numerator and if you look at the

numerator here it is of the order D the that the the the the spread that you add is of the order D which means that if

the D components D things that I'm releasing as output to the public then you know the the noise kind of adds up

linearly whereas here it's only square root of T of course there is an extra log 1 by Delta that you will have to

suffer but then that's just log right so okay so let me just be consistent here when I

say log it's typically lwn um which is just to the base log to the base e but that doesn't matter that's just a

constant uh which will differ even if I put log the order will remain the same um but but the important thing is that

this is of square root of T right so if we are comparing in the same scale um it has to be either compared as variance or

standard deviation right so if you fix that scale and see how much noise do you end up adding in a La Place mechanism

versus how much noise you end up adding in a um in a gaussian gaussian mechanism so there is quadratically lesser amount

of noise in terms of the number of components or number of know output that you're releasing

instead of D this is just square root of D well there is an extra factor that you'll have to suffer not exactly square

root of D slightly more than square root of D and that slight extra is lawn of 1 by Delta but because it's lawn you know

if Delta is even if Delta is small the logarithm is going to make that lawn of 1 by Delta not too large right so um

you know so so what essentially you're doing is you can think of it roughly as some constant time square root of D

versus d right so when D is large when the number of output that you're releasing is large then you will have to

add quadratically more amount of noise uh to maintain Epsilon differential privacy but then you can get away with

much less noise uh if you're if you only I mean if you're happy to settle for Epsilon Delta differential privacy

now because you're adding less amount of noise in the gaussian mechanism well basically you're corrupting the output

and giving it to the user um now the amount of corruption is the noise right so and and because you're

adding less amount of noise the the actual value that you really want to give to the user um but you cannot

because of privacy issues is getting corrupted lesser and so you will have higher utility right so so the gausian

noise helps you achieve higher utility because the amount of noise that you add is lower of course by this we can we can

try to analyze I mean we can work out what is the exact utility which I'm not really doing here uh but but the point

that I want to make is you know um because it's gaussian noise you will end up adding less amount of

noise um and and the fundamental reason why this works is because because you know of the L1 versus L2 uh ways of

measuring things um but again let's not worry about that too much in this course um the main takeaway here is that we are

letting go of some amount of privacy for certain rare outputs that's the Delta part and once you're okay with that what

you gain is for for the other things where you do have Epsilon differential privacy um your utility is quadratically

better than the utility that you would have gotten had you just used blindly appli mechanism right so lli mechanism

is Epsilon zero differentially private but then if you allow as little bit of you know leway in terms of not caring

about privacy for rare events then your utility becomes quadratically pattern so that's the approximate differential

privacy's advantage that's the first advantage of approximate differential privacy so what we will also see is are

some very interesting properties of approximate differential privacy uh mechanisms and that is why this this is

quite popular and we'll later see U um in the future videos that uh how we can make use of these properties that we are

going to see now to make machine learning algorithms also private right so but the reason uh why we are talking

about this is eventually we'll get to machine learning algorithms um and make them private and for that we will use

properties that I'm going to talk about now so what are these properties let's see

them okay so properties of um I'm going to call this ADP for approximate differential

privacy so the first property and there are several properties uh we'll we'll talk about some important properties

that are pretty much needed for from a machine learning context later on the first property is called as post

processing um so what does this mean so let's say we have some mechanism with us um

say Y is a mechanism is Epsilon Delta differentially private so I'm not saying approximate because the Delta

means that it's approximate differentially private now let's say what does this mean this means that you

know there is some input X and some data set and I do some computation on this data set I release something via the

mechanism M and I have added gaussian noise appropriately so that I can achieve Epsilon Delta differential

privacy now the world or an adversary sitting in the world is looking at this value that

m releases um and we are guaranteed that well M has this nice property that you know by just by looking at the output of

M I cannot figure out whether X generated it or X Prime generated it up to the Delta um you know Delta

probability now if I do some post processing on the output that M releases it should not

suddenly become less private right so let's say you you added noise and gave me a vector um and now I know

that this Vector has this nice property of Epsilon Epsilon Delta differential privacy now let's say I run I input this

Vector to some other function and now this Vector is postprocessed to become something else and suddenly

that's just by making this post processing um I should not lose any privacy guarantee right so which means

that uh if there is some other function f and then I'm composing f with M right so this is this is a composition

symbol basically I'm applying f after I apply M um that should also be Epsilon Delta

differential so which means that this is a natural thing but this is important to uh state it because you know um it

should not become the case that I'm only protecting what I'm releasing but not some function of it right so this is

saying that if you are doing approximate differential privacy you are not only protecting uh what you're releasing but

all functions of that as well I cannot do some you know let's say you give me a vector and then let's say I cannot add

the values of that vector and suddenly you know I get to know more information about the input that cannot happen right

so that's that's an that's a good qual I mean property to have and that is indeed Satisfied by Epsilon uh Delta

differential privacy the the second thing um is uh is another interesting property uh

and this is this will become very critical later on uh for um for for uh privacy in machine learning applications

especially when you're training uh using gradient distance and so on uh is the property of

composition and there are two different variants of it uh there's a basic variant and an advanced variant we'll

just look at the basic variant and I make a note or a comment about the advanced variant uh so let's say we

have uh M1 M2 dot dot dot MK there are K different mechanisms U and or

all Epsilon Delta differentially private right so you have K different way mechanisms which operate on a data um

and then they are releasing their own stuff and then all of them are Epsilon Delta differentially private um

now well essentially what does this mean this means that you know you are leak

you are kind of leaking information up to Epsilon Delta differential privacy via each of these mechanisms and now if

uh if a user has access to K such mechanisms output uh how much is the combined privacy guarantee uh

of the on the on the data set right so essentially I a data set and then I have a data set and then I give my data set

to you know K different parties and then they are doing their own things and then they are all releasing um Epsilon Delta

differentially private quantities to the world and then there is an adversary sitting in the world who has access to

all these K different um outputs from K different parties Each of which is Epsilon Delta differentially private now

how how private is my data set right now because you leaking Epsilon Delta information from each of these

mechanisms uh it cannot be the case that you have leaked only Epsilon Delta for the mechanism that that that

puts together all these K things the interesting thing is that the basic composition theorem says that you know

the combined thing right so an adversary who has access to Output of all these mechanisms um still you

know uh has something like K Epsilon K Delta differential privacy what does this mean this means

that the the the Privacy has become worse by a factor of K that's a natural thing to expect right

so uh that's that's a straightforward thing to expect because each of these mechanisms is

releasing uh releasing is leaking something like Epsilon and that there are K such

mechanisms so it's natural to expect that the total leak is K * Epsilon similarly the the U each one is Delta

approximately differentially private so you know you are it it might be different know sub output sets of

probability of Delta where you are failing and so when you're putting everything together uh a weak Union

bound kind of would say that there are K different portions of the output space where you are perhaps not Epsilon

differentially private right um so you know and each of these K has probability Delta so put together you you lose K

Delta right so I mean your approximation becomes worse um now this is the basic thing and and um I mean for the Curious

uh U you know participant who is watching the video you can try to prove this yourself it's very I mean assuming

that each of M1 MK is Epsilon Delta differentially private it's not actually too hard to prove that the combined

mechanism that releases all these together is k k k Delta kson K Delta differentially private there is an

advanced version um now you can take advantages of the fact that you know this is a gaussian and this is important

that this is a gaussian distribution where we are adding noise from um and you can

prove a complicated but an interesting result that you know the combined mechanism is actually uh something like

Epsilon um K square root of K uh log 1 by Delta prime plus

um Epsilon e to the Epsilon minus 1 comma K Delta + Delta Prime differentially private um yeah so it's a

it's a mouthful to read this out but the idea is that you know um you can get better

combined Epsilon uh better combined privacy and approximation uh if you do a more

careful analysis right so just to point this out um this was K uh K Epsilon so this is Epsilon times something like

square root of K of course there is another factor of log 1 by Delta Prime and you can optimize for Delta Prime and

so on but the but the interesting thing is that you know the K has gone out and you only have square root of K and why

is this important and we'll we'll talk about this later when we talking a bit about uh private machine learning just

to give an example um you can think of it as you know running um you know if you're if if you know the gradient

descent uh which is a standard uh you know Workhorse algorithm for machine learning now you can think of releasing

the gradients at every step uh while performing gradient descent and um you know which which means that at every

step you leaking some information and after making K gradient steps you would have lost K Epsilon in the usual way if

you are adding you know let's say laian noise but now if you add gaussian noise then what this is saying is that well

you would lose something like square root of kilon only which means that you can be private for more number of

iterations and the more number of iterations the better your quality of solution in a machine learning context

is going to be we'll again come back to this when we talk about privacy in machine learning but just to give an

idea for you that you know as you uh maybe think of this as a process where you're keeping on releasing um different

outputs and you want to preserve privacy in a combined fashion and and because you want to release out many such

outputs many I mean you want to run this algorithm for many rounds let's say gradient descent um you want to make

sure that uh at every round you know you don't lose too much and by doing a gaussian noise you can actually achieve

square root of K * Epsilon which you cannot do if you're using a lapl mechanism by the way right so you have

to use a gaussian mechanism and you have to settle for approximate differential privacy but because You' have settled

for approximate differential privacy you gain in terms of you know uh the the Privacy Factor by a quadratic Factor

right so it will be square root of K Epsilon as opposed to K Epsilon which is the basic uh basic thing right so both

of these are true for the gaussian mechanism the basic is easy to prove the Advan is hard to prove nevertheless both

are true whereas only the basic is true for a lapl mechanism that one's just not true so if we want um better privacy for

a longer period of time then we have to settle for um gaussian mechanism or an approximate differential

privacy so uh so with that we have concluded uh you know broadly the two two interesting aspects of privacy uh

one is uh Epsilon differential privacy the other is know what we saw just now which is the approximate Epsilon Delta

differential privacy we will see applications of this in machine learning as we go along um but the next thing

that we will see um is uh is trying to apply privacy in a slightly different context um especially in a in a buyer

seller kind of a cont text um and and we'll talk about that and then after that we'll St start talking about uh

privacy in machine learning using all the you know techniques that we have learned here thank you

[Music]

Heads up!

This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.

Generate a summary for free

If you found this summary useful, consider buying us a coffee. It would help us a lot!

Understanding Approximate Differential Privacy and Gaussian Mechanism

Overview of Differential Privacy Mechanisms

Introduction to Approximate Differential Privacy (ε, δ-DP)

Gaussian Mechanism for Approximate DP

Advantages of Gaussian Mechanism

Key Properties of Approximate Differential Privacy

1. Post-Processing Invariance

2. Composition

Implications for Privacy in Machine Learning

Summary

What is approximate differential privacy and how does it differ from traditional differential privacy?

How does the Gaussian mechanism work in the context of approximate differential privacy?

What are the advantages of using the Gaussian mechanism over the Laplace mechanism?

What does post-processing invariance mean in approximate differential privacy?

How does composition work in approximate differential privacy, particularly with Gaussian mechanisms?

What are the implications of approximate differential privacy for machine learning?

Where can I find more resources to understand differential privacy and its mechanisms?

Start Taking Better Notes Today