Overview of Differential Privacy Mechanisms
- Traditional differential privacy (DP) uses the Laplace mechanism, adding noise proportional to the L1 sensitivity of the function to protect individual data.
- When releasing multiple outputs (a vector of values), noise is added independently to each component based on the L1 norm of sensitivity.
Introduction to Approximate Differential Privacy (ε, δ-DP)
- Approximate DP relaxes the strict privacy guarantee by allowing a small probability δ where the privacy guarantee may not hold.
- Formally, for all neighboring datasets X and X', and all output subsets S, the probability that the mechanism outputs a value in S differs by at most a factor of e^ε plus δ.
- When δ=0, this reduces to standard ε-DP.
Gaussian Mechanism for Approximate DP
- Instead of Laplace noise, Gaussian noise with zero mean and variance proportional to (L2 sensitivity)^2 / ε^2 is added.
- L2 sensitivity measures the Euclidean distance between outputs on neighboring datasets.
- This mechanism achieves (ε, δ)-DP, providing a trade-off between privacy and utility.
Advantages of Gaussian Mechanism
- Noise added scales with the square root of the number of outputs (√D), compared to linear scaling (D) in the Laplace mechanism.
- This results in significantly less noise for high-dimensional outputs, improving utility.
- The logarithmic factor involving δ is typically small, making the noise reduction substantial.
Key Properties of Approximate Differential Privacy
1. Post-Processing Invariance
- Any function applied to the output of an (ε, δ)-DP mechanism does not degrade privacy guarantees.
- This ensures privacy is preserved even after further computations on the released data.
2. Composition
- When combining K mechanisms each providing (ε, δ)-DP, the overall privacy loss is at most (Kε, Kδ) under basic composition.
- Advanced composition for Gaussian mechanisms improves this to roughly (√Kε, Kδ), allowing more queries or iterations with less privacy loss.
- This is crucial for iterative algorithms like gradient descent in machine learning.
Implications for Privacy in Machine Learning
- Approximate DP and Gaussian noise enable private training of models with better utility due to reduced noise.
- The advanced composition property allows many iterations of private computations while controlling cumulative privacy loss.
Summary
- Approximate differential privacy introduces a small relaxation (δ) to enable better utility.
- Gaussian mechanism leverages L2 sensitivity and adds noise with variance tuned to ε and δ.
- Post-processing and composition properties ensure robust privacy guarantees in complex workflows.
- These concepts are foundational for developing privacy-preserving machine learning algorithms.
For a deeper understanding of the underlying principles of differential privacy, you may find the following resources helpful:
[Music] hello um so far we've seen um a way to achieve Epsilon differential privacy in
a trusted curator model uh and to do this we calculated the sensitivity of the function for which we want to uh be
function that we want to be private and uh we added laian noise where the spread or the I mean approximately the variance
of the noise uh depends on the sensitivity of the function that we are trying to uh protect or make it private
now um before we move ahead and talk about uh an approximate notion of privacy uh what I just wanted to point
out one thing is that um the same idea of computing the sensitivity and adding laian noise also works when you are
releasing as a mechanism not just a single value but you might release a bunch of values what do I mean by that
um in this case let's say you have some data set X and your function that operates on this data set X might take
this data set and um map it to not just a single value but a bunch of D different values for example let's say
um again in the survey example that we have been looking at we get binary variables values from and different
people um and we were interested in the average value um earlier now we might also be interested let's
say in the in other um functions of the binary variables let's say the median or let's say some uh um quantile values
and so on and so forth basically what I'm trying to say is that it's the same data set and from the same data set we
compute different functions and then release a bunch of values you can think of this as if we releasing not a single
number but a vector right so the vector is in D Dimension which means that there are D different values that you're
Computing from the same data set and releasing it to the to the public now uh how can we make this private well the
idea is exactly the same um instead of adding a onedimensional laian noise we will now
add laan noise with you know mean z a d dimensional zero vector and um the
uh this the sensitivity is still something like d by Epsilon uh where now D is defined slightly differently
um well I should say this is Del by different into I and we'll talk about this uh D is just you know Max over all
X comma X Prime neighboring now remember my my function f returns a vector so we are comparing
two different vectors and um earlier we were just comp Computing the absolute difference now what we will do is we
will compute the the L1 Norm between these two right so uh L1 that is we will compute
um sum over IAL 1 to D the the ith component absolute difference okay um and and now um we
will add Del by Epsilon as the spread parameter for the laian noise for each of these components
um in other words if you want to think of this as um a covariance matrix then you can multiply it by I but maybe it's
a better better way to say that um maybe I will say it this way uh La of 0 comma d by Epsilon uh in all
components independently of each other what does that mean that means that um what we are
doing is you know we are adding we are sampling a noise value for each component from a distribution which is
laian with zero mean and D by Epsilon as the um as a spread parameter where D is defined as the L1 sensitivity of the
function that we are trying to compute now one can show that this is also Epsilon differentially private uh when
you releasing F which is a vector valued function so that's one point I just wanted to make uh make it clear that you
know um it's not just that we're releasing uh we have to release just a single value we might release multiple
values um and and the idea is you know if you're releasing multiple values somehow um information can get leaked in
different ways uh and so you'll have to protect it together so you know naturally the sensitivity um you know is
going to be larger because you're protecting the different uh components okay good so that's one point
I wanted to see um so now let's talk about um slightly different variant of
differential privacy uh which is what is called as approximate differential privacy DP for differential privacy um
and let me put down the definition and then we'll talk about uh what it means um we will say some mechanism
M as usual uh and and let me say what is this me mechanism well it takes n different values and Maps it to some
output which can be let's say zero or one depending on or or real number depending on your problem at
hand is Epsilon Delta [Music]
approximately differentially private or sometimes this word approximately is not used we'll just say epsilon Delta
differentially private and that means we are talking about approximate differential privacy see um if for all
neighboring x and x Prime um in xn and all you know y uh sorry set s subset of
Y uh we have the following criteria that gets satisfied let me put it in a different color um
the probability that M of x subset um subset of s um is less than or equal to sorry yeah
belongs to S it's less than or equal to e to the Epsilon probability that M of X Prime belongs to S so far it's the same
but then uh because it's an approximate notion we will add plus Delta
so um so this is the definition right so given an Epsilon and Delta we will say that a mechanism m is
approximately Epsilon Delta differentially private if for all neighboring data sets x and x Prime the
probability of M of X belongs to some subset in the output s um is at most e to the Epsilon times the probability of
you know the same uh the mechanism giving out the value in the same subset for a neighboring data set plus Delta so
in other words uh well of course if Delta is zero this means that it's exactly the same as the definition of uh
differential privacy that we that we know already um but this Delta can be thought of as giving us some leway right
so there might be some uh low probability events or some subsets yes where the Epsilon differential privacy
property may not hold right so and we are saying that you know if if if that is a very small probability um event
then you know where where that is where the probability is at most Delta we are still safe right so we are we are not in
other words we are not enforcing that um the Epsilon differential privacy property which is a very strict
requirement that no matter what the subset s is in the output we need it to we need the property to be satisfied the
ratio should be at most e to the Epsilon but now we are saying that well there is some legal room of Delta for some
subsets yes right so that's that's why it's called approximate differential privacy because you are achieving EP so
one I mean one can show that if this property is true then you are achieving uh Epsilon differential privacy with for
most of the subsets s right so with very high probability um you are achieving Epsilon differential privacy so to
say um Okay so two questions one is how do we achieve this well one obvious way to achieve
this would be to use the LA mechanism itself right so because the LA mechanism is satisfies a stronger notion of
Epsilon differential privacy which means that with Delta equals z you know you get Epsilon comma zero differential
privacy if you use the laas mechanism uh but the whole idea is that you know because we are relaxing this a little
bit um maybe that's a different way to to achieve this and uh by doing that different way of achieving this can we
gain something um in terms of let's say utility right so that's the bigger question bigger hope that we have and
the answer is yes um and and again we won't go into the uh proofs of this uh in this course but um I would at least
like to tell you how to achieve this and what does that essentially give us right so to achieve
this achieve this uh use the what I'm going to call as the gausian
mechanism so instead of the laian mechanism we can use a slightly relaxed mechanism called the gaussian mechanism
where we can achieve this and how how exactly do we achieve that well um again I'm I'm going to give you the mechanism
itself we won't prove that this is Epsilon Delta in this this uh course but it's not too hard to prove also so we
will add gaussian noise as simple as that right so instead
of the laian noise uh we will add gausian noise to the function that we are trying to compute and what exactly
is this gaussian noise well gausian noise is typically indicated as n n for normal normal distribution same as
gausian distribution um with mean zero um and variance
um lawn of 1 by Delta um Del s by Epsilon squ so it's a slightly different noise
that you're adding so so if you if you are aware the laian noises you know uh PDF looks something like this now the
gaussian is a more common wellknown known um PDF um it it looks like this right so it falls
faster um than the laan uh than the laian noise um okay so what are we doing here
we are saying that if you want if you if I give you a specific Epsilon and Delta then the noise that you should add is of
course zero mean and the Delta plays a role here in the in the variance right so um and well
so there are two Deltas here so this is uh uh this is the Delta for the Epsilon Delta differential privacy this is the
sensitivity I'm not going to call this as Delta I'll call this as a sensitivity so this is the sensitivity squar by
Epsilon squar now it can one can show that if you if you do this then we can achieve Epsilon Delta differential
privacy now um like how for the laian case um when you have m multiple outputs we would add laian noise to each
of these components separately um and the sensitivity was calculated as an L1 sensitivity the L1 Norm difference
between the outputs of the neighboring uh data sets here what we will do is you know uh
for high dimensional outputs the noise that you will add will be zero mean lwn 1 by Delta the same thing Del s
by Epsilon squ but this will be del2 squ right so and that's just that just means this is the
L2 sensitivity right so instead of the L1 n we are looking at L2 Norm so basically
Del del2 can be defined as Max over X comma X Prime neighboring F of xus f of x Prime L2
Norm what is L2 Norm well L2 Norm is su/ I = 1 Tod f ofx - f of x Prime ith component square and then you take the
square root right so this is the standard ukian Norm the way we measure distance between any two points in space
in nulean geometry this is just that um L2 Norm so we have L2 squar sensitivity squared different
divide divided by Epsilon squ is what is the noise that you should add to achieve Epsilon Delta differential
privacy now the question is why are we doing this right so what do we gain by um adding gaussian noise um of course
what we lose is that from a Epsilon differential privacy we have lost the strong notion of Epsilon differential
privacy we only have an approximate notion right so that is there is some Pro some set s in the output space of
some probability for which our Epsilon differential privacy guarantee may not hold that's what this Delta is you know
allowing us right so there is some low probability event uh which if it happens um we cannot guarantee right so in fact
we can uh perhaps that means that you know by looking at the output maybe if that event happens we we will be able to
figure out whether X generated that output or X Prime generated that output the
difference is larger there right so and so we'll be able to figure out but such an output happens with very very low
probability and so we are we are kind of okay with that right so for most of the times if you draw noise from gaussian
distribution uh you are not going to see something where you know you will not have Epsilon differential privacy for
most of the time you will have strict Epsilon differential privacy for very rare events you will not have so
nevertheless we have lost something there right so if the RAR event happens then yes we don't have necessarily
Epsilon differential privacy but what have we gained right so we should we should have gained something otherwise
we wouldn't do this um and what we gain again is so we gain in terms of and this should be easy
to guess by now uh in terms of utility and and the way we will describe that is as follows
um well if you think of uh sensitivity itself right so how much uh um what's what is the sensitivity that we have
um so for the laian noise if just to give an example right so we were adding La of 0a d by Epsilon
uh which is like La of 0 comma um so if you if if each of the component was something like one by n sensitive um
when I say 1 byn you have n different binary variables and if if this binary varibles so then you know for
neighboring data sets it cannot change let's say something like a mean uh cannot change by more than 1 byn and
let's say if each of the components had that 1 byn kind of a sensitivity then the L1 Norm is adding up the uh maximum
difference in each of the components so there are D different outputs uh and each component is 1 byn sensitive so the
total noise that we'll essentially end up adding is um is of the order d by n Epsilon
this Epsilon is already there this D by n comes because it's just sum over I you know the absolute difference between F
of XI the I component which is all 1 by n and I = 1 to D and so this is d byn right so this is d by an Epsilon that's
the laian case now in the gaussian case what will happen is is that we will gain something in the gaussian case what we
will end up adding is n of0 comma um if you do the calculation for something very similar like this uh we will end up
getting something like square root of D log 1 by Delta by n Epsilon um so when I say that so this is
the uh standard deviation here right so uh so roughly the spread corresponds to the standard deviation um essentially
what the key thing to note is that the the N Epsilon in the denominator is the same that is the same so then we are
kind of trying to compare you know how much noise that we add um by looking at the numerator and if you look at the
numerator here it is of the order D the that the the the the spread that you add is of the order D which means that if
the D components D things that I'm releasing as output to the public then you know the the noise kind of adds up
linearly whereas here it's only square root of T of course there is an extra log 1 by Delta that you will have to
suffer but then that's just log right so okay so let me just be consistent here when I
say log it's typically lwn um which is just to the base log to the base e but that doesn't matter that's just a
constant uh which will differ even if I put log the order will remain the same um but but the important thing is that
this is of square root of T right so if we are comparing in the same scale um it has to be either compared as variance or
standard deviation right so if you fix that scale and see how much noise do you end up adding in a La Place mechanism
versus how much noise you end up adding in a um in a gaussian gaussian mechanism so there is quadratically lesser amount
of noise in terms of the number of components or number of know output that you're releasing
instead of D this is just square root of D well there is an extra factor that you'll have to suffer not exactly square
root of D slightly more than square root of D and that slight extra is lawn of 1 by Delta but because it's lawn you know
if Delta is even if Delta is small the logarithm is going to make that lawn of 1 by Delta not too large right so um
you know so so what essentially you're doing is you can think of it roughly as some constant time square root of D
versus d right so when D is large when the number of output that you're releasing is large then you will have to
add quadratically more amount of noise uh to maintain Epsilon differential privacy but then you can get away with
much less noise uh if you're if you only I mean if you're happy to settle for Epsilon Delta differential privacy
now because you're adding less amount of noise in the gaussian mechanism well basically you're corrupting the output
and giving it to the user um now the amount of corruption is the noise right so and and because you're
adding less amount of noise the the actual value that you really want to give to the user um but you cannot
because of privacy issues is getting corrupted lesser and so you will have higher utility right so so the gausian
noise helps you achieve higher utility because the amount of noise that you add is lower of course by this we can we can
try to analyze I mean we can work out what is the exact utility which I'm not really doing here uh but but the point
that I want to make is you know um because it's gaussian noise you will end up adding less amount of
noise um and and the fundamental reason why this works is because because you know of the L1 versus L2 uh ways of
measuring things um but again let's not worry about that too much in this course um the main takeaway here is that we are
letting go of some amount of privacy for certain rare outputs that's the Delta part and once you're okay with that what
you gain is for for the other things where you do have Epsilon differential privacy um your utility is quadratically
better than the utility that you would have gotten had you just used blindly appli mechanism right so lli mechanism
is Epsilon zero differentially private but then if you allow as little bit of you know leway in terms of not caring
about privacy for rare events then your utility becomes quadratically pattern so that's the approximate differential
privacy's advantage that's the first advantage of approximate differential privacy so what we will also see is are
some very interesting properties of approximate differential privacy uh mechanisms and that is why this this is
quite popular and we'll later see U um in the future videos that uh how we can make use of these properties that we are
going to see now to make machine learning algorithms also private right so but the reason uh why we are talking
about this is eventually we'll get to machine learning algorithms um and make them private and for that we will use
properties that I'm going to talk about now so what are these properties let's see
them okay so properties of um I'm going to call this ADP for approximate differential
privacy so the first property and there are several properties uh we'll we'll talk about some important properties
that are pretty much needed for from a machine learning context later on the first property is called as post
processing um so what does this mean so let's say we have some mechanism with us um
say Y is a mechanism is Epsilon Delta differentially private so I'm not saying approximate because the Delta
means that it's approximate differentially private now let's say what does this mean this means that you
know there is some input X and some data set and I do some computation on this data set I release something via the
mechanism M and I have added gaussian noise appropriately so that I can achieve Epsilon Delta differential
privacy now the world or an adversary sitting in the world is looking at this value that
m releases um and we are guaranteed that well M has this nice property that you know by just by looking at the output of
M I cannot figure out whether X generated it or X Prime generated it up to the Delta um you know Delta
probability now if I do some post processing on the output that M releases it should not
suddenly become less private right so let's say you you added noise and gave me a vector um and now I know
that this Vector has this nice property of Epsilon Epsilon Delta differential privacy now let's say I run I input this
Vector to some other function and now this Vector is postprocessed to become something else and suddenly
that's just by making this post processing um I should not lose any privacy guarantee right so which means
that uh if there is some other function f and then I'm composing f with M right so this is this is a composition
symbol basically I'm applying f after I apply M um that should also be Epsilon Delta
differential so which means that this is a natural thing but this is important to uh state it because you know um it
should not become the case that I'm only protecting what I'm releasing but not some function of it right so this is
saying that if you are doing approximate differential privacy you are not only protecting uh what you're releasing but
all functions of that as well I cannot do some you know let's say you give me a vector and then let's say I cannot add
the values of that vector and suddenly you know I get to know more information about the input that cannot happen right
so that's that's an that's a good qual I mean property to have and that is indeed Satisfied by Epsilon uh Delta
differential privacy the the second thing um is uh is another interesting property uh
and this is this will become very critical later on uh for um for for uh privacy in machine learning applications
especially when you're training uh using gradient distance and so on uh is the property of
composition and there are two different variants of it uh there's a basic variant and an advanced variant we'll
just look at the basic variant and I make a note or a comment about the advanced variant uh so let's say we
have uh M1 M2 dot dot dot MK there are K different mechanisms U and or
all Epsilon Delta differentially private right so you have K different way mechanisms which operate on a data um
and then they are releasing their own stuff and then all of them are Epsilon Delta differentially private um
now well essentially what does this mean this means that you know you are leak
you are kind of leaking information up to Epsilon Delta differential privacy via each of these mechanisms and now if
uh if a user has access to K such mechanisms output uh how much is the combined privacy guarantee uh
of the on the on the data set right so essentially I a data set and then I have a data set and then I give my data set
to you know K different parties and then they are doing their own things and then they are all releasing um Epsilon Delta
differentially private quantities to the world and then there is an adversary sitting in the world who has access to
all these K different um outputs from K different parties Each of which is Epsilon Delta differentially private now
how how private is my data set right now because you leaking Epsilon Delta information from each of these
mechanisms uh it cannot be the case that you have leaked only Epsilon Delta for the mechanism that that that
puts together all these K things the interesting thing is that the basic composition theorem says that you know
the combined thing right so an adversary who has access to Output of all these mechanisms um still you
know uh has something like K Epsilon K Delta differential privacy what does this mean this means
that the the the Privacy has become worse by a factor of K that's a natural thing to expect right
so uh that's that's a straightforward thing to expect because each of these mechanisms is
releasing uh releasing is leaking something like Epsilon and that there are K such
mechanisms so it's natural to expect that the total leak is K * Epsilon similarly the the U each one is Delta
approximately differentially private so you know you are it it might be different know sub output sets of
probability of Delta where you are failing and so when you're putting everything together uh a weak Union
bound kind of would say that there are K different portions of the output space where you are perhaps not Epsilon
differentially private right um so you know and each of these K has probability Delta so put together you you lose K
Delta right so I mean your approximation becomes worse um now this is the basic thing and and um I mean for the Curious
uh U you know participant who is watching the video you can try to prove this yourself it's very I mean assuming
that each of M1 MK is Epsilon Delta differentially private it's not actually too hard to prove that the combined
mechanism that releases all these together is k k k Delta kson K Delta differentially private there is an
advanced version um now you can take advantages of the fact that you know this is a gaussian and this is important
that this is a gaussian distribution where we are adding noise from um and you can
prove a complicated but an interesting result that you know the combined mechanism is actually uh something like
Epsilon um K square root of K uh log 1 by Delta prime plus
um Epsilon e to the Epsilon minus 1 comma K Delta + Delta Prime differentially private um yeah so it's a
it's a mouthful to read this out but the idea is that you know um you can get better
combined Epsilon uh better combined privacy and approximation uh if you do a more
careful analysis right so just to point this out um this was K uh K Epsilon so this is Epsilon times something like
square root of K of course there is another factor of log 1 by Delta Prime and you can optimize for Delta Prime and
so on but the but the interesting thing is that you know the K has gone out and you only have square root of K and why
is this important and we'll we'll talk about this later when we talking a bit about uh private machine learning just
to give an example um you can think of it as you know running um you know if you're if if you know the gradient
descent uh which is a standard uh you know Workhorse algorithm for machine learning now you can think of releasing
the gradients at every step uh while performing gradient descent and um you know which which means that at every
step you leaking some information and after making K gradient steps you would have lost K Epsilon in the usual way if
you are adding you know let's say laian noise but now if you add gaussian noise then what this is saying is that well
you would lose something like square root of kilon only which means that you can be private for more number of
iterations and the more number of iterations the better your quality of solution in a machine learning context
is going to be we'll again come back to this when we talk about privacy in machine learning but just to give an
idea for you that you know as you uh maybe think of this as a process where you're keeping on releasing um different
outputs and you want to preserve privacy in a combined fashion and and because you want to release out many such
outputs many I mean you want to run this algorithm for many rounds let's say gradient descent um you want to make
sure that uh at every round you know you don't lose too much and by doing a gaussian noise you can actually achieve
square root of K * Epsilon which you cannot do if you're using a lapl mechanism by the way right so you have
to use a gaussian mechanism and you have to settle for approximate differential privacy but because You' have settled
for approximate differential privacy you gain in terms of you know uh the the Privacy Factor by a quadratic Factor
right so it will be square root of K Epsilon as opposed to K Epsilon which is the basic uh basic thing right so both
of these are true for the gaussian mechanism the basic is easy to prove the Advan is hard to prove nevertheless both
are true whereas only the basic is true for a lapl mechanism that one's just not true so if we want um better privacy for
a longer period of time then we have to settle for um gaussian mechanism or an approximate differential
privacy so uh so with that we have concluded uh you know broadly the two two interesting aspects of privacy uh
one is uh Epsilon differential privacy the other is know what we saw just now which is the approximate Epsilon Delta
differential privacy we will see applications of this in machine learning as we go along um but the next thing
that we will see um is uh is trying to apply privacy in a slightly different context um especially in a in a buyer
seller kind of a cont text um and and we'll talk about that and then after that we'll St start talking about uh
privacy in machine learning using all the you know techniques that we have learned here thank you
[Music]
Heads up!
This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.
Generate a summary for freeRelated Summaries

Pseudorandom Generators: Encrypting Long Messages with Short Keys
Learn about pseudorandom generators and how they enable encryption of long messages with short keys in secure communications.

Understanding Perfect Secrecy in Cryptography: A Comprehensive Guide
Explore perfect secrecy, its definitions, and implications in cryptography in this detailed guide.

Understanding Modern Cryptography: The Shift from Perfect Secrecy to Computational Security
Explore the evolution of cryptography, focusing on key reusability and computational security concepts.

Introduction to Linear Predictors and Stochastic Gradient Descent
This lecture covers the fundamentals of linear predictors in machine learning, including feature extraction, weight vectors, and loss functions for classification and regression. It also explains optimization techniques like gradient descent and stochastic gradient descent, highlighting their practical implementation and differences.

Understanding Sensitivity Analysis in Linear Programming
Learn how sensitivity analysis helps us evaluate the impact of changes in optimization models on optimal solutions.
Most Viewed Summaries

A Comprehensive Guide to Using Stable Diffusion Forge UI
Explore the Stable Diffusion Forge UI, customizable settings, models, and more to enhance your image generation experience.

Mastering Inpainting with Stable Diffusion: Fix Mistakes and Enhance Your Images
Learn to fix mistakes and enhance images with Stable Diffusion's inpainting features effectively.

Pag-unawa sa Denotasyon at Konotasyon sa Filipino 4
Alamin ang kahulugan ng denotasyon at konotasyon sa Filipino 4 kasama ang mga halimbawa at pagsasanay.

How to Use ChatGPT to Summarize YouTube Videos Efficiently
Learn how to summarize YouTube videos with ChatGPT in just a few simple steps.

Ultimate Guide to Installing Forge UI and Flowing with Flux Models
Learn how to install Forge UI and explore various Flux models efficiently in this detailed guide.