Malicious Machine Learnings? Privacy Preservation and Computational Correctness Across Parties. Note Quote/Didactics.


Multi-Party Computation deals with the following problem: There are n ≥ 2 parties P1, . . ., Pn where party Pi holds input ti, 1 ≤ i ≤ n, and they wish to compute together a functions = f (t1, . . . , tn) on their inputs. The goal is that each party will learn the output of the function, s, yet with the restriction that Pi will not learn any additional information about the input of the other parties aside from what can be deduced from the pair (ti, s). Clearly it is the secrecy restriction that adds complexity to the problem, as without it each party could announce its input to all other parties, and each party would locally compute the value of the function. Thus, the goal of Multi-Party Computation is to achieve the following two properties at the same time: correctness of the computation and privacy preservation of the inputs.

The following two generalizations are often useful:

(i) Probabilistic functions. Here the value of the function depends on some random string r chosen according to some distribution: s = f (t1, . . . , tn; r). An example of this is the coin-flipping functionality, which takes no inputs, and outputs an unbiased random bit. It is crucial that the value r is not controlled by any of the parties, but is somehow jointly generated during the computation.

(ii) Multioutput functions. It is not mandatory that there be a single output of the function. More generally there could be a unique output for each party, i.e., (s1, . . . , sn) = f(t1,…, tn). In this case, only party Pi learns the output si, and no other party learns any information about the other parties’ input and outputs aside from what can be derived from its own input and output.

One of the most interesting aspects of Multi-Party Computation is to reach the objective of computing the function value, but under the assumption that some of the parties may deviate from the protocol. In cryptography, the parties are usually divided into two types: honest and faulty. An honest party follows the protocol without any deviation. Otherwise, the party is considered to be faulty. The faulty behavior can exemplify itself in a wide range of possibilities. The most benign faulty behavior is where the parties follow the protocol, yet try to learn as much as possible about the inputs of the other parties. These parties are called honest-but-curious (or semihonest). At the other end of the spectrum, the parties may deviate from the prescribed protocol in any way that they desire, with the goal of either influencing the computed output value in some way, or of learning as much as possible about the inputs of the other parties. These parties are called malicious.

We envision an adversary A, who controls all the faulty parties and can coordinate their actions. Thus, in a sense we assume that the faulty parties are working together and can exert the most knowledge and influence over the computation out of this collusion. The adversary can corrupt any number of parties out of the n participating parties. Yet, in order to be able to achieve a solution to the problem, in many cases we would need to limit the number of corrupted parties. This limit is called the threshold k, indicating that the protocol remains secure as long as the number of corrupted parties is at most k.

Assume that there exists a trusted party who privately receives the inputs of all the participating parties, calculates the output value s, and then transmits this value to each one of the parties. This process clearly computes the correct output of f, and also does not enable the participating parties to learn any additional information about the inputs of others. We call this model the ideal model. The security of Multi-Party Computation then states that a protocol is secure if its execution satisfies the following: (1) the honest parties compute the same (correct) outputs as they would in the ideal model; and (2) the protocol does not expose more information than a comparable execution with the trusted party, in the ideal model.

Intuitively, the adversary’s interaction with the parties (on a vector of inputs) in the protocol generates a transcript. This transcript is a random variable that includes the outputs of all the honest parties, which is needed to ensure correctness, and the output of the adversary A. The latter output, without loss of generality, includes all the information that the adversary learned, including its inputs, private state, all the messages sent by the honest parties to A, and, depending on the model, maybe even include more information, such as public messages that the honest parties exchanged. If we show that exactly the same transcript distribution can be generated when interacting with the trusted party in the ideal model, then we are guaranteed that no information is leaked from the computation via the execution of the protocol, as we know that the ideal process does not expose any information about the inputs. More formally,

Let f be a function on n inputs and let π be a protocol that computes the function f. Given an adversary A, which controls some set of parties, we define REALA,π(t) to be the sequence of outputs of honest parties resulting from the execution of π on input vector t under the attack of A, in addition to the output of A. Similarly, given an adversary A′ which controls a set of parties, we define IDEALA′,f(t) to be the sequence of outputs of honest parties computed by the trusted party in the ideal model on input vector t, in addition to the output of A′. We say that π securely computes f if, for every adversary A as above, ∃ an adversary A′, which controls the same parties in the ideal model, such that, on any input vector t, we have that the distribution of REALA,π(t) is “indistinguishable” from the distribution of IDEALA′,f(t).

Intuitively, the task of the ideal adversary A′ is to generate (almost) the same output as A generates in the real execution or the real model. Thus, the attacker A′ is often called the simulator of A. The transcript value generated in the ideal model, IDEALA′,f(t), also includes the outputs of the honest parties (even though we do not give these outputs to A′), which we know were correctly computed by the trusted party. Thus, the real transcript REALA,π(t) should also include correct outputs of the honest parties in the real model.

We assumed that every party Pi has an input ti, which it enters into the computation. However, if Pi is faulty, nothing stops Pi from changing ti into some ti′. Thus, the notion of a “correct” input is defined only for honest parties. However, the “effective” input of a faulty party Pi could be defined as the value ti′ that the simulator A′ gives to the trusted party in the ideal model. Indeed, since the outputs of honest parties look the same in both models, for all effective purposes Pi must have “contributed” the same input ti′ in the real model.

Another possible misbehavior of Pi, even in the ideal model, might be a refusal to give any input at all to the trusted party. This can be handled in a variety of ways, ranging from aborting the entire computation to simply assigning ti some “default value.” For concreteness, we assume that the domain of f includes a special symbol ⊥ indicating this refusal to give the input, so that it is well defined how f should be computed on such missing inputs. What this requires is that in any real protocol we detect when a party does not enter its input and deal with it exactly in the same manner as if the party would input ⊥ in the ideal model.

As regards security, it is implicitly assumed that all honest parties receive the output of the computation. This is achieved by stating that IDEALA′,f(t) includes the outputs of all honest parties. We therefore say that our currency guarantees output delivery. A more relaxed property than output delivery is fairness. If fairness is achieved, then this means that if at least one (even faulty) party learns its outputs, then all (honest) parties eventually do too. A bit more formally, we allow the ideal model adversary A′ to instruct the trusted party not to compute any of the outputs. In this case, in the ideal model either all the parties learn the output, or none do. Since A’s transcript is indistinguishable from A′’s this guarantees that the same fairness guarantee must hold in the real model as well.

A further relaxation of the definition of security is to provide only correctness and privacy. This means that faulty parties can learn their outputs, and prevent the honest parties from learning theirs. Yet, at the same time the protocol will still guarantee that (1) if an honest party receives an output, then this is the correct value, and (2) the privacy of the inputs and outputs of the honest parties is preserved.

The basic security notions are universal and model-independent. However, specific implementations crucially depend on spelling out precisely the model where the computation will be carried out. In particular, the following issues must be specified:

  1. The faulty parties could be honest-but-curious or malicious, and there is usually an upper bound k on the number of parties that the adversary can corrupt.
  2. Distinguishing between the computational setting and the information theoretic setting, in the latter, the adversary is unlimited in its computing powers. Thus, the term “indistinguishable” is formalized by requiring the two transcript distributions to be either identical (so-called perfect security) or, at least, statistically close in their variation distance (so-called statistical security). On the other hand, in the computational, the power of the adversary (as well as that of the honest parties) is restricted. A bit more precisely, Multi-Party Computation problem is parameterized by the security parameter λ, in which case (a) all the computation and communication shall be done in time polynomial in λ; and (b) the misbehavior strategies of the faulty parties are also restricted to be run in time polynomial in λ. Furthermore, the term “indistinguishability” is formalized by computational indistinguishability: two distribution ensembles {Xλ}λ and {Yλ}λ are said to be computationally indistinguishable, if for any polynomial-time distinguisher D, the quantity ε, defined as |Pr[D(Xλ) = 1] − Pr[D(Yλ) = 1]|, is a “negligible” function of λ. This means that for any j > 0 and all sufficiently large λ, ε eventually becomes smaller than λ − j. This modeling helps us to build secure Multi-Party Computational protocols depending on plausible computational assumptions, such as the hardness of factoring large integers.
  3. The two common communication assumptions are the existence of a secure channel and the existence of a broadcast channel. Secure channels assume that every pair of parties Pi and Pj are connected via an authenticated, private channel. A broadcast channel is a channel with the following properties: if a party Pi (honest or faulty) broadcasts a message m, then m is correctly received by all the parties (who are also sure the message came from Pi). In particular, if an honest party receives m, then it knows that every other honest party also received m. A different communication assumption is the existence of envelopes. An envelope guarantees the following properties: a value m can be stored inside the envelope, it will be held without exposure for a given period of time, and then the value m will be revealed without modification. A ballot box is an enhancement of the envelope setting that also provides a random shuffling mechanism of the envelopes. These are, of course, idealized assumptions that allow for a clean description of a protocol, as they separate the communication issues from the computational ones. These idealized assumptions may be realized by a physical mechanisms, but in some settings such mechanisms may not be available. Then it is important to address the question if and under what circumstances we can remove a given communication assumption. For example, we know that the assumption of a secure channel can be substituted with a protocol, but under the introduction of a computational assumption and a public key infrastructure.