Friday, April 4, 2008 * Probabilistically Checkable Proofs (PCP) * PCP and Inapproximability PROBABILISTICALLY CHECKABLE PROOFS Probabilistically Checkable Proofs is a refinement of Interactive Proof systems where we explicitly take into account two resources used by the verifier -- queries and randomness. Note that in the IP system, we could have assumed (by increasing the number of rounds) that the prover always responds with one bit. This is what we will assume in the future here. Futhermore, we will assume that the verifier, includes in each of its queries, the entire message exchange thus far. So we can assume that the prover is nonadaptive. Note that the number of queries is simply the number of rounds. PCP(r(n),q(n)) is the class of languages for which a (P,V) system exists that satisfy the following conditions: (1) V is a probabilistic polynomial-time verifier. (2) The number of random bits used by V is O(r(n)). (3) The number of queries is O(q(n)). (4) w is in L => Pr[V accepts] = 1. (5) w is not in L => Pr[V accepts] < c. Here c can be made arbitrarily close to 0. MAIN PCP THEOREM: NP = PCP(log(n), 1). That is, for any SAT formula, there is a way to write a polynomial-size proof down that a poly-time verifier can check the correctness with high probability by looking at only a constant number of bits of the proof! APPROXIMATION ALGORITHMS An algorithm A is an $a(n)$-approximation for a minimization problem P if for every instance I of P, we have: cost(A(I)) <= OPT(I)*a(|I|). Similarly for a maximization problem, cost(A(I)) >= OPT(I)*a(|I|) Since NP-complete problems do not seem to be solvable in polynomial time, can we hope to solve the optimization versions approximately. This has been a focus of attention the last couple of decades. A number of problems have been resolved but several more remain open. Consider MAX-3SAT. We would like to maximize the number of clauses satisfied. Theorem: There exists a polynomial-time 7/8-approximation for MAX-3SAT. Proof: Set each variable to be true with prob 1/2. The probability that a clause is satisfied is 7/8. So expected number of clauses satisfied is 7m/8. How do we derandomize it? For each x_i, i from 1 to n, compute: E[# clauses satisfied|x_1 = 1] and E[# clauses satisfied|x_1 = 0] One of them has to be at least 7m/8. Pick that assignment. Continue. End Proof Theorem: There is a PTAS for MAX-3SAT iff P = NP. Proof: If P = NP, then one can solve MAX-3SAT optimally. For the other direction, suppose we have a PTAS for MAX-3SAT. We will show that P = NP. Consider a language L in NP. Since NP = PCP(log(n), 1), there exists a prover-verifier system for L in which the V uses clog(n) bits and k queries. Given string x, we will construct a 3SAT formula f_x such that: -- if x is in L, then f_x is satisfiable -- if x is not in L, then at most 1-eps fraction of the clauses can be satisfied for eps = 1/((k-2)2^{(k+1)}). If we had a PTAS for MAX-3SAT, then we can distinguish between the two cases, thus establishing that L is in P, implying that P = NP. The PCP protocol determines a computation tree T with two kinds of branches -- random branches and oracle branches. There are only n^c possible random bit strings. For each such random bit string y, let T_y be the subtree of T obtained by fixing the random bit string choice y. Thus, T_y has only oracle branches. There are 2^k leaves in this tree. Some of these lead to accept, some to reject. We can write a DNF formula corresponding to the accepting branches. These can be converted to CNF form. After this, we can convert the CNF formula to a 3SAT formula. This has (k-2)2^k clauses. Putting together this formula for every random bit string gives a formula f with (k-2)2^kn^c clauses. If x is in L, then there exists a P such that V always accepts x. So for each random string y, there exists a choice of the oracle answers that satisfies the DNF (and hence the CNF as well). So every clause is satisfied. Suppose x is not in L. Then, for any prover P', at most half of the random bit choices lead to accept, and the other half lead to reject. So any assignment has to leave at least one clause unsatisfied for at least half the random strings. Thus, number of clauses that can be satisfied is at most n^c((k-2)2^k - 1/2). So number of unsatisfied clauses is at least 1/((k-2)(2^{k+1})) = eps fraction. If MAX-3SAT has a > (1-eps)-approximation, then if x is in L, we would be able to satisfy > 1-eps fraction. Otherwise, can only satisfy < 1 - eps fraction. End Proof Theorem: There is no alpha-approximation for MAX-CLIQUE unless P = NP. Proof: If P = NP, then one can solve MAX-CLIQUE optimally. For the other direction, suppose we have an alpha-approximation for MAX-CLIQUE. We will show that P = NP. Consider a language L in NP. Since NP = PCP(log(n), 1), there exists a prover-verifier system for L in which the V uses O(log(n)) bits and O(1) queries. Given string x, we will construct a G such that: -- if x is in L, then G has a clique of size n^c. -- if x is not in L, then the largest clique has size < alpha*n^c. Consider a random string y and a sequence of oracle answers a. This completely determines the oracle query sequence q. Let V be the set {(y,q,a):V accepts on the computation path given by y, q and a}. We have an edge between (y,q,a) and (y',q',a') if the answer to every query in (q',a') is consistent with the answer in every query in (q,a). Note that (y,q,a) is consistent with (y,q',a') iff q = q' and a = a'. Further note, that any prover only issues consistent answers. If x is in L, there is a P (and hence a proof pi) such that V accepts for all random strings. For each random string y, we have exactly one (q,a) pair occurring -- this leads to acceptance. If x is not in L, there is no P' (and hence no proof pi) such that V accepts for more than alpha fraction of the random strings. If there were a clique of size > alpha*n^c, that would yield a prover which can achieve an acceptance probability of more than alpha, a contradiction. End Proof