CMSC 27200 Theory of Algorithms: Winter 2015

Material covered so far

$\newcommand{\cook}{\preccurlyeq_{\mathrm{Cook}}}$ $\newcommand{\karp}{\preccurlyeq_{\mathrm{Karp}}}$ $\newcommand{\PPP}{\mathsf P}$ $\newcommand{\NP}{\mathsf{NP}}$ $\newcommand{\NPC}{\mathsf{NPC}}$ $\newcommand{\coNP}{\mathsf{coNP}}$ $\newcommand{\coP}{\mathsf{coP}}$ $\newcommand{\factor}{\mathrm{FACTOR}}$ $\newcommand{\opt}{\mathrm{OPT}}$ $\newcommand{\dec}{\mathrm{DEC}}$ $\newcommand{\nnn}{\mathbb N}$ $\newcommand{\zzz}{\mathbb Z}$ $\newcommand{\rrr}{\mathbb R}$ $\newcommand{\ccc}{\mathbb C}$ $\newcommand{\fff}{\mathbb F}$ $\newcommand{\eee}{\mathrm e}$ $\newcommand{\vf}{\varphi}$ $\DeclareMathOperator{\Div}{\mathrm Div}$ $\DeclareMathOperator{\lcm}{\mathrm lcm}$ $\DeclareMathOperator{\var}{\mathrm Var}$ $\DeclareMathOperator{\cov}{\mathrm Cov}$ $\DeclareMathOperator{\period}{\mathrm per}$ $\DeclareMathOperator{\rank}{\mathrm rk}$ $\DeclareMathOperator{\trace}{\mathrm Tr}$ $\DeclareMathOperator{\span}{\mathrm Span}$ $\DeclareMathOperator{\sgn}{\mathrm sgn}$ $\DeclareMathOperator{\inv}{\mathrm Inv}$ $\DeclareMathOperator{\diam}{\mathrm diam}$

This synopsis is neither complete nor totally accurate. Please alert the instructors by email to any errors/omissions/discrepancies.

Note: the textbook uses “lg” to denote base-2 logarithms ($\log_2$). We did not follow this convention during the first two weeks of classes but will try to follow it from the third week on. The synopsis below follows this convention.

Wed 3/11Review of definitions of Cook and Karp reducibilities. PARTITION is NP-complete (reduction of SUBSET SUM to PARTITION). Note that PARTITION is a special case of SUBSET SUM--this is not unusual. Restrictions of NP-complete problems may be NP-complete.

Mon 3/9More NP-completeness. New topic: on-line algorithms. Competitive ratio. Ski rent vs. buy problem. Online algorithms for the 2-server problem on a line (see handout).

Fri 3/6Definition of approximation ratio. Definition of NP-hard. FPTAS (fully polynomial time approximation scheme) for KNAPSACK. (For any $\epsilon > 0$ there is a polynomial time algorithm that computes a value $S$ for KNAPSACK with $S \ge (1-\epsilon )OPT$ where $OPT$ is the optimal value of KNAPSACK. ) CLIQUE, VERTEX COLOR problems. Stated without proof: CLIQUE cannot be approximated within $n^{1-o(1)}$ unless P=NP. Probabilistic approximation algorithm for MAX 3-SAT: random assignment satisfies a clause with probability 7/8. If $s_i$ is the indicator variable for the i-th clause being satisfied, use linearity of expectation on the expectation of their sum.

Wed 3/4The HAMILTON CYCLE decision problem. Recap of matchings, and perfect matchings, augmenting paths, characterization of non-existence of perfect matchings (constricted sets.) 3-D MATCHING definition, stated that it is NP-complete. Discussion of the P vs NP boundary: 2-SAT, 2D matching, Eulerian Cycle in P; 3-SAT, 3D matching, Hamiltonian Cycle are NP-complete. Definition of SUBSET SUM, Karp reduction of 3D matching to SUBSET SUM. Definition of (INTEGER) KNAPSACK. Remider of the Dynamic Programming algorithm for Knapsack (pseudo-polynomial!).

Mon 3/2 Discussion of some Midterm-2 problems: 7(a,c) (the complexity class $\PPP$, Euclid's algorithm; 8(b,e) role of black vertices in Dijkstra's and Jarnik's algorithms; 9(configurations, configuration space, loop invariants). -- Review: 1-1 correspondence between laguages and decision problems. Karp reduction, reducibility, CNF and 3-CNF formulas, 3-SAT. Cook--Levin Theorem (1972) stated: 3-SAT $\NP$-complete. Outline of proof: Turing machine computation simulated by CNF formula. The CLIQUE decision problem: INPUT: $(G,k)$ ($G$: a graph, $k$: a positive integer); Question: Does $G$ contain $K_k$? -- Karp's Theorem (1973): a list of some $\NP$-complete problems, central to combinatorial optimization, operations research, various areas of statistics and AI such as clustering, pattern recognition stated: CLIQUE, 3-COL, HAMILTONICITY, Integer Knapsack. -- Proved: CLIQUE $\NP$-complete by proving that 3-SAT $\karp$ CLIQUE.

Fri 2/27 Review: $\Sigma$: finite alphabet, $\Sigma^*$: set of all finite strings over $\Sigma$; language: $L\subseteq\Sigma^*$. Decision problem $\leftrightarrow$ membership problem in a language. Formal definition of language classes $\PPP$, $\NP$, $\coNP$. Formal proof of $\PPP\subseteq\NP$. Observed: $\coP = \PPP$ and therefore $\PPP\subseteq \NP\cap\coNP$. Diagram of these complexity classes.

Conjectures: (1) $\PPP\neq\NP$ (2) $\NP\neq\coNP$ (3) $\PPP\neq\NP\cap\coNP$.

DEF: $L\subseteq\Sigma^*$ is a well characterized language if $L\in\NP\cap\coNP$. Example of good characterization: Kuratowski's Theorem (planarity is well characterized). Easier example: graph bipartite $\iff$ no odd cycle: good characterization of bipartiteness (2-colorability). Story about impatient boss.

But it turns out that $\{$bipartite graphs$\}\in \PPP$ (easy) and $\{$planar graphs$\}\in \PPP$ (difficult). So is there a candidate language to support Conj. (3)? Yes, the decision version of factoring.

DEF: A function $f:\Sigma_1^*\to\Sigma_2^*$ is Cook-reducible to a function $g:\Sigma_3^*\to\Sigma_4^*$ if there is a polynomial-time algorithm that computes $f$ given an oracle (black box) that evaluates $g$. Notation: $f \cook g$ (or $g \succcurlyeq_{\mathrm{Cook}} f$).

Examples: optimization problems $\opt$ such as Integer Knapsack have a corresponding decision problem $\dec$ and $\opt\cook\dec$ via the binary search trick discussed last time. The other direction, $\opt\cook\dec$, is obvious, so these two problems are

Cook equivalent. We look for a decision problem that is Cook equivalent to factoring (splitting an integer into prime factors). We define the language $\factor=\{(x,y) \mid x,y$ are positive integers such that $(\exists d)(d \mid x$ and $2\le d\le y)\}$. So a pair $(x,y)$ belongs to $\factor$ iff $x$ has a non-trivial divisor not greater than $y$. We showed that $\factor$ is Cook equivalent to factoring integers. Facts: $\factor\in\NP$ (witness: $d$) and, suprisingly, $\factor\in\coNP$ (Pratt 1972) (witness: prime factorization of $x$). For this witness to work, the Verifier needs to verify in polynomial time that the factors given are indeed prime. AKS Theorem (Agrawal - Kayal - Saxena, 2002) stated: $\{$primes$\}\in \PPP$. Conclusion: $\factor\in\NP\cap \coNP$ and $\factor$ is not believed to be in $\PPP$. If it is in $\PPP$ then RSA can be broken in polynomial time. So RSA depends on Conjecture (3).

DEF: Karp reduction (among decision problems/languages) -- reviewed: Let $L_1\subseteq \Sigma_1^*$, $L_2\subseteq \Sigma_2^*$ be languages. A function $f : \Sigma_1^*\to \Sigma_2^*$ is a Karp reduction of $L_1$ to $L_2$ if (i) $f$ is computable in polynomial time and (ii) $\forall x\in\Sigma_1^*)(x\in L_1 \iff\ f(x)\in L_2)$. We say that $L_1$ is Karp reducible to $L_2$ if there exists such a Karp reduction; notation: $L_1\karp L_2$.

DEF: The language $L\subseteq\Sigma^*$ is $\NP$-complete if (i) $L\in\NP$ and (ii) $(\forall L'\in\NP)(L'\karp L)$. So the $\NP$-complete problems are the “hardest” problems in $\NP$. Examples: 3-SAT, 3-COL, CLIQUE, Integer Knapsack, HAMILTONIAN graphs (stated, not proved). But FACTOR is not $\NP$-complete unless $\NP=\coNP$, i.e., Conj. (2) is false. (Stated but not proved, will be an exercise.)

Wed 2/25 Toward the definition of NP, second installment. Optimization problems versus decision problems, reduction via binary search. Role of witness.

Mon 2/23 More reductions. Definition of CNF, DNF. Fact: every Boolean function can be written as a DNF formula (AND of n literals gives you a unique assignment: OR all the ones where the function is 1). Corollary: can be written as a CNF formula (use De Morgan's Laws.) Definition of CNF SAT ("SAT") and 3-SAT. Boolean formula satisfiability reduces to 3-SAT (proof in Cormen). Same is true of Boolean circuit satisfiability.

Fri 2/20 Towards a definition of NP. Definitions: (finite) alphabet, string, language, characteristic function. Verification, witness string. Correctness requirement for verification function for $x \in L$ there is some witness, for $x \notin L$ no witness works. Note the asymmetry between "yes" and "no". Cormen terminology: "problem" "instance". We require the verification to be poly time in bit-lnegth of the input (so witness poly length). Examples: graph $k$-colorability, $k$-clique in graph, set partition, Boolean formula satisfiability. Definition of polynomial time (Karp) reducibility. Proof that 3-colorability reduces to Boolean formula satisfiability (idea: get variables denot the color of each vertex. For each vertex, write the formula that says that the vertex has exactly one color. For each edge, write the formula that endpoint vertices have different colors).

Wed 2/18 Fermat test (of compositeness) reviewed. Primality testing -- Miller-Rabin, Solovay-Strassen mentioned, not described. Idea: modified Fermat test. Outcome of probabilistic algorithm interpreted: NOT probability that a given number is prime (that probability is either zero or 100%) but the confidence in our judgment (probability that we make an error). This probability is zero if the number is prime and at most 1/2 i the number is composite. Repeat this $t$ times; the probability of making an error goes down to $1/2^t$. Note: $2^{500}$ is greater than the number of particles in known universe. But we shall still not have a proof of primality.

Minimum-cost spanning tree problem. Input: undirected, connected, edge-weighted graph, no source node; negative weights permitted. Greedy algorithm: variations. Boruvka (1928), Kruskal (1956) mentioned. Jarnik's (1930) algorithm (a.k.a. Prim's 1957) described in detail (grow tree from a source node). Pseudocode almost identical with Dijkstra's. -- Effect of adding the same number to each weight discussed: min-cost spanning tree problem does not change, min-cost path problem changes.

Graph coloring, chromatic number, greedy coloring algorithm.

Mon 2/16 RSA cryptosystem reviewed. Need efficient modular exponentiation. Theoretical benchmark of efficiency: polynomial time. Taking large powers: NOT polynomial-time. Modular exponentiation: method of repeated squaring. Elegant pseudocode. -- How to select large random primes? Prime Number Theorem (Jacques Hadamard and Pierre de la Vallée Poussin, 1896): $\pi(x)\sim x/\ln x$. Expected number of trials before we hit a prime. How to test primality without trying to find a factor (the latter being hopeless)? Fermat test: method to refute primality without finding a factor.

Wed 2/11 Least common multiple. Splitting a congruence modulo $pq$ where $p,q$ are relatively prime, into a congruence modulo $p$ and a congruence modulo $q$. Unique prime factorization (mentioned). Addition, subtraction, multiplication of congruences. Prime number (definition). Fermat's little Theorem (stated).

Cryptography. Information-based: one-time pad. Complexity-based: Public key cryptosystems (Diffie-Hellman). Implementation: The RSA cryptosystem. Proof that RSA encryption and decryption are inverses, based on FlT: If $p,q$ are distinct primes, $N=pq$, $K=(p-1)(q-1)$, and $ed\equiv 1 \pmod K$ then $(\forall x)(x^{ed}\equiv x \pmod N)$.

Mon 2/9 Division Theorem. Lemma (DO): $\Div(a,b)=\Div(a-b,b)$. Gcd: Euclid's algorithm. Pseudocode. Polynomially bounded function, polynomial-time algorithm. DO: prove that in every two rounds of Euclid's algorithm, the number of binary digits decreases by at least 1. Infer that Euclid's algorithm takes $\le 2n$ round where $n$ is the number of binary digits of the smaller of the two numbers. Infer that Euclid's algorithm runs in polynomial time. -- Multiplicative inverse of $a$ modulo $m$: solution to congruence $ax\equiv 1 \pmod m$. Notation: $a^{-1} \pmod m$. Exists if and only if $\gcd(a,m)=1$.

Fri 2/6 Amortized complexity. Fibonacci heap (Fredman-Tarjan 1987): statement of amortized cost of INSERT, EXTRACT-MIN, DECREASE-KEY. Total cost of Dijkstra with the PQ implemented by Fibonacci heap: $O(V \log V +E)$. Proof that this is optimal: for every pair of numbers $V$ and $E$ such that $V-1 \le E \le V(V-1)$ there is a weighted digraph with positive weights such that any Dijkstra implementation requires $\Omega(V \log V + E)$ operations (in fact, just comparisons of reals and scanning input data must occur at least this number of times).

Elements of number theory: divisibility, proper definition of gcd. Please check Chapter 4 of instructor's online Discrete Mathematics Lecture Notes.

Wed 2/4 Bellman-Ford algorithm for single-source min-cost paths in the presence of negative edges. Loop invariant to prove correctness and to detect the presence of a negative cycle.

Order statistics: selecting the median and the $k$-th smallest item in $O(n)$. Divide-and-conquer goal: finding lucky pivot (item of rank between $cn$ and $(1-c)n$ for some constant $c>0$). First algorithm: randomized. Analysis using linearity of expectation and the following fact: if a Bernoulli trial has probability $p$ of success and we repeat it until the first success occurs then the expected number of trials is $1/p$. Second algorithm: deterministic. Leads to recurrence $T(n)\le T(n/5) + T(7n/10)+O(n)$. HW: Show that this recurrent inequality implies $T(n) = O(n)$.

Mon 2/2 Completion of the proof of correctness of Dijkstra's algorithm: the main loop invariant that explains the meaning of the current cost for each vertex at the start of each cycle.

Fri 1/30 Configuration space. Loop invariants. Loop invariants for Dijkstra's algorithm. Proof of correctness of Dijkstra's algorithm (started).

Wed 1/28 Pre-midterm review. Homework problems discussed. The “alternating paths problem” (Exercise 2.16) and the “car race problem” (Ex. 3.13(b),(c),(d)); both involve the application of BFS to digraphs that are not part of the input but need to be constructed; constructing this digraph is the essence of the solution. In the case of the car race problem, the construction involves constant-time look-up of membership in the "race track;" a hashing trick is dicussed that is needed in the case $|R| < n$. Information theory lower bound for ternary queries (Ex. 3.8) and its application to lower bounds on the fake-coin problem (Ex. 3.10) discussed.

Diskjstra's algorithm reviewed; timing discussed in terms of the three data structure operations used (INSERT, EXTRACT-MIN, DECREASE-KEY).

Mon 1/26 Finite probability space: $(\Omega, P)$ -- sample space, probability distribution. Events, probability of events. Random variable. Expected value. DO: $\min X \le E(X) \le \max X$. Linearity of expectation. Indicator variable, $E(I_A)=P(A)$. Note: Instructor's Discrete Mathematics lecture notes posted. Study these chapters: Finite Probability Spaces, Graph/Digraph terminology, Asymptotic equality/inequality.

Weighted digraph $(V,E,w)$. Min-cost path problem. Single-source min-cost path algorithm for non-negative edge weights: Dijkstra's algorithm. Pseudocode. Algorithm requires priority queue.

Fri 1/23 Data structures. Priority queue; requests served: Insert, Extract-Min. Complete binary tree; height $=\lfloor \lg n\rfloor$. Links: parent, left child, right child. Heap. Cost of Insert, Extract-Min, Decrease-Key, Increase-Key. Bubbling up, bubbling down. Heapsort. Heapify in $O(n)$.

Wed 1/21 Divide and Conquer. The Karatsuba algorithm to multiply polynomials and to multiply integers. Evaluation of recurrent inequalities: the method of reverse inequalities. Merge-sort: asymptotically optimal number of comparisons ($\sim n\lg n$), and within a constant factor optimal total cost ($\Theta(n\lg n)$, including bookkeeping).

Fri 1/16 Asymptotic equality. Stirling's formula: $n! \sim (n/\eee)^n \sqrt{2\pi n}$ (stated without proof). Proof of the asymptotic equality $\log(n!) \sim n\log n$ (log to any base) without using Stirling's formula. Lemma: $(n/\eee)^n < n! \le n^n$. Information theory lower bound: to identify an object out of $N$ objects requires at least $\lg N$ binary queries. (lg refers to base-2 logarithms.) Comparison-based sorting requires $\ge \lg(n!) \sim n\lg n$ comparisons (no matter what the algorithm). Insertion sort (using binary search for insertions) takes at most $n+\lg(n!) \sim n\lg n$ comparisons. So the asymptotic complexity of comparison-based sorting is $\sim n\lg n$ comparisons. This calculation ignores the cost of bookkeeping but sets the target to find sorting algorithms where the total cost (including bookkeeping) is $O(n\lg n)$.

Wed 1/14 Depth-First Search (DFS). Discover time, finish time of each vertex. Parenthesis theorem. White-path theorem. Classification of edges: tree edge, back edge, forward edge, cross edge. Application: topological sort. Strong components.

Mon 1/12 Breadth-First Search (BFS). Precise algorithm, using adjacency list representation as input. Proof of linear time bound. Proof of correctness, using invariants on the queue maintained by the algorithm. Definition of topological sort.

Fri 1/9 Graphs: review of basic concepts and notation. Representing graphs in computers: edge lists and adjacency lists. Simple manipulations of representations, linear-time algorithms. Breadth-First search (BFS) (introduction)

Wed 1/7 Fundamental algorithmic techiques: (1) Dynamic Programming: the Knapsack Problem. (2) Binary search. Each algorithm described in elegant pseudocode.

Mon 1/5 Introduction. Computational task: input $\mapsto$ output function. Model of computation: steps, associated cost. Worst-case analysis of algorithms: performance guarantees: proof of correctness, estimating the cost. Randomized algorithm: analysis may involve estimated probability of error for “worst input” (i.e., for all inputs).

Communication complexity: input split between two processors, local computation is free, each bit of communication between the processors costs one unit. Randomized communication complexity of the identity function. Analysis using some number theory. Asymptotic equality. The Prime Number Theorem.