Algorithm

CLRS book, Ch 2

Algorithm Correctness

For any input instance, it halts with the correct (expected) output

Look at the Pseudocode of InsertionSort(A):

The numbers that we wish to sort are also known as the keys. Although conceptually we are sorting a sequence, the input comes to us in the form of an array with n elements.

for j=2 to A.length
 	key = A[j]
   // Insert A[j] into the sorted sequence A[1 .. j-1].
   i = j-1
   while i>0 and A[i]>key
   	A[i+1] = A[i]	//shifting to the right
     i = i-1
   A[i+1] = key

^1-base index is used here

C++ implementation of Insertion sort

#include <iostream>

int main(int argc, char* argv[]) {

  int A[10] = {2, 3, 6, 1, 7, 9, 8, 4, 10, 5};

  // print the array
  for(int k = 0; k < 10; k++)
    std::cout << A[k]
              << "\t";
  std::cout << std::endl
            << std::endl;

  for(int j = 1; j < 10; j++) {
    int key = A[j];
    
    int i = j - 1;
    //insert A[j] into the sorted A[1..j-1]
    while( i >= 0 && A[i] > key ) {
      A[i+1] = A[i];
      i = i - 1;
    } //end while
    A[i+1] = key;
  } //end for

  // print the array
  for(int k = 0; k < 10; k++)
    std::cout << A[k]
              << "\t";
  std::cout << std::endl;
  
  return 0;
}

How do we prove the Algorithm Correctness?

Loop invariants

What is a loop invariant?

One way to show the correctness => Loop Invariants:

We use loop invariants to help us understand why an algorithm is correct. We must show three things about a loop invariant:

A loop invariant that is:
- True prior to the first iteration of the loop (initialization)
  - before the first iteration, the loop invariant has to be true
- If it’s true before an iteration of the loop, it remains true before the next iteration (maintenance)
  - showing that each iteration maintains the loop invariant
- When the loop terminates the invariant gives us a useful property that helps show that the algorithm is correct (termination).
  - This is the step where we will prove the correctness of the algorithm

A loop invariant is an assertion that is placed at the top of a loop and that must hold true every time the computation returns to the top of the loop.

We need Loop invariants because we need to ensure that the loop terminates. If the loop invariant is always true and the loop is designed to terminate when the invariant is no longer true, then we know that the loop will eventually terminate.

Example: Verify Algorithm Correctness of Insertion Sort

for j=2 to A.length
 	key = A[j]
   // Insert A[j] into the sorted sequence A[1 .. j-1].
   i = j-1
   while i>0 and A[i]>key
   	A[i+1] = A[i]	//shifting to the right
     i = i-1
   A[i+1] = key

^1-base index is used here

Loop Invariant in this case: Sub-array[1 to j-1] is always sorted.

Initialization: Before the first iteration j=2. So sub-array [1:1] is the array to be tested. As it has only one element so it is sorted. Thus loop invariant is satisfied.

Maintenance: Showing that each iteration maintains the loop invariant that sub-array [1 to j-1] is sorted. This can be easily verified by checking the loop invariant after each iteration. In this case it is satisfied.

Termination: Finally, we examine what happens when the loop terminates. The condition causing the for loop to terminate is that j > A.length = n. Because each loop iteration increases j by 1, we must have j = n + 1 at that time. Substituting n + 1 for j in the wording of loop invariant, we have that the subarray A[1…n] in sorted order. Observing that the subarray A[1…n] is the entire array, we conclude that the entire array is sorted. Hence, the algorithm is correct.

Algorithm Efficiency

Efficiency includes: Memory, Communication bandwidth, Computer hardware (e.g. disk)
The most important Efficiency is computational time

Example efficiency of InsertionSort

For the moment, let us adopt the following view. A constant amount of time is required to execute each line of our pseudocode. A constant amount of time is required to execute each line of our pseudocode. One line may take a different amount of time than another line, but we shall assume that each execution of the ith line takes time $c_i$ , where $c_i$ is a constant.

A.length = n

The running time of the algorithm is the sum of running times for each statement executed

$T(n) = c_1n + c_2(n-1) + c_4(n-1) + c_5 \sum^n_{j=2}t_j + c_6 \sum^n_{j=2}(t_j - 1) + c_7 \sum^n_{j=2}(t_j - 1) + c_8(n-1)$

In best case, array is already sorted when fit into the sorting. In that case $t_j = 1$ for $j = 2 \dots n$ .

$T(n) = c_1n + c_2(n-1) + c_4(n-1) + c_5(n-1) + c_8(n-1) \\ T(n) = (c_1 + c_2 + c_4 + c_5 + c_8)n - (c_2 + c_4 + c_5 + c_8)$

We can express this best case running time as $an+b$ for constants $a$ and $b$ that depend on the statement costs $c_i$ ; it is thus a linear function of $n$ .

In worst case, the array is in reverse sorted order. In that case $t_j = j$ for $j = 2\dots n$ .

given that $\sum^n_{j=2}j = \frac{n(n+1)}{2} - 1$ and $\sum^n_{j=2}(j-1) = \frac{n(n-1)}{2}$

$T(n) = c_1n + c_2(n-1) + c_4(n-1) + c_5(\frac{n(n+1)}{2} - 1) + c_6(\frac{n(n-1)}{2}) + c_7(\frac{n(n-1)}{2}) + c_8(n-1) \\ T(n) = (\frac{c_5+c_6+c_7}{2})n^2 + (c_1 + c_2 + c_4 \frac{c_5-c_6-c_7}{2} + c_8)n - (c_2 + c_4 + c_5 + c_8)$

We can express this worst case running time as $an^2 + bn + c$ for constants $a$ and $b$ that depend on the statement costs $c_i$ ; it is thus a quadratic function of $n$ .

Challenges

Detailed analysis requires:
- Define instructions in RAM model
- Calculate cost for each instruction
- Add up the costs of the instructions used in an algorithm

How can we have a simple way indicating algorithm’s resource requirements, without getting into details?

Algorithm Analysis

In analyzing algorithms we use two effective techniques

Worst-case analysis
Asymptotic notations

Worst-Case Analysis (WCA)

The worst-case running time of an algorithm gives us an upper bound on the running time for any input. Knowing it provides a guarantee that the algorithm will never take any longer. We need not make some educated guess about the running time and hope that it never gets much worse.

Worst-case running time: longest running time for any input size of n

Upper-bound for any input
For some algorithms, occurs fairly often (e.g. searching in a database)
Average case is often roughly as bad as the worst case

Asymphtotic Notations

CLRS book, Ch 3

When we look at input sizes large enough to make only the order of growth of the running time relevant, we are studying the asymptotic efficiency of algorithms. That is, we are concerned with how the running time of an algorithm increases with the size of the input in the limit, as the size of the input increases without bound. Usually, an algorithm that is asymptotically more efficient will be the best choice for all but very small inputs.

A concise way to show a set of functions (running time, space, etc.)
Why?
- To describe the running time $T(n)$ (or other characteristics) of the algorithms expressed as functions of $n$
Which running time:
- Sometimes worst-case
- Often no matter what the input (asymptotic notation is useful)
For all notations, every member of $f(n)$ is asymptotically non-negative ( $f(n)$ is non-negative for sufficiently large $n$ )

Asymptotic notation is useful because it allows us to compare the efficiency of different algorithms without getting bogged down in the details of their implementations. We can simply look at the asymptotic growth rates of their running times or space requirements to get a sense of how they will perform on large inputs.

$\Theta$ notation (Theta notation)

This notation is used to describe both upper and lower bounds on the growth rate of a function. For example, if we say that the running time of an algorithm is Theta(n), we mean that the running time grows linearly with the input size.

$f(n)=\Theta(g(n)) \Longrightarrow f(n) \in \Theta(g(n))$
$f(n)$ belongs to $\Theta(g(n))$ , for sufficiently large $n$ , it is sandwiched between $c_1 g(n)$ and $c_2 g(n)$

$\Theta(g(n)) = \{ f(n): \text{there exist positive constants } c_1, c_2,$

$\text{and } n_0 \text{ such that } 0 \le c_1g(n) \le f(n) \le c_2g(n) \text{ for all } n \ge n_0 \}$

Asymptotically tight bound

Big O notation

This notation is used to describe an upper bound on the growth rate of a function. For example, if we say that the running time of an algorithm is O(n), we mean that the running time grows no faster than linearly with the input size.

$f(n)=O(g(n)) \Longrightarrow f(n) \in O(g(n))$

$O(g(n)) = \{ f(n): \text{there exist positive constants } c$

$\text{and } n_0 \text{ such that } 0 \le f(n) \le cg(n) \text{ for all } n \ge n_0 \}$

An asymptotic upper-bound

In the literature, we sometimes find O-notation informally describing asymptotically tight bounds

Ω notation (Omega notation)

This notation is used to describe a lower bound on the growth rate of a function. For example, if we say that the running time of an algorithm is Omega(n), we mean that the running time grows at least as fast as linearly with the input size.

$f(n)=\Omega(g(n)) \Longrightarrow f(n) \in \Omega(g(n))$

$\Omega(g(n)) = \{ f(n): \text{there exist positive constants } c$

$\text{and } n_0 \text{ such that } 0 \le cg(n) \le f(n) \text{ for all } n \ge n_0 \}$

An asymptotic lower-bound

Asymptotic Notations Relations

Theorem: for any two functions $f(n)$ and $g(n)$ , we have:

$f(n) \in \Theta(g(n))$ if and only if $f(n) \in O(g(n))$ and

$f(n) \in \Omega(g(n))$ (i.e., $O / \Omega(f(n)) \subseteq \Theta(f(n))$ )

Analogy with real numbers:

$f(n)=O(g(n))$ is like $a \leq b$
$f(n)=\Omega(g(n))$ is like $a \geq b$
$f(n)=\Theta(g(n))$ is like $a=b$

Example: Prove $\frac{1}{2}n^2 - 3n = \Theta(n^2)$

To do so, we must determine positive 2 constants $c_1$ , $c_2$ , and $n_0$ such that $c_1 n^2 \le \frac{1}{2}n^2 - 3n \le c_2n^2$

for all $n \ge n_0$ , Dividing by $n^2$ yields $c_1 \le \frac{1}{2} - \frac{3}{n} \le c_2$
We can make the right-hand inequality hold for any value of $n \ge 1$ by choosing any constant $c_2 \ge \frac{1}{2}$ . Likewise, we can make the left-hand inequality hold for any value of $n \ge 7$ by choosing any constant $c_1 \le \frac{1}{14}$ .
By choosing $c_1 = \frac{1}{14}, c_2 = \frac{1}{2}, n_0 = 7$ , we can verify that $\frac{1}{2}n^2 - 3n = \Theta(n^2)$ .
Certainly, other choices for the constants exist, but the important thing is that some choice exists.
a different function would usually require different constants.

Designing Algorithms

Divide-And-Conquer

Recursive in structure

They break the problem into several subproblems that are similar to the original prob- lem but smaller in size, solve the subproblems recursively, and then combine these solutions to create a solution to the original problem.

Divide: Break the problem to similar subproblems (smaller in size)
Conquer: straightforwardly solve the subproblems (typically in a recursive manner)
Combine: Combine the solutions to find the solution for the whole problem

The merge sort algorithm closely follows the divide-and-conquer paradigm.

Take a divide-and-conquer approach for merge sort:

Divide: Divide the n-element sequence to be sorted into two subsequences of $n / 2$ elements each
Conquer: Sort the two subsequences recursively using merge sort
Combine: Merge the two sorted subsequences to produce the sorted answer

C++ implementation of Mergesort

#include <iostream>
#include <cmath>

//complexity of merge 
void merge(int A[], int p, int q, int r)
{
	//two subarrays that are sorted
  //they are in A[p...q] and A[q+1...r]
  int n1 = q - p + 1;
  int n2 = r - q;
  
  int L[n1+1];
  int R[n1+1];
  
  for(int i=0; i<n1; i++)
  {
    L[i] = A[p+i];
  }
  
  for(int j=0; j<n2; j++)
  {
    R[j] = A[q+j+1];
  }
  
  L[n1] = INT_MAX;
  R[n2] = INT_MAX;
  
  int i = 0; //index going over L
  int j = 0; //index going over R
  
  //index going over A
  for(int k= p; k < r+1; k++)
  {
    //compare
    if(L[i] <= R[j])
    {
      A[k] = L[i];
      i++;
    }
    else
    {
      A[k] = R[j];
      j++;
    }
  }
  return;
}

//
void mergeSort(int A[], int p, int r) //array, first index, last index
{
  if(p < r)
  {
    //n is the size of array A
    //we want to find element n/2
    int q = std::floor((p+r)/2);
    //sorting the subarray A[p...q]
    mergeSort(A, p, q);
    //sorting the subarray A[q+1...r]
    mergeSort(A, q+1, r);
    //merge A[p..q] and A[q+1..r] (both are sorted)
    merge(A, p, q, r);    
  }  
}

int main()
{
	int A[10] = {2,3,6,1,7,9,8,4,10,5}
  for(int i=0; i<10; i++)
  {
    std::cout << A[i] << "\t";
  }
  std::cout << std::endl << std::endl; 
  
  mergeSort(A, 0, 9); //array, first index, last index
  
  for(int i=0; i<10; i++)
  {
    std::cout << A[i] << "\t";
  }
  std::cout << std::endl << std::endl; 
	return 0;
}

//output
// 2 3 6 1 7 9 8 4 10 5
// 1 2 3 4 5 6 7 8 9 10

Merge Sort Correctness

Loop invariant:

At the start of the for loop for merging L and R, the subarray A[p…k-1] contains the k-p smallest elements of L[1…n1+1] and R[1…n2+1], in sorted order. Moreover, L[i] and R[j] are the smallest elements of their arrays that have not been copied back into A.

Initialization: k=p

Maintenance: for both conditions L[i] <= R[j] true and false

Termination: k=r+1

Analyzing recursive (divide-and-conquer) algorithms

When an algorithm contains a recursive call to itself, we can often describe its running time by a recurrence equation or recurrence, which describes the overall running time on a problem of size n in terms of the running time on smaller inputs. We can then use mathematical tools to solve the recurrence and provide bounds on the performance of the algorithm.

Recurrence equation: overall running time on a problem of size $n$ in terms of the running time on smaller inputs
Mathematical tools to solve the recurrence and find bounds on algorithm’s performance

A recurrence for the running time of a divide-and-conquer algorithm falls out from the three steps of the basic paradigm.

We let $T(n)$ be the running time on a problem of size $n$ .

If the problem size is small enough, say $n \le c$ for some constant $c$ , the straightforward solution takes constant time which we can write as $\Theta(1)$ .
Suppose that our division of the problem yields $a$ subproblems, each of which is $1/b$ the size of the original. It takes $T(n/b)$ to solve one subproblem. So $aT(n/b)$ to solve $a$ subproblems.
we take $D(n)$ time to divide the problem into subproblems
we take $C(n)$ time to combine the solutions to the subproblems into the solution to the original problem

$T(n)=\left\{\begin{array}{ll} \Theta(1) & \text { if } n\le c \\ a T\left(\frac{n}{b}\right)+ D(n) + C(n) & \text { otherwise } \end{array}\right.$

For merge sort:

Divide: The divide step just computes the middle of the subarray, which takes constant time. Thus, $D(n) = \Theta(1)$ .
Conquer: We recursively solve two subproblems, each of size $n=2$ , which contributes $2T(n/2)$ to the running time.
Combine: We have already noted that the MERGE procedure on an n-element subarray takes time $\Theta(n)$ . Thus $C(n) = \Theta(n)$ .

$T(n)=\left\{\begin{array}{ll} \Theta(1) & \text { if } n=1 \\ 2 T\left(\frac{n}{2}\right)+\Theta(n) & \text { if n > 1 } \end{array}\right.$

In the CLRS book, Ch 4, we shall see the “master theorem,” which we can use to show that $T(n)$ is $\Theta(n \log_2 n)$ Because the logarithm function grows more slowly than any linear function, for large enough inputs, merge sort, with its $\Theta(n \log_2 n)$ running time, outperforms insertion sort, whose running time is $\Theta(n^2)$ , in the worst case.

Master Theorem

The Master Theorem is a mathematical formula used to analyze the time complexity of recursive algorithms. It provides a quick way to determine the time complexity of a recursive algorithm in terms of its input size and the complexity of its subproblems.

The master method provides bounds for recurrences of the form:

$T(n)=a T(n / b)+f(n)$
where:
- T(n) is the running time of the algorithm for an input size of n.
- a is the number of subproblems the algorithm generates at each level of recursion.
- n/b is the size of each subproblem.
- f(n) is the time complexity of the work done outside of the recursive calls.
$a \geq 1, b>1$ , and $f(n)$ is asymptotically positive

Theorem:

Let $a \geq 1, b>1$ $a \geq 1, b > 1$ be constants, let $f(n)$ $f (n)$ be a function, and let $T(n)$ $T (n)$ be defined on the nonnegative integers by the recurrence $T(n)=a T(n / b)+f(n)$ $T (n) = a T (n / b) + f (n)$
- the form characterizes a divide-and-conquer algorithm that creates $a$ subproblems, each of which is $1/b$ the size of the original problem, and in which the divide and combine steps together take $f(n)$ time.

The Master Theorem has 3 cases, depending on the relationship between the complexity of the subproblems and the complexity of the work done outside of the recursive calls:

Then $T(n)$ has the following asymptotic bounds
1. If $f(n)=O\left(n^{\log _b a-\epsilon}\right)$ for some constant $\epsilon>0$ ,
  
  then $T(n)=\Theta\left(n^{\log _b a}\right)$
2. If $f(n)=\Theta\left(n^{\log _b a}\right)$ ,
  
  then $T(n)=\Theta\left(n^{\log _b a} \lg n\right)$
3. If $f(n)=\Omega\left(n^{\log _b a+\epsilon}\right)$ for some constant $\epsilon>0$ , and if $a f(n / b) \leq c f(n)$ for some constant $c<1$ and all sufficiently large $n$ ,
  
  then $T(n)=\Theta(f(n))$

Example question for Master Theorem:

A divide-and-conquer algorithm has the complexity $T(n) = 2T(n/4) + n^2$ . This algorithm is of:

a) $\Theta(n^2)$

b) $\Theta(n^2lgn)$

c) $\Theta(nlgn)$

d) $\Theta(lgn)$

$T(n) = 2T(n/4) + n^2$

a = 2 , b = 4, f(n) = $n^2$

$n^{\log_ba}$ = $n^{\log_42}$

Since $n^{\log_42} = n^{0.5}$ is clearly $\le n^2$ , so we need add constant $\epsilon$

In that case, it is case 3: $n^{\log_4(2+\epsilon)} = n^{2}$

Then $T(n)=\Theta(f(n))$

Therefore the answer is a).
$T(n)=4 T\left(\frac{n}{2}\right)+n^2$ is of (select one)
$\Theta\left(n^2\right)$
$\Theta\left(n^2 \lg n\right)$
$\Theta(n \lg n)$
$\Theta(\lg n)$

$T(n)=4 T\left(\frac{n}{2}\right)+n^2$

$a = 4, b = 2, f(n) = n^2$

$n^{\log_ba}$ = $n^{\log_24} = n^2$

In that case, it is case 2.

then $T(n)=\Theta\left(n^{\log _b a} \lg n\right) = \Theta\left(n^2 \lg n\right)$

Dynamic Programming

Dynamic programming, like the divide-and-conquer method, solves problems by combining the solutions to subproblems.

How the divide-and-conquer method solves a problem?
- divide-and-conquer algorithms partition the problem into disjoint subprob- lems, solve the subproblems recursively, and then combine their solutions to solve the original problem.
What if sub-problems overlap (share sub-subproblems)?
- Dynamic programming (DP) saves the repeated solutions!
- Dynamic programming is typically applied to optimization problems:
  - Has many solutions, each has a value, find the optimal solution (maximum or minimum)

Example: Fib(n)

Dynamic programming (DP) saves the repeated solution
- Reusing the f(4) tree

#include <iostream>

//normal approach
int fib(int n)
{
	if(n == 0 || n == 1)
    return 1;
  else
    return fib(n-1) + fib(n-2);
}

int main()
{
  int n;
  std::cin >> n;
  std::cout << fib(n) << std::endl;
  return 0;
}

//e.g. output
// 10
// 89

//Time = 1s

#include <iostream>

//dynamic approach
int fib(int n)
{
	int f[n];
  f[0] = 1;
  f[1] = 1;
  for (int i=2; i<=n; i++)
  {
    f[i] = f[i-1] + f[i-2];
  }
  return f[n];
}

int main()
{
  int n;
  std::cin >> n;
  std::cout << fib(n) << std::endl;
  return 0;
}

//e.g. output
// 10
// 89

//Time = 0.5s

When use Dynamic Programming ?

Optimal substructure

Optimal solutions to a problem incorporate optimal solutions to related subproblems, which we may solve independently.
- Choice depends on solutions to subproblems

Overlapping subproblems

Steps of Dynamic Programming

Show that a solution to the problem consists of making a choice leaving one or more subproblems to be solved
Assume that for a given problem, you are given the choice that leads to an optimal solution
Given this choice, determine which subproblems ensue and how to best characterize the resulting space of subproblems
Show that the solutions to the subproblems used within an optimal solution to the problem must themselves be optimal by using a “cut- and-paste” technique.

Dynamic Programming Example

Activity-Selection Problem (ASP)

A set $S = \{a_1, a_2, \dots, a_n\}$ of $n$ activities
Each activity $a_i$ has a start time $s_i$ and a finish time $f_i$ , $0 \le s_i < f_i < \infty$ , $a_i$ takes place $[s_i, f_i)$ (Suppose monotonically ordered finish time)
$a_i$ and $a_j$ are mutually compatible if their intervals do not overlap
Select a maximum-size subset of mutually compatible activities
- Objective: Find the maximum number of non-conflicting jobs

Using Dynamic Programming to solve Activity-Selection Problem (ASP):

Let $S_{ij}$ be a set of the activities start after the activity $a_i$ and finish before $a_j$
Let $A_{ij}$ be the max set of mutually compatible activities in $S_{ij}$
Then for each activity, let say, $a_k$ , we have two subproblems: $S_{ik}$ and $S_{kj}$ .
$A_{ik} = A_{ij} \cap S_{ik}$ and $A_{kj} = A_{ij} \cap S_{kj}$
$A_{ij} = A_{ik} \cup \{a_k\} \cup A_{kj}$

Greedy Approach

Obtains an optimal solution to a problem by making a sequence of locally optimum choices
Does it produce always an optimum solution?
Two requirements:
- Greedy-choice property
- Optimal substructure

Key: If you find the optimal subproblem, we can use it to find the optimal problem

Greedy Approach - Greedy-Choice Property

Greedy-Choice Property (GCP):
- we can assemble a globally optimal solution by making locally optimal (greedy) choices
Differing from dynamic-programming:
- Choice depends on solutions to subproblems (bottom-up)
May depend on choices so far, but not on any future choices => making choice before solving subproblems! (top-down)
GCP should be demonstrated for a problem (via a theorem and its proof)
If input is preprocessed properly, the greedy approach is efficient!

Steps of Greedy Approach

Cast the optimization problem as one in which we make a choice and are left with one subproblem to solve
Prove that there is always an optimal solution to the original problem that makes the greedy choice, so that the greedy choice is always safe
Demonstrate optimal substructure by showing that, having made the greedy choice, what remains is a subproblem with the property that if we combine an optimal solution to the subproblem with the greedy choice we have made, we arrive at an optimal solution to the original problem

Using **Greedy Approach ** to solve Activity-Selection Problem (ASP):

Consider any nonempty subproblem $S_k$ , and let $a_m$ be an activity in $S_k$ with the earliest finish time. Then $a_m$ is included in some maximum-size subset of mutually compatible activities of $S_k$

To do it:

Initially sort the activities in increasing order of their finish times and create a set $S$ $S$ to store the selected activities and initialize it with the first activity.
- if the first activity is the optimal solution for a subproblem, then an optimal solution to the original problem consists the first activity, and all the activities in an optimal solution to the remaining problem!
Then from the second activity onward, include the activity in the activities list if the activity’s start time is greater or equal to the finish time of the last selected activity. Repeat this for each activity involved.

Alternatively, we can select the last compatible activity that starts the latest.

Demo:

Hard Problems

Complexity Classes

What is an NP-complete in computer science?

NP is the set of all decision problems (questions with a yes-or-no answer) for which the ‘yes’-answers can be verified in polynomial time ( $O(n^k)$ where $n$ is the problem size, and $k$ is a constant) by a deterministic Turing machine. Polynomial time is sometimes used as the definition of fast or quickly.

$N P$ $NP$ : verifiable in polynomial time
- i.e., given a solution we can say in polynomial time that the solution to the problem is correct
  - (example: vertex-cover)

P is the set of all decision problems which can be solved in polynomial time by a deterministic Turing machine.

$P$ : solvable in polynomial time
- i.e., of $O\left(n^k\right)$ $O (n^{k})$ for some constant $k$ $k$ and $n$ $n$ is the size of input
  - (example: sorting, shortest-path)
Observation: $P \subset N P$
- Since they can be solved in polynomial time, they can also be verified in polynomial time. Therefore P is a subset of NP.
$NP-Complete$ : it is $N P$ and as “hard” as any other $N P$ problem,
- i.e., solving any $N P$ - Complete problem in polynomial time means every $N P$ - Complete problem can be solved in polynomial time!

NP − Completeness

A problem x that is in NP is also in NP-Complete if and only if every other problem in NP can be quickly (ie. in polynomial time) transformed into x.

In other words:

x is in NP, and

Every problem in NP is reducible to x

So, what makes NP-Complete so interesting is that if any one of the NP-Complete problems was to be solved quickly, then all NP problems can be solved quickly.

To show a problem that is NP-Complete,

Perform Reduction to show NP − Completeness

Practice Questions

For any real constants $a$ and $b(b>0),(n+a) b=\Theta(n b)$

True

For any value of $n$ , an algorithm with runtime $100 n^2$ runs faster than an algorithm with runtime $2^n$

False

This statement: "The running time of insertion sort is at least $O\left(n^2\right)$ $O (n^{2})$ " is (select one):
- meaningless
- true for some cases
- true in all cases
- false

meaningless. Big O is a upper bound. We only care tight bound.

The function $\frac{n^3}{1000}-100 n 2-100 n+3$ is of (select one):

$\Theta\left(n^3\right)$
$\Theta\left(n^2\right)$
$O(n)$
$O\left(n^2\right)$

$\Theta\left(n^3\right)$

Which one of the following improve the best-case running time of a sort algorithm? (Select all)

Check if the array is sorted do nothing
Check if only the first two elements are not in-place swap them
Check if only the last two elements are not in-place swap them

All are correct and they do improve the running time.

Check if the array is sorted do nothing
Check if only the first two elements are not in-place swap them
Check if only the last two elements are not in-place swap them

Suppose an implementation of insertion sort with runtime $8 n^2$ and an implementation of merge sort with runtime $64 n l g n$ . Then (Select one):

Merge-sort implementation is faster than that of the inserstion sort
Insertion sort implementation is faster than that of the merge-sort
Insertion sort is faster only for $n \leq 43$
Merge sort is faster only for $n \leq 43$

Insertion sort is faster only for $n \leq 43$

$T(n)=4 T\left(\frac{n}{2}\right)+n^2$ is of (select one)
$\Theta\left(n^2\right)$
$\Theta\left(n^2 \lg n\right)$
$\Theta(n \lg n)$
$\Theta(\lg n)$

Master Theory problem

$\Theta\left(n^2 \lg n\right)$

Which problem has the suboptimal substructure? (select all)
Finding the shortest path between two nodes in an undirected graph
Finding the longest path between two nodes in an undirected graph
Finding the maximum number of mutually compatible activities within a period

All problem has the suboptimal substructure.

Finding the shortest path between two nodes in an undirected graph
Finding the longest path between two nodes in an undirected graph
Finding the maximum number of mutually compatible activities within a period