- Hash tables
- Heaps, Priority Queues
- Binomial Queues
- Sorting
- quicksort, partition
- counting sort
- radix sort
- bucket sort
- Graphs
- Adjacency List, Adjacency Matrix
- BFS, DFS
- Shortest Paths: Dijkstra's
- Connected Components
- Minimum Spanning Trees: Kruskal's
- Union-find (aka. Disjoint Sets)
- Backtracking
- Dynamic Programming
- Rod cutting
- DNA Sequence Alignment
- 0-1 Knapsack
- Greedy Algorithms
- Activity Selection
- Huffman Codes
- Fractional Knapsack
-
Dictionary/Map ADT:
get,put,remove,containsKey -
Direct-address tables
-
Prehashing: e.g. mapping strings to integers
-
Hash functions
-
division method (modulo)
h(k) = k mod m
where m is the table size
-
multiply-add-and-divide method
h(k) = ((a * k + b) mod p) mod mwhere p is a prime number larger than m and a,b are randomly chosen integers between 1 and p-1.
-
-
Collision handling by separate chaining
___16___
/ \
14 10
/ \ / \
/ \ / \
8 7 9 3
/ \ /
2 4 1
Def. A max heap is a heap in which for every node other than the root, A[i] ≤ A[parent(i)]
-
Operation: max_heapify(H, i) The tree rooted at i is not a max heap, but the subtrees left(i) and right(i) are max heaps. The resulting tree is a max heap. Percolate the 4 down to the right spot.
___16___ / \ *4* 10 / \ / \ / \ / \ 14 7 9 3 / \ / 2 8 1becomes
___16___ / \ 14 10 / \ / \ / \ / \ 8 7 9 3 / \ / 2 *4* 1code:
void max_heapify(int i, int length) { int l = left(i); int r = right(i); int largest = i; if (l < length && data[i] < data[l]) largest = l; if (r < length && data[largest] < data[r]) largest = r; if (largest != i) { swap(data, i, largest); max_heapify(largest, length); } } -
build_max_heap(A): turns an arbitrary array into a heap
___1____ / \ 2 3 / \ / \ / \ / \ 4 7 8 9 / \ / 10 14 16Max-Heapify the last parent: 7
___1____ / \ 2 3 / \ / \ / \ / \ 4 16 8 9 / \ / 10 14 7Max-Heapify the next parent: 4
___1____ / \ 2 3 / \ / \ / \ / \ 14 16 8 9 / \ / 10 4 7Max-Heapify the next parent: 3
___1____ / \ 2 9 / \ / \ / \ / \ 14 16 8 3 / \ / 10 4 7Max-Heapify the next parent: 2
___1____ / \ 16 9 / \ / \ / \ / \ 14 7 8 3 / \ / 10 4 2Max-Heapify the next parent: 1
___16___ / \ 14 9 / \ / \ / \ / \ 10 7 8 3 / \ / 1 4 2code:
void build_max_heap() { int last_parent = length / 2 - 1; for (int i = last_parent; i != -1; --i) { max_heapify(i); } }
-
Depth-first search is a graph search algorithm.
-
From a given starting node, it determines which other nodes are reachable from it.
-
Example:
A--->B--->C | ^ | V | | D--->E<---|Depth-first trees:
A----B----C | | | | D E----| A B----C | | | | D----E -
Code:
static <V> void DFS_visit(Graph<V> G, V u, Map<V,V> parent, Map<V,Boolean> visited) { visited.put(u, true); for (V v : G.adjacent(u)) if (! visited.get(v)) { parent.put(v, u); DFS_visit(G, v, parent, visited); } } -
Example with discover/finish times
1/10 2/7 3/6 A----B----C | | | | D E----| 8/9 4/5 1/10 4/7 5/6 A B----C | | | | D----E 2/9 3/8
-
Solves the single-source shortest-hops problem What is the shortest path from A to E, A to B? DFS doesn't guarantee that it's paths are the shortest.
-
Algorithm idea: expand tree one level at a time.
-
Example:
A--->B--->C | ^ | V | | D--->E<---|Breadth-first trees:
A----B----C | | D----EShow the queue at each step
-
Code:
static <V> void BFS(Graph<V> G, V start, Map<V,Boolean> visited, Map<V,V> parent) { for (V v : G.vertices()) visited.put(v, false); LinkedList<V> Q = new LinkedList<V>(); Q.add(start); parent.put(start, start); visited.put(start, true); while (Q.size() != 0) { V u = Q.remove(); for (V v : G.adjacent(u)) if (! visited.get(v)) { parent.put(v, u); Q.add(v); visited.put(v, true); } } }
-
Example problem:
Which people live in the same state?
- You're given assertions that two people live in the same state.
- You're asked, based on the assertions so far, whether two people definitely live in the same state.
example:
Universe: Mary, Katie, Jeremy, Luke, John {^Mary} {^Katie} {^Jeremy} {^Luke} {^John} Mary-Katie ! {^Mary, Katie} {^Jeremy} {^Luke} {^John} Jeremy-Luke ! {^Mary, Katie} {Jeremy, ^Luke} {^John} Katie-John ! {^Mary, Katie, John} {^Jeremy, Luke} Mary-John ? Yes Jeremy-John ? No Luke-John ! {^Mary, Katie, John,Jeremy, Luke} Jeremy-John ? Yes -
Basic implementation
idea: the set is represented as a forest, with one tree per partition. The root of the tree is the representative.
public class UnionFind<N> implements DisjointSets<N> { Map<N,N> parent; // key: an entity, value: the entity's parent public UnionFind(Map<N,N> p) { parent = p; } public void make_set(N x) { parent.put(x, x); } public N find(N x) { if (x == parent.get(x)) return x; else return find(parent.get(x)); } public N union(N x, N y) { N rx = find(x); N ry = find(y); parent.put(ry, rx); return rx; } } -
path compression
public N find(N x) { if (x == parent.get(x)) return x; else { N rep = find(parent.get(x)); parent.put(x, rep); return rep; } } -
union-by-rank
public void make_set(N x) { parent.put(x, x); rank.put(x, 0); } public N union(N x, N y) { N rx = find(x); N ry = find(y); if (rank.get(rx) > rank.get(ry)) { parent.put(ry, rx); return rx; } else { parent.put(rx, ry); if (rank.get(ry) == rank.get(rx)) rank.put(ry, rank.get(ry) + 1); return ry; } }
-
Kruskal's algorithm
-
Order all the edges by their weight
-
initialize a union-find data structure with every vertex by itself
-
For each edge in this ordering:
If the two vertices connected by the edge have different representatives, then add this edge to the tree and union the sets associated with the two vertices.
-
-
Student group work: apply Kruskal's to the following graph
A--2--B--3--C--1--D | | | | 3 1 2 5 | | | | E--4--F--3--G--3--H | | | | 4 2 4 3 | | | | I--3--J--3--K--1--LSolution: sorted edges:
C-D,B-F,K-L, A-B,C-G,F-J, B-C,A-E,F-G,G-H,H-L,I-J,J-K, E-F,E-I,G-K, D-HAn MST of weight 24:
A--2--B--3--C--1--D | | | 3 1 2 5 | | | E 4 F 3 G--3--H | | 4 2 4 3 | | I--3--J 3 K--1--L
A--1--B--1--C
| |
4 1
| |
D----2------E
What's the shortest path from
A to D? A-D (4)
A to C? A-B-C (2)
A to E? A-B-C-E (3)
What's the MST?
A-B-C-E-D (5)
Try to insert gaps (_) to cause the two sequences to match up the best.
Example:
CTGA CTGA_ score = -2 +2 -1 +2 -1 = 0
ATAC AT_AC
C_TGA_ sore = -1 -1 +2 -1 +2 -1 = 0
_AT_AC
Working back-to-front, what choices can be made to characterize all possible alignments? For the last column you can either
- take the last character of both strings to be aligned, or
- add a gap to s1 and take the last character of s2. This can be thought of as an "insertion".
- add a gap to s2 and take the last character of s1. This can be thought of as an "deletion".
The recursive brute-force solution, shown below, tries all three of the above options and picks the best one.
Recursive function for computing a best alignment:
int align(String s1, String s2, int i, int j) {
if (i == 0 && j == 0) {
return 0;
} else if (i == 0) {
return j * -1;
} else if (j == 0) {
return i * -1;
} else {
int M = align(s1, s2, i-1, j-1) + score(s1[i-1],s2[j-1]);
int I = align(s1, s2, i, j-1) - 1;
int D = align(s1, s2, i-1, j) - 1;
return max(M, I, D);
}
}
We can then turn this into a dynamic-programming solution with a table to record the results of subproblems and two loops to proceed in a bottom-up fashion.
T[0][0] = 0
for (int i = 1; i != m + 1; ++i) {
T[i][0] = i * -1
}
for (int j = 1; j != n+1; ++j) {
T[0][j] = j * -1;
}
for (int i = 1; i != m+1; ++i) {
for (int j = 1; j != n+1; ++j) {
int M = T[i-1][j-1] + s(s1[i-1], s2[j-1]);
int D = T[i-1][j] + -1;
int I = T[i][j-1] + -1;
T[(i, j)] = max(M, D, I);
}
}
Now let's try this on an example: align ACGT and ATAG
j
A T A G
0 I:-1 I:-2 I:-3 I:-4
A D:-1 M:2 I:1 I:0 I:-1
i C D:-2 D:1 M:0 I:-1 D:-2
G D:-3 D:0 I:-1 M:-2 M:1
T D:-4 D:-1 M:2 D:1 D:0
So an answer is
A C _ G T
A T A G _
2-2-1+2-1 = 0