COMS 482: unofficial class blog

Lecture 10: More Dynamic Programming

Posted in Class Notes by Elliott Back on February 14th, 2005.

Divide and Conquer v. Dynamic Programming:

Both break problem into subproblems, then combine the results to get a solution. Divide and Conquer uses recursion to solve subproblems, while Dynamic Programming stores the subproblems in a table, and hopes it will keep encountering them.

Word Similarity Problem:

GIANT
GREEN
OGRE

Which words are most similar? Assign transformation costs:

  1. identical letters -> distance 0
  2. both vowels -> distance 1
  3. both consonants -> distance 1
  4. insert/delete -> distance 2
  5. vowels <-> consonants -> distance 3

For example, GREEN -> OGRE:

GREEN, GREE (2), GRE (2), OGRE (2) = 6
GREEN, GIEEN (3), GIAEN (1), GIAN (2), GIANT (2) = 8

This distance sum is called the edit distance.

Protein Sequence Alignment:

There are 20 amino acids, divided into several groups (nonpolar/polar). You want to look for similar sequence to try and guess at protein structure–same as word-matching problem, except that proteins contain 150-400 amino acids (words).

Conversion to Graph Problem:

Word matching as a graph problem

  • Now we can use Djikstra’s algorithm
  • Word size n, |v| = n2, |E| = O(n2)
  • O(n2 log n)

O(n2) Time Algorithm:

We want to find the optimum path from 0,0 to n,n, which is the diffence between words. First, create the graph table D whose entries are the cost of diagonal graph traversals, since the other kinds are simply of fixed cost 2. Then, run this algorithm:

For k =1 to n
    opt[0, k] = opt[k, 0] = 2 * k;
EndFor

For i = 1 to n
    For j = 1 to m
        opt[i, j] = min {(opt[i-1, j-1] + D[i, j]), (opt[i, j-1] + 2), (opt[i-1, j] + 2)}
    EndFor
EndFor

return opt[n,n]

This entry was posted on Monday, February 14th, 2005 at 2:02 pm and is tagged with protein sequence alignment, 20 amino acids, word similarity, nbsp nbsp nbsp nbsp nbsp, graph problem, transformation costs, time algorithm, optimum path, protein structure, diffence, word size, dynamic programming, consonants, fixed cost, gree, ogre, vowels, n2, gian, gre. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback.

Leave a Reply

Powered by WP Hashcash