Solutions of mid-term II

Problem 1

General: please note that the original CYK algorithm, as presented in class, stores in the [i,j] entry of the chart a set of non-terminal symbols. Not rules, nor trees. Since this is a set, each non-terminal can occur in a given entry at most once. Hence counting the number of occurences of 'S' in chart[0,n] is not a reasonable solution for question A, for example.

A

Change the data structure: each entry of the cart will be a set of non-terminal symbols, each associated with an integer denoting the number of trees this non-terminal roots. The invariance: if the non-teminal 'A' in the [i,j] entry of the chart is associated with the number 'n', then there are 'n' different trees whose root is 'A' and whose yield is w_i...w_j.

Initialization: the chart is empty. Then, for rules of the form A->w_j (line 3 of the algorithm), change line 4 as follows:
if <A,n> is in chart[j-1,j], change it to <A,n+1>;
otherwise, add <A,1> to chart[j-1,j].

Finally, assuming that <B,n1> is in chart[i,k] and <C,n2> is in chart[k,j], change line 10 of the algorithm as follows:
if <A,m> is in chart[i,j], replace it by <A,m+n1*n2>; otherwise, add <A,n1*n2> to chart[i,j].

The idea here is that for the rule A->B C, the number of trees whose root is 'A' is the product of the number of trees rooted in B and the number of trees rooted in C; and these trees are added to other trees rooted in 'A' which may have already been counted in chart[i,j]. Complexity remains cubic.

B

  1. The change in the data structure is as above. The change in the algorithm is very similar. Initialization: for a rule of the form A->w_j whose price is 'p',
    if <A,n> is in chart[j-1,j], change it to <A,max(n,p)>;
    otherwise, add <A,p> to chart[j-1,j].

    Then, assuming that <B,n1> is in chart[i,k] and <C,n2> is in chart[k,j], change lines 9-10 of the algorithm as follows:
    for every rule A->B C whose price is p,
    if <A,m> is in chart[i,j], let n be max(m,n1+n2+p).
    otherwise, let n be n1+n2+p.
    remove <A,m> from chart[i,j].
    add <A,n> to chart[i,j].

    The idea is that the price of a tree is the sum of the prices of its two subtrees, plus the price of the rule used to derive its root. We only record the price of the most expensive tree rooted in each non-terminal in each chart entry. Complexity remains cubic.

  2. In addition to the above change, each non-terminal in each entry of the chart will include also pointers to the chart entries used to contruct it. To print the most expensive trees, follow these links.

    Since the number of trees for a string of length 'n' may be exponential in 'n', the complexity of printing them is exponential (since all trees may be of the same price, you may have to print all the trees).

  3. If each rule is priced exactly zero, then each tree costs exactly zero and hence all trees cost the same. Hence the algorithm of the previous answer will necessarily print all the trees for a given string.


Shuly Wintner
Last modified: Thu Feb 5 10:50:12 IST 2004