Operators in XFST, FSA Utilities and FSM -- A synopsis

You will find this synopsis useful if you are working with one of these tools and do not remember what the right operator for your specific task is. Blanks in the table normally indicate the an operation for a given tool must be defined by more complex means. (This table is partly derived from a similar table by Jason Eisner.)


XFSTFSA UtilitiesAT&T FSM
lextools
Explanation
ManualManualFSM Manual
lextools Manual
Online Manuals
xfst
Web-based interface
fsa -tkn/aInteractive startup
??[<sigma>] (if defined)any alphabet symbol
 a..z
(or predicates)
superclass defn
in .sym file
(lexmakelab compiles
to .scl file for -S option)
symbol class
"*" (or %*)'*' (or escape(*))\*escaped literal symbol
foo bar
(whitespace separator)
foo<regex op>bar
(separated by regex op)
[foo][bar]long symbol names
 (predicates)[noun num=pl
gender=fem
complex symbol
abc
(Ambiguous: see
Explanation below
abc (no symbol separator)
a  b  c (symbol separator=32)
a+b+c (symbol separator=43)
(no notation for input string
use fsmintersect or fsmcompose)
the input string "abc"
 a,bc (symbol separator=44) the input
E1 E2 ... En[E1,E2,...,En], concat(E1,E2)E1 E2 (fsmconcat)concatenation
{abcd}[a,b,c,d] (no special notation)abcdcharacter concat
A*A*, kleene_star(A)A*Kleene star
A+A+, kleene_plus(A)A+Kleene plus
(A)A^, option(A), {A,[]}A?option
E1 | E2 | ... | En{E1,E2,...,En}, union(E1,E2)E1 | E2 (fsmunion)union
A & BA & B, intersect[ion](A,B)A & B (fsmintersect)intersection
~A~A, complement(A)(fsmdifference [<sigma>]* A)complement (negation)
\a~a & ?, ? - a
(expressed as `{a}
in graphic interface)
!a (fsmdifference [<sigma>] a)symbol complement
A - BA - B, difference(A,B)E1 - E2 (fsmdifference)relative complement
~$[], \?{}(echo -n | fsmcompile)the empty language
0, [ ][ ]0
[<epsilon>] (if defined)
(echo -n 0 | fsmcompile)
epsilon (the empty string)
A .x. B, A:BA x B, A:BA:Bcross product
A .o. BA o B, compose(A,B)E1 @ E2 (fsmcompose)(reverse) composition
R.1domain(R)(fsmproject -i)domain
R.2range(R)(fsmproject -o)range
AA, identity(A)Aidentity transducer
 efree(A)fsmrmepsilonepsilon-removal
 A!
{t,w,wt}_determinize(A)
fsmdeterminizedeterminization
A.iinvert(A)fsminvertinvert (switch direction of transduction)
$A$A, [? *,A,? *] contains
A/Bignore(A,B) ignore
A.rreverse(A)fsmreversereverse
.#.FSA Utilities Anchors[<bos>][<eos>]edge of string anchors
A => L1 _ R1
A => L1 _ R1, L2 _ R2
Restriction in FSA Utilities restriction
A -> (exhaustive
nondeterm.)
A (->) B (optional)
A @-> B (LR longest)
A @> B (LR shortest)
A ->@ B (RL longest)
A >@ B (RL shortest)
Reversse arrow swaps
lower, upper
A may have the form [.E.]
 (blocks repeat matches
 of epsilon in same
 place)
B may have form L ... R
 (inserts L, R around A)
  || L _ R (upper upper)
  // L _ R (lower upper)
  \\ L _ R (upper lower)
  \/ L _ R (lower lower)
Parallel replacement
  joined by ,
  or by ,, if restricted
Replacement in FSA Utilites lexrulecomp replacement
test equivalent/overlap/
sublangage/lower-bounded/
upper-universal/etc
Decidability Properties
in FSA Utilities
fsmequiv, fsmprops
Some complex expressions
in fsm
decidable properties:
equality, functionality
determinizability, etc
lexc lexcomplexCompilation of acyclic automata from word lists:
Comments on lexical compilation

Explanation of operators for FSA Utilities


XFST Input

The input "abc" to XFST is ambiguous. If multicharacter symbols are not used in the regex, then the string must be interpreted as the three symbol sequence: a b c. If multicharacter symbols are used, then the string is processed from left to right, using a longest-match strategy to find each symbol. For example, if the regex is:

    [a:x b:y c:z]

Then the input "abc" is transduced to "xyz" (using "apply downward"). If however, the regex is:

    [c:x ab:z] | [a:x b:y c:z]

Then the input "abc" (interpreted as: ab c) is no longer accepted.


Contextual restrictions in XFST

Left and right contextual restrictions in XFST may be checked either on the input side (upper) or on the output side (lower). Kaplan & Kay showed that || (upper upper) corresponds simultaneous application, // (lower upper) corresponds to right-to-left application and \\ (upper lower) corresponds to right-to-left application. The fourth possibility \/ (lower lower) corresponds to no intuitively obvious mode of application. XFST provides no way to check restrictions partly on the input side and partly on the output side. Such multi-level contexts are possible in Kimmo-style two-level rule compilers, although only for length-preserving (or ``same-length'') transductions.


Decidability Properties in FSM

FSM provides only the predicate: fsmequiv. It is a useful exercise, however, to define some predicates on top of fsm. Defining a few simple predicates may serve as a model for how more complex operations may be defined using fsm.

First, let {epsilon} be interpreted as "true" and {} be interpreted as false:

   echo -n 0 | fsmcompile > true.fsa

   echo -n | fsmcompile > false.fsa

More generally, one could perhaps allow any language that accepts epsilon to be interpreted as "true".

Now let conjunction, disjunction and negation be defined as concatenation, union and complementation, respectively. With this interpretation, it is possible to design small programs using fsm that implement these operators. For example, using C++, one can implement negation with the following program:

// not.cc: note that a pipe is used so that the argument 
// can occur either on the command line or come from 
// standard input
#include <iostream>
#include <string>
#include <pfstream.h>  // g++ library extension for pipes

int main(int argc, char* argv[]) {
  if (argc == 2) {
    string command = "cat ";
    command += argv[1];
    command += " | fsmrmepsilon | fsmdeterminize";
    command += " | fsmdifference true.fsa -";
    system(command.c_str());
  } else if (argc == 1) {
    opfstream 
      pipe("|fsmrmepsilon|fsmdeterminize|fsmdifference true.fsa -");
    char c;
    while (cin.get(c))
      pipe.put(c);
  }
}  

The program is a little complicated since complementation is not a basic operation in fsm. One has to define the complementation of A in terms of sigma*-A. If we allow the alphabet to be empty, then sigma* is the same as our automaton: true.fsa. A further complication arises since the second argument to fsmdifference is required to be epsion free and deterministic.

Once negation is defined, we can use this to define the predicate empty:

// empty.cc: for simplicity, assume only a command-line 
// argument
#include <iostream>
#include <string>

int main(int argc, char* argv[]) {
  string command = 
    "lexcompre -S symbols.scl -l symbols.lab -s ";
  command += "'[<sigma>]*:[<epsilon>]'";
  command += " | fsmcompose ";
  command += argv[1];
  command += " - | fsmproject -o | not";
  system(command.c_str());
}

Note that the central idea here is contained in the transducer turning sigma* into epsilon (= true.fsa). The only input that is not turned into epsilon is the empty language.

Once the predicate empty is defined, this can be used as the basis for defining other predicates such as totality, equality and subset.



Lexicon Compilation

Although general purpose finite-state tools may be used to construct a lexicon, it is generally preferable (easier or more efficient) to use a specialized tool. Both Xerox and AT&T provide such tools. Another set of tools for this purpose is the fsa package developed by Jan Daciuk. This package includes tools for lexicon construction, along with specialized tools for morphological analysis, spell checking, diacritic restoration, analysis of unknown words and perfect hashing.



XFST Intersection and Relative Complement

Intersection and relative complemnt (subtraction) are generally defined only for languages, not relations. XFST allows these operations also for the special case of equal-length relations.



The ANY Symbol in XFST

A question mark (?) in an XFST regular expression stands for any symbol. In an XFST network, however, the question mark stands for any unknown symbol. This distiction was also made in previous versions of the FSA Utilities. In the current version, the question mark always indicates any symbol (known or unknown).