January 8, 2008 * Administrivia * Introduction and a bit of history * Regular languages =================== REGULAR LANGUAGES Finite-state automaton: 5-tuple (Q, Sigma, d, q_0, F) Q = set of states Sigma = alphabet d = transition function : Q x Sigma -> Q q_0 is initial state F = set of accepting states A FSA M accepts a string w if its computation on w ends in an accepting state. The set of all strings accepted by M is the language accepted or recognized by M. A language is regular if it is recognized by a finite-state automaton. Examples of regular languages: -- Set of strings that contain a finite pattern, e.g., contain 00110 as a substring. -- {a^ib^jc^k} Closure Properties Theorem: Regular languages are closed under union, concatenation, and star operations. Nondeterminism: Non-deterministic finite state automata are identical to FSA above except that d is a relation (as opposed to a function) and that transitions can be done over the empty symbol eps. d: Q x (Sigma U {eps}) -> P(Q) Theorem: NFAs are equivalent to DFAs; the languages they accept are precisely the regular languages. Regular expressions -- start from an alphabet, then generate strings using concatenation, union, and star. Equivalent to regular languages. Nonregular languages: Quite clear that many languages are not regular. Consider the set of all palindromes. Since these can be arbitrarily long, difficult to keep track using a finite state automaton. Pumping lemma: If A is a regular language, there is a number p such that if s in A is of length >= p, then s can be written as s = xyz, satisfying the following: 1. for each i>=0, x y^i z is in A. 2. |y| > 0, and 3. |xy| <= p. Examples of nonregular languages: -- {ww| w in {0,1}^*} -- Palindromes -- {0^n1^n} -- {a^ib^jc^k| i = j or i = k} History: Finite-state automata introduced to study sequential switching circuits -- Huffman (1954), Mealy (1955), and Moore (1956). Also defined earlier in the context of neural nets -- McCulloch and Pitts (1943). NFA and subset construction (equivalence with DFA) due to Rabin-Scott (1959). Regular expressions and equivalence to finite-state automata due to Kleene (1956). Usage in PL and OS introduced by Thompson in 1968.