updated: 2022-01-23_12:32:31-05:00
Chomsky Normal Form
Guarantied to resolve a Push Down Automata in (2n-1)
steps
For some ambiguous grammar we can find unambiguous grammar that generates the same language (ie trompsky normal)
Some Context Free Languages can only be generated using ambiguous grammar. Such languages are called Inherently ambiguous and cannot be put in trompsky form
CNF:
A grammar is in CNF if every rule is of the form:
A->BC
A->a
Where a is any terminal and A B & C are any variable
also, s->e is permitted where s is the start variable
Four Steps to Converting:
-
Add a new start variable
and a new rule where S is the original start variable
This guarantees that the start variable is not on RHS -
Eliminate e rules
Remove an e rule e where A is not the start variable
If where U and V are strings of variables and terminals
we add the rule
We do this for every occurance of AEg: R -> uAvAw so we add...
R -> uvAw
R -> uAvw
R -> uvw -
Remove unit rules
Remove unit rule then whenever rule appears, add
where u is a string of variables and terminals
Unit rules is a variable going to a single variable (not a terminal) -
Convert all remaining rules to proper form
Replace where k 3
Until each is a variable or terminal symbol with rules:
...
We replace any terminal in the preceding rules with the new variable and add rule:
Example: converting to CNF
S -> A S A | a B
A -> B | S
B -> b | e
what's wrong?
B going to epsilon
S goes to ASA (more than 2)
S goes to aB (variable and terminal in same rule)
A goes to B | S (both unit rules)
.
.
- New start symbol
S0 -> S
S -> ASA | aB
A -> B | S
B -> b | e
.
.
2. Remove e rules B -> e
S0 -> S
S -> ASA | aB | a
A -> B | S | e
B -> b
(add e transitions)
Now we gotta finish, adding the possibilities for ASA (where A can be e)
S0 -> S
S -> ASA | AS | SA | S | aB | a
A -> B | S
B -> b
.
.
3. Unit Rules (remove s->s)
S0 -> S
S -> ASA | AS | SA | aB | a
A -> B | S
B -> b
(remove s0->S)
S0 -> ASA | AS | SA | aB | a
S -> ASA | AS | SA | aB | a
A -> B | S
B -> b
(remove A->B)
S0 -> ASA | AS | SA | aB | a
S -> ASA | AS | SA | aB | a
A -> b | S
B -> b
(remove A->S)
S0 -> ASA | AS | SA | aB | a
S -> ASA | AS | SA | aB | a
A -> b | ASA | AS | SA | aB | a
B -> b
At this point:
* marks non compliant bits
S0 -> *ASA | AS | SA | *aB | a
S -> *ASA | AS | SA | *aB | a
A -> b | *ASA | AS | SA | *aB | a
B -> b
Let's add some rules:
A1 -> AS
U -> a
This becomes:
S0 -> A1 A | AS | SA | UB | a
S -> A1 A | AS | SA | UB | a
A -> b | A1 A | AS | SA | UB | a
B -> b
Don't convert the single a's, because that would violate unit rules
CNF
(2n-1) steps for string of length n