Context-sensitive grammar
A context-sensitive grammar (CSG) is a
A
Chomsky introduced context-sensitive grammars as a way to describe the syntax of natural language where it is often the case that a word may or may not be appropriate in a certain place depending on the context. Walter Savitch has criticized the terminology "context-sensitive" as misleading and proposed "non-erasing" as better explaining the distinction between a CSG and an unrestricted grammar.[10]
Although it is well known that certain features of languages (e.g.
Formal definition
Formal grammar
Let us notate a formal grammar as , with a set of nonterminal symbols, a set of terminal symbols, a set of production rules, and the start symbol.
A string directly yields, or directly derives to, a string , denoted as , if v can be obtained from u by an application of some production rule in P, that is, if and , where is a production rule, and is the unaffected left and right part of the string, respectively. More generally, u is said to yield, or derive to, v, denoted as , if v can be obtained from u by repeated application of production rules, that is, if for some n ≥ 0 and some strings . In other words, the relation is the
The language of the grammar G is the set of all terminal-symbol strings derivable from its start symbol, formally: . Derivations that do not end in a string composed of terminal symbols only are possible, but do not contribute to L(G).
Context-sensitive grammar
A formal grammar is context-sensitive if each rule in P is either of the form where is the empty string, or of the form
- αAβ → αγβ
with A ∈ N,[note 1] ,[note 2] and .[note 3]
The name context-sensitive is explained by the α and β that form the context of A and determine whether A can be replaced with γ or not. By contrast, in a context-free grammar, no context is present: the left hand side of every production rule is just a nonterminal.
The string γ is not allowed to be empty. Without this restriction, the resulting grammars become equal in power to unrestricted grammars.[10]
(Weakly) equivalent definitions
A noncontracting grammar is a grammar in which for any production rule, of the form u → v, the length of u is less than or equal to the length of v.
Every context-sensitive grammar is noncontracting, while every noncontracting grammar can be converted into an equivalent context-sensitive grammar; the two classes are
Some authors use the term context-sensitive grammar to refer to noncontracting grammars in general.
The left-context- and right-context-sensitive grammars are defined by restricting the rules to just the form αA → αγ and to just Aβ → γβ, respectively. The languages generated by these grammars are also the full class of context-sensitive languages.
Examples
anbncn
The following context-sensitive grammar, with start symbol S, generates the canonical non-context-free language { anbncn | n ≥ 1 } :[citation needed]
1. | S | → | a | B | C | ||
2. | S | → | a | S | B | C | |
3. | C | B | → | C | Z | ||
4. | C | Z | → | W | Z | ||
5. | W | Z | → | W | C | ||
6. | W | C | → | B | C | ||
7. | a | B | → | a | b | ||
8. | b | B | → | b | b | ||
9. | b | C | → | b | c | ||
10. | c | C | → | c | c |
Rules 1 and 2 allow for blowing-up S to anBC(BC)n−1; rules 3 to 6 allow for successively exchanging each CB to BC (
- S
- →2 aSBC
- →2 aaSBCBC
- →1 aaaBCBCBC
- →3 aaaBCZCBC
- →4 aaaBWZCBC
- →5 aaaBWCCBC
- →6 aaaBBCCBC
- →3 aaaBBCCZC
- →4 aaaBBCWZC
- →5 aaaBBCWCC
- →6 aaaBBCBCC
- →3 aaaBBCZCC
- →4 aaaBBWZCC
- →5 aaaBBWCCC
- →6 aaaBBBCCC
- →7 aaabBBCCC
- →8 aaabbBCCC
- →8 aaabbbCCC
- →9 aaabbbcCC
- →10 aaabbbccC
- →10 aaabbbccc
anbncndn, etc.
More complicated grammars can be used to parse { anbncndn | n ≥ 1 }, and other languages with even more letters. Here we show a simpler approach using non-contracting grammars:[citation needed] Start with a kernel of regular productions generating the sentential forms and then include the non contracting productions , , , , , , , , , .
ambncmdn
A non contracting grammar (for which there is an equivalent CSG) for the language is defined by
- ,
- ,
- ,
- ,
- ,
- ,
- , and
- .
With these definitions, a derivation for is: .[citation needed]
a2i
A noncontracting grammar for the language { a2i | i ≥ 1 } is constructed in Example 9.5 (p. 224) of (Hopcroft, Ullman, 1979):[15]
Kuroda normal form
Every context-sensitive grammar which does not generate the empty string can be transformed into a
The Kuroda normal form is an actual normal form for non-contracting grammars.
Properties and uses
This section needs additional citations for verification. (August 2014) |
Equivalence to linear bounded automaton
A formal language can be described by a context-sensitive grammar if and only if it is accepted by some linear bounded automaton (LBA).[18] In some textbooks this result is attributed solely to Landweber and Kuroda.[7] Others call it the Myhill–Landweber–Kuroda theorem.[19] (Myhill introduced the concept of deterministic LBA in 1960. Peter S. Landweber published in 1963 that the language accepted by a deterministic LBA is context sensitive.[20] Kuroda introduced the notion of non-deterministic LBA and the equivalence between LBA and CSGs in 1964.[21][22])
As of 2010[update][needs update] it is still an open question whether every context-sensitive language can be accepted by a deterministic LBA.[23]
Closure properties
Context-sensitive languages are closed under complement. This 1988 result is known as the Immerman–Szelepcsényi theorem.[19] Moreover, they are closed under
Every
Computational problems
The decision problem that asks whether a certain string s belongs to the language of a given context-sensitive grammar G, is PSPACE-complete. Moreover, there are context-sensitive grammars whose languages are PSPACE-complete. In other words, there is a context-sensitive grammar G such that deciding whether a certain string s belongs to the language of G is PSPACE-complete (so G is fixed and only s is part of the input of the problem).[26]
The
As model of natural languages
Savitch has
It has been shown that nearly all
It was proven that some natural languages are not context-free, based on identifying so-called
Ongoing research on
More recently, the class
See also
- Chomsky hierarchy
- Growing context-sensitive grammar
- Definite clause grammar#Non-context-free grammars
- List of parser generators for context-sensitive grammars
Notes
- nonterminal
- terminals
- ^ i.e., γ is a nonempty string of nonterminals (except for the start symbol) and terminals
- ^ more formally: if L ⊆ Σ* is a context-sensitive language and f maps each a∈Σ to a context-sensitive language f(a), the f(L) is again a context-sensitive language
- ^ This also follows from (1) context-free languages being also context-sensitive, (2) context-sensitive language being closed under intersection, but (3) disjointness of context-free languages being undecidable.
References
- ^ (Hopcroft, Ullman, 1979); Sect.9.4, p.227
- ISBN 978-1-4496-1552-9.
- ISBN 978-1-85233-074-3.
- ISBN 978-0-08-050246-5.
- ISBN 9780073191461.
- ISBN 978-90-272-3250-2.
- ^ ISBN 978-0-08-050246-5.
- ^ Chomsky, N. (1963). "Formal properties of grammar". In Luce, R. D.; Bush, R. R.; Galanter, E. (eds.). Handbook of Mathematical Psychology. New York: Wiley. pp. 360–363.
- ISBN 978-90-272-3250-2.
- ^ ISBN 90-272-1556-1.
- ^ Zhang, Da-Qian, Kang Zhang, and Jiannong Cao. "A context-sensitive graph grammar formalism for the specification of visual languages." The Computer Journal 44.3 (2001): 186–200.
- ISBN 9780201029888.; p. 223–224; Exercise 9, p. 230. In the 2003 edition, the chapter on CSGs has been omitted.
- .
- ^ They obtained the grammar by systematic transformation of an unrestricted grammar, given in Exm. 9.4, viz.:
- ,
- ,
- ,
- ,
- ,
- ,
- ,
- .
<name-part>
in Backus–Naur form). The symbol names are chosen to resemble the unrestricted grammar. Likewise, rule groups in the context-sensitive grammar are numbered by the unrestricted-grammar rule they originated from. - .
- ISBN 3-540-61486-9., Here: Theorem 2.2, p. 190
- ^ (Hopcroft, Ullman, 1979); Theorem 9.5, 9.6, p. 225–226
- ^ a b Sutner, Klaus (Spring 2016). "Context Sensitive Grammars" (PDF). Carnegie Mellon University. Archived from the original (PDF) on 2017-02-03. Retrieved 2019-08-29.
- .
- ISBN 978-1-85233-074-3.
- ISBN 978-90-272-3250-2.
- ISBN 9780073191461.
- ^ (Hopcroft, Ullman, 1979); Exercise S9.10, p. 230–231
- ^ (Hopcroft, Ullman, 1979); Exercise S9.14, p. 230–232. h maps each symbol to itself or to the empty string.
- S2CID 18067130.
- ^ (Hopcroft, Ullman, 1979); Exercise S9.13, p. 230–231
- ^ Kallmeyer, Laura (2011). "Mildly Context-Sensitive Grammar Formalisms: Natural Languages are not Context-Free" (PDF). Archived (PDF) from the original on 2014-08-19.
- ^ Kallmeyer, Laura (2011). "Mildly Context-Sensitive Grammar Formalisms: Linear Context-Free Rewriting Systems" (PDF). Archived (PDF) from the original on 2014-08-19.
- ^ ISBN 978-3-642-14846-0.
Further reading
- ISBN 978-0-471-73655-4.