Context-sensitive grammar

Source: Wikipedia, the free encyclopedia.

A context-sensitive grammar (CSG) is a

nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general (in the same sense) than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.[1]

A

weakly equivalent), but it does make a difference in terms of what grammars are structurally considered context-sensitive; the latter issue was analyzed by Chomsky in 1963.[8][9]

Chomsky introduced context-sensitive grammars as a way to describe the syntax of natural language where it is often the case that a word may or may not be appropriate in a certain place depending on the context. Walter Savitch has criticized the terminology "context-sensitive" as misleading and proposed "non-erasing" as better explaining the distinction between a CSG and an unrestricted grammar.[10]

Although it is well known that certain features of languages (e.g.

graph grammars.[11]

Formal definition

Formal grammar

Let us notate a formal grammar as , with a set of nonterminal symbols, a set of terminal symbols, a set of production rules, and the start symbol.

A string directly yields, or directly derives to, a string , denoted as , if v can be obtained from u by an application of some production rule in P, that is, if and , where is a production rule, and is the unaffected left and right part of the string, respectively. More generally, u is said to yield, or derive to, v, denoted as , if v can be obtained from u by repeated application of production rules, that is, if for some n ≥ 0 and some strings . In other words, the relation is the

reflexive transitive closure
of the relation .

The language of the grammar G is the set of all terminal-symbol strings derivable from its start symbol, formally: . Derivations that do not end in a string composed of terminal symbols only are possible, but do not contribute to L(G).

Context-sensitive grammar

A formal grammar is context-sensitive if each rule in P is either of the form where is the empty string, or of the form

αAβ → αγβ

with AN,[note 1] ,[note 2] and .[note 3]

The name context-sensitive is explained by the α and β that form the context of A and determine whether A can be replaced with γ or not. By contrast, in a context-free grammar, no context is present: the left hand side of every production rule is just a nonterminal.

The string γ is not allowed to be empty. Without this restriction, the resulting grammars become equal in power to unrestricted grammars.[10]

(Weakly) equivalent definitions

A noncontracting grammar is a grammar in which for any production rule, of the form uv, the length of u is less than or equal to the length of v.

Every context-sensitive grammar is noncontracting, while every noncontracting grammar can be converted into an equivalent context-sensitive grammar; the two classes are

weakly equivalent.[12]

Some authors use the term context-sensitive grammar to refer to noncontracting grammars in general.

The left-context- and right-context-sensitive grammars are defined by restricting the rules to just the form αA → αγ and to just Aβ → γβ, respectively. The languages generated by these grammars are also the full class of context-sensitive languages.

Examples

anbncn

The following context-sensitive grammar, with start symbol S, generates the canonical non-context-free language { anbncn | n ≥ 1 } :[citation needed]

1.       S     →     a B C
2. S a S B C
3. C B C Z
4. C Z W Z
5. W Z W C
6. W C B C
7. a B a b
8. b B b b
9. b C b c
10. c C c c

Rules 1 and 2 allow for blowing-up S to anBC(BC)n−1; rules 3 to 6 allow for successively exchanging each CB to BC (

four rules
are needed for that since a rule CBBC wouldn't fit into the scheme αAβ → αγβ); rules 7–10 allow replacing a non-terminal B or C with its corresponding terminal b or c, respectively, provided it is in the right place. A generation chain for aaabbbccc is:

S
2 aSBC
2 aaSBCBC
1 aaaBCBCBC
3 aaaBCZCBC
4 aaaBWZCBC
5 aaaBWCCBC
6 aaaBBCCBC
3 aaaBBCCZC
4 aaaBBCWZC
5 aaaBBCWCC
6 aaaBBCBCC
3 aaaBBCZCC
4 aaaBBWZCC
5 aaaBBWCCC
6 aaaBBBCCC
7 aaabBBCCC
8 aaabbBCCC
8 aaabbbCCC
9 aaabbbcCC
10 aaabbbccC
10 aaabbbccc

anbncndn, etc.

More complicated grammars can be used to parse { anbncndn | n ≥ 1 }, and other languages with even more letters. Here we show a simpler approach using non-contracting grammars:[citation needed] Start with a kernel of regular productions generating the sentential forms and then include the non contracting productions , , , , , , , , , .

ambncmdn

A non contracting grammar (for which there is an equivalent CSG) for the language is defined by

,
,
,
,
,
,
, and
.

With these definitions, a derivation for is: .[citation needed]

a2i

A noncontracting grammar for the language { a2i | i ≥ 1 } is constructed in Example 9.5 (p. 224) of (Hopcroft, Ullman, 1979):[15]

Kuroda normal form

Every context-sensitive grammar which does not generate the empty string can be transformed into a

weakly equivalent one in Kuroda normal form. "Weakly equivalent" here means that the two grammars generate the same language. The normal form will not in general be context-sensitive, but will be a noncontracting grammar.[16][17]

The Kuroda normal form is an actual normal form for non-contracting grammars.

Properties and uses

Equivalence to linear bounded automaton

A formal language can be described by a context-sensitive grammar if and only if it is accepted by some linear bounded automaton (LBA).[18] In some textbooks this result is attributed solely to Landweber and Kuroda.[7] Others call it the Myhill–Landweber–Kuroda theorem.[19] (Myhill introduced the concept of deterministic LBA in 1960. Peter S. Landweber published in 1963 that the language accepted by a deterministic LBA is context sensitive.[20] Kuroda introduced the notion of non-deterministic LBA and the equivalence between LBA and CSGs in 1964.[21][22])

As of 2010[needs update] it is still an open question whether every context-sensitive language can be accepted by a deterministic LBA.[23]

Closure properties

Context-sensitive languages are closed under complement. This 1988 result is known as the Immerman–Szelepcsényi theorem.[19] Moreover, they are closed under

Kleene plus.[24]

Every

Computational problems

The decision problem that asks whether a certain string s belongs to the language of a given context-sensitive grammar G, is PSPACE-complete. Moreover, there are context-sensitive grammars whose languages are PSPACE-complete. In other words, there is a context-sensitive grammar G such that deciding whether a certain string s belongs to the language of G is PSPACE-complete (so G is fixed and only s is part of the input of the problem).[26]

The

undecidable.[27][note 5]

As model of natural languages

Savitch has

recursively enumerable set R, there exists a context-sensitive language/grammar G which can be used as a sort of proxy to test membership in R in the following way: given a string s, s is in R if and only if there exists a positive integer n for which scn is in G, where c is an arbitrary symbol not part of R.[10]

It has been shown that nearly all

P=NP
.

It was proven that some natural languages are not context-free, based on identifying so-called

linear context-free rewriting systems (LCFRSs) are strictly weaker than CSGs but can account for the phenomenon of cross-serial dependencies; one can write a LCFRS grammar for {anbncndn | n ≥ 1} for example.[28][29][30]

Ongoing research on

. The languages generated by these formalisms properly lie between the context-free and context-sensitive languages.

More recently, the class

PTIME has been identified with range concatenation grammars, which are now considered to be the most expressive of the mild-context sensitive language classes.[30]

See also

Notes

  1. nonterminal
  2. terminals
  3. ^ i.e., γ is a nonempty string of nonterminals (except for the start symbol) and terminals
  4. ^ more formally: if L ⊆ Σ* is a context-sensitive language and f maps each a∈Σ to a context-sensitive language f(a), the f(L) is again a context-sensitive language
  5. ^ This also follows from (1) context-free languages being also context-sensitive, (2) context-sensitive language being closed under intersection, but (3) disjointness of context-free languages being undecidable.

References

  1. ^ (Hopcroft, Ullman, 1979); Sect.9.4, p.227
  2. .
  3. .
  4. .
  5. .
  6. .
  7. ^ .
  8. ^ Chomsky, N. (1963). "Formal properties of grammar". In Luce, R. D.; Bush, R. R.; Galanter, E. (eds.). Handbook of Mathematical Psychology. New York: Wiley. pp. 360–363.
  9. .
  10. ^ .
  11. ^ Zhang, Da-Qian, Kang Zhang, and Jiannong Cao. "A context-sensitive graph grammar formalism for the specification of visual languages." The Computer Journal 44.3 (2001): 186–200.
  12. .; p. 223–224; Exercise 9, p. 230. In the 2003 edition, the chapter on CSGs has been omitted.
  13. .
  14. ^ They obtained the grammar by systematic transformation of an unrestricted grammar, given in Exm. 9.4, viz.:
    1. ,
    2. ,
    3. ,
    4. ,
    5. ,
    6. ,
    7. ,
    8. .
    In the context-sensitive grammar, a string enclosed in square brackets, like , is considered a single symbol (similar to e.g. <name-part> in Backus–Naur form). The symbol names are chosen to resemble the unrestricted grammar. Likewise, rule groups in the context-sensitive grammar are numbered by the unrestricted-grammar rule they originated from.
  15. .
  16. ., Here: Theorem 2.2, p. 190
  17. ^ (Hopcroft, Ullman, 1979); Theorem 9.5, 9.6, p. 225–226
  18. ^ a b Sutner, Klaus (Spring 2016). "Context Sensitive Grammars" (PDF). Carnegie Mellon University. Archived from the original (PDF) on 2017-02-03. Retrieved 2019-08-29.
  19. .
  20. .
  21. .
  22. .
  23. ^ (Hopcroft, Ullman, 1979); Exercise S9.10, p. 230–231
  24. ^ (Hopcroft, Ullman, 1979); Exercise S9.14, p. 230–232. h maps each symbol to itself or to the empty string.
  25. S2CID 18067130
    .
  26. ^ (Hopcroft, Ullman, 1979); Exercise S9.13, p. 230–231
  27. ^ Kallmeyer, Laura (2011). "Mildly Context-Sensitive Grammar Formalisms: Natural Languages are not Context-Free" (PDF). Archived (PDF) from the original on 2014-08-19.
  28. ^ Kallmeyer, Laura (2011). "Mildly Context-Sensitive Grammar Formalisms: Linear Context-Free Rewriting Systems" (PDF). Archived (PDF) from the original on 2014-08-19.
  29. ^ .

Further reading

External links