Context-sensitive grammar

A context-sensitive grammar (CSG) is a

nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general (in the same sense) than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.^[1]

A

weakly equivalent), but it does make a difference in terms of what grammars are structurally considered context-sensitive; the latter issue was analyzed by Chomsky in 1963.^[8]^[9]

Chomsky introduced context-sensitive grammars as a way to describe the syntax of natural language where it is often the case that a word may or may not be appropriate in a certain place depending on the context. Walter Savitch has criticized the terminology "context-sensitive" as misleading and proposed "non-erasing" as better explaining the distinction between a CSG and an unrestricted grammar.^[10]

Although it is well known that certain features of languages (e.g.

graph grammars.^[11]

Formal definition

Formal grammar

Let us notate a formal grammar as $G=(N,\Sigma ,P,S)$ , with $N$ a set of nonterminal symbols, $\Sigma$ a set of terminal symbols, $P$ a set of production rules, and $S\in N$ the start symbol.

A string $u\in (N\cup \Sigma )^{*}$ directly yields, or directly derives to, a string $v\in (N\cup \Sigma )^{*}$ , denoted as $u\Rightarrow v$ , if v can be obtained from u by an application of some production rule in P, that is, if $u=\gamma L\delta$ and $v=\gamma R\delta$ , where $(L\to R)\in P$ is a production rule, and $\gamma ,\delta \in (N\cup \Sigma )^{*}$ is the unaffected left and right part of the string, respectively. More generally, u is said to yield, or derive to, v, denoted as $u\Rightarrow ^{*}v$ , if v can be obtained from u by repeated application of production rules, that is, if $u=u_{0}\Rightarrow ...\Rightarrow u_{n}=v$ for some n ≥ 0 and some strings $u_{1},...,u_{n-1}\in (N\cup \Sigma )^{*}$ . In other words, the relation $\Rightarrow ^{*}$ is the

reflexive transitive closure

of the relation

\Rightarrow

.

The language of the grammar G is the set of all terminal-symbol strings derivable from its start symbol, formally: $L(G)=\{w\in \Sigma ^{*}\mid S\Rightarrow ^{*}w\}$ . Derivations that do not end in a string composed of terminal symbols only are possible, but do not contribute to L(G).

Context-sensitive grammar

A formal grammar is context-sensitive if each rule in P is either of the form $S\to \varepsilon$ where $\varepsilon$ is the empty string, or of the form

αAβ → αγβ

with A ∈ N,^{[note 1]} $\alpha ,\beta \in (N\cup \Sigma \setminus \{S\})^{*}$ ,^{[note 2]} and $\gamma \in (N\cup \Sigma \setminus \{S\})^{+}$ .^{[note 3]}

The name context-sensitive is explained by the α and β that form the context of A and determine whether A can be replaced with γ or not. By contrast, in a context-free grammar, no context is present: the left hand side of every production rule is just a nonterminal.

The string γ is not allowed to be empty. Without this restriction, the resulting grammars become equal in power to unrestricted grammars.^[10]

(Weakly) equivalent definitions

A noncontracting grammar is a grammar in which for any production rule, of the form u → v, the length of u is less than or equal to the length of v.

Every context-sensitive grammar is noncontracting, while every noncontracting grammar can be converted into an equivalent context-sensitive grammar; the two classes are

weakly equivalent.^[12]

Some authors use the term context-sensitive grammar to refer to noncontracting grammars in general.

The left-context- and right-context-sensitive grammars are defined by restricting the rules to just the form αA → αγ and to just Aβ → γβ, respectively. The languages generated by these grammars are also the full class of context-sensitive languages.

Penttonen normal form.^[14]

Examples

aⁿbⁿcⁿ

The following context-sensitive grammar, with start symbol S, generates the canonical non-context-free language { aⁿbⁿcⁿ | n ≥ 1 } :^{[citation needed]}

1.		S	→	a	B	C
2.		S	→	a	S	B	C
3.	C	B	→	C	Z
4.	C	Z	→	W	Z
5.	W	Z	→	W	C
6.	W	C	→	B	C
7.	a	B	→	a	b
8.	b	B	→	b	b
9.	b	C	→	b	c
10.	c	C	→	c	c

Rules 1 and 2 allow for blowing-up S to aⁿBC(BC)ⁿ⁻¹; rules 3 to 6 allow for successively exchanging each CB to BC (

four rules

are needed for that since a rule CB → BC wouldn't fit into the scheme αAβ → αγβ); rules 7–10 allow replacing a non-terminal B or C with its corresponding terminal b or c, respectively, provided it is in the right place. A generation chain for aaabbbccc is:

S

→₂ aSBC

→₂ aaSBCBC

→₁ aaaBCBCBC

→₃ aaaBCZCBC

→₄ aaaBWZCBC

→₅ aaaBWCCBC

→₆ aaaBBCCBC

→₃ aaaBBCCZC

→₄ aaaBBCWZC

→₅ aaaBBCWCC

→₆ aaaBBCBCC

→₃ aaaBBCZCC

→₄ aaaBBWZCC

→₅ aaaBBWCCC

→₆ aaaBBBCCC

→₇ aaabBBCCC

→₈ aaabbBCCC

→₈ aaabbbCCC

→₉ aaabbbcCC

→₁₀ aaabbbccC

→₁₀ aaabbbccc

aⁿbⁿcⁿdⁿ, etc.

More complicated grammars can be used to parse { aⁿbⁿcⁿdⁿ | n ≥ 1 }, and other languages with even more letters. Here we show a simpler approach using non-contracting grammars:^{[citation needed]} Start with a kernel of regular productions generating the sentential forms $(ABCD)^{n}abcd$ and then include the non contracting productions $p_{Da}:Da\rightarrow aD$ , $p_{Db}:Db\rightarrow bD$ , $p_{Dc}:Dc\rightarrow cD$ , $p_{Dd}:Dd\rightarrow dd$ , $p_{Ca}:Ca\rightarrow aC$ , $p_{Cb}:Cb\rightarrow bC$ , $p_{Cc}:Cc\rightarrow cc$ , $p_{Ba}:Ba\rightarrow aB$ , $p_{Bb}:Bb\rightarrow bb$ , $p_{Aa}:Aa\rightarrow aa$ .

a^mbⁿc^mdⁿ

A non contracting grammar (for which there is an equivalent CSG) for the language $L_{Cross}=\{a^{m}b^{n}c^{m}d^{n}\mid m\geq 1,n\geq 1\}$ is defined by

p_{0}:S\rightarrow RT

,

p_{1}:R\rightarrow aRC|aC

,

p_{3}:T\rightarrow BTd|Bd

,

p_{5}:CB\rightarrow BC

,

p_{6}:aB\rightarrow ab

,

p_{7}:bB\rightarrow bb

,

p_{8}:Cd\rightarrow cd

, and

p_{9}:Cc\rightarrow cc

.

With these definitions, a derivation for $a^{3}b^{2}c^{3}d^{2}$ is: $S\Rightarrow _{p_{0}}RT\Rightarrow _{p_{1}^{2}p_{2}}a^{3}C^{3}T\Rightarrow _{p_{3}p_{4}}a^{3}C^{3}B^{2}d^{2}\Rightarrow _{p_{5}^{6}}a^{3}B^{2}C^{3}d^{2}\Rightarrow _{p_{6}p_{7}}a^{3}b^{2}C^{3}d^{2}\Rightarrow _{p_{8}p_{9}^{2}}a^{3}b^{2}c^{3}d^{2}$ .^{[citation needed]}

a^2ⁱ

A noncontracting grammar for the language { a^2ⁱ | i ≥ 1 } is constructed in Example 9.5 (p. 224) of (Hopcroft, Ullman, 1979):^[15]

$S\rightarrow [ACaB]$
${\begin{cases}\ [Ca]a\rightarrow aa[Ca]\\\ [Ca][aB]\rightarrow aa[CaB]\\\ [ACa]a\rightarrow [Aa]a[Ca]\\\ [ACa][aB]\rightarrow [Aa]a[CaB]\\\ [ACaB]\rightarrow [Aa][aCB]\\\ [CaB]\rightarrow a[aCB]\end{cases}}$
$[aCB]\rightarrow [aDB]$
$[aCB]\rightarrow [aE]$
${\begin{cases}\ a[Da]\rightarrow [Da]a\\\ [aDB]\rightarrow [DaB]\\\ [Aa][Da]\rightarrow [ADa]a\\\ a[DaB]\rightarrow [Da][aB]\\\ [Aa][DaB]\rightarrow [ADa][aB]\end{cases}}$
$[ADa]\rightarrow [ACa]$
${\begin{cases}\ a[Ea]\rightarrow [Ea]a\\\ [aE]\rightarrow [Ea]\\\ [Aa][Ea]\rightarrow [AEa]a\end{cases}}$
$[AEa]\rightarrow a$

Kuroda normal form

Every context-sensitive grammar which does not generate the empty string can be transformed into a

weakly equivalent one in Kuroda normal form. "Weakly equivalent" here means that the two grammars generate the same language. The normal form will not in general be context-sensitive, but will be a noncontracting grammar.^[16]^[17]

The Kuroda normal form is an actual normal form for non-contracting grammars.

Properties and uses

Equivalence to linear bounded automaton

A formal language can be described by a context-sensitive grammar if and only if it is accepted by some linear bounded automaton (LBA).^[18] In some textbooks this result is attributed solely to Landweber and Kuroda.^[7] Others call it the Myhill–Landweber–Kuroda theorem.^[19] (Myhill introduced the concept of deterministic LBA in 1960. Peter S. Landweber published in 1963 that the language accepted by a deterministic LBA is context sensitive.^[20] Kuroda introduced the notion of non-deterministic LBA and the equivalence between LBA and CSGs in 1964.^[21]^[22])

As of 2010^[update]^{[needs update]} it is still an open question whether every context-sensitive language can be accepted by a deterministic LBA.^[23]

Closure properties

Context-sensitive languages are closed under complement. This 1988 result is known as the Immerman–Szelepcsényi theorem.^[19] Moreover, they are closed under

Kleene plus.^[24]

Every

string homomorphism h.^[25]

Computational problems

The decision problem that asks whether a certain string s belongs to the language of a given context-sensitive grammar G, is PSPACE-complete. Moreover, there are context-sensitive grammars whose languages are PSPACE-complete. In other words, there is a context-sensitive grammar G such that deciding whether a certain string s belongs to the language of G is PSPACE-complete (so G is fixed and only s is part of the input of the problem).^[26]

The

undecidable.^[27]^{[note 5]}

As model of natural languages

Savitch has

recursively enumerable set R, there exists a context-sensitive language/grammar G which can be used as a sort of proxy to test membership in R in the following way: given a string s, s is in R if and only if there exists a positive integer n for which scⁿ is in G, where c is an arbitrary symbol not part of R.^[10]

It has been shown that nearly all

P=NP

.

It was proven that some natural languages are not context-free, based on identifying so-called

linear context-free rewriting systems (LCFRSs) are strictly weaker than CSGs but can account for the phenomenon of cross-serial dependencies; one can write a LCFRS grammar for {aⁿbⁿcⁿdⁿ | n ≥ 1} for example.^[28]^[29]^[30]

Ongoing research on

mildly context-sensitive" whose decision problems are feasible, such as tree-adjoining grammars, combinatory categorial grammars, coupled context-free languages, and linear context-free rewriting systems

. The languages generated by these formalisms properly lie between the context-free and context-sensitive languages.

More recently, the class

PTIME has been identified with range concatenation grammars, which are now considered to be the most expressive of the mild-context sensitive language classes.^[30]

Notes

nonterminal

terminals

^ i.e., γ is a nonempty string of nonterminals (except for the start symbol) and terminals

^ more formally: if L ⊆ Σ^* is a context-sensitive language and f maps each a∈Σ to a context-sensitive language f(a), the f(L) is again a context-sensitive language

^ This also follows from (1) context-free languages being also context-sensitive, (2) context-sensitive language being closed under intersection, but (3) disjointness of context-free languages being undecidable.

References

^ (Hopcroft, Ullman, 1979); Sect.9.4, p.227

ISBN 978-1-4496-1552-9
.

ISBN 978-1-85233-074-3
.

ISBN 978-0-08-050246-5
.

ISBN 9780073191461
.

ISBN 978-90-272-3250-2
.

^
ISBN 978-0-08-050246-5
.

^ Chomsky, N. (1963). "Formal properties of grammar". In Luce, R. D.; Bush, R. R.; Galanter, E. (eds.). Handbook of Mathematical Psychology. New York: Wiley. pp. 360–363.

ISBN 978-90-272-3250-2
.

^
ISBN 90-272-1556-1
.

^ Zhang, Da-Qian, Kang Zhang, and Jiannong Cao. "A context-sensitive graph grammar formalism for the specification of visual languages." The Computer Journal 44.3 (2001): 186–200.

ISBN 9780201029888
.; p. 223–224; Exercise 9, p. 230. In the 2003 edition, the chapter on CSGs has been omitted.

ISBN 978-1-55608-003-6. also at https://www.encyclopediaofmath.org/index.php/Grammar,_context-sensitive

doi:10.1016/S0019-9958(74)91049-3
.

^ They obtained the grammar by systematic transformation of an unrestricted grammar, given in Exm. 9.4, viz.:
$S\rightarrow ACaB$ ,

$Ca\rightarrow aaC$ ,

$CB\rightarrow DB$ ,

$CB\rightarrow E$ ,

$aD\rightarrow Da$ ,

$AD\rightarrow AC$ ,

$aE\rightarrow Ea$ ,

$AE\rightarrow \varepsilon$ .
In the context-sensitive grammar, a string enclosed in square brackets, like $[ACaB]$ , is considered a single symbol (similar to e.g. <name-part> in Backus–Naur form). The symbol names are chosen to resemble the unrestricted grammar. Likewise, rule groups in the context-sensitive grammar are numbered by the unrestricted-grammar rule they originated from.

doi:10.1016/s0019-9958(64)90120-2
.

ISBN 3-540-61486-9
., Here: Theorem 2.2, p. 190

^ (Hopcroft, Ullman, 1979); Theorem 9.5, 9.6, p. 225–226

^ ^a ^b Sutner, Klaus (Spring 2016). "Context Sensitive Grammars" (PDF). Carnegie Mellon University. Archived from the original (PDF) on 2017-02-03. Retrieved 2019-08-29.

doi:10.1016/s0019-9958(63)90169-4
.

ISBN 978-1-85233-074-3
.

ISBN 978-90-272-3250-2
.

ISBN 9780073191461
.

^ (Hopcroft, Ullman, 1979); Exercise S9.10, p. 230–231

^ (Hopcroft, Ullman, 1979); Exercise S9.14, p. 230–232. h maps each symbol to itself or to the empty string.

S2CID 18067130
.

^ (Hopcroft, Ullman, 1979); Exercise S9.13, p. 230–231

^ Kallmeyer, Laura (2011). "Mildly Context-Sensitive Grammar Formalisms: Natural Languages are not Context-Free" (PDF). Archived (PDF) from the original on 2014-08-19.

^ Kallmeyer, Laura (2011). "Mildly Context-Sensitive Grammar Formalisms: Linear Context-Free Rewriting Systems" (PDF). Archived (PDF) from the original on 2014-08-19.

^
ISBN 978-3-642-14846-0
.

Further reading

ISBN 978-0-471-73655-4
.

External links

Earley Parsing for Context-Sensitive Grammars

v
t
e
Automata theory: formal languages and formal grammars
Chomsky hierarchy Grammars Languages Abstract machines

Type-0

—

Type-1

—

—

—

—

—

Type-2

—

—

Type-3

—

—

Unrestricted

(no common name)

Context-sensitive

Positive
range concatenation

Indexed

—

Linear context-free rewriting systems

Tree-adjoining

Context-free

Deterministic context-free

Visibly pushdown

Regular

—

Non-recursive

Recursively enumerable

Decidable

Context-sensitive

Positive
range concatenation
^*

Indexed^*

—

Linear context-free rewriting language

Tree-adjoining

Context-free

Deterministic context-free

Visibly pushdown

Regular

Star-free

Finite

Turing machine

Decider

Linear-bounded

PTIME
Turing Machine

Nested stack

Thread automaton

restricted Tree stack automaton

Embedded pushdown

Nondeterministic pushdown

Deterministic pushdown

Visibly pushdown

Finite

Counter-free (with aperiodic finite monoid)

Acyclic finite

Each category of languages, except those marked by a ^*, is a
proper subset
of the category directly above it. Any language in each category is generated by a grammar and by an automaton in the category in the same line.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Context-sensitive_grammar&oldid=1253959429"

[12] nonterminal

[13] terminals

[14] .e., γ is a nonempty string of nonterminals (except for the start symbol) and terminals

[27] re formally: if L ⊆ Σ^* is a context-sensitive language and f maps each a∈Σ to a context-sensitive language f(a), the f(L) is again a context-sensitive language

[32] This also follows from (1) context-free languages being also context-sensitive, (2) context-sensitive language being closed under intersection, but (3) disjointness of context-free languages being undecidable.

[1] (Hopcroft, Ullman, 1979); Sect.9.4, p.227

[Linz2011-2] ISBN 978-1-4496-1552-9
.

[Meduna2000-3] ISBN 978-1-85233-074-3
.

[DavisSigal1994-4] ISBN 978-0-08-050246-5
.

[5] ISBN 9780073191461
.

[Levelt2008-6] ISBN 978-90-272-3250-2
.

[DavisSigal1994b-7] 
ISBN 978-0-08-050246-5
.

[8] Chomsky, N. (1963). "Formal properties of grammar". In Luce, R. D.; Bush, R. R.; Galanter, E. (eds.). Handbook of Mathematical Psychology. New York: Wiley. pp. 360–363.

[Levelt2008-126-9] ISBN 978-90-272-3250-2
.

[Vide1999-10] 
ISBN 90-272-1556-1
.

[11] Zhang, Da-Qian, Kang Zhang, and Jiannong Cao. "A context-sensitive graph grammar formalism for the specification of visual languages." The Computer Journal 44.3 (2001): 186–200.

[15] ISBN 9780201029888
.; p. 223–224; Exercise 9, p. 230. In the 2003 edition, the chapter on CSGs has been omitted.

[Hazewinkel1989-16] ISBN 978-1-55608-003-6. also at https://www.encyclopediaofmath.org/index.php/Grammar,_context-sensitive

[ItoKobayashi2010-17] :10.1016/S0019-9958(74)91049-3
.

[18] They obtained the grammar by systematic transformation of an unrestricted grammar, given in Exm. 9.4, viz.:
$S\rightarrow ACaB$ ,

$Ca\rightarrow aaC$ ,

$CB\rightarrow DB$ ,

$CB\rightarrow E$ ,

$aD\rightarrow Da$ ,

$AD\rightarrow AC$ ,

$aE\rightarrow Ea$ ,

$AE\rightarrow \varepsilon$ .
In the context-sensitive grammar, a string enclosed in square brackets, like $[ACaB]$ , is considered a single symbol (similar to e.g. <name-part> in Backus–Naur form). The symbol names are chosen to resemble the unrestricted grammar. Likewise, rule groups in the context-sensitive grammar are numbered by the unrestricted-grammar rule they originated from.

[21] $S\rightarrow ACaB$ ,

[22] $Ca\rightarrow aaC$ ,

[23] $CB\rightarrow DB$ ,

[24] $CB\rightarrow E$ ,

[25] $aD\rightarrow Da$ ,

[26] $AD\rightarrow AC$ ,

[27] $aE\rightarrow Ea$ ,

[28] $AE\rightarrow \varepsilon$ .

[19] doi:10.1016/s0019-9958(64)90120-2
.

[20] ISBN 3-540-61486-9
., Here: Theorem 2.2, p. 190

[21] (Hopcroft, Ullman, 1979); Theorem 9.5, 9.6, p. 225–226

[flac-22] Sutner, Klaus (Spring 2016). "Context Sensitive Grammars" (PDF). Carnegie Mellon University. Archived from the original (PDF) on 2017-02-03. Retrieved 2019-08-29.

[23] doi:10.1016/s0019-9958(63)90169-4
.

[24] ISBN 978-1-85233-074-3
.

[25] ISBN 978-90-272-3250-2
.

[26] ISBN 9780073191461
.

[28] (Hopcroft, Ullman, 1979); Exercise S9.10, p. 230–231

[29] (Hopcroft, Ullman, 1979); Exercise S9.14, p. 230–232. h maps each symbol to itself or to the empty string.

[30] S2CID 18067130
.

[31] (Hopcroft, Ullman, 1979); Exercise S9.13, p. 230–231

[33] Kallmeyer, Laura (2011). "Mildly Context-Sensitive Grammar Formalisms: Natural Languages are not Context-Free" (PDF). Archived (PDF) from the original on 2014-08-19.

[34] Kallmeyer, Laura (2011). "Mildly Context-Sensitive Grammar Formalisms: Linear Context-Free Rewriting Systems" (PDF). Archived (PDF) from the original on 2014-08-19.

[Kallmeyer2010-35] 
ISBN 978-3-642-14846-0
.

[44] $S\rightarrow ACaB$ ,

[45] $Ca\rightarrow aaC$ ,

[46] $CB\rightarrow DB$ ,

[47] $CB\rightarrow E$ ,

[48] $aD\rightarrow Da$ ,

[49] $AD\rightarrow AC$ ,

[50] $aE\rightarrow Ea$ ,

[51] $AE\rightarrow \varepsilon$ .

[1]

[8]

[9]

[10]

[11]

[note 1]

[note 2]

[note 3]

[12]

[14]

[15]

[16]

[17]

[18]

[7]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[note 5]

[28]

[29]

[30]

Formal definition

Formal grammar

Context-sensitive grammar

(Weakly) equivalent definitions

Examples

anbncn

anbncndn, etc.

ambncmdn

a2i

Kuroda normal form

Properties and uses

Equivalence to linear bounded automaton

Closure properties

Computational problems

As model of natural languages

See also

Notes

References

Further reading

External links

aⁿbⁿcⁿ

aⁿbⁿcⁿdⁿ, etc.

a^mbⁿc^mdⁿ

a^2ⁱ