Link grammar
Link grammar (LG) is a theory of syntax by Davy Temperley and Daniel Sleator which builds relations between pairs of words, rather than constructing constituents in a phrase structure hierarchy. Link grammar is similar to dependency grammar, but dependency grammar includes a head-dependent relationship, whereas link grammar makes the head-dependent relationship optional (links need not indicate direction).[1] Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words.[2] The relationship between words is indicated with link types, thus making the Link grammar closely related to certain categorial grammars.
For example, in a
In a
Overview
Link grammar connects the words in a sentence with links, similar in form to a
Link grammar also differs from traditional dependency grammars by allowing cyclic relations between words. Thus, for example, there can be links indicating both the head verb of a sentence, the head subject of the sentence, as well as a link between the subject and the verb. These three links thus form a cycle (a triangle, in this case). Cycles are useful in constraining what might otherwise be ambiguous parses; cycles help "tighten up" the set of allowable parses of a sentence.
For example, in the parse
+---->WV--->+ +--Wd--+-Ss-+--Pa--+ | | | | LEFT-WALL he runs fast
the LEFT-WALL indicates the start of the sentence, or the root node. The directional WV link (with arrows) points at the head verb of the sentence; it is the Wall-Verb link.
Parsing algorithm
Parsing is performed in analogy to assembling a
A given word may have dozens or even hundreds of allowed puzzle-shapes (termed "disjuncts"): for example, many verbs may be optionally transitive, thus making the O+ connector optional; such verbs might also take adverbial modifiers (E connectors) which are inherently optional. More complex verbs may have additional connectors for indirect objects, or for
Dependency
Connectors may also include head-dependent indicators h and d. In this case, a connector containing a head indicator is only allowed to connect to a connector containing the dependent indicator (or to a connector without any h-d indicators on it). When these indicators are used, the link is decorated with arrows to indicate the link direction.[9]
A recent extension simplifies the specification of connectors for languages that have little or no restrictions on word-order, such as Lithuanian. There are also extensions to make it easier to support languages with concatenative morphologies.
Planarity
The parsing algorithm also requires that the final graph is a planar graph, i.e. that no links cross.[9] This constraint is based on empirical psycho-linguistic evidence that, indeed, for most languages, in nearly all situations, dependency links really do not cross.[11][12] There are rare exceptions, e.g. in Finnish, and even in English; they can be parsed by link-grammar only by introducing more complex and selective connector types to capture these situations.
Costs and selection
Connectors can have an optional
The assignment of a log-likelihood to linkages allows link grammar to implement the semantic selection of predicate-argument relationships. That is, certain constructions, although syntactically valid, are extremely unlikely. In this way, link grammar embodies some of the ideas present in operator grammar.
Because the costs are additive, they behave like the logarithm of the probability (since log-likelihoods are additive), or equivalently, somewhat like the
Type theory
The link grammar link types can be understood to be the types in the sense of
whereas the corresponding disjuncts in link grammar would be
the: D+; bad: A+; boy: D- & A-;
The contraction rules (inference rules) of the Lambek calculus can be mapped to the connecting of connectors in link grammar. The + and - directional indicators correspond the forward and backward-slashes of the categorical grammar. Finally, the single-letter names A and D can be understood as labels or "easy-to-read" mnemonic names for the rather more verbose types NP/N, etc.
The primary distinction here is then that the categorical grammars have two type constructors, the forward and backward slashes, that can be used to create new types (such as NP/N) from base types (such as NP and N). Link-grammar omits the use of type constructors, opting instead to define a much larger set of base types having compact, easy-to-remember mnemonics.
Examples
Example 1
A basic rule file for an SVO language might look like:
<determiner> D+; <noun-subject> {D−} & S+; <noun-object> {D−} & O−; <verb> S− & {O+};
Thus the English sentence, "The boy painted a picture" would appear as:
+-----O-----+ +-D-+--S--+ +--D--+ | | | | | The boy painted a picture
Similar parses apply for Chinese.[20]
Example 2
Conversely, a rule file for a
<noun-subject> S+; <noun-object> O+; <verb> {O−} & {S−};
And a simple Persian sentence, man nAn xordam (من نان خوردم) 'I ate bread' would look like:[21][22][23]
+-----S-----+ | +--O--+ | | | man nAn xordam
VSO order can be likewise accommodated, such as for Arabic.[24]
Example 3 (morphology)
In many languages with a concatenative morphology, the stem plays no grammatical role; the grammar is determined by the suffixes. Thus, in Russian, the sentence 'вверху плыли редкие облачка' might have the parse:[25][26]
+------------Wd-----------+---------------SIp---------------+ | +-------EI------+ +--------Api-------+ | | +--LLCZD-+ +-LLAQZ+ +--LLCAO-+ | | | | | | | | LEFT-WALL вверху.e плы.= =ли.vnndpp ре.= =дкие.api облачк.= =а.ndnpi
The subscripts, such as '.vnndpp', are used to indicate the grammatical category. The primary links: Wd, EI, SIp and Api connect together the suffixes, as, in principle, other stems could appear here, without altering the structure of the sentence. The Api link indicates the adjective; SIp denotes subject-verb inversion; EI is a modifier. The Wd link is used to indicate the head noun; the head verb is not indicated in this sentence. The LLXXX links serve only to attach stems to suffixes.
Example 4 (phonology)
The link-grammar can also indicate phonological agreement between neighboring words. For example:
+---------Ost--------+ +------>WV------>+ +------Ds**x-----+ +----Wd---+-Ss*b-+ +--PHv-+----A----+ | | | | | | LEFT-WALL that.j-p is.v an abstract.a concept.n
Here, the connector 'PH' is used to constrain the determiners that can appear before the word 'abstract'. It effectively blocks (makes it costly) to use the determiner 'a' in this sentence, while the link to 'an' becomes cheap. The other links are roughly as in previous examples: S denoting subject, O denoting object, D denoting determiner. The 'WV' link indicates the head verb, and the 'W' link indicates the head noun. The lower-case letters following the upper-case link types serve to refine the type; so for example, Ds can only connect to a singular noun; Ss only to a singular subject, Os to a singular object. The lower-case v in PHv denotes 'vowel'; the lower-case d in Wd denotes a declarative sentence.
Example 5 (Vietnamese)
The Vietnamese language sentence "Bữa tiệc hôm qua là một thành công lớn" - "The party yesterday was a great success" may be parsed as follows:[27]
Implementations
Developer(s) | OpenCog |
---|---|
Initial release | October 1991[1] |
Stable release | 5.8.1
/ January 8, 2021[28] |
Repository | |
Written in | LGPLv2 |
Website | www |
The link grammar syntax
A current major undertaking is a project to learn the grammar and morphology of new languages, using unsupervised learning algorithms.[33][34]
The link-parser program along with rules and word lists for English may be found in standard Linux distributions, e.g., as a Debian package, although many of these are years out of date.[35]
Applications
AbiWord,[29] a free word processor, uses link grammar for on-the-fly grammar checking. Words that cannot be linked anywhere are underlined in green.
The semantic relationship extractor RelEx,
Link grammar has also been employed for information extraction of biomedical texts[39][40] and events described in news articles,[41] as well as experimental machine translation systems from English to German, Turkish, Indonesian.[42] and Persian.[43][44]
The link grammar link dictionary is used to generate and verify the syntactic correctness of three different natural language generation systems: NLGen,[45] NLGen2[46] and microplanner/surreal.[47] It is also used as a part of the NLP pipeline in the OpenCog AI project.
Notes
- ^ a b Daniel Sleator (September 8, 2004). "Link Grammar Bibliography". cmu.edu. Retrieved 2023-08-28.
- ^ Anssi Yli-Jyrä & Matti Nykänen (2004). "A Hierarchy of Mildly Context-Sensitive Dependency Grammars" (PDF). In G. P. Gerhard Jäger, Paola Monachesi and S. Wintner (ed.). Proceedings of the 9th conference on Formal Grammar 2004 "FGNancy". Pre-Proceedings. pp. 151–165.
- ^ Özlem İstek (2006). A Link Grammar for Turkish (PDF) (Master's thesis). Ankara, Turkey: Bilkent University. Retrieved 2023-08-23.
- ^ WV Link type
- ^ W link type
- ^ S link type
- ^ P link type
- arXiv:cmp-lg/9508004.
- ^ a b c d e An Introduction to the Link Grammar Parser
- ^ Dennis Grinberg; John Lafferty; Daniel Sleator (1995). A Robust Parsing Algorithm for Link Grammar (PDF). Proceedings of the Fourth International Workshop on Parsing Technologies, Prague. Retrieved 2023-08-28.
- ^ J. Havelka (2007). Beyond projectivity: multilingual evaluation of constraints and measures on non-projective structures. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Republic: Association for Computational Linguistics. pp. 608–615.
- ^ R. Ferrer i Cancho (2006). "Why do syntactic links not cross?". EPL. 76 (6): 1228–1234.
- ^ John Lafferty; Daniel Sleator; Davey Temperley (1992). Grammatical Trigrams: a Probabilistic Model of Link Grammar (PDF). Proceedings of the AAAI Conference on Probabilistic Approaches to Natural Language.
- arXiv:1304.4086.
- ^ D. Temperley (2008). "Dependency length minimization in natural and artificial languages". Journal of Quantitative Linguistics. 15 (3): 256–282.
- ^ E. Gibson (2000). "The dependency locality theory: A distance-based theory of linguistic complexity". In Marantz, A.; Miyashita, Y.; O'Neil, W. (eds.). Image, Language, Brain: Papers from the first Mind Articulation Project Symposium. Cambridge, MA: MIT Press.
- ^ Haitao Liu (2008). "Dependency distance as a metric of language comprehension difficulty" (PDF). Journal of Cognitive Science. 9 (2): 159–191.
- PMC 4547262.
- ^ Daniel Sleator; Davey Temperley (1993). Parsing English with a Link Grammar (PDF). Third International Workshop on Parsing Technologies. (See section 6 on categorial grammar).
- ^ Carol Liu (2001). "Towards A Link Grammar for Chinese". Computer Processing of Chinese and Oriental Languages. Chinese Language Computer Society.
- ^ John Dehdari; Deryle Lonsdale (2005). "A Link Grammar for Persian" (PDF). Ohio-state.edu. Archived from the original (PDF) on 2008-12-03.
- ^ Armin Sajadi; A. Abdollahzadeh (2006). "Farsi Syntactic Analysis using Link Grammar" (PDF). Letter of Research Center of Intelligent Signal Processing (in Persian). 1 (9): 25–37. Archived from the original (PDF) on 2014-04-01.
- ^ A. Sajadi; M. Homayounpour (2006). "Representation of Farsi Morphological Knowledge using Link Grammar". Letter of Research Center of Intelligent Signal Processing (in Persian). 1 (9): 41–55.
- ^ Warren Casbeer; Jon Dehdari; Deryle Lonsdale (March 2006). A Link Grammar parser for Arabic (PDF). Perspectives on Arabic Linguistics: Papers from the annual symposium on Arabic linguistics. Vol. XX. Kalamazoo, Michigan. Archived from the original (PDF) on 2014-05-12.
- ^ Документация по связям и по классам слов доступна.
- ^ Грамматика связей (Link Grammar)
- ^ Nguyễn Thị Thu Hương, Nguyễn Thúc Hải, Nguyễn Thanh Thủy "Parsing complex - compound sentences with an extension of Vietnamese link parser combined with discourse segmenter" Journal of Computer Science and Cybernetics, Vol 28, No 4 (2012)
- ^ www
.abisource .com /downloads /link-grammar / - ^ a b AbiWord — Link Grammar Parser
- ^ Lingua-LinkParser (Perl interfaces)
- ^ "Ruby Link Parser interfaces". Archived from the original on 2016-03-04. Retrieved 2019-02-01.
- ^ javaScript node.js library
- ^ OpenCog Language Learning
- ^ Learning Language from a Large (Unannotated) Corpus
- ^ Debian - Package Search Results - link-grammar
- ^ "RelEx Dependency Relationship Extractor". Archived from the original on 2009-07-28. Retrieved 2013-11-21.
- ^ The Stanford Parser: A statistical parser
- ^ The Penn Treebank Project Archived 2013-11-09 at the Wayback Machine
- ISBN 0-7695-2038-3. Archived from the original(PDF) on 2011-03-31. Retrieved 2023-08-27.
- ^ Sampo Pyysalo, Tapio Salakoski, Sophie Aubin and Adeline Nazarenko, "Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches", BMC Bioinformatics 7(Suppl 3):S2 (2006).
- .
- .
- ^ A.Sajadi and M.R Borujerdi, "Machine Translation Using Link Grammar", Submitted to the Journal of Computational Linguistics, MIT Press (Feb 2009)
- ^ Sajadi, A., Borujerdi, M. "Machine Translation Based on Unification Link Grammar" Journal of Artificial Intelligence Review. DOI=10.1007/s10462-011-9261-7, Pages 109-132, 2013.
- ^ Ruiting Lian, et al, "Sentence generation for artificial brains: a glocal similarity matching approach", Neurocomputing (Elsevier) (2009, submitted for publication).
- ^ Blake Lemoine, NLGen2: A Linguistically Plausible, General Purpose Natural Language Generation System (2009)
- ^ Microplanner and Surface Realization (SuReal)
External links
- The original Link Grammar homepage (which has been replaced by the current project.)
- Online English demonstration (for an older, out-of-date version; many bugs have been fixed since this version.)
- BioLG, a modification of the Link Grammar Parser adapted for the biomedical domain (many, but not all, BioLG enhancements have been folded back into the main link-grammar distribution).
- Parsing sentences with Link Grammar and Python by Jeff Elmore at PyCon 2012