Speech production
Part of a series on | ||||||
Phonetics | ||||||
---|---|---|---|---|---|---|
Part of the Linguistics Series | ||||||
Subdisciplines | ||||||
Articulation | ||||||
|
||||||
Acoustics | ||||||
|
||||||
Perception | ||||||
|
||||||
Linguistics portal | ||||||
Speech production is the process by which thoughts are translated into speech. This includes the selection of
In ordinary fluent
Normally speech is created with pulmonary pressure provided by the lungs that generates sound by phonation through the glottis in the larynx that then is modified by the vocal tract into different vowels and consonants. However speech production can occur without the use of the lungs and glottis in alaryngeal speech by using the upper parts of the vocal tract. An example of such alaryngeal speech is Donald Duck talk.[5]
The vocal production of speech may be associated with the production of hand
The development of speech production throughout an individual's life starts from an infant's first babble and is transformed into fully developed speech by the age of five.
Three stages
The production of spoken language involves three major levels of processing: conceptualization, formulation, and articulation.[1][8][9]
The first is the processes of
The second stage is formulation in which the linguistic form required for the expression of the desired message is created. Formulation includes grammatical encoding, morpho-phonological encoding, and phonetic encoding.[10] Grammatical encoding is the process of selecting the appropriate syntactic word or lemma. The selected lemma then activates the appropriate syntactic frame for the conceptualized message. Morpho-phonological encoding is the process of breaking words down into syllables to be produced in overt speech. Syllabification is dependent on the preceding and proceeding words, for instance: I-com-pre-hend vs. I-com-pre-hen-dit.[10] The final part of the formulation stage is phonetic encoding. This involves the activation of
The third stage of speech production is articulation, which is the execution of the articulatory score by the lungs, glottis, larynx,
![](http://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Blausen_0215_CerebralHemispheres.png/141px-Blausen_0215_CerebralHemispheres.png)
Neuroscience
The
Disorders
Speech production can be affected by several disorders:
|
History of speech production research
![](http://upload.wikimedia.org/wikipedia/commons/5/5d/Speecherrorchart2.png)
Until the late 1960s research on speech was focused on comprehension. As researchers collected greater volumes of speech error data, they began to investigate the psychological processes responsible for the production of speech sounds and to contemplate possible processes for fluent speech.[14] Findings from speech error research were soon incorporated into speech production models. Evidence from speech error data supports the following conclusions about speech production.
Some of these ideas include:
- Speech is planned in advance.[15]
- The lexicon is organized both semantically and phonologically.[15] That is by meaning, and by the sound of the words.
- Morphologically complex words are assembled.[15] Words that we produce that contain morphemes are put together during the speech production process. Morphemes are the smallest units of language that contain meaning. For example, "ed" on a past tense word.
- Affixes and functors behave differently from context words in slips of the tongue.[15] This means the rules about the ways in which a word can be used are likely stored with them, which means generally when speech errors are made, the mistake words maintain their functions and make grammatical sense.
- Speech errors reflect rule knowledge.[15] Even in our mistakes, speech is not nonsensical. The words and sentences that are produced in speech errors are typically grammatical, and do not violate the rules of the language being spoken.
Aspects of speech production models
Models of speech production must contain specific elements to be viable. These include the elements from which speech is composed, listed below. The accepted models of speech production discussed in more detail below all incorporate these stages either explicitly or implicitly, and the ones that are now outdated or disputed have been criticized for overlooking one or more of the following stages.[16]
The attributes of accepted speech models are:
a) a conceptual stage where the speaker abstractly identifies what they wish to express.[16]
b) a syntactic stage where a frame is chosen that words will be placed into, this frame is usually sentence structure.[16]
c) a lexical stage where a search for a word occurs based on meaning. Once the word is selected and retrieved, information about it becomes available to the speaker involving phonology and morphology.[16]
d) a phonological stage where the abstract information is converted into a speech like form.[16]
e) a
Also, models must allow for forward planning mechanisms, a buffer, and a monitoring mechanism.
Following are a few of the influential models of speech production that account for or incorporate the previously mentioned stages and include information discovered as a result of speech error studies and other disfluency data,[17] such as tip-of-the-tongue research.
Model
The Utterance Generator Model (1971)
The Utterance Generator Model was proposed by Fromkin (1971).[18] It is composed of six stages and was an attempt to account for the previous findings of speech error research. The stages of the Utterance Generator Model were based on possible changes in representations of a particular utterance. The first stage is where a person generates the meaning they wish to convey. The second stage involves the message being translated onto a syntactic structure. Here, the message is given an outline.[19] The third stage proposed by Fromkin is where/when the message gains different stresses and intonations based on the meaning. The fourth stage Fromkin suggested is concerned with the selection of words from the lexicon. After the words have been selected in Stage 4, the message undergoes phonological specification.[20] The fifth stage applies rules of pronunciation and produces syllables that are to be outputted. The sixth and final stage of Fromkin's Utterance Generator Model is the coordination of the motor commands necessary for speech. Here, phonetic features of the message are sent to the relevant muscles of the vocal tract so that the intended message can be produced. Despite the ingenuity of Fromkin's model, researchers have criticized this interpretation of speech production. Although The Utterance Generator Model accounts for many nuances and data found by speech error studies, researchers decided it still had room to be improved.[21][22]
The Garrett model (1975)
A more recent (than Fromkin's) attempt to explain speech production was published by Garrett in 1975.[23] Garrett also created this model by compiling speech error data. There are many overlaps between this model and the Fromkin model from which it was based, but he added a few things to the Fromkin model that filled some of the gaps being pointed out by other researchers. The Garrett Fromkin models both distinguish between three levels—a conceptual level, and sentence level, and a motor level. These three levels are common to contemporary understanding of Speech Production.[24]
![](http://upload.wikimedia.org/wikipedia/commons/thumb/4/42/Moraghfigure4.png/240px-Moraghfigure4.png)
Dell's model (1994)
In 1994,
Levelt model (1999)
Levelt further refined the lexical network proposed by Dell. Through the use of speech error data, Levelt recreated the three levels in Dell's model. The conceptual stratum, the top and most abstract level, contains information a person has about ideas of particular concepts. The lowest and final level is the form stratum which, similarly to the Dell Model, contains syllabic information. From here, the information stored at the form stratum level is sent to the motor cortex where the vocal apparatus are coordinated to physically produce speech sounds.
Places of articulation
![](http://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Illu01_head_neck.jpg/166px-Illu01_head_neck.jpg)
The physical structure of the human nose, throat, and vocal cords allows for the productions of many unique sounds, these areas can be further broken down into
Articulation
Articulation, often associated with speech production, is how people physically produce speech sounds. For people who speak fluently, articulation is automatic and allows 15 speech sounds to be produced per second.[30]
An effective articulation of speech include the following elements – fluency, complexity, accuracy, and comprehensibility.[31]
- Fluency: Is the ability to communicate an intended message, or to affect the listener in the way that is intended by the speaker. While accurate use of language is a component in this ability, over-attention to accuracy may actually inhibit the development of fluency. Fluency involves constructing coherent utterances and stretches of speech, to respond and to speak without undue hesitation (limited use of fillers such as uh, er, eh, like, you know). It also involves the ability to use strategies such as simplification and gestures to aid communication. Fluency involves use of relevant information, appropriate vocabulary and syntax.
- Complexity: Speech where the message is communicated precisely. Ability to adjust the message or negotiate the control of conversation according to the responses of the listener, and use subordination and clausal forms appropriate per the roles and relationship between the speakers. It includes the use of sociolinguistic knowledge – the skills required to communicate effectively across cultures; the norms, the knowledge of what is appropriate to say in what situations and to whom.
- Accuracy: This refers to the use of proper and advanced grammar; subject-verb agreement; word order; and word form (excited/exciting), as well as appropriate word choice in spoken language. It is also the ability to self-correct during discourse, to clarify or modify spoken language for grammatical accuracy.
- Comprehensibility: This is the ability to be understood by others, it is related with the sound of the language. There are three components that influence one’s comprehensibility and they are: Pronunciation – saying the sounds of words correctly; Intonation – applying proper stress on words and syllables, using rising and falling pitch to indicate questions or statements, using voice to indicate emotion or emphasis, speaking with an appropriate rhythm; and Enunciation – speaking clearly at an appropriate pace, with effective articulation of words and phrases and appropriate volume.
Development
Before even producing a sound, infants imitate facial expressions and movements.[32] Around 7 months of age, infants start to experiment with communicative sounds by trying to coordinate producing sound with opening and closing their mouths.
Until the first year of life infants cannot produce coherent words, instead they produce a reoccurring
The first stage of meaningful speech does not occur until around the age of one. This stage is the holophrastic phase.[33] The holistic stage refers to when infant speech consists of one word at a time (i.e. papa).
The next stage is the telegraphic phase. In this stage infants can form short sentences (i.e., Daddy sit, or Mommy drink). This typically occurs between the ages of one and a half and two and a half years old. This stage is particularly noteworthy because of the explosive growth of their
When they reach two and a half years their speech production becomes increasingly complex, particularly in its semantic structure. With a more detailed semantic network the infant learns to express a wider range of meanings, helping the infant develop a complex conceptual system of lemmas.
Around the age of four or five the child lemmas have a wide range of diversity, this helps them select the right lemma needed to produce correct speech.
See also
- FOXP2
- KE family
- Neurocomputational speech processing
- Psycholinguistics
- Silent speech interface
- Speech perception
- Speech science
References
- ^ S2CID 7939521.
- S2CID 144105729. Archived from the original (PDF) on 2011-10-08. Retrieved 2009-12-25.)
{{cite journal}}
: CS1 maint: multiple names: authors list (link - S2CID 9567809.
- PMID 11296722.[permanent dead link]
- ISSN 0001-4966.
- ISBN 978-0-226-51463-5.
- ^ a b c d e Harley, T.A. (2011), Psycholinguistics. (Volume 1). SAGE Publications.
- ^ ISBN 978-0-262-62089-5.
- S2CID 26291682.
- ^ a b c d e Levelt, W. (1999). "The neurocognition of language", p.87 -117. Oxford Press
- S2CID 12662702.
- PMID 17189619.
- ^ S2CID 23476697.
- ISBN 978-0155041066.
- ^ ISBN 978-0155041066.
- ^ ISBN 978-0415258906.
- ISBN 978-0155041066.
- ISBN 978-0155041066.
- ISBN 978-0155041066.
- ^ Fromkin, Victoria (1998). Utterance Generator Model of Speech Production in Psycho-linguistics (2 ed.). Harcourt. p. 330.
- ISBN 978-0155041066.
- ^ Butterworth (1982). Psycho-linguistics. Harcourt College. p. 331.
- ISBN 978-0155041066.
- ^ Garrett; Fromkin, V.A.; Ratner, N.B (1998). The Garrett Model in Psycho-linguistics. Harcourt College. p. 331.
- ^ "Psycholinguistics/Models of Speech Production - Wikiversity". en.wikiversity.org. Retrieved 2015-11-16.
- )
- ^ S2CID 7939521.
- ISBN 978-1-4443-3526-2.
- ^ ISBN 9781608762132.
- ISBN 978-0415258906.
- ISBN 978-1-317-43299-9.
- ^ a b Redford, M. A. (2015). The handbook of speech production. Chichester, West Sussex; Malden, MA : John Wiley & Sons, Ltd, 2015.
- ^ a b Shaffer, D., Wood, E., & Willoughby, T. (2005). Developmental Psychology Childhood and Adolescence. (2nd Canadian Ed). Nelson.
- ^ Wolf, M. (2005). Proust and the squid:The story and science of the reading brain, New York, NY. Harper
Further reading
- Gow DW (June 2012). "The cortical organization of lexical knowledge: a dual lexicon model of spoken language processing". Brain Lang. 121 (3): 273–88. PMID 22498237.
- Hickok G (2012). "The cortical organization of speech processing: feedback control and predictive coding the context of a dual-stream model". J Commun Disord. 45 (6): 393–402. PMID 22766458.
- Hickok G, Houde J, Rong F (February 2011). "Sensorimotor integration in speech processing: computational basis and neural organization". Neuron. 69 (3): 407–22. PMID 21315253.
- Hickok G, S2CID 635860.
- PMID 23055482.
- Price CJ (August 2012). "A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading". NeuroImage. 62 (2): 816–47. PMID 22584224.
- Stout D, Chaminade T (January 2012). "Stone tools, language and the brain in human evolution". Philos. Trans. R. Soc. Lond. B Biol. Sci. 367 (1585): 75–87. PMID 22106428.
- Kroeger BJ, Stille C, Blouw P, Bekolay T, Stewart TC (November 2020) "Hierarchical sequencing and feedforward and feedback control mechanisms in speech production: A preliminary approach for modeling normal and disordered speech" Frontiers in Computational Neuroscience 14:99 doi=10.3389/fncom.2020.573554