Katholieke Universiteit Nijmegen Max Planck Institute for Psycholinguistics

LabPhon 7

Themes

Phonological encoding

Once speakers have decided what words to use, they will have to retrieve the corresponding phonological forms from their lexicons, build phonological representations of utterances, and pass this information on to the phonetic implementation module. In some cases, the phonological form of the word will have to be pulled together from different morphemes, as in the case of regular inflections in English, while for simplex words and irregular forms like "went" (GO-Past) no such assembly is needed. Where does the cut-off point lie here? For instance, are null-inflections assembled? Can speakers retrieve phonological forms "surgically", without interference from phonological forms of semantically related words? And what is the form of this phonological representation: are segments syllabified and syllables metrified, or is the full representation still to be assembled? Are the segments fully specified or will redundant features have to be supplied, and if so at what stage? And once a full surface representation has been constructed, complete with higher prosodic structure, what sort of chunks are passed on to the phonetic implementation? Are articulatory programmes built from scratch, starting from the feature, or can ready-made higher-level programmes be called on? And what happens when such ready-made programmes cross word-boundaries and disturb the word-based prosodic structures computed or retrieved earlier?

Phonological processing

The information available to hearers in the acoustic speech signal is very large indeed. When and how do listeners use what in order to decide what they hear? A considerable research effort has been devoted to the question of how listeners decide where words begin, that is, when it is useful to entertain the hypothesis that a certain part of the speech stream constitutes the beginning of a word. Listener strategies appear to be language-specific: only if one's language has word-based vowel harmony does it make sense to take sequential vowel qualities into consideration when deciding where to begin a lexical search. Likewise, the extent to which speakers may provide strings of perceived segments with syllable structure may depend on the role which syllable structure plays in the phonology of the language. How do speakers cope with context-dependent segmental insertions, deletions and assimilations? What role is played by the prosodic and tonal structure? What kind of representation is available in the lexicon with which perceived phonetic or phonological features can be matched during a search? And when and how are various types of non-phonological information brought to bear on the recognition process?

Field work and phonological theory

Phonological theory aims to account for the shapes of the sound systems of the world's languages. What segments, what metrical, tonal and prosodic structures do languages have, how do they combine linearly and hierarchically, and why are these segments and structures statistically distributed the way they are, within and across languages? A prerequisite to serving this aim is the availability of reliable data. Although it may go too far to say that every new language provides at least one aspect which overturns our unspoken conceptions of what can exist, it is fair to say that new data continue to throw unexpected light on current conceptions of phonological structure, and we are far from feeling confident that we know what we need to know. The current threat of extinction that looms over the stock of spoken languages makes the crucial role of field work all the more conspicuous. Greater mobility, together with the availability of high-quality recording and analysis techniques, have widened our notion of "field" in the sense that the field may be the laboratory at the other end of the corridor from our offices, and that the crucial element in this theme is theoretical advances built on "primary data".

Speech technology and phonological theory

More so than in speech synthesis, advances in speech recognition have been possible without any appreciable contribution of phonological theory. This is because researchers have worked with pattern recognition techniques which are independent of the medium within which those patterns exist. Indeed, success in optical recognition has likewise been possible in the absence of a theory of visual perception. In many ways, however, the success of current speech recognition systems is limited. Personalised dictation systems as well as systems with unlimited numbers of speakers and limited sets of expressions to be recognised fall far short of the performance achieved by humans, who recognise unlimited sets of expressions spoken by unlimited sets of speakers. A breakthrough can perhaps be forced by a consideration of the way humans identify linguistic units in a situation where their acoustic properties are highly variable, due to interactions with speaking style, speaker, and nearby other units. This would allow the current knowledge-shy recognition strategy to be replaced with one that makes non-trivial use of phonological representations. Possibly, too, progress in speech synthesis can be based on a careful implementation of phonological accounts, particularly in the area of prosody.

Phonology-phonetics interface

Phonological representations are conglomerates of discrete features and structures selected from a finite set. This is how we assume humans solve the onerous task of remembering the sound forms of the words they know. What speakers produce and hearers receive, however, are continuously varying acoustic patterns, whose shapes are determined by the ergonomics of vocal sound production and perception. It is evident that the nature of the set of features and structures is historically indebted to the ergonomics of speaking and perceiving, and that many changes that occur over time are at least in part determined by these ergonomics. But how direct are these relations? Does a speaker's phonology change with every change in the phonetics, and if not, how much leverage are speakers allowed in the phonetic implementation? Or is the notion that phonological representations and phonetic implementation are separate modules perhaps misguided? How phonetic are phonological representations, and to what extent is phonetic implementation aiding and abetting in the signalling of phonological contrasts? Do features refer to articulatory states, to articulatory gestures, or to auditory effects, or perhaps to all of these at the same time?