Isolated speech

What are the network dynamics that underpin sound-to-meaning mapping in the human brain?

As the speech unfolds, before we even recognise the word, we activate phonological and semantic representations of all words that match the speech input (i.e. cohort). This multiple activation results in transient competition that is resolved quickly as more of the speech is heard. Models of speech recognition (e.g. Cohort modelTRACE) suggest that the semantic representation of the target word initially is weakly activated. However as we accumulate more speech input, we can identify the word, and thereafter this semantic activation is proposed to be boosted. Neural dynamics of this sound-to-meaning mapping remains unclear. 

In this E/MEG experiment we explore this mapping by presenting participants with single spoken names of concrete objects (e.g. hammer, lion). 

We used a novel MVPA method called the spatiotemporal searchlight similarity analysis (ssRSA) to develop theoretical models that capture three key cognitive processes we assume to take place: phonological competition, semantic competition, access to lexical semantics. ssRSA allows to validate theoretical models of representational geometry against the representational geometry of brain activity patterns.

We tested models against the brain activity patterns, extracted from a bilateral language mask. ​ssRSA involves constructing data and model representational dissimilarity matrices (RDMs) that represent trial-wise similarities. The model RDMs were correlated with data RDMs across time and space.

A. Language mask used to extract data.

B. Cartoon showing how data RDMs and model RDMs are correlated for each searchlight. Note that data RDMs change at every time point and model RDM is static.

Results of the ssRSA showing suprathreshold clusters at p=0.05. 

For each model tested ssRSA revealed a separate network of regions. We found early parallel effects of competition. Starting from 300 ms before word’s recognition, phonological competition (PhonComp) recruited a network consisted of LSMG, LSTG followed by LMTG and LIFG.


Simultaneously we found effects of semantic competition (SemComp) in LMTG, LAG and LIFG. Only after the word’s recognition point model capturing unique semantic access (UniSem) showed effects. This model recruited bilateral AG, MTG and RIFG.

We found that LIFG p.orb. was activated only for SemComp model, which might be undertaking controlled semantic retrieval of semantic representations. Whereas the activation of the phonological representations were activated by the speech input directly. Competition resolution was carried out by LIFG p.tri. which we found was recruited for both models before word's recognition.

We found that phonological and semantic competition occur early in parallel, confirming that the lexical representations are active before the word is uniquely recognised. There were no effects of the UniSem model before the UP showing that UP is a critical point in word recognition, only after which target word's semantic representation is boosted.