Columbia neuroengineers have created a system that is capable of translating thought into recognizable speech via monitoring brain activity using technology that can reconstruct words a person hears with clarity. This scientific first harnesses the power of speech synthesizers and AI in a breakthrough which could lead to new ways for computers to communicate directly with the brain, and lays paths for helping those who can’t speak such as patients recovering from stroke and ALS patients regain ability to communicate with the world around them.
As people talk or think about speaking the brain gives off telltale signs with patterns of activity, distinct patterns of signal also emerge when listening or imagining listening. Scientists working to record and decode neural patterns foresee a future in which thoughts need not remain hidden in the brain, rather they can be translated into verbal speech at will.
Accomplishing this task has been challenging, early efforts to decode brain signals focused on simple computer models analyzing spectrograms, however these attempts failed to produce anything resembling intelligible speech. This study used a vocoder computer algorithm which can synthesize speech after being trained using recordings of people speaking; this is the same technology that gives Echo and Siri ability to give verbal responses to questions.
Dr. Mesgarani teamed up with Ashesh Dinesh Mehta, MD, PhD, in order to train vocoder to interpret brain activity, who treats epilepsy patients of which some must undergo regular surgeries; some of those patients already undergoing surgery were asked to listen to sentences spoken by different people while patterns of brain activity were measured; these neural patterns were used to train the vocoder. Then they were asked to listen to speakers reciting numbers while recording brain signals which were ran through the vocoder; sounds produced by the vocoder in response to the signals was analysed and cleaned up by AI neural networks that mimic structures of neurons in a biological brain.
End result of the artificial intelligence deep learning was a robotic sounding voice reciting a sequence of numbers. Accuracy of recording was tested by tasking individuals to listen to the recordings and report what they heard: 75% of the time people were able to understand and repeat the sounds they heard. Intelligibility improvements were evident when comparing these recordings to previously recorded spectrogram based attempts.
The system is to be further tested using more complicated words and sentences in future studies, which hope to run the same tests on brain signals emitted when a person speaks or imagines talking. The end goal is that their system could be part of an implant that translates the wearers thoughts directly into words such as a patient who can’t speak would merely think I need a glass of water and the brain signals would be in turn synthesized into verbal speech. This technology would be a welcomed game changer indeed, providing anyone who has lost ability to speak through disease or injury a renewed chance to verbally connect with the world around them.