Apple plans to beef up iPhone’s text-to-speech abilities

dsellers

14 years ago

An Apple patent (number 20110111805) for a synthesized audio message over communication links has appeared at the US Patent & Trademark Office. It shows that Apple is planning on beefing up the text-to-speech and speech-to-text capabilities of the iPhone.

Per the patent, a communication device establishes an audio connection with a far-end user via a communication network. The communication device receives text input from a near-end user, and converts the text input into speech signals. The speech signals are transmitted to the far-end user using the established audio connection while muting audio input to its microphone. Other embodiments are also described and claimed. The inventors are Baptiste P. Paquier, Aram M. Lindahl and Phillip G. Tamchina.

Here’s Apple’s background and summary of the invention: “A user of a communication device (e.g., a telephone) may sometimes have to make or answer a phone call in a noisy environment. Noise can interfere with a phone conversation to a degree that the conversation is no longer intelligible to either conversing party. A user in the noisy environment may try to scream into the phone over the noise, but the screaming and the noise may render the voice signal unintelligible at the other end.

“For example, a user may be talking on the phone in a busy restaurant. The user may not be able to shout loud enough into the phone to cover the noise in the restaurant. The user may not even be able to hear when the other end is talking. The noise may render the conversation unintelligible and may lead to a termination of the telephone conversation.

“In another scenario, it may be inconvenient for a user to talk on a phone. For example, the user may be in a meeting and does not want to draw attention to himself by speaking into the phone. The user may try to whisper into the phone, but the whispering may render the conversation unintelligible. The user may choose to send a text message to the other party, but the other party may be on a landline where texting is unavailable, or may not have a texting plan.

“It can be frustrating to conduct a telephone conversation when the environment is noisy or the circumstance is inappropriate for a user to speak.

“An embodiment of the invention is directed to a communication device, which establishes an audio connection with a far-end user via a communication network. The communication device receives text input from a near-end user, and converts the text input into speech signals. The speech signals are transmitted to the far-end user using the established audio connection while muting audio input to its audio receiving component.

“In one embodiment, the communication device detects the noise level at the near end. When the noise level is above a threshold, the communication device can automatically activate or prompt the near-end user to activate text-to-speech conversion at any point of a communication such as a phone call. Alternatively, the communication device may playback a pre-recorded message to inform the far-end user of the near-end user’s inability to speak due to the excessive noise at the near end.
“In another embodiment, the near-end user can activate text-to-speech conversion whenever necessary regardless of the detected noise level. The near-end user can enter a text message, which is converted into speech signals for transmission via the established audio connection to the far-end user.

“In yet another embodiment, the communication device can also perform speech-to-text conversion to convert the far-end user’s speech into text for display on the communication device. This feature can be used when the far-end communication device cannot, or is not enabled to, send or receive text messages. The speech-to-text conversion and the text-to-speech conversion can be activated at the same time, or can be activated independent of each other. The far-end communication device communicates with the near-end communication device in audio signals, regardless of whether the speech-to-text conversion or the text-to-speech conversion is activated.

“The communication device may be configured or programmed by its user, to support one or more of the above-described features.”

— Dennis Sellers

Share this: