Ultra-performance speech recognition technology for smartphones

This seems like a good idea when speech recognition technology is applied to the computer desktop. However, for most people, speech recognition is not a substitute for keyboards and mice. Now, voice technology is being used in a whole new environment: mobile phones. The application of speech recognition technology in mobile phones will further promote the development and application of this technology in a new direction. This is the direction that speech recognition technology has never been involved in in desktop applications.

IBM will commemorate its 100th anniversary this year. In the early 1960s, IBM created an experimental speech recognition system called "Shoebox." This system solves the problem of spoken language algorithms. Speech recognition technology first appeared as an early technology in the 1950s, mainly due to curiosity. In the early 1960s, IBM's "Shoebox" device recognized 16 spoken words and was able to answer simple mathematical questions such as "3 + 4 =?".

DragonDictate, launched by Dragon Systems for DOS computers in the early 1980s, is probably the first speech recognition application. This app can only recognize a single word, only one word at a time. Over time, this app has evolved into a product called "Dragon NaturallySpeaking" (currently the 11th version, owned by Nuance Communications). This application is capable of translating text read at normal session speech and speed.

There are two constraints to the use of speech recognition technology in desktop computers. First, in order for this application to work with greater accuracy, the application must be trained to recognize the user's speech characteristics. Third-party products such as local voice-converted text technology and Dragon NaturallySpeaking in Windows Vista and Windows 7 still require a user training period.

The second constraint is the popularity of the keyboard. Most people are used to keyboard typing rather than speaking, so voice control faces the same application barriers as Dvorak keyboard layouts. Why do you need to learn to use a Dvorak keyboard when a simple old-fashioned QWERTY keyboard is available and working well?

The Microsoft TellMe team is responsible for developing speech recognition technology for the multimedia environment. Abhi Rele, senior product manager at TellMe team, pointed out that in a desktop environment, users have convenient human-machine communication modes, such as keyboards and mice. Therefore, the use of voice is mainly for voice enthusiasts.

The wider application of voice control computing requires two things: a better and more convenient application and a place where the main use of voice. Mobile phones are such a place that has been growing for a long time.

Matt Revis, vice president of product management and marketing at Nuance, explains that the difference between a desktop computer and a mobile environment is this: a desktop computer is a fixed environment, and the focus is entirely on the use of desktop computers. Therefore, the voice technology of the desktop computer mainly performs the following tasks: supporting office applications, web browsing, communication, and the like. On the mobile side, voice is used more to support a variety of lifestyles: professionals on the move, fun activities outdoors, hands-free phones, and more.

Gartner analyst Tuong Nguyen agrees with the idea that speech makes more sense in a mobile environment. He said that from the point of view of use, the voice recognition function of handheld devices is more valuable. It adds user-friendly, convenient input.

Nguyen added that if you don't use voice technology to say a simple statement, but flip through a lot of menus or try to type on a small display keyboard, the value of speech recognition becomes apparent. As touch screen devices (without physical keyboards) grow in use, speech recognition technology will be used to enhance data input and output. Speech recognition also supports hands-free requirements or legal requirements.

In terms of mobile devices

Because mobile devices typically only support a portion of the storage and processing functions of a desktop computer, voice processing takes some time to appear in the phone in a basic form.

The Springer Handbook for Voice Processing explains the situation of mobile phones in the early 2000s. Although there were some limitations at the time, the phone was able to recognize the digit-by-digit dial-up speech after programming, and to some extent recognize the person's name. The main problem is memory, so most phones can only recognize 10 numbers or names at a time. However, another problem pointed out by these authors is that this feature is used less often, probably because mobile phone manufacturers are poorly marketing in this area.

With the increased memory and enhanced processing power of mobile phones, the recognition capabilities of ordinary mobile phones have also increased. Samsung’s $99 SCH-p-207 phone, released in 2005, adds voice-to-text dictation and voice dialing. With hundreds of megabytes of memory and a few kilobytes of storage capacity, this generation of smartphones is rarely limited.

Another key advance is the speed of the network. The faster wireless network wave has lifted many big ships, including the latest generation of voice processing technology. Faster networks can migrate voice processing tasks from the network to remote servers.

Amir Mane, Google Voice Search Product Manager, explains how faster networks can help Google Voice applications. He said that because all the heavy processing tasks are handled by Google servers on the network, we have reduced the computing power of handheld devices.

Current application

The current state of mobile phone speech recognition technology is not limited to voice dialing. Voice-enabled features actually include voice dialing. This is the first speech recognition feature that appears on mobile phones. Currently, even many low-end phones have this feature, although this feature handles some of the less common names in the phone book.

Gartner analyst Nguyen pointed out that the newer generation of voice features is more open. Instead of programming specific voice commands that perform certain functions, the application can recognize the speech and perform the appropriate actions. Higher-end, more powerful devices make these applications more viable. In other words, instead of being able to dial a phone number using the phrase "call 888-555-1212", the user can also say "call mom" or "call my mom."

Google Voice Search has fewer restrictions than previous voice recognition technologies because all the heavy tasks are done by web servers. This makes voice-driven applications such as Google Voice Search more feasible. For example, if you say "create war movie time", you will see a page listing the area number or location. This app not only recognizes the meaning of this phrase, but also provides information about your phone (your current location) and website (time of the show).

The app is also very familiar with English and automatically distinguishes between vocabulary differences without training. If I say "Motley Crue", the app can even use the band's unique spelling in search terms, even though it misses the diacritics. Search for "Motley's Crew" and you'll get a comedy.

This means that the limitations of Google's speech recognition clearly indicate that you will be further out of mainstream English. The name of the foreigner is not helpful. Another problem with speech recognition applications is the noise of the environment. Mobile users are often more affected by environmental noise than desktop users. According to Nuance's Revis, the accuracy of speech recognition is a problem in noisy outdoor environments.

Since the launch of Samsung's mobile phone in 2005, the dictation function has made great progress. The Dragon Dictation feature of the iPhone powered by Dragon NaturallySpeaking allows users to dictate everything from memos, emails to Twitter updates. The Dragon software for email provides similar functionality for BlackBerry devices.

For Android phones, Nuance offers FlexT9 software. This software combines the Dragon dictation function with three types of touch screen input methods. There is also a Handcent SMS app. This app integrates Android native speech recognition technology to help you send text messages with your voice.

Translations between texts have been available for many years (eg through the well-known Babel Fish website). The simultaneous translation feature is not available yet, but the software will be available soon. For example, the Jibbigo software for the iPhone translates words, phrases, and reasonable simple sentences, allowing both parties to alternately speak.

Yuhai company develop and produce of various discs sizes, electrode and metallisation configurations. Disc elements is fabricated from various piezoelectric material formulations to respond to the ever growing challenges on new applications.


Features

Sizes from 3mm up to 200mm

Thickness from 0.1mm up to 25mm

Electrode design on request

Choice of metallisation (Silver, Nickel, Gold and others on request)

Thickness frequency tuning available on request

Wide choice of PZT formulations 



Applications include

  Ÿ   Distance sensors

  Ÿ   Liquid and Gas flow sensors

  Ÿ   Micro-pump actuators

  Ÿ   Liquid level sensors

  Ÿ   Ultrasonic Transducer


Electrode configurations

Full range electrode
Electrode with boder
Wrap around electrode (Square, circle or other on request)
Annular wrap around electrode 



PZT Piezoelectric Discs & Rods

Piezoelectric Disc,Pzt Disc,Pzt Piezoelectric Discs,Piezo Electric Disc

Zibo Yuhai Electronic Ceramic Co., Ltd. , http://www.yhpiezo.com