General Practice, Solo & Small Firm DivisionMagazine
Volume 17, Number 4
SPEECH RECOGNITION TECHNOLOGY
BY Joseph L. Kashi and Daryl Teshima
Because most lawyers can talk faster than they can type, speech recognition software has long topped most attorneys' technology wish list. Past programs, unfortunately, have been disappointing. The first generation of these programs used discrete speech, which required users to pause between words. These products also required expensive, cutting-edge hardware and software, not to mention countless hours to learn to be a good computer speech dictator.
In the past couple of years, however, improvements to speech recognition software have occurred faster than the speed of sound. New versions-many customized for lawyers-appear every few months in a myriad of flavors. They use continuous speech (no more pauses), and basic versions are priced under $200. This improved software, coupled with the significant drop in computer hardware prices, has made speech recognition programs affordable to law firms big and small.
But is this next generation of speech recognition products really ready for prime-time law firm usage? To answer this question, we tested the latest versions of Dragon Systems' NaturallySpeaking Deluxe (Legal Suite 4.0) and IBM's ViaVoice Pro (Millennium Edition).Hardware Requirements
Both programs were tested on quite fast desktop computers packed with memory, well over the minimum system requirements recommended by both IBM and Dragon Systems. (IBM ViaVoice Pro's minimum system requirements are a Windows-based Pentium 233 MHz computer with 48 MB of RAM. Dragon's requirements are a Windows-based Pentium MMX (or equivalent) 300 MHz with 128 MB of RAM.) The reason? As is true with most programs, the minimum requirements will make these programs run too slowly. They are also real resource hogs, and will consume the computing capacity of even the fastest, most powerful computer available today. A faster processor and, more important, at least 128 MB of RAM can make a significant difference in both programs' performance.
Between the two, Dragon's computer system requirements were higher. Accordingly, this program was tested on a state-of-the-art 650 MHz AMD K-7 Athlon processor that contained a very fast IBM Ultra2-Wide SCSI hard disk running Windows NT. At first, the system included 128 MB of memory, but the system ran slow on this hardware configuration. An examination of the system showed that NaturallySpeaking 4.0 was using as much as 160 MB of memory when recognizing or correcting text. That memory requirement, while using only Dragon within WordPerfect 9, exceeded the amount (already a generous amount by law firm standards) then installed on my computer. Increasing memory to 256 MB seemed to help somewhat, although we found that Dragon still ran rather slowly.
In fact, Windows NT's diagnostic monitors showed that during speech recognition, all of the computer's resources were consumed for substantial blocks of time, indicating that the speech recognition process saturated this very fast system. Because Windows NT could have been partly to blame for the high system resource consumption, NaturallySpeaking was also tested on a similarly configured Windows 98 machine with similar results.
ViaVoice Pro was tested on a more modest system (Pentium III 600 MHz processor, 256 MB of RAM running Windows 98 Second Edition) and seemed to consume fewer resources than Dragon NaturallySpeaking. Nonetheless, the ViaVoice system did slow to a crawl during certain operations, such as recognizing speech and correcting text.
Each product includes an inexpensive noise-reducing microphone headset. NaturallySpeaking is bundled with VXI's Parrot Translator microphone, which includes an intermediate battery-powered amplifier. That amplifier not only helps match different sound cards to Dragon's software, but also boosts the critical signal-to-noise ratio. If the ratio of microphone input to system noise is too low, then your voice input is garbled by too much system "static" and is that much harder to recognize correctly. However, if the input is too loud, then the system will overload and also produce accuracy-degrading sound distortion. To help get the levels just right, you should run Dragon's Audio Setup Wizard each time you begin dictating to set the correct microphone volume for that session and to check for acceptable sound input quality.
ViaVoice also contains an Audio Setup Wizard that maximizes sound levels and quality. This ViaVoice Wizard measures voice levels, as well as the background noises in your office environment. The Andrea headset bundled with ViaVoice does not contain an amplifier, but that omission did not seem to hinder performance directly.
Last but not least, sound card quality is critical. Older, low-end sound cards (especially those found in laptops) will result in poor recognition results, even after you've trained your system and switched to an amplified headset. Before purchasing a speech recognition program, visit the vendor's website to determine if the sound card you have is compatible with the application. If you're going to spend the time to train and use one of these systems, then the extra $100 to $200 for a topflight SoundBlaster (or compatible) sound card is well worth the money.Installation and Training
Before installing either program (installation is straightforward for both), there are several important guidelines to keep in mind. Although users no longer need to pause between each word, they still need to enunciate clearly. The microphone must be properly positioned at all times, or accuracy will drop. More important, users must spend time training before they can get consistently reliable results.
ViaVoice requires that users train the program by reading up to six different passages that take anywhere from 10 to 20 minutes to read. Although only one passage is required, it is probably time well spent to work through as many of the passages as possible. During training, after users speak each sentence accurately, they are prompted with another sentence. ViaVoice then customizes itself with the user's individual speech and dictation style. Even if all passages are read, initial training can be completed in a couple of hours.
NaturallySpeaking likewise needs only minimal initial training to customize a user's speech files. The minimum process takes about 20 minutes, but for better training and ultimate accuracy, users can read several additional text passages.Dictation Basics
Both programs come with word processors similar to Microsoft's WordPad, the basic word processor included in Windows. Neither of these stripped-down word processors rivals the feature set of either Word or WordPerfect, however, and their main purpose is to convert speech into text. After dictating text in the basic word processor, users cut (or copy) and paste the text into their regular word processors. With NaturallySpeaking, users can accomplish this task with voice commands. With ViaVoice, users press a Transfer button on the tool bar or save the text in Word, Rich Text Format (RTF), or text format.
The two programs also have the ability to dictate directly into Word 97/2000 and WordPerfect 8/9, though ViaVoice's WordPerfect integration is limited. ViaVoice users can dictate directly into WordPerfect (as well as other Windows applications), but they can't control menus and functions by voice as they can in Word. NaturallySpeaking lets users dictate and voice-control both Word and WordPerfect.
To begin dictating in either program, essentially you click a microphone icon and start speaking. What you say is what you get, with just a few exceptions. First, all punctuation must be added by name where necessary (e.g., "Four score and seven years ago COMMA…"). You create a new line by saying "New Paragraph" or "New Line." Second, for numbers, dates, and other specially formatted text, you can set options that will tell the program how to respond. For example, you can set a rule that will spell out all numbers under 10. In ViaVoice, there is also a unique dictation command called "Spell Out," which lets you letter-by-letter spell a particular name, abbreviation, or term. Third, formatting text requires familiarity with voice commands that turn attributes on and off. Although we found it difficult to voice-issue advanced formatting commands such as footnotes and font size, basic formatting commands were thankfully intuitive (e.g., "Bold This" or "Underline That"). In addition, both programs automatically capitalize the first letter of each sentence.
These basic formatting commands are OK if you dictate only short letters or passages. Unfortunately, most lawyers draft documents that contain Latin terms, court citations, and statutory references-all of which must conform to standard Bluebook citation formats. Due to these requirements, it pays to purchase the legal version of a particular program. Dragon Systems' Legal Suite 4.0 (street price $995) and IBM's ViaVoice Legal Dictionary ($149 add-in to ViaVoice Pro) incorporate a specialized legal vocabulary that recognizes court names, case reporters, and other common legal terms. These legal-specific modules won't let you dictate a perfect Bluebook Supreme Court brief out of the box, but they will give you better results than the retail versions of these products.
Finally, both programs also let you control and navigate the word processor program while dictating. This is a good idea in theory, but we found this feature difficult to master in practice. For example, ViaVoice had difficulty distinguishing between dictated text and program commands. ViaVoice goes into command mode if a user pauses during dictation. If the next utterance is not recognized as a command, the program is supposed to switch back to dictation mode and process the speech as text. Unfortunately, this feature never worked consistently in testing, resulting in command errors whenever we had the slightest pause during dictation. The process became so cumbersome that we were forced to configure ViaVoice to go into command mode only when the command was preceded by the word "Computer." Unless you can dictate flawless text without pausing, we suggest you do the same.Accuracy and Editing
Although speech recognition programs have come a long way, we found that dictation still resulted in mistakes, even after the systems had been highly trained. Inaccuracies tended to be words whose sounds are quite similar. We initially tested each program by dictating Lincoln's Gettysburg Address. After each round of dictation, we corrected (and retrained) any recognition errors, and dictated the passage again. We did this until we obtained an acceptable dictated version (see "The Speech Recognition Address" on page 36).
Surprisingly, IBM's ViaVoice produced significantly better results than NaturallySpeaking, though there were still a few mistakes even after the third recitation. Nonetheless, it took ViaVoice only three tries to reach acceptable results. This was a far cry from the results we had with NaturallySpeaking, which contained more mistakes than ViaVoice even after the seventh try. As may be apparent, Dragon's product had problems recognizing similar-sounding words like "war" and "were"; "our" and "are"; and "for" and "four."
Even more significant are the initial dictation results, because most attorneys are unlikely to repeatedly dictate the same letter or brief. In NaturallySpeaking, the first attempt was practically unusable; the mistakes outnumbered the correctly recognized words. IBM's ViaVoice produced a better first draft, but there were still more than ten mistakes that needed to be corrected. This percentage is quite high when you consider that the Gettysburg Address contains no legal citations, proper names, or cryptic Latin phrases, and has only 264 words.
These results were surprising considering that most reviews of these products have reported accuracy rates of over 95 percent. We did not achieve an accuracy level that came close to that figure. Your mileage (and speech enunciation) may vary, but if this Gettysburg Address test is a true indication, most lawyers will find the end product more trouble than it's worth.
It should be noted that these tests were conducted after initial voice training. And indeed, additional testing revealed that the accuracy of both programs did improve over time. One reason is the customizing of each program's vocabulary. Both products feature large dictionaries and the ability to add additional words and phrases based on a user's own particular needs and usage. ViaVoice starts with an active dictionary of 64,000 common words. Users can add up to 2 million words or phrases of their own. NaturallySpeaking features an active dictionary of 160,000 frequently used words plus a backup dictionary of 250,000 words. The program monitors word usage and moves words to and from the active dictionary. Users can add new words not found in either dictionary.
Customizing each program's vocabulary does take some effort. One way to customize is to have the program analyze documents that you have already created. This process scans the indicated text, adding unknown words to the vocabulary and analyzing your usage frequency and speech patterns. Incorporating this analysis into your speech file can greatly enhance word recognition.
The more common method is to correct recognition errors once dictation is completed. Correcting text is not the same as editing text, as editing does not necessarily mean that the program failed to recognize speech. When you correct an error, a speech recognition program can learn from the mistake. Accordingly, a key requirement for making the speech recognition more accurate and effective is to use the program's correction feature when correcting recognition errors. To make a correction in either program, users highlight the text containing the error and select "Correct Error."
Unfortunately, we found this process to be cumbersome, especially in ViaVoice. Although it is possible to correct errors using your voice, it becomes exceedingly frustrating to navigate the cursor and select the incorrect text. After much trial and error, we found that correcting mistakes in ViaVoice required a combination of mouse, keyboard, and voice. The time it took to correct errors erased any productivity gains we might have achieved by using speech recognition.
NaturallySpeaking's correction module was much easier to use, and often correction could be done without resorting to the keyboard or mouse. When you make a correction, NaturallySpeaking prompts you to say the correct word and the incorrect word, which helps the program learn how you pronounce both words. However, the greater number of errors still added substantial time to the document creation process. For most touch-typists, it would have been faster to type.Final Say
ViaVoice and NaturallySpeaking demonstrate that science fiction continues to become science reality. With computer processing power and the price of these applications getting cheaper every day, these programs give lawyers-especially two-fingered typists and people with disabilities-an alternative way to create documents.
Despite the incredible technology involved in turning speech into text, this technology still isn't ready for every lawyer's desktop. To dictate a usable first draft, there is a significant time investment as the lawyer not only learns how to use these programs, but also trains them to recognize their particular speech patterns, as well as the legal jargon used in their practice. Realize that with either product, 100 percent accuracy is still not feasible. Most lawyers will find themselves more productive if they type documents themselves, or have their assistant transcribe from a tape.
Joseph L. Kashi is a solo practitioner in Soldotna, Alaska, who also owns and operates a small personal computer store and legal technology consulting service. Daryl Teshima is a practice systems attorney at Gibson, Dunn & Crutcher, where he is responsible for implementing technology that lawyers and other legal professionals use.