CS474: Human Computer Interaction - Modalities - Voice Prompts
Activity Goals
The goals of this activity are:- To identify alternative modalities for human-computer interaction
- To write a program that uses voice prompts for engagement
- To identify signifiers and affordances for a given application and modality
The Activity
Directions
Consider the activity models and answer the questions provided. First reflect on these questions on your own briefly, before discussing and comparing your thoughts with your group. Appoint one member of your group to discuss your findings with the class, and the rest of the group should help that member prepare their response. Answer each question individually from the activity, and compare with your group to prepare for our whole-class discussion. After class, think about the questions in the reflective prompt and respond to those individually in your notebook. Report out on areas of disagreement or items for which you and your group identified alternative approaches. Write down and report out questions you encountered along the way for group discussion.Model 1: Voice Prompts
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | # on linux: sudo apt install portaudio19-dev libespeak-dev libespeak1 # on mac: brew install portaudio # on linux/windows: pip3 install git+https://github.com/BillJr99/pyttsx3.git # on mac: pip3 install py3-tts (pip3 install pyobjc followed by pyttsx3 might also work) # pip3 install pyaudio speechrecognition disutils setuptools # alternatively: pip3 install pipwin pypiwin32 && python -m pipwin install pyaudio # Install Visual C++ Tools on Windows https://visualstudio.microsoft.com/visual-cpp-build-tools/ import speech_recognition as sr import pyttsx3 import sys import time tts = pyttsx3.init() # pass 'dummy' to this constructor if this call fails due to a lack of voice drivers (but will disable speech) def speak(tts, text): tts.say(text) tts.runAndWait() def main(): # get audio from the microphone listener = sr.Recognizer() with sr.Microphone() as source: listener.adjust_for_ambient_noise(source) # used to detect silence to stop listening after a phrase is spoken while True : print ( "Listening." ) speak(tts, "listening" ) # how do we prevent this from being spoken every time an exception is thrown? time.sleep( 1 ) # used to prevent hearing any spoken text; what else could we do? user_input = None sys.stdout.write( ">" ) #record audio listener.pause_threshold = 0.5 # how long, in seconds, to observe silence before processing what was heard audio = listener.listen(source, timeout = 5 ) #, timeout = N throws an OSError after N seconds if nothing is heard. can also call listen_in_background(source, callback) and specify a function callback that accepts the recognizer and the audio when data is heard via a thread try : #convert audio to text #user_input = listener.recognize_sphinx(audio) #requires PocketSphinx installation user_input = listener.recognize_google(audio, show_all = False ) # set show_all to True to get a dictionary of all possible translations print (user_input) speak(tts, user_input) except sr.UnknownValueError: print ( "Could not understand audio" ) except sr.RequestError as e: print ( "Could not request results; {0}" . format (e)) except OSError: print ( "No speech detected" ) sys.stdout.write( "\n" ) if __name__ = = "__main__" : main() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | # on linux: sudo apt install portaudio19-dev libespeak-dev libespeak1 # on mac: brew install portaudio # on linux/windows: pip3 install git+https://github.com/BillJr99/pyttsx3.git # on mac: pip3 install py3-tts (pip3 install pyobjc followed by pyttsx3 might also work) # pip3 install pyaudio speechrecognition disutils setuptools # alternatively: pip3 install pipwin pypiwin32 && python -m pipwin install pyaudio # Install Visual C++ Tools on Windows https://visualstudio.microsoft.com/visual-cpp-build-tools/ import speech_recognition as sr import pyttsx3 import sys import time tts = pyttsx3.init() # pass 'dummy' to this constructor if this call fails due to a lack of voice drivers (but will disable speech) def process_speech(listener, audio): try : #convert audio to text #user_input = listener.recognize_sphinx(audio) #requires PocketSphinx installation user_input = listener.recognize_google(audio, show_all = False ) # set show_all to True to get a dictionary of all possible translations print (user_input) speak(tts, user_input) sys.stdout.write( "\n" ) sys.stdout.write( ">" ) except sr.UnknownValueError: print ( "Could not understand audio" ) except sr.RequestError as e: print ( "Could not request results; {0}" . format (e)) except OSError: print ( "No speech detected" ) def speak(tts, text): tts.say(text) tts.runAndWait() def main(): # get audio from the microphone listener = sr.Recognizer() with sr.Microphone() as source: listener.adjust_for_ambient_noise(source) # used to detect silence to stop listening after a phrase is spoken speak(tts, "listening" ) print ( "Listening." ) time.sleep( 1 ) # used to prevent hearing any spoken text; what else could we do? sys.stdout.write( "\n" ) sys.stdout.write( ">" ) #record audio stop_listening = listener.listen_in_background(source, process_speech) # sleep and stop time.sleep( 60 ) stop_listening(wait_for_stop = False ) if __name__ = = "__main__" : main() |
Questions
- What is the difference between the two versions of this program?
- How might you adapt this code for use in a text-based program you've written in the past?
- What challenges might you anticipate when using a voice approach, particularly with respect to accessibility, and how might you address them?
- What other modalities can you think of?
- How might you indicate to a user that it is time to input a certain value, and indicate what kinds of values are permissible?
- How do you enable the user to to provide input and to understand output at the right time?
Adapted from Dr. Alvin Grissom’s 2020 HCI course