Web   ·   Wiki   ·   Activities   ·   Blog   ·   Lists   ·   Chat   ·   Meeting   ·   Bugs   ·   Git   ·   Translate   ·   Archive   ·   People   ·   Donate
summaryrefslogtreecommitdiffstats
path: root/studio/static/doc/myosa/ch018_adding-text-to-speech.xhtml
blob: b56bbb71a587c228f003c5441ed03093f1e45568 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><body><h1>Adding Text To Speech
</h1>
<h2>Introduction
</h2>
<p>Certainly one of the most popular Activities available is <strong>Speak</strong>, which takes the words you type in and speaks them out loud, at the same time displaying a cartoon face that seems to be speaking the words.&#160; You might be surprised to learn how little of the code in that Activity is used to get the words spoken.&#160;&#160; If your Activity could benefit from having words spoken out loud (the possibilities for educational Activities and games are definitely there) this chapter will teach you how to make it happen.
</p>
<h2><img alt="SpeakActivity.png" src="static/ActivitiesGuideSugar-SpeakActivity-en.png" height="450" width="600"/></h2>
<h2>We Have Ways To Make You Talk
</h2>
<p>A couple of ways, actually, and neither one is that painful.&#160; They are:
</p>
<ul><li>Running the <strong>espeak</strong> program directly</li>
  <li>Using the <strong>gstreamer espeak plugin</strong></li>
</ul><p>Both approaches have their advantages.&#160; The first one is the one used by Speak.&#160; (Technically, Speak uses the gstreamer plugin if it is available, and otherwise executes espeak directly.&#160; For what Speak is doing using the gstreamer plugin isn't really needed).&#160; Executing espeak is definitely the simplest method, and may be suitable for your own Activity.&#160; Its big advantage is that you do not need to have the gstreamer plugin installed.&#160; If your Activity needs to run on something other than the latest version of Sugar this will be something to consider.
  <br/></p>
<p>The gstreamer plugin is what is used by <strong>Read Etexts</strong> to do text to speech with highlighting.&#160; For this application we needed to be able to do things that are not possible by just running <strong>espeak</strong>.&#160; For example:
</p>
<ul><li>We needed to be able to pause and resume speech, because the Activity needs to speak a whole page worth of text, not just simple phrases.</li>
  <li>We needed to highlight the words being spoken as they are spoken.</li>
</ul><p>You might think that you could achieve these objectives by running espeak on one word at a time.&#160; If you do, don't feel bad because I thought that too.&#160; On a fast computer it sounds really awful, like HAL 9000 developing a stutter towards the end of being deactivated.&#160; On the XO no sounds came out at all.
</p>
<p> Originally Read Etexts used <strong>speech-dispatcher</strong> to do what the gstreamer plugin does.&#160; The developers of that program were very helpful in getting the highlighting in Read Etexts working, but speech-dispatcher needed to be configured before you could use it which was an issue for us.&#160; (There is more than one kind of text to speech software available and speech-dispatcher supports most of them.&#160; This makes configuration files inevitable).&#160; Aleksey Lim of Sugar Labs came up with the idea of using a gstreamer plugin and was the one who wrote it.&#160; He also rewrote much of <strong>Read Etexts</strong> so it would use the plugin if it was available, use speech-dispatcher if not, and would not support speech if neither was available.
</p>
<h2> Running espeak Directly
</h2>
<p>You can run the <strong>espeak</strong> program from the terminal to try out its options.&#160; To see what options are available for espeak you can use the <strong>man</strong> command:
</p>
<pre>man espeak</pre>
<p>This will give you a manual page describing how to run the program and what options are available.&#160; The parts of the man page that are most interesting to us are these:
</p>
<pre><strong>NAME</strong>
       espeak - A multi-lingual software speech synthesizer.

<strong>SYNOPSIS</strong>
       espeak [options] [&lt;words&gt;]

<strong>DESCRIPTION</strong>
       espeak is a software speech synthesizer for English,
       and some other languages.

<strong>OPTIONS</strong>
       -p &lt;integer&gt;
              Pitch adjustment, 0 to 99, default is 50

       -s &lt;integer&gt;
              Speed in words per minute, default is 160

       -v &lt;voice name&gt;
              Use voice file of this name from
              espeak-data/voices

       --voices[=&lt;language code&gt;]
               Lists the available voices. If =&lt;language code&gt;
               is present then only those voices which are
               suitable for that language are listed.
</pre>
<p>Let's try out some of these options. First let's get a list of <strong>Voices</strong>:
</p>
<pre><strong>espeak --voices</strong>
Pty Language Age/Gender VoiceName       File        Other Langs
 5  af             M  afrikaans         af
 5  bs             M  bosnian           bs
 5  ca             M  catalan           ca
 5  cs             M  czech             cs
 5  cy             M  welsh-test        cy
 5  de             M  german            de
 5  el             M  greek             el
 5  en             M  default           default
 5  en-sc          M  en-scottish       en/en-sc    (en 4)
 2  en-uk          M  english           en/en       (en 2)
<em>... and many more ...</em>
</pre>
<p>Now that we know what the names of the voices are we can try them out. How about English with a French accent?
</p>
<pre>espeak "Your mother was a hamster and your father \
smelled of elderberries." -v fr
</pre>
<p>Let's try experimenting with rate and pitch:
</p>
<pre>espeak "I'm sorry, Dave. I'm afraid I can't \
do that." -s 120 -p 30
</pre>
<p>The next thing to do is to write some Python code to run espeak.&#160; Here is a short program adapted from the code in <strong>Speak</strong>:
</p>
<pre>import re
import subprocess

PITCH_MAX = 99
RATE_MAX = 99
PITCH_DEFAULT = PITCH_MAX/2
RATE_DEFAULT = RATE_MAX/3

def speak(text, rate=RATE_DEFAULT, pitch=PITCH_DEFAULT,
    voice="default"):

    # espeak uses 80 to 370
    rate = 80 + (370-80) * int(rate) / 100

    subprocess.call(["espeak", "-p", str(pitch),
            "-s", str(rate), "-v", voice,  text],
            stdout=subprocess.PIPE)

def voices():
    out = []
    result = subprocess.Popen(["espeak", "--voices"],
        stdout=subprocess.PIPE).communicate()[0]

    for line in result.split('\n'):
        m = re.match(
            r'\s*\d+\s+([\w-]+)\s+([MF])\s+([\w_-]+)\s+(.+)',
            line)
        if not m:
            continue
        language, gender, name, stuff = m.groups()
        if stuff.startswith('mb/') or \
            name in ('en-rhotic','english_rp',
                'english_wmids'):
            # these voices don't produce sound
            continue
        out.append((language, name))

    return out

def main():
    print voices()
    speak("I'm afraid I can't do that, Dave.")
    speak("Your mother was a hamster, and your father "
        + "smelled of elderberries!",  30,  60,  "fr")

if __name__ == "__main__":
    main()
</pre>
<p>In the Git repository in the directory <strong>Adding_TTS</strong> this file is named <strong>espeak.py</strong>.&#160; Load this file into <strong>Eric</strong> and do <strong>Run Script</strong> from the <strong>Start</strong> menu to run it.&#160; In addition to hearing speech you should see this text:
</p>
<p><em>[('af', 'afrikaans'), ('bs', 'bosnian'), ('ca', 'catalan'), ('cs', 'czech'), ('cy', 'welsh-test'), ('de', 'german'), ('el', 'greek'), ('en', 'default'), ('en-sc', 'en-scottish'), ('en-uk', 'english'), ('en-uk-north', 'lancashire'), ('en-us', 'english-us'), ('en-wi', 'en-westindies'), ('eo', 'esperanto'), ('es', 'spanish'), ('es-la', 'spanish-latin-american'), ('fi', 'finnish'), ('fr', 'french'), ('fr-be', 'french'), ('grc', 'greek-ancient'), ('hi', 'hindi-test'), ('hr', 'croatian'), ('hu', 'hungarian'), ('hy', 'armenian'), ('hy', 'armenian-west'), ('id', 'indonesian-test'), ('is', 'icelandic-test'), ('it', 'italian'), ('ku', 'kurdish'), ('la', 'latin'), ('lv', 'latvian'), ('mk', 'macedonian-test'), ('nl', 'dutch-test'), ('no', 'norwegian-test'), ('pl', 'polish'), ('pt', 'brazil'), ('pt-pt', 'portugal'), ('ro', 'romanian'), ('ru', 'russian_test'), ('sk', 'slovak'), ('sq', 'albanian'), ('sr', 'serbian'), ('sv', 'swedish'), ('sw', 'swahihi-test'), ('ta', 'tamil'), ('tr', 'turkish'), ('vi', 'vietnam-test'), ('zh', 'Mandarin'), ('zh-yue', 'cantonese-test')] </em>
</p>
<p>The <em>voices()</em> function returns a list of voices as one tuple per voice, and eliminates voices from the list that espeak cannot produce on its own.&#160; This list of tuples can be used to populate a drop down list.
  <br/></p>
<p>The <em>speak()</em> function adjusts the value of <strong>rate</strong> so you can input a value between 0 and 99 rather than between 80 to 370.&#160; <em>speak()</em> is more complex in the Speak Activity than what we have here because in that Activity it monitors the spoken audio and generates mouth movements based on the amplitude of the voice.&#160; Making the face move is most of what the Speak Activity does, and since we aren't doing that we need very little code to make our Activity speak.
</p>
<p>You can use <strong>import espeak</strong> to include this file in your own Activities.
</p>
<h2>Using The gstreamer espeak Plugin
</h2>
<p>The gstreamer espeak plugin can be installed in <strong>Fedora 10</strong> or later using <strong>Add/Remove Software</strong>.
</p>
<p><img alt="Installing the plugin." src="static/ActivitiesGuideSugar-espeak-en.jpg" height="363" width="600"/></p>
<p> When you have this done you should be able to download the <strong>Read Etexts</strong> Activity (the real one, not the simplified version we're using for the book) from ASLO and try out the <strong>Speech</strong> tab.&#160; You should do that now.&#160; It will look something like this:
</p>
<p><img alt="espeak2_1.jpg" src="static/ActivitiesGuideSugar-espeak2_1-en.jpg" height="415" width="600"/><br/></p>
<p>The book used in the earlier screenshots of this manual was <em>Pride and Prejudice</em> by Jane Austen.&#160; To balance things out the rest of the screen shots will be using <em>The Innocents Abroad</em> by Mark Twain.
</p>
<p><strong>Gstreamer</strong> is a framework for multimedia.&#160; If you've watched videos on the web you are familiar with the concept of streaming media.&#160; Instead of downloading a whole song or a whole movie clip and then playing it, streaming means the downloading and the playing happen at the same time, with the downloading just a bit behind the playing.&#160; There are many different kinds of media files: MP3's, DivX, WMV, Real Media, and so on.&#160; For every kind of media file Gstreamer has a plugin.
</p>
<p>Gstreamer makes use of a concept called <strong>pipelining</strong>.&#160; The idea is that the output of one program can become the input to another.&#160; One way to handle that situation is to put the output of the first program into a temporary file and have the second program read it.&#160; This would mean that the first program would have to finish running before the second one could begin.&#160; What if you could have both programs run at the same time and have the second program read the data as the first one wrote it out?&#160; It's possible, and the mechanism for getting data from one program to the other is called a <strong>pipe</strong>.&#160; A collection of programs joined together in this way is called a <strong>pipeline</strong>.&#160; The program that feeds data into a pipe is called a <strong>source</strong>, and the data that takes the data out of the pipe is called a <strong>sink</strong>.
  <br/></p>
<p>The gstreamer espeak plugin uses a simple pipe: text goes into espeak at one end and sound comes out the other and is sent to your soundcard.&#160; You might think that doesn't sound much different from what we were doing before, but it is.&#160; When you just run espeak the program has to load itself into memory, speak the text you give it into the sound card, then unload itself.&#160; This is one of the reasons you can't just use espeak a word at a time to achieve speech with highlighted words.&#160; There is a short lag while the program is loading.&#160; It isn't that noticeable if you give espeak a complete phrase or sentence to speak, but if it happens for every word it is <em>very</em> noticeable.&#160; Using the gstreamer plugin we can have espeak loaded into memory all the time, just waiting for us to send some words into its input pipe.&#160; It will speak them and then wait for the next batch.
</p>
<p>Since we can control what goes into the pipe it is possible to pause and resume speech.
</p>
<p> The example we'll use here is a version of <strong>Read Etexts</strong> again, but instead of the Activity we're going to modify the standalone version.&#160; There is nothing special about the gstreamer plugin that makes it only work with Activities.&#160; Any Python program can use it.&#160; I'm only including Text to Speech as a topic in this manual because every Sugar installation includes espeak and many Activities could find it useful.
</p>
<p>There is a in our Git repository named <strong>speech.py</strong> which looks like this:
</p>
<pre>import gst

voice = 'default'
pitch = 0

rate = -20
highlight_cb = None

def _create_pipe():
    pipeline = 'espeak name=source ! autoaudiosink'
    pipe = gst.parse_launch(pipeline)

    def stop_cb(bus, message):
        pipe.set_state(gst.STATE_NULL)

    def mark_cb(bus, message):
        if message.structure.get_name() == 'espeak-mark':
            mark = message.structure['mark']
            highlight_cb(int(mark))

    bus = pipe.get_bus()
    bus.add_signal_watch()
    bus.connect('message::eos', stop_cb)
    bus.connect('message::error', stop_cb)
    bus.connect('message::element', mark_cb)

    return (pipe.get_by_name('source'), pipe)

def _speech(source, pipe, words):
    source.props.pitch = pitch
    source.props.rate = rate
    source.props.voice = voice
    source.props.text = words;
    pipe.set_state(gst.STATE_PLAYING)

info_source, info_pipe = _create_pipe()
play_source, play_pipe = _create_pipe()

# track for marks
play_source.props.track = 2

def voices():
    return info_source.props.voices

def say(words):
    _speech(info_source, info_pipe, words)
    print words

def play(words):
    _speech(play_source, play_pipe, words)

def is_stopped():
    for i in play_pipe.get_state():
        if isinstance(i, gst.State) and \
            i == gst.STATE_NULL:
            return True
    return False

def stop():
    play_pipe.set_state(gst.STATE_NULL)

def is_paused():
    for i in play_pipe.get_state():
        if isinstance(i, gst.State) and \
            i == gst.STATE_PAUSED:
            return True
    return False

def pause():
    play_pipe.set_state(gst.STATE_PAUSED)

def rate_up():
    global rate
    rate = min(99, rate + 10)

def rate_down():
    global rate
    rate = max(-99, rate - 10)

def pitch_up():
    global pitch
    pitch = min(99, pitch + 10)

def pitch_down():
    global pitch
    pitch = max(-99, pitch - 10)

def prepare_highlighting(label_text):
    i = 0
    j = 0
    word_begin = 0
    word_end = 0
    current_word = 0
    word_tuples = []
    omitted = [' ', '\n', u'\r', '_', '[', '{', ']',\
        '}', '|', '&lt;', '&gt;', '*', '+', '/', '\\' ]
    omitted_chars = set(omitted)
    while i &lt; len(label_text):
        if label_text[i] not in omitted_chars:
            word_begin = i
            j = i
            while j &lt; len(label_text) and \
                 label_text[j] not in omitted_chars:
                 j = j + 1
                 word_end = j
                 i = j
            word_t = (word_begin, word_end, \
                 label_text[word_begin: word_end].strip())
            if word_t[2] != u'\r':
                 word_tuples.append(word_t)
        i = i + 1
    return word_tuples

def add_word_marks(word_tuples):
    "Adds a mark between each word of text."
    i = 0
    marked_up_text = '&lt;speak&gt; '
    while i &lt; len(word_tuples):
        word_t = word_tuples[i]
        marked_up_text = marked_up_text + \
            '&lt;mark name="' + str(i) + '"/&gt;' + word_t[2]
        i = i + 1
    return marked_up_text + '&lt;/speak&gt;'</pre>
<p>There is another file named <strong>ReadEtextsTTS.py</strong> which looks like this:
  <br/></p>
<pre>import sys
import os
import zipfile
import pygtk
import gtk
import getopt
import pango
import gobject
import time
import speech

speech_supported = True

try:
    import gst
    gst.element_factory_make('espeak')
    print 'speech supported!'
except Exception, e:
    speech_supported = False
    print 'speech not supported!'

page=0
PAGE_SIZE = 45

class ReadEtextsActivity():
    def __init__(self):
        "The entry point to the Activity"
        speech.highlight_cb = self.highlight_next_word
        # print speech.voices()

    def highlight_next_word(self, word_count):
        if word_count &lt;&#8286; len(self.word_tuples):
            word_tuple = self.word_tuples[word_count]
            textbuffer = self.textview.get_buffer()
            tag = textbuffer.create_tag()
            tag.set_property('weight', pango.WEIGHT_BOLD)
            tag.set_property( 'foreground', "white")
            tag.set_property( 'background', "black")
            iterStart = \
                textbuffer.get_iter_at_offset(word_tuple[0])
            iterEnd = \
                textbuffer.get_iter_at_offset(word_tuple[1])
            bounds = textbuffer.get_bounds()
            textbuffer.remove_all_tags(bounds[0], bounds[1])
            textbuffer.apply_tag(tag, iterStart, iterEnd)
            v_adjustment = \
                self.scrolled_window.get_vadjustment()
            max = v_adjustment.upper - \
                v_adjustment.page_size
            max = max * word_count
            max = max / len(self.word_tuples)
            v_adjustment.value = max
        return True

    def keypress_cb(self, widget, event):
        "Respond when the user presses one of the arrow keys"
        global done
        global speech_supported
        keyname = gtk.gdk.keyval_name(event.keyval)
        if keyname == 'KP_End' and speech_supported:
            if speech.is_paused() or speech.is_stopped():
                speech.play(self.words_on_page)
            else:
                speech.pause()
            return True
        if keyname == 'plus':
            self.font_increase()
            return True
        if keyname == 'minus':
            self.font_decrease()
            return True
        if speech_supported and speech.is_stopped() == False \
            and speech.is_paused == False:
            # If speech is in progress, ignore other keys.
            return True
        if keyname == '7':
            speech.pitch_down()
            speech.say('Pitch Adjusted')
            return True
        if keyname == '8':
            speech.pitch_up()
            speech.say('Pitch Adjusted')
            return True
        if keyname == '9':
            speech.rate_down()
            speech.say('Rate Adjusted')
            return True
        if keyname == '0':
            speech.rate_up()
            speech.say('Rate Adjusted')
            return True
        if keyname == 'KP_Right':
            self.page_next()
            return True
        if keyname == 'Page_Up' or keyname == 'KP_Up':
            self.page_previous()
            return True
        if keyname == 'KP_Left':
            self.page_previous()
            return True
        if keyname == 'Page_Down' or keyname == 'KP_Down':
            self.page_next()
            return True
        if keyname == 'Up':
            self.scroll_up()
            return True
        if keyname == 'Down':
            self.scroll_down()
            return True
        return False

    def page_previous(self):
        global page
        page=page-1
        if page &lt; 0: page=0
        self.show_page(page)
        v_adjustment = \
            self.scrolled_window.get_vadjustment()
        v_adjustment.value = v_adjustment.upper - \
            v_adjustment.page_size

    def page_next(self):
        global page
        page=page+1
        if page &gt;= len(self.page_index): page=0
        self.show_page(page)
        v_adjustment = \
            self.scrolled_window.get_vadjustment()
        v_adjustment.value = v_adjustment.lower

    def font_decrease(self):
        font_size = self.font_desc.get_size() / 1024
        font_size = font_size - 1
        if font_size &lt; 1:
            font_size = 1
        self.font_desc.set_size(font_size * 1024)
        self.textview.modify_font(self.font_desc)

    def font_increase(self):
        font_size = self.font_desc.get_size() / 1024
        font_size = font_size + 1
        self.font_desc.set_size(font_size * 1024)
        self.textview.modify_font(self.font_desc)

    def scroll_down(self):
        v_adjustment = \
            self.scrolled_window.get_vadjustment()
        if v_adjustment.value == v_adjustment.upper - \
            v_adjustment.page_size:
            self.page_next()
            return
        if v_adjustment.value &lt; v_adjustment.upper - \
            v_adjustment.page_size:
            new_value = v_adjustment.value + \
                v_adjustment.step_increment
            if new_value &gt; v_adjustment.upper - \
                v_adjustment.page_size:
                new_value = v_adjustment.upper - \
                    v_adjustment.page_size
            v_adjustment.value = new_value

    def scroll_up(self):
        v_adjustment = \
            self.scrolled_window.get_vadjustment()
        if v_adjustment.value == v_adjustment.lower:
            self.page_previous()
            return
        if v_adjustment.value &gt; v_adjustment.lower:
            new_value = v_adjustment.value - \
                v_adjustment.step_increment
            if new_value &lt; v_adjustment.lower:
                new_value = v_adjustment.lower
            v_adjustment.value = new_value

    def show_page(self, page_number):
        global PAGE_SIZE, current_word
        position = self.page_index[page_number]
        self.etext_file.seek(position)
        linecount = 0
        label_text = ''
        textbuffer = self.textview.get_buffer()
        while linecount &lt; PAGE_SIZE:
            line = self.etext_file.readline()
            label_text = label_text + \
                unicode(line, 'iso-8859-1')
            linecount = linecount + 1
        textbuffer.set_text(label_text)
        self.textview.set_buffer(textbuffer)
        self.word_tuples = \
            speech.prepare_highlighting(label_text)
        self.words_on_page = \
            speech.add_word_marks(self.word_tuples)

    def save_extracted_file(self, zipfile, filename):
        "Extract the file to a temp directory for viewing"
        filebytes = zipfile.read(filename)
        f = open("/tmp/" + filename, 'w')
        try:
            f.write(filebytes)
        finally:
            f.close()

    def read_file(self, filename):
        "Read the Etext file"
        global PAGE_SIZE

        if zipfile.is_zipfile(filename):
            self.zf = zipfile.ZipFile(filename, 'r')
            self.book_files = self.zf.namelist()
            self.save_extracted_file(self.zf, \
                self.book_files[0])
            currentFileName = "/tmp/" + self.book_files[0]
        else:
            currentFileName = filename

        self.etext_file = open(currentFileName,"r")
        self.page_index = [ 0 ]
        linecount = 0
        while self.etext_file:
            line = self.etext_file.readline()
            if not line:
                break
            linecount = linecount + 1
            if linecount &gt;= PAGE_SIZE:
                position = self.etext_file.tell()
                self.page_index.append(position)
                linecount = 0
        if filename.endswith(".zip"):
            os.remove(currentFileName)

    def delete_cb(self, widget, event, data=None):
        speech.stop()
        return False

    def destroy_cb(self, widget, data=None):
        speech.stop()
        gtk.main_quit()

    def main(self, file_path):
        self.window = gtk.Window(gtk.WINDOW_TOPLEVEL)
        self.window.connect("delete_event", self.delete_cb)
        self.window.connect("destroy", self.destroy_cb)
        self.window.set_title("Read Etexts Activity")
        self.window.set_size_request(800, 600)
        self.window.set_border_width(0)
        self.read_file(file_path)
        self.scrolled_window = gtk.ScrolledWindow(
            hadjustment=None, vadjustment=None)
        self.textview = gtk.TextView()
        self.textview.set_editable(False)
        self.textview.set_left_margin(50)
        self.textview.set_cursor_visible(False)
        self.textview.connect("key_press_event",
            self.keypress_cb)
        self.font_desc = pango.FontDescription("sans 12")
        self.textview.modify_font(self.font_desc)
        self.show_page(0)
        self.scrolled_window.add(self.textview)
        self.window.add(self.scrolled_window)
        self.textview.show()
        self.scrolled_window.show()
        self.window.show()
        gtk.main()

if __name__ == "__main__":
    try:
        opts, args = getopt.getopt(sys.argv[1:], "")
        ReadEtextsActivity().main(args[0])
    except getopt.error, msg:
        print msg
        print "This program has no options"
        sys.exit(2)
</pre>
<p>The program <strong>ReadEtextsTTS</strong> has only a few changes to make it enabled for speech. The first one checks for the existence of the gstreamer plugin:
</p>
<pre>speech_supported = True

try:
    import gst
    gst.element_factory_make('espeak')
    print 'speech supported!'
except Exception, e:
    speech_supported = False
    print 'speech not supported!'
</pre>
<p> This code detects whether the plugin is installed by attempting to import for the Python library associated with it named "gst". If the import fails it throws an <strong>Exception</strong> and we catch that Exception and use it to set a variable named <strong>speech_supported</strong> to <strong>False</strong>.&#160; We can check the value of this variable in other places in the program to make a program that works with Text To Speech if it is available and without it if it is not.&#160; Making a program work in different environments by doing these kinds of checks is called <em>degrading gracefully</em>.&#160; Catching exceptions on imports is a common technique in Python to achieve this.&#160; If you want your Activity to run on older versions of Sugar you may find yourself using it.
</p>
<p>The next bit of code we're going to look at highlights a word in the textview and scrolls the textview to keep the highlighted word visible.
</p>
<pre>class ReadEtextsActivity():
    def __init__(self):
        "The entry point to the Activity"
        speech.highlight_cb = self.highlight_next_word
        # print speech.voices()

    def highlight_next_word(self, word_count):
        if word_count &lt; len(self.word_tuples):
            word_tuple = self.word_tuples[word_count]
            textbuffer = self.textview.get_buffer()
            tag = textbuffer.create_tag()
            tag.set_property('weight', pango.WEIGHT_BOLD)
            tag.set_property( 'foreground', "white")
            tag.set_property( 'background', "black")
            iterStart = \
                textbuffer.get_iter_at_offset(word_tuple[0])
            iterEnd = \
                textbuffer.get_iter_at_offset(word_tuple[1])
            bounds = textbuffer.get_bounds()
            textbuffer.remove_all_tags(bounds[0], bounds[1])
            textbuffer.apply_tag(tag, iterStart, iterEnd)
            v_adjustment = \
                self.scrolled_window.get_vadjustment()
            max = v_adjustment.upper - v_adjustment.page_size
            max = max * word_count
            max = max / len(self.word_tuples)
            v_adjustment.value = max
        return True
</pre>
<p>In the <em>__init__()</em> method we assign a variable called <em>highlight_cb</em> in <strong>speech.py</strong> with a method called <em>highlight_next_word()</em>.&#160; This gives <strong>speech.py</strong> a way to call that method every time a new word in the textview needs to be highlighted.
</p>
<p>The next line will print the list of tuples containing Voice names to the terminal if you uncomment it.&#160; We aren't letting the user change voices in this application but it would not be difficult to add that feature.
  <br/></p>
<p>The code for the method that highlights the words follows.&#160; What it does is look in a list of tuples that contain the starting and ending offsets of every word in the textarea's text buffer.&#160; The caller of this method passes in a word number (for instance the first word in the buffer is word 0, the second is word 1, and so on).&#160; The method looks up that entry in the list, gets its starting and ending offsets, removes any previous highlighting, then highlights the new text.&#160; In addition to that it figures out what fraction of the total number of words the current word is and scrolls the textviewer enough to make sure that word is visible.
</p>
<p>Of course this method works best on pages without many blank lines, which fortunately is most pages.&#160; It does not work so well on title pages.&#160; An experienced programmer could probably come up with a more elegant and reliable way of doing this scrolling.&#160; Let me know what you come up with.
</p>
<p>Further down we see the code that gets the keystrokes the user enters and does speech-related things with them:
</p>
<pre>    def keypress_cb(self, widget, event):
        "Respond when the user presses one of the arrow keys"
        global done
        global speech_supported
        keyname = gtk.gdk.keyval_name(event.keyval)
        if keyname == 'KP_End' and speech_supported:
            if speech.is_paused() or speech.is_stopped():
                speech.play(self.words_on_page)
            else:
                speech.pause()
            return True
        if speech_supported and speech.is_stopped() == False \
            and speech.is_paused == False:
            # If speech is in progress, ignore other keys.
            return True
        if keyname == '7':
            speech.pitch_down()
            speech.say('Pitch Adjusted')
            return True
        if keyname == '8':
            speech.pitch_up()
            speech.say('Pitch Adjusted')
            return True
        if keyname == '9':
            speech.rate_down()
            speech.say('Rate Adjusted')
            return True
        if keyname == '0':
            speech.rate_up()
            speech.say('Rate Adjusted')
            return True
</pre>
<p>As you can see, the functions we're calling are all in the file <strong>speech.py</strong> that we imported.&#160; You don't have to fully understand how these functions work to make use of them in your own Activities.&#160; Notice that the code as written prevents the user from changing pitch or rate while speech is in progress.&#160; Notice also that there are two different methods in speech.py for doing speech.&#160; <strong>play()</strong> is the method for doing text to speech with word highlighting.&#160; <strong>say()</strong> is for saying short phrases produced by the user interface, in this case "Pitch adjusted" and "Rate adjusted".&#160; Of course if you put code like this in your Activity you would use the _() function so these phrases could be translated into other languages.
</p>
<p>There is one more bit of code we need to do text to speech with highlighting: we need to prepare the words to be spoken to be highlighted in the textviewer.
</p>
<pre>    def show_page(self, page_number):
        global PAGE_SIZE, current_word
        position = self.page_index[page_number]
        self.etext_file.seek(position)
        linecount = 0
        label_text = ''
        textbuffer = self.textview.get_buffer()
        while linecount &lt; PAGE_SIZE:
            line = self.etext_file.readline()
            label_text = label_text + unicode(line, \
                'iso-8859-1')
            linecount = linecount + 1
        textbuffer.set_text(label_text)
        self.textview.set_buffer(textbuffer)
        self.word_tuples = \
            speech.prepare_highlighting(label_text)
        self.words_on_page = \
            speech.add_word_marks(self.word_tuples)
</pre>
<p>The beginning of this method reads a page's worth of text into a string called label_text and puts it into the textview's buffer.&#160; The last two lines splits the text into words, leaving in punctuation, and puts each word and its beginning and ending offsets into a tuple.&#160; The tuples are added to a List.
</p>
<p> <strong>speech.add_word_marks()</strong> converts the words in the List to a document in <em>SSML</em> (<em>Speech Synthesis Markup Language</em>) format.&#160; SSML is a standard for adding tags (sort of like the tags used to make web pages) to text to tell speech software what to do with the text.&#160; We're just using a very small part of this standard to produce a marked up document with a mark between each word, like this:
</p>
<pre>&lt;speak&gt;
    &lt;mark name="0"/&gt;The&lt;mark name="1"/&gt;quick&lt;mark name-"2"/&gt;
    brown&lt;mark name="3"/&gt;fox&lt;mark name="4"/&gt;jumps
&lt;/speak&gt;</pre>
<p>When espeak reads this file it will do a <em>callback</em> into our program every time it reads one of the mark tags.&#160; The callback will contain the number of the word in the <strong>word_tuples</strong> List which it will get from the <strong>name </strong>attribute of the <strong>mark </strong>tag.&#160; In this way the method being called will know which word to highlight.&#160; The advantage of using the mark name rather than just highlighting the next word in the textviewer is that if espeak should fail to do one of the callbacks the highlighting won't be thrown out of sync.&#160; This was a problem with speech-dispatcher.
</p>
<p>A callback is just what it sounds like.&#160; When one program calls another program it can pass in a function or method of its own that it wants the second program to call when something happens.
  <br/></p>
<p>To try out the new program run
</p>
<pre>./ReadEtextsTTS.py <em>bookfile</em></pre>
<p>from the Terminal.&#160; You can adjust pitch and rate up and down using the keys <strong>7, 8, 9</strong>, and <strong>0</strong> on the top row of the keyboard.&#160; It should say "Pitch Adjusted" or "Rate Adjusted" when you do that.&#160; You can start, pause, and resume speech with highlighting by using the <strong>End</strong> key on the keypad.&#160; (On the XO laptop the "game" keys are mapped to what is the numeric keypad on a normal keyboard.&#160; This makes these keys handy for use when the XO is folded into tablet mode and the keyboard is not available).&#160; You cannot change pitch or rate while speech is in progress.&#160; Attempts to do that will be ignored.&#160; The program in action looks like this:
</p>
<p><img alt="espeak3.jpg" src="static/ActivitiesGuideSugar-espeak3-en.jpg" height="465" width="600"/></p>
<p>That brings us to the end of the topic of Text to Speech.&#160; If you're like to see more, the Git repository for this book has a few more sample programs that use the gstreamer espeak plugin.&#160; These examples were created by the author of the plugin and demonstrate some other ways you can use it.&#160; There's even a "choir" program that demonstrates multiple voices speaking at the same time.
  <br/></p></body></html>