Web   ·   Wiki   ·   Activities   ·   Blog   ·   Lists   ·   Chat   ·   Meeting   ·   Bugs   ·   Git   ·   Translate   ·   Archive   ·   People   ·   Donate
summaryrefslogtreecommitdiffstats
path: root/TODO
blob: d053206c181d1ef43966a3f4d0f0a372aeb22cfe (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
- collaboration
	- if performance is okay, should show multiple faces - one for each person
	- if not, then just share settings and let any person type

- eyes should look some z-distance towards the user
	- this should prevent the cross-eyed and mismatched y-coordinate problems

- i18n

- speechd
	- get newer version with callbacks, list_voices, etc.
	- try to insert lots of index_marks
	- try to pipe audio back to Speak to get waveform

- mouth shape should be driven by phonemes
	- try C-API callbacks
		- we get callbacks for phonemes with really big numbers - not sure how to interpret them
		- could use multi-step process: text->human readable phonemes, then add <mark> between each one, then speak 
		- either way need to handle RETRIEVAL mode and route audio to the right place
	- try to wrap espeak API with SWIG
	- get per-phoneme callbacks from speechd?
		- can we send pre-phonemed [[...]] text to speechd?

- words/syllables should highlight as it speaks (karaoke-style)

- repackage face into a widget
- eyes should blink
- there should be a nose
- there should be a Googly vs Normal eye motion (keep y-coords level)
- use XO colors
- mouth doesn't close all the way at the end sometimes?
	- especially when using fft and rate is very fast
- large numbers aren't spoken correctly
- eyes should track when dragging sliders in the toolbar

- adjusting rate, pitch, etc. should say something more informative (like "faster", "slower", etc.)

- read-a-story mode
	- list of stories to read
	- easy to add new ones
	- play/pause
	- remember where you left off
	- this sounds like maybe a different activity?

- predictive typing ala Stephen Hawking's talking computer
	- use a simple dictionary for letters, weighted by frequency of use
	- use a markov chain for words, seeded with some pre-computed frequencies, but trained by use

- language translation
	I typed "open source machine translation" into Google and spent a couple of hours reading.
	Start here: http://events.ccc.de/congress/2006/Fahrplan/events/1701.en.html
	This one seems quite nice: http://www.statmt.org/moses/
	The language models + phrase tables are large (200-400 MB)
	An open web translation service would be ideal for space, but requires connectivity
	Could try: http://www.google.com/language_tools?hl=en
	http://www.google.com/support/contact/?translate=1
	http://groups.google.com/group/google-translate

[done] try speechd API
[done] fix mouth corners by using end caps or a closed shape
[done] eyes should track the text cursor when typing
[done] eyes should float back to center after a while
[done] up/down arrows should cycle through old sentences
[done] text should not disappear until after the sentence is over
[done] should save state to journal