From 48a6e2dff631c35081e592fdc8b3b36de0363281 Mon Sep 17 00:00:00 2001 From: Joshua Minor Date: Sat, 15 Mar 2008 19:32:08 +0000 Subject: Added po files. Added TODO.txt Added archived versions of speak.xo --- (limited to 'docs') diff --git a/docs/TODO.txt b/docs/TODO.txt new file mode 100644 index 0000000..d053206 --- /dev/null +++ b/docs/TODO.txt @@ -0,0 +1,66 @@ +- collaboration + - if performance is okay, should show multiple faces - one for each person + - if not, then just share settings and let any person type + +- eyes should look some z-distance towards the user + - this should prevent the cross-eyed and mismatched y-coordinate problems + +- i18n + +- speechd + - get newer version with callbacks, list_voices, etc. + - try to insert lots of index_marks + - try to pipe audio back to Speak to get waveform + +- mouth shape should be driven by phonemes + - try C-API callbacks + - we get callbacks for phonemes with really big numbers - not sure how to interpret them + - could use multi-step process: text->human readable phonemes, then add between each one, then speak + - either way need to handle RETRIEVAL mode and route audio to the right place + - try to wrap espeak API with SWIG + - get per-phoneme callbacks from speechd? + - can we send pre-phonemed [[...]] text to speechd? + +- words/syllables should highlight as it speaks (karaoke-style) + +- repackage face into a widget +- eyes should blink +- there should be a nose +- there should be a Googly vs Normal eye motion (keep y-coords level) +- use XO colors +- mouth doesn't close all the way at the end sometimes? + - especially when using fft and rate is very fast +- large numbers aren't spoken correctly +- eyes should track when dragging sliders in the toolbar + +- adjusting rate, pitch, etc. should say something more informative (like "faster", "slower", etc.) + +- read-a-story mode + - list of stories to read + - easy to add new ones + - play/pause + - remember where you left off + - this sounds like maybe a different activity? + +- predictive typing ala Stephen Hawking's talking computer + - use a simple dictionary for letters, weighted by frequency of use + - use a markov chain for words, seeded with some pre-computed frequencies, but trained by use + +- language translation + I typed "open source machine translation" into Google and spent a couple of hours reading. + Start here: http://events.ccc.de/congress/2006/Fahrplan/events/1701.en.html + This one seems quite nice: http://www.statmt.org/moses/ + The language models + phrase tables are large (200-400 MB) + An open web translation service would be ideal for space, but requires connectivity + Could try: http://www.google.com/language_tools?hl=en + http://www.google.com/support/contact/?translate=1 + http://groups.google.com/group/google-translate + +[done] try speechd API +[done] fix mouth corners by using end caps or a closed shape +[done] eyes should track the text cursor when typing +[done] eyes should float back to center after a while +[done] up/down arrows should cycle through old sentences +[done] text should not disappear until after the sentence is over +[done] should save state to journal + -- cgit v0.9.1