diff options
author | Wade Brainerd <wadetb@gmail.com> | 2008-05-23 22:59:37 (GMT) |
---|---|---|
committer | Wade Brainerd <wadetb@gmail.com> | 2008-05-23 22:59:37 (GMT) |
commit | 9878512ab181ef56e82d91ed3e69ddbaa50520d0 (patch) | |
tree | 879e52bebdea44daa32afaaa8802c183fd9484ed /woip/README | |
parent | dd58bf72d6799438d8033cf7de6bc26a711734c3 (diff) |
Reorganization step 2.
Diffstat (limited to 'woip/README')
-rw-r--r-- | woip/README | 61 |
1 files changed, 61 insertions, 0 deletions
diff --git a/woip/README b/woip/README new file mode 100644 index 0000000..4c80aac --- /dev/null +++ b/woip/README @@ -0,0 +1,61 @@ +This is all one huge hack, with lots of debug code and hacks left in place. + +==Get started== + +cd c +./configure +make +cd .. + +Get a Wikipedia XML dump (e.g. enwiki-20071018-pages-articles.xml.bz2), and +place it in root (wp) directory. + +cd sh +./process ../<dump> +cd .. + +The processing stage will take several hours (8 on 2.16GHz MBP). (If someone +wants to speed it up, implement xmlprocess.rb in C.) Once this is done, you +can delete the original dump. If you get sick of waiting, use a dump of +the Simple English Wikipedia, which is several orders of magnitude smaller +than the standard English dumps. + +When done: + +To run curses-based livesearch: +cd sh +./livesearch ../<dump> + +To run webserver: +cd rb +ruby -r server.rb -e 'WPServer.start_on(9000)' -- ../<dump> + +(Note that even though the above commands use the filename of the original dump, +they don't actually depend on the file; just the processed versions created by +sh/process) + +==Directory layout== + +rb/ + bzipreader.rb (ruby interface to c/bzipreader.c; supports streaming bz2 files) + index.rb (generate an article-to-block index using bzipreader.rb) + server.rb (Mongrel-based server for using WP dumps with a web browser) + xmlprocess.rb (generate stripped, XML-less file from a vanilla WP dump) + +c/ + bzipreader (locate, extract and decompress arbitrary blocks of bz2 files; sometimes + quite useful for purposes unrelated to this project) + lsearcher (use locate(1) search databases in interesting ways) + searcher (search a ternary search tree built with indexer) + indexer (generate a ternary search tree) + livesearch (use a curses-based interface to browse Wikipedia dumps) + +sh/ + test (run outdated tests) + process (take a vanilla Wikipedia dump and create all the necessary support files) + +app/ + the iPhone application itself + +c/bzipreader.c is based on bzip2recover.c, part of the bzip2 distribution. +c/lsearcher.c is based on fastfind.c, part of the FreeBSD implementation of locate(1). |