1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
|
The xapian index module can be used directly as follows
First clean up any old test data.
>>> index_home = "/tmp/xi"
>>> import os, sys, time, logging
>>> assert os.system('rm -rf %s' % index_home) == 0
# >>> logging.basicConfig(level=logging.DEBUG,
# ... format="%(asctime)-15s %(name)s %(levelname)s: %(message)s",
# ... stream=sys.stderr)
>>> from olpc.datastore.xapianindex import IndexManager
>>> from olpc.datastore import model
>>> im = IndexManager()
>>> im.connect(index_home)
Now add the file to the index.
>>> props = dict(title="PDF Document",
... mime_type="application/pdf")
>>> uid = im.index(props, "test.pdf")
Let the async indexer do its thing. We ask the indexer if it has work
left, when it has none we expect our content to be indexed and searchable.
>>> im.complete_indexing()
Searching on an property of the content works.
>>> def expect(r, count=None):
... if count: assert r[1] == count
... return list(r[0])
>>> def expect_single(r):
... assert r[1] == 1
... return r[0].next()
>>> def expect_none(r):
... assert r[1] == 0
... assert list(r[0]) == []
>>> assert expect_single(im.search("PDF")).id == uid
Searching into the binary content of the object works as well.
>>> assert expect_single(im.search("peek")).id == uid
Specifying a search that demands a document term be found only in the
title works as well.
>>> assert expect_single(im.search('title:PDF')).id == uid
>>> expect_none(im.search('title:peek'))
Searching for documents that are PDF works as expected here. Here we
use the dictionary form of the query where each field name is given
and creates a search.
>>> assert expect_single(im.search(dict(mime_type='application/pdf'))).id == uid
Punctuation is fine.
>>> assert expect_single(im.search("Don't peek")).id == uid
As well as quoted strings
>>> assert expect_single(im.search(r'''"Don't peek"''')).id == uid
We can also issue OR styled queries over a given field by submitting
a list of queries to a given field.
>>> assert expect_single(im.search(dict(mime_type=["text/plain",
... 'application/pdf']))).id == uid
But an OR query for missing values still return nothing.
>>> expect_none(im.search(dict(mime_type=["video/mpg",
... 'audio/ogg'])))
Partial search...
>>> assert expect_single(im.search(r'''pee*''')).id == uid
# We also support tagging of documents.
#
# >>> im.tag(uid, "foo bar")
# >>> assert expect_single(im.search('tags:foo')).id == uid
Cleanly shut down.
>>> im.stop()
>>> assert os.system('rm -rf %s' % index_home) == 0
|