tests/xapianindex.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97

The xapian index module can be used directly as follows

First clean up any old test data.

>>> index_home = "/tmp/xi"
>>> import os, sys, time, logging
>>> assert os.system('rm -rf %s' % index_home) == 0

# >>> logging.basicConfig(level=logging.DEBUG,
# ...                    format="%(asctime)-15s %(name)s %(levelname)s: %(message)s",
# ...                    stream=sys.stderr)
     

>>> from olpc.datastore.xapianindex import IndexManager
>>> from olpc.datastore import model
>>> im = IndexManager()
>>> im.connect(index_home)


Now add the file to the index.

>>> props = dict(title="PDF Document",
...              mime_type="application/pdf")


>>> uid = im.index(props, "test.pdf")

Let the async indexer do its thing. We ask the indexer if it has work
left, when it has none we expect our content to be indexed and searchable.

>>> im.complete_indexing()


Searching on an property of the content works.
>>> def expect(r, count=None):
...    if count: assert r[1] == count
...    return list(r[0])
>>> def expect_single(r):
...    assert r[1] == 1
...    return r[0].next()
>>> def expect_none(r):
...    assert r[1] == 0
...    assert list(r[0]) == []


>>> assert expect_single(im.search("PDF")).id == uid

Searching into the binary content of the object works as well.
>>> assert expect_single(im.search("peek")).id == uid

Specifying a search that demands a document term be found only in the
title works as well.

>>> assert expect_single(im.search('title:PDF')).id == uid
>>> expect_none(im.search('title:peek'))

Searching for documents that are PDF works as expected here. Here we
use the dictionary form of the query where each field name is given
and creates a search.
>>> assert expect_single(im.search(dict(mime_type='application/pdf'))).id == uid

Punctuation is fine.

>>> assert expect_single(im.search("Don't peek")).id == uid

As well as quoted strings

>>> assert expect_single(im.search(r'''"Don't peek"''')).id == uid


We can also issue OR styled queries over a given field by submitting
a list of queries to a given field.

>>> assert expect_single(im.search(dict(mime_type=["text/plain", 
...    'application/pdf']))).id == uid


But an OR query for missing values still return nothing.

>>> expect_none(im.search(dict(mime_type=["video/mpg", 
...    'audio/ogg'])))


Partial search... 
>>> assert expect_single(im.search(r'''pee*''')).id == uid


We also support tagging of documents.

>>> im.tag(uid, "foo bar")
>>> assert expect_single(im.search('tags:foo')).id == uid

Cleanly shut down.
>>> im.stop()

>>> assert os.system('rm -rf %s' % index_home) == 0