Web   ·   Wiki   ·   Activities   ·   Blog   ·   Lists   ·   Chat   ·   Meeting   ·   Bugs   ·   Git   ·   Translate   ·   Archive   ·   People   ·   Donate
summaryrefslogtreecommitdiffstats
path: root/rainbow/permissions/manifest.txt
blob: cf0ff5453ce89b29324524c725e663cd4dd556aa (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
Verification System Data
========================

:Author: Michael Stone <michael@laptop.org>, Noah Kantrowitz <noah@laptop.org>
:Date: August 8, 2007

Our system for update verification manipulates four kinds of data:
envelopes, keys, credentials, and manifests.

Presently, we intend to use JSON as a transfer encoding because we only need to
represent tree-structured data and because we want an encoding that is easily
and safely parseable in pure Python and in other languages.

If we learn that we need a canonical encoding, a more space-efficient encoding,
or an encoding that can be more efficiently streamed, then we will consider
alternatives. 


Envelopes
---------

Envelopes are tuples of the form

  (version, type, data) :: (Integer, String, Value)
     ^       ^     ^
     |       |     \________ a JSON value in a format governed by (version, type)
     |       \______ a hint about how to interpret the data field that follows.
     \_________ an opaque version identifier; see below.

The version identifier is intended to specify a minimal dialect that must be
supported by compliant clients. Backwards-compatible extensions of this dialect
may be included in an enveloped value without changing the version identifier. 

  Note: Noah suggests make the version identifier transparent and declare it to
  be an integer with a a monotonically increasing value. I'm okay with this,
  but since we anticipate changing the identifer only when making
  backwards-incompatible protocol changes, I think it might as well be opaque.

Keys
----

Keys are envelopes with version 1 and type "key". A key's datum consists of a
tuple

  (algorithm, fingerprint, key)  :: (String, String, String)
       ^          ^         ^
       |          |         \______ an opaque string specifiying the actual key
       |          \______ an opaque string to use to refer to the key 
       \_________ an opaque algorithm identifier; 

In practice, we might see a key encoded as:
  [1, "key", ["rsa", "12508713460986140897", "AE1908579871250981230896109761"]]


Credentials
-----------

A credential is an envelope with version 1 and type "credential". Conceptually,
its datum consists of a relation whose tuples are signatures of the form:

  (signature-algorithm, key, message, signature)

However, in practice, the message is defined implicitly by context (i.e. as the
hash of the manifest with which this credential is paired) and we wish to use
the credential/manifest system to help detect accidental corruption as early as
possible, including corruption of the manifest. 

Therefore, we actually propose to represent the datum of a manifest credential
as a tuple of

  (hash-alg, sig-alg, expected-hash, key-identifier, signature)

We include the manifest hash as a convenient way to check that the manifest
was not accidentally damaged in transit (e.g. by a flash corruption bug). 

The key fingerprint identifies the key that should be used to verify the
signature.

The hash-algorithm and signature-algorithm are used to verify the signature
against the *computed* manifest-hash (not the one cached in the signature
message).

Finally, the signature itself is opaque.

In actual JSON, we might imagine that such a message would look like:
  
  [1, "credential", [ ["sha1", "rsa", "DEADBEEF", "123MYKEY", "0PAQU3S1G"], 
                      ["whirlpool", "ecc", "123095", "9912KEY", "8567A"] ] ]



Manifests
---------

Manifests are the most complicated piece of the system because they have
different conceptual and transfer-encoded forms.

Conceptually, a manifest's datum is a constraint-set. Implementations of this
constraint language are free to support checking as many or as few kinds of
constraint as they wish. Clearly, though, adequate verification of a file-system
against a manifest depends on both the presence of a constraint set that
sufficiently constrains the properties of the file-system being checked and a
correct and complete implementation of a checker for that constraint-set.

This model controls the implementation cost of both the verification machine
(and the manifest generation tool) while allowing great flexibility in the
underlying properties being verified and in the degree to which we compromise
our implementation goals in the face of time pressure.

Details
~~~~~~~

There are several basic constraints we wish to support. These include
constraints on file content, file type (normal, hardlink, symlink, device, ...),
permissions, ownership, and on the contents of directories.

Manifests are envelopes with version 1 and type "manifest". A manifest's datum
consists of a tuple of a dictionary of optimization hints and a dictionary of
constraints. Structurally, the datum looks like
  
  (Hint -> Value, Constraint -> (Query, Result))

Here "hints" are opaque strings that may be given some uninterpreted data as an
argument (e.g. "invert-filesystem" : true). These are provided because checking
some of the global constraints may rely on expensive-to-compute structures such
as an inverted index that maps inodes to the paths that point to them.

The constraint itself dictionary is also made slightly more complicated in the
name of speed and memory usage. While conceptually, we wish to store a ternary
relation with columns

  (constraint-algorithm, query, expected-result)

in practice, we expect that there will very few (~5) constraint algorithms and
very many (~100000) (query, expected-result) pairs.  Hence we suggest
optimizing for this case by grouping the relation's tuples by
constraint-algorithm, as shown above in the structural description.

A small example of a manifest might be:

  [1, "manifest", [
    {"invert-filesystem": ["/links"]},
    {"file-sha1" : [ ["/etc/passwd", "BLAH18AK"],
                     ["/bin/bash", "LA81985ED"]],
     "permissions" : [ ["/my/secret", 448] ],
     "same-inode" : [ [["/links/1", "/links/2", "/links/3"], null] ],
     "dir-contains" : [ ["/", ["bin", "sbin"]] ],
     "update-excludes": [ ["/security", null] ]
    }]]