Datastore ========= An outline of the basic requirements and design decisions of the Datastore used in Sugar. Description ----------- The Datastore (DS) is a context storage and indexing tool. Its designed to provide efficient access to content while supplying both browse and search models on top of this content. It doesn't this without demanding any notion of a content repository, location or a data API from the consumer. Additionally the Datastore supports recovery of older versions of content and supports both backups and remote storage. Constraints ------------ The system is designed to run in a hostile environment. Operation runtime and space are both highly constrained and only limited CPU time/power is available to complete tasks. 1. Limited Memory 2. Limited Processing Power 3. Limited Storage 4. Attempt to limit number of writes to medium. 5. All interactions through the public interface should return as soon as possible. There is a 10 second window available to DBus, failure to return in this time causes errors. Failure to return in under 0.2 seconds can result in the UI blocking. Point 5 seems a system design flaw to me. The need for atomic I/O bound operations is at odds with a completely asycnronous model and the shouldn't call into such a system in a way that would make it block. Versioning/Storage ------------------ The datastore needs to remain efficient in terms of both space and time while deliverying content in a reliable fashion. Additionally because its designed to function in an environment where we hope to minimize the number of writes.en 1. Recovery of previous versions of documents 2. Efficient Storage Should work with both normal text and binary data 3. Should allow the archival of old versions removing the need to store the entire version history. 4. Should support synchronization with remote stores. Searching/Indexing ------------------ The DS provide searchable metadata and content indexing for the vast majority of content on the system. 1. Searches should reflect immediate operations immediately. (Even though the operations happen asynchronously). 2. Fulltext searching of content should be possible and accessible. Even through historic versions. 3. Fulltext should support stemming of common terms in a given language. Archival/Backups ---------------- The system should provide a model for long term storage and backup support. Old versions of content should be migrated to a long term storage. The system should provide a way identify content no longer needed by the runtime when: 1. Connected to a remote store 2. Connected to the school server 3. Space is limited 4. System is idle The system should identify content subject to backups. Begin a remote transaction with the storage repo. Migrate the old versions over an SSH connection and then remove the old versions and the index information for them from the local store. Remote Repositories ------------------- The DS is capable of mouting additional stores and having them function as a single unit. This can extend from USB devices to remote network based storage (through the use of SSH). If the model is SSH based then the remote stores don't require a active server runtime, but will have increased latency for common operations as indexes must be loaded on a per-request basis. Counting on the remote OS to cache and manage this is an option, a TTL based server start is an option, a per-user or per-machine server is also possible.