== Problem: Internal Coupling, Interfaces, and User Experience == #??? The last few weeks of broader use of the rainbow codebase have revealed three problems that we need to fix in preparation for broader deployment. Situation: First, olpc-update is too strongly coupled to Rainbow. This is a problem because it means that it's hard to modify either piece of software without unintentionally breaking the other. Next, Rainbow does not have a workable user-interface. Users of the software have universally been unable to follow or to capture rainbow traces and are frequently frustrated because rainbow fails offer human-readable explanations of why it is failing. Two important facets of this issue are (the violations of the Principle of Least Surprise and the Principle of Silence) and (you can only interpret the traces if you know exactly what the trace is *supposed* to be). Also, the traces are not packaged in form convenient for uploading to me or, better, automatically aggregated. Finally, Rainbow's internal interfaces are too strongly coupled. This hinders development because, without a test suite, console entry points, and cut-points (in the AOP sense), it's very hard to modify the software without understanding the whole code base. In particular, you can't really produce an known good initial configuration, then iterate changes against that configuration). Thoughts: I intend to separate olpc-update out from rainbow by splitting rainbow up into a pipeline of scripts, probably organized along the current 'stage' boundaries, that more exhaustively verify their starting and ending assumptions. This decomposition will accomplish several things: 1) it will allow me to properly split olpc-update into a separate package and should reduce code-breakage resulting from changes made to one component that violate the assumptions made by a different component. 2) it will give me a vehicle for automatically collecting trace-data from third-party users - I will simply ask them to exercise the shell pipeline through a wrapper that records full traces, that nicely packages them, and that attempts to send them to me. 3) it will clarify the locations in the design in which outside data is processed and it will document the protocol by which these interactions occur, e.g. by forcing these data to be passed through introspectable media like files, environment variables, and command-line options 4) it will clarify the life-cycle of contained applications 5) it will provide a better opportuntity to exhaustively verify the assumptions of each stage before continuing 6) it will make it easy and robust to start using a prototype process simply by adding a wrapper to the chain Downsides: * It adds a lot of option-parsing complexity - The interfaces need to be documented anyway, and being able to pick up part-way through an interaction / single-step could be a big plus * It probably uses more resources than the current architecture - The current architecture is just not adequately transparent; therefore, it can be seen as a premature optimization. - Also, if it means that we get prototypes, we probably win. * it's not clear that it will actually be easier to give good user feedback - it will be easier to debug, though, because you can start part-way through and because you don't have to work *through* sugar to try things. - one could, for example, experiment with new pieces of isolation without dealing with Sugar and the Terminal. * do we have any data that we have to communicate backwards? If so, how are we going to accomplish this? - perhaps we could take the python-web-server idea of having a response object that you piece together and ultimately return - The chaining model is basically a linear graph of continuations We could expand on this by putting together a more interesting graph; e.g, by adding failure continuations * why not go with a web-server? - It still feels too monolithic and inadequately transparent to me I really want to be able to examine the intermediate state of the system with find and grep. Model: The basic model being proposed here is to use the filesystem as an abstract store, to regard individual shell scripts as being fragments of an overall program (written in CPS) with *well-defined interfaces*, and to thereby expose the overall computation being performed to analysis by normal unix filesystem tools. Questions for the Model: * Are the individual chunks are supposed to execute atomically with respect to one another? Questions for the Store: * Is the store a naïve Scheme store or is it like the C heap? * Does the store preserve old bindings for inspection? * Perhaps it records the sequence in which bindings are added? * Can it record who made a binding, and why? * Thread-local or global? * Configurable location? * File-system or environment-variable + tempfile? * Typed or untyped? (Pickles, JSON, repr)?