Pippin is roughly a database. Like databases, it should support simultaneous access from multiple parties or parts of a program.
It already does in a way: it is designed such that multiple instances accessing the same files should not clash (though they may not synchronise until the next load). But it should support in-memory parallel usage.
One option would be to put each access through a queing system, but this makes each request slower.
Another would be to allow read access to multiple parties and copy-on-write with some synchronisation system. This makes each element insertion/deletion expesive and requires frequent synchronisation. It might be possible to reuse old copies of the map (after synchronisation) once they no longer have read locks; otherwise each batch of non-concurrent accesses requires a new copy.
Another would be to allow each user to make a copy of the current state and only allow modifications through a copy. Committing modifications may require a merge. The user should be able to check for external modifications while holding a copy. This may be useful for a form of transactions.
There should be an interface representing a single partition's data in memory;
call this Repository.
A Repository should be able to represent multiple states of its data; this is
needed for commit replay and commit creation, as well as for history browsing.
In particular, a Repository must represent at least two states (which may be
equal): the current state, and the last saved state.
Interfaces for creating a Repository:
- Create empty, with a name
- Load all snapshots and commit logs provided by some interface
- Ditto, but with restrictions (e.g. only latest state)
Interfaces for modifying a Repository:
-
Load more snapshots/logs provided initially, optionally with restrictions
-
Modify the current state, by:
- inserting an element
- replacing an element
- deleting an element
-
Create a new commit from the current state
-
Writting all changes to a commit log (automatically choosing whether or not to additionally create a new snapshot)
-
Write a new snapshot
Interfaces for reading data from a Repository:
- List element identifiers
- Iterate over elements, perhaps with filters
- Retrieve a specified element
Note that this is incomplete: some mechanism is required in order to (a) provide snapshot and commit log data streams for loading, (b) provide a data stream for writing a new snapshot, (c) provide a data stream for writing a commit log as well as removing obsolete commit logs.
There should be some interface for discovering repository snapshots and log files given a path to a snapshot file, either limited to the specified snapshot file plus its commit logs, or resolving all snapshots and commit logs for the repository.
Creation:
- Snapshot only, via path
- 1 + find corresponding commit logs
- Extrapolate to all files for the repository
Interface: this should implement some trait used by Repository, allowing
retrieval of the latest snapshot, all snapshots in historical order, commit
logs for each snapshot (maybe via a sub-interface), creation of new snapshot
files (more accurately writable streams), and creation of new log files.
Push/pull/merge: push local modifications to a remote copy, pull remote modifications, merge changes (only automatic ones or start manual merge), etc.
Fix: if checksum errors are found, try to recover (e.g. check whether remote copies are also corrupted, try to localise the corruption, possibly ask the user, replay a series of patches and compare to a snapshot).
I don't know what might be needed here, or maybe combined elsewhere...