diff --git a/docs/specifications/registries-2.md b/docs/specifications/registries-2.md new file mode 100644 index 0000000000..41a32886b9 --- /dev/null +++ b/docs/specifications/registries-2.md @@ -0,0 +1,557 @@ +# Registries: Take 2 (including Git Registries) + +Originally, the design of registries was decided upon and written up in the [Registries RFC](registries.md). +However, as we've gotten further into the design process of git registries and versioning, +and discussed the interaction of versioning with registries, +it's become clear that the existing design was lacking. +We need to have an on-disk port database that is not tied to the ports tree. + +This RFC is a new design for registries, that includes this registry database. +It also includes the design for git registries, +which are likely to be the predominant form of registries in the wild. +They are also what we will start to treat the default registry as, +to allow for updating ports without updating the vcpkg executable +(likely necessary for binary releases). + +## Design Considerations + +After internal discussions of the relationship between versioning and registries, +it was clear that the existing design of registries does not play well with versioning. +It was also clear that it was necessary to have metadata about ports in a separate place from the ports tree; +in fact, after discussion, it was clear that the ports tree should be considered an implementation detail; +a backing store for build process information (e.g., `portfile.cmake` and the patches) and the manifest. + +From this, it's clear that vcpkg needs to add a new set of metadata. +The versioning implementation has decided on `port_versions`, and thus that's what this RFC uses. + +Since we're replacing the existing ports directory with a new method of describing ports, +this means that the ports directory is no longer anything but a data store. +This also means that the existing rules around locations of ports is no longer required; +however, it will still keep getting followed for the main repository, +and it's recommended that other registries follow the same pattern to make contributing easier. + +## What does the registry database look like? + +We don't wish to have the same problem as we do right now, +where there are nearly 1500 entries in a single directory. +We solve this by placing each database entry into `port_versions/-/.json`. +For example, the database entry for 7zip is in `port_versions/7-/7zip.json`. + +Each of these database entries contains all of the versions of the port throughout history, +along with versioning and feature metadata, so that we do not have to check out old manifests or CONTROL files +to get at that information. + +Each database entry file must be a top-level array of port version objects, which contain the following entries: +* A version field: `"version-string"`, `"version"`, etc. Same as in the manifest. +* Optionally, `"port-version"`: Same as in the manifest. + +And also contain a description of where to find the build files for this port; the possibilities include: + +* `"git-tree"`: The [git object ID] of a tree object; this is only allowed for git registries. + Note that this ID must be an ID from the repository where the registry is located. +* `"path"`: A path describing where to find the build files. + The first entry in this path should be `$`, which means "this path starts at the root of the registry". + No other kinds of paths are allowed. + * For example: `$/foo/bar` gives you `foo/bar` underneath the folder containing the `port_versions` directory. + * `/foo/bar` and `foo/bar` are both disallowed. + +Using a `"git-tree"` as a backend in a non-git registry, and using a `"path"` in a git registry, +is not permitted. Future extensions may include things like remote archives or git repositories, +or may allow `"path"` in git registries. + +Note that a registry entry should _always_ be additive; +deleting existing entries is unsupported and may result in bad behavior. +The only modification to existing entries that is allowable is moving the backing store +for the build files, assuming that the new build files are equivalent to the old build files. +(For example, a filesystem registry might have a new way of laying out where ports are). + +Additionally, we'd like a new way of describing the set of ports that make up a "baseline". +This is currently done with the reference of the vcpkg git repository - +each reference has a set of versions that are tested against each other, +and this is a major feature of vcpkg. +We wish to have the same feature in the new versioning world, +and so we'll have a set of baseline versions in the registry database. + +Baselines act differently between git registries or the builtin registry, +and in filesystem registries. +In git registries and the builtin registry, +since there's a history that one can access, +a baseline is the `"default"` entry in the baseline at the reference specified. +In filesystem registries, since there is no accessible history, +the baseline identifiers are mapped directly to entries in the baseline file, +without translation; by default, the `"default"` entry is used. + +These baselines are placed in `port_versions/baseline.json`. +This is an object mapping baseline names to baseline objects, +where baseline objects map port names to version objects. +A version object contains `"baseline"`, which is un-schemed version, +and optionally `"port-version"`. + +[git object ID]: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects + +### Example of a baseline file + +The following is a reasonable baseline.json for a filesystem registry that only has two ports: + +```json +{ + "default": { + "abseil": { "baseline": "2020-03-03" }, + "zlib": { "baseline": "1.2.11", "port-version": 9 } + }, + "old": { + "abseil": { "baseline": "2019-02-11" }, + "zlib": { "baseline": "1.2.11", "port-version": 3 } + }, + "really-old": { + "zlib": { "baseline": "1.2.9" } + } +} +``` + +### Example of a registry database entry file + +Note: This file assumes that the versions RFC has been implemented, +and thus that minimum versions are required; +the syntax may change in the time between now and finishing the implementation. + +This example is of `ogre`, since this port has both features and dependencies; +remember that this file would be `port_versions/o-/ogre.json`. + +```json +[ + { + "version-string": "1.12.7", + "git-tree": "466e96fd2e17dd2453aa31dc0bc61bdcf53e7f61", + }, + { + "version-string": "1.12.1", + "port-version": 1, + "git-tree": "0de81b4f7e0ec24966e929c2ea64e16c15e71d5e", + }, + ... +] +``` + +#### Filesystem Registry Databases + +Filesystem registries are the simplest possible registry; +they have a `port_versions` directory at the top-level, which contains the registry database. +It's expected that the filesystem registry would have a filesystem backing store: +something like the existing `ports` directory, except with separate versions. +There won't be a specific way to lay the ports tree out as mandated by the tool, +as we are treating the ports tree as an implementation detail of the registry; +it's simply a way to get the files for a port. +As an example, let's assume that the registry is laid out something like this: + +``` +/ + port_versions/ + baseline.json + a-/ + abseil.json + asmjit.json + o-/ + ogre.json + ports/ + a-/ + abseil/ + 2020-03-03_7/ + vcpkg.json + portfile.cmake + ... + 2020-03-03_8/ + vcpkg.json + portfile.cmake + ... + ... + asmjit/ + 2020-05-08/ + CONTROL + portfile.cmake + ... + 2020-07-22/ + vcpkg.json + portfile.cmake + ... + o-/ + ogre/ + 1.12.7/ + ... + 1.12.1/ + ... + ... + ... +``` + +Then, let's look at updating `asmjit` to latest. + +The current manifest file, in `asmjit/2020-07-22/vcpkg.json` looks like: + +```json +{ + "name": "asmjit", + "version-string": "2020-07-22", + "description": "Complete x86/x64 JIT and Remote Assembler for C++", + "homepage": "https://github.com/asmjit/asmjit", + "supports": "!arm" +} +``` + +while the current `port_versions/a-/asmjit.json` looks like: + +```json +[ + { + "version-string": "2020-07-22", + "path": "$/ports/a-/asmjit/2020-07-22" + }, + { + "version-string": "2020-05-08", + "path": "$/ports/a-/asmjit/2020-05-08" + } +] +``` + +with `port_versions/baseline.json` looking like: + +```json +{ + "default": { + ..., + "asmjit": { "baseline": "2020-07-22" }, + ... + } +} +``` + +and we'd like to update to `2020-10-08`. +We should first copy the existing implementation to a new folder: + +```sh +$ cp -r ports/a-/asmjit/2020-07-22 ports/a-/asmjit/2020-10-08 +``` + +then, we'll make the edits required to `ports/a-/asmjit/2020-10-08` to update to latest. +We should then update `port_versions/a-/asmjit.json`: + +```json +[ + { + "version-string": "2020-10-08", + "path": "$/ports/a-/asmjit/2020-10-08" + }, + { + "version-string": "2020-07-22", + "path": "$/ports/a-/asmjit/2020-07-22" + }, + { + "version-string": "2020-05-08", + "path": "$/ports/a-/asmjit/2020-05-08" + } +] +``` + +and update `port_versions/baseline.json`: + +```json +{ + "default": { + ..., + "asmjit": { "baseline": "2020-10-08" }, + ... + } +} +``` + +and we're done 😊. + +#### Git Registry Databases + +Git registries are not quite as simple as filesystem registries, +but they're still pretty simple, and are likely to be the most common: +the default registry is a git registry, for example. +There is not a specific way the tool requires one to lay out the backing store, +as long as it's possible to get an object hash that corresponds to a checked-in git tree +of the build information. +This allows, for example, the current vcpkg default registry way of laying out ports, +where the latest version of a port `

` is at `ports/

`, +and it also allows for any number of other designs. +One interesting design, for example, +is having an `old-ports` branch which is updated whenever someone want to backfill versions; +then, one could push the old version to the `old-ports` branch, +and then update the HEAD branch with the git tree of the old version in `port_versions/p-/

`. + +As above, we want to update `asmjit` to latest; let's assume we're working in the default vcpkg registry +(the repository): + +The current manifest file for `asmjit` looks like: + +```json +{ + "name": "asmjit", + "version-string": "2020-07-22", + "description": "Complete x86/x64 JIT and Remote Assembler for C++", + "homepage": "https://github.com/asmjit/asmjit", + "supports": "!arm" +} +``` + +and the current `port_versions/a-/asmjit.json` looks like: + +```json +[ + { + "version-string": "2020-07-22", + "git-tree": "fa0c36ba15b48959ab5a2df3463299e1d2473b6f" + } +] +``` + +Now, let's update it to the latest version: + +```json +{ + "name": "asmjit", + "version-string": "2020-10-08", + "description": "Complete x86/x64 JIT and Remote Assembler for C++", + "homepage": "https://github.com/asmjit/asmjit", + "supports": "!arm" +} +``` + +and make the proper edits to the portfile.cmake. Then, let's commit the changes: + +```cmd +> git add ./ports/asmjit +> git commit -m "[asmjit] update asmjit to 2020-10-08" +``` + +In `git-tree` mode, one needs to commit the new version of the port to get the git tree hash; +we use `git rev-parse` to do so: + +```cmd +> git rev-parse HEAD:ports/asmjit +2bb51d8ec8b43bb9b21032185ca8123da10ecc6c +``` + +and then modify `port_versions/a-/asmjit.json` as follows: + +```json +[ + { + "version-string": "2020-10-08", + "git-tree": "2bb51d8ec8b43bb9b21032185ca8123da10ecc6c" + }, + { + "version-string": "2020-07-22", + "git-tree": "fa0c36ba15b48959ab5a2df3463299e1d2473b6f" + } +] +``` + +Then we can commit and push this new database with: + +```sh +$ git add port_versions +$ git commit --amend --no-edit +$ git push +``` + +## Consuming Registries + +The `vcpkg-configuration.json` file from the [first registries RFC](registries.md) +is still the same, except that the registries have a slightly different layout. +A `` is still an object with the following fields: +* Optionally, `"default-registry"`: A `` or `null` +* Optionally, `"registries"`: An array of ``s + +Additionally, `` is still the same; +a `` object, plus the following properties: +* Optionally, `"baseline"`: A named baseline. Defaults to `"default"`. +* Optionally, `"packages"`: An array of ``s + +however, ``s are now slightly different: +* ``: + * `"kind"`: The string `"builtin"` +* ``: + * `"kind"`: The string `"filesystem"` + * `"path"`: A path +* ``: + * `"kind"`: The string `"git"` + * `"repository"`: A URI + +The `"packages"` field of distinct registries must be disjoint, +and each `` must have at the `"packages"` property, +since otherwise there's no point. + +As an example, a package which uses a different default registry, and a different registry for boost, +might look like the following: + +```json +{ + "default-registry": { + "kind": "filesystem", + "path": "vcpkg-ports" + }, + "registries": [ + { + "kind": "builtin", + "packages": [ "cppitertools" ] + } + ] +} +``` + +This will install `fmt` from `/vcpkg-ports`, +and `cppitertools` and the `boost` ports from the registry that ships with vcpkg. +Notably, this does not replace behavior up the tree -- only the `vcpkg-configuration.json`s +for the current invocation do anything. + +### Filesystem Registries + +A filesystem registry takes on the form: + +* `"kind"`: The string `"filesystem"` +* `"path"`: The path to the filesystem registry's root, i.e. the directory containing the `port_versions` directory. + +```json +{ + "kind": "filesystem", + "path": "vcpkg-registry" +} +``` + +Unlike git registries, where there's quite a bit of interesting stuff going on, +there isn't much stuff to do with filesystem registries. +We simply use the registry database at `/port_versions` to get information about ports. + +### Git Registries + +A git registry takes on the form: + +* `"kind"`: The string `"git"` +* `"repository"`: The URL at which the git repository lives. May be any kind of URL that git understands + +```json +{ + "kind": "git", + "repository": "https://github.com/microsoft/vcpkg" +} +``` + +Whenever the first vcpkg command is run with a git registry, +vcpkg notes down the exact commit that HEAD points to at the time of the run in the `vcpkg-lock.json` file. +This will be used as the commit which vcpkg takes the `"default"` baseline from, +and vcpkg will only update that commit when `vcpkg update` is run. + +Since the `"versions"` field is strictly additive, we don't consider older refs than `HEAD`. +We update the repository on some reasonable clip. +Likely, whenever a command is run that will change the set of installed ports. + +#### `vcpkg-lock.json` + +This file will contain metadata that we need to save across runs, +to allow us to keep a "state-of-the-world" that doesn't change unless one explicitly asks for it to change. +This means that, even across different machines, the same registries will be used. +We will also be able to write down version resolution in this file as soon as that feature is added. + +It is recommended that one adds this `vcpkg-lock.json` to one's version control. +This file is machine generated, and it is not specified how it's layed out; +however, for purposes of this RFC, we will define how it relates to git registries. + +In `vcpkg-lock.json`, in the top level object, +there will be a `"registries"` property that is an object. +This object will contain a `"git"` field, which is an array of git-registry objects, +that contain: + +* `"repository"`: The `"repository"` field from the git registry object +* `"baseline"`: The name of the baseline that we've used +* `"baseline-ref"`: The ref which we've gotten the specific baseline from. + +For example, a `vcpkg-lock.json` might look like: + +```json +{ + "registries": { + "git": [ + { + "repository": "https://github.com/microsoft/vcpkg", + "baseline": "default", + "baseline-ref": "6185aa76504a5025f36754324abf307cc776f3da" + } + ] + } +} +``` + +#### `vcpkg update` + +You'll notice that once the repository is added the first time, +there is only one way to update the repository to the tag at a later date - deleting the lock file. +We additionally want to add support for the user updating the registry by themselves - +they will be able to do this via the `vcpkg update` command. +The `vcpkg update` command will, for each git registry, +update the registry and repoint the `"commit"` field in `vcpkg-lock.json` to the latest `HEAD`. + +There is no way to update only one git registry to a later date, since versions are strictly additive. + +## Git Registries: Implementation on Disk + +There are two implementations on disk to consider here: the implementation of the registry database, +and once we have the database entries for the ports, accessing the port data from the git tree object. + +Both of these implementations are placed in the vcpkg cache home (shared by binary caching archives). +On unix, this is located at `$XDG_CACHE_HOME/vcpkg` if the environment variable exists, +otherwise `$HOME/.cache/vcpkg`; on Windows, it's located at `%LOCALAPPDATA%\vcpkg`. +In this document, we use the variable `$CACHE_ROOT` to refer to this folder. +We will add a new folder, `$CACHE_ROOT/registries`, which will contain all the data we need. + +First, we'll discuss the registry database. + +### Registry Database + +At `$CACHE_ROOT/registries/git`, +we'll create a new git repository root which contains all information from all git registries, +since the hashes should be unique, and this allows for deduplication +across repositories which have the same commits (e.g., for mirrors). +In order to get the data from git registries, we simply `fetch` the URL of the git registry. + +In order to grab a specific database entry from a git registry, `git show` is used to grab the +file from the right commit: `git show -- port_versions/-/.json`. + +One unfortunate thing about having one directory being used for all vcpkg instances on a machine is +that it's possible to have an issue with concurrency - for example, after `fetch`ing the latest HEAD +of `https://github.com/microsoft/vcpkg`, another vcpkg process might fetch the latest HEAD of +`https://github.com/meow/vcpkg` before the first vcpkg process has the chance to `git rev-parse FETCH_HEAD`. +Since the first vcpkg process will run `git rev-parse` after the second fetch is done, +instead of getting the `HEAD` of `microsoft/vcpkg`, they instead get the `HEAD` of `meow/vcpkg`. +We will solve this by having a mutex file in `$CACHE_ROOT/registries/git` +that vcpkg locks before any fetches (and unlocks after `rev-parse`ing). + +### Accessing Port Data from `git-tree`s + +Once we've done version resolution and everything with the database, +we then need to access the port data from the git history. +We will add a new folder, `$CACHE_ROOT/registries/git-trees`, into which we'll check out the port data. + +In this `git-trees` directory, we will have all of the trees we check out, at their hashes. +For example, the asmjit port data from above will be located at +`git-trees/2bb51d8ec8b43bb9b21032185ca8123da10ecc6c`. +We will add a mutex file in this `git-trees` directory as well which is taken whenever +we are checking out a new git tree. +We wish to allow multiple vcpkg instances to read port data at a time, +and thus we do the check outs semi-atomically - if `git-trees/` exists, +then the `` must be completely checked out. +vcpkg does this by first checking out to a temporary directory, +and then renaming to the actual hash. + +## Future Extensions + +The way forward for this is to allow the `"builtin"` registry to be a git registry, +in order to support packaging and shipping vcpkg as a binary. +This is currently our plan, although it definitely is still a ways out. +Git registries _are_ an important step on that road, +but are also a good way to support both enterprise, +and experimentation by our users. +They allow us a lot more flexibility than we've had in the past. \ No newline at end of file