* [vcpkg registries] Registries: Take 2 unfortunately without all the history because I really screwed up the git commands This RFC is the new state of the world with regard to our plan for registries; it is very different from the old RFC. It includes registry databases, which are how one describes the set of versions of ports that one has access to. It also includes the specification for git registries, and updates the specification for filesystem registries with these new registry databases. Since this RFC completely changes how registries work, the existing implementation will be broken. * update the registries RFC for modern design * remove bikeshedding
20 KiB
Registries: Take 2 (including Git Registries)
Originally, the design of registries was decided upon and written up in the Registries RFC. However, as we've gotten further into the design process of git registries and versioning, and discussed the interaction of versioning with registries, it's become clear that the existing design was lacking. We need to have an on-disk port database that is not tied to the ports tree.
This RFC is a new design for registries, that includes this registry database. It also includes the design for git registries, which are likely to be the predominant form of registries in the wild. They are also what we will start to treat the default registry as, to allow for updating ports without updating the vcpkg executable (likely necessary for binary releases).
Design Considerations
After internal discussions of the relationship between versioning and registries,
it was clear that the existing design of registries does not play well with versioning.
It was also clear that it was necessary to have metadata about ports in a separate place from the ports tree;
in fact, after discussion, it was clear that the ports tree should be considered an implementation detail;
a backing store for build process information (e.g., portfile.cmake
and the patches) and the manifest.
From this, it's clear that vcpkg needs to add a new set of metadata.
The versioning implementation has decided on port_versions
, and thus that's what this RFC uses.
Since we're replacing the existing ports directory with a new method of describing ports, this means that the ports directory is no longer anything but a data store. This also means that the existing rules around locations of ports is no longer required; however, it will still keep getting followed for the main repository, and it's recommended that other registries follow the same pattern to make contributing easier.
What does the registry database look like?
We don't wish to have the same problem as we do right now,
where there are nearly 1500 entries in a single directory.
We solve this by placing each database entry into port_versions/<first character of port name>-/<port name>.json
.
For example, the database entry for 7zip is in port_versions/7-/7zip.json
.
Each of these database entries contains all of the versions of the port throughout history, along with versioning and feature metadata, so that we do not have to check out old manifests or CONTROL files to get at that information.
Each database entry file must be a top-level array of port version objects, which contain the following entries:
- A version field:
"version-string"
,"version"
, etc. Same as in the manifest. - Optionally,
"port-version"
: Same as in the manifest.
And also contain a description of where to find the build files for this port; the possibilities include:
"git-tree"
: The git object ID of a tree object; this is only allowed for git registries. Note that this ID must be an ID from the repository where the registry is located."path"
: A path describing where to find the build files. The first entry in this path should be$
, which means "this path starts at the root of the registry". No other kinds of paths are allowed.- For example:
$/foo/bar
gives youfoo/bar
underneath the folder containing theport_versions
directory. /foo/bar
andfoo/bar
are both disallowed.
- For example:
Using a "git-tree"
as a backend in a non-git registry, and using a "path"
in a git registry,
is not permitted. Future extensions may include things like remote archives or git repositories,
or may allow "path"
in git registries.
Note that a registry entry should always be additive; deleting existing entries is unsupported and may result in bad behavior. The only modification to existing entries that is allowable is moving the backing store for the build files, assuming that the new build files are equivalent to the old build files. (For example, a filesystem registry might have a new way of laying out where ports are).
Additionally, we'd like a new way of describing the set of ports that make up a "baseline". This is currently done with the reference of the vcpkg git repository - each reference has a set of versions that are tested against each other, and this is a major feature of vcpkg. We wish to have the same feature in the new versioning world, and so we'll have a set of baseline versions in the registry database.
Baselines act differently between git registries or the builtin registry,
and in filesystem registries.
In git registries and the builtin registry,
since there's a history that one can access,
a baseline is the "default"
entry in the baseline at the reference specified.
In filesystem registries, since there is no accessible history,
the baseline identifiers are mapped directly to entries in the baseline file,
without translation; by default, the "default"
entry is used.
These baselines are placed in port_versions/baseline.json
.
This is an object mapping baseline names to baseline objects,
where baseline objects map port names to version objects.
A version object contains "baseline"
, which is un-schemed version,
and optionally "port-version"
.
Example of a baseline file
The following is a reasonable baseline.json for a filesystem registry that only has two ports:
{
"default": {
"abseil": { "baseline": "2020-03-03" },
"zlib": { "baseline": "1.2.11", "port-version": 9 }
},
"old": {
"abseil": { "baseline": "2019-02-11" },
"zlib": { "baseline": "1.2.11", "port-version": 3 }
},
"really-old": {
"zlib": { "baseline": "1.2.9" }
}
}
Example of a registry database entry file
Note: This file assumes that the versions RFC has been implemented, and thus that minimum versions are required; the syntax may change in the time between now and finishing the implementation.
This example is of ogre
, since this port has both features and dependencies;
remember that this file would be port_versions/o-/ogre.json
.
[
{
"version-string": "1.12.7",
"git-tree": "466e96fd2e17dd2453aa31dc0bc61bdcf53e7f61",
},
{
"version-string": "1.12.1",
"port-version": 1,
"git-tree": "0de81b4f7e0ec24966e929c2ea64e16c15e71d5e",
},
...
]
Filesystem Registry Databases
Filesystem registries are the simplest possible registry;
they have a port_versions
directory at the top-level, which contains the registry database.
It's expected that the filesystem registry would have a filesystem backing store:
something like the existing ports
directory, except with separate versions.
There won't be a specific way to lay the ports tree out as mandated by the tool,
as we are treating the ports tree as an implementation detail of the registry;
it's simply a way to get the files for a port.
As an example, let's assume that the registry is laid out something like this:
<registry>/
port_versions/
baseline.json
a-/
abseil.json
asmjit.json
o-/
ogre.json
ports/
a-/
abseil/
2020-03-03_7/
vcpkg.json
portfile.cmake
...
2020-03-03_8/
vcpkg.json
portfile.cmake
...
...
asmjit/
2020-05-08/
CONTROL
portfile.cmake
...
2020-07-22/
vcpkg.json
portfile.cmake
...
o-/
ogre/
1.12.7/
...
1.12.1/
...
...
...
Then, let's look at updating asmjit
to latest.
The current manifest file, in asmjit/2020-07-22/vcpkg.json
looks like:
{
"name": "asmjit",
"version-string": "2020-07-22",
"description": "Complete x86/x64 JIT and Remote Assembler for C++",
"homepage": "https://github.com/asmjit/asmjit",
"supports": "!arm"
}
while the current port_versions/a-/asmjit.json
looks like:
[
{
"version-string": "2020-07-22",
"path": "$/ports/a-/asmjit/2020-07-22"
},
{
"version-string": "2020-05-08",
"path": "$/ports/a-/asmjit/2020-05-08"
}
]
with port_versions/baseline.json
looking like:
{
"default": {
...,
"asmjit": { "baseline": "2020-07-22" },
...
}
}
and we'd like to update to 2020-10-08
.
We should first copy the existing implementation to a new folder:
$ cp -r ports/a-/asmjit/2020-07-22 ports/a-/asmjit/2020-10-08
then, we'll make the edits required to ports/a-/asmjit/2020-10-08
to update to latest.
We should then update port_versions/a-/asmjit.json
:
[
{
"version-string": "2020-10-08",
"path": "$/ports/a-/asmjit/2020-10-08"
},
{
"version-string": "2020-07-22",
"path": "$/ports/a-/asmjit/2020-07-22"
},
{
"version-string": "2020-05-08",
"path": "$/ports/a-/asmjit/2020-05-08"
}
]
and update port_versions/baseline.json
:
{
"default": {
...,
"asmjit": { "baseline": "2020-10-08" },
...
}
}
and we're done 😊.
Git Registry Databases
Git registries are not quite as simple as filesystem registries,
but they're still pretty simple, and are likely to be the most common:
the default registry is a git registry, for example.
There is not a specific way the tool requires one to lay out the backing store,
as long as it's possible to get an object hash that corresponds to a checked-in git tree
of the build information.
This allows, for example, the current vcpkg default registry way of laying out ports,
where the latest version of a port <P>
is at ports/<P>
,
and it also allows for any number of other designs.
One interesting design, for example,
is having an old-ports
branch which is updated whenever someone want to backfill versions;
then, one could push the old version to the old-ports
branch,
and then update the HEAD branch with the git tree of the old version in port_versions/p-/<P>
.
As above, we want to update asmjit
to latest; let's assume we're working in the default vcpkg registry
(the https://github.com/microsoft/vcpkg repository):
The current manifest file for asmjit
looks like:
{
"name": "asmjit",
"version-string": "2020-07-22",
"description": "Complete x86/x64 JIT and Remote Assembler for C++",
"homepage": "https://github.com/asmjit/asmjit",
"supports": "!arm"
}
and the current port_versions/a-/asmjit.json
looks like:
[
{
"version-string": "2020-07-22",
"git-tree": "fa0c36ba15b48959ab5a2df3463299e1d2473b6f"
}
]
Now, let's update it to the latest version:
{
"name": "asmjit",
"version-string": "2020-10-08",
"description": "Complete x86/x64 JIT and Remote Assembler for C++",
"homepage": "https://github.com/asmjit/asmjit",
"supports": "!arm"
}
and make the proper edits to the portfile.cmake. Then, let's commit the changes:
> git add ./ports/asmjit
> git commit -m "[asmjit] update asmjit to 2020-10-08"
In git-tree
mode, one needs to commit the new version of the port to get the git tree hash;
we use git rev-parse
to do so:
> git rev-parse HEAD:ports/asmjit
2bb51d8ec8b43bb9b21032185ca8123da10ecc6c
and then modify port_versions/a-/asmjit.json
as follows:
[
{
"version-string": "2020-10-08",
"git-tree": "2bb51d8ec8b43bb9b21032185ca8123da10ecc6c"
},
{
"version-string": "2020-07-22",
"git-tree": "fa0c36ba15b48959ab5a2df3463299e1d2473b6f"
}
]
Then we can commit and push this new database with:
$ git add port_versions
$ git commit --amend --no-edit
$ git push
Consuming Registries
The vcpkg-configuration.json
file from the first registries RFC
is still the same, except that the registries have a slightly different layout.
A <configuration>
is still an object with the following fields:
- Optionally,
"default-registry"
: A<registry-implementation>
ornull
- Optionally,
"registries"
: An array of<registry>
s
Additionally, <registry>
is still the same;
a <registry-implementation>
object, plus the following properties:
- Optionally,
"baseline"
: A named baseline. Defaults to"default"
. - Optionally,
"packages"
: An array of<package-name>
s
however, <registry-implementation>
s are now slightly different:
<registry-implementation.builtin>
:"kind"
: The string"builtin"
<registry-implementation.filesystem>
:"kind"
: The string"filesystem"
"path"
: A path
<registry-implementation.git>
:"kind"
: The string"git"
"repository"
: A URI
The "packages"
field of distinct registries must be disjoint,
and each <registry>
must have at the "packages"
property,
since otherwise there's no point.
As an example, a package which uses a different default registry, and a different registry for boost, might look like the following:
{
"default-registry": {
"kind": "filesystem",
"path": "vcpkg-ports"
},
"registries": [
{
"kind": "builtin",
"packages": [ "cppitertools" ]
}
]
}
This will install fmt
from <directory-of-vcpkg-configuration.json>/vcpkg-ports
,
and cppitertools
and the boost
ports from the registry that ships with vcpkg.
Notably, this does not replace behavior up the tree -- only the vcpkg-configuration.json
s
for the current invocation do anything.
Filesystem Registries
A filesystem registry takes on the form:
"kind"
: The string"filesystem"
"path"
: The path to the filesystem registry's root, i.e. the directory containing theport_versions
directory.
{
"kind": "filesystem",
"path": "vcpkg-registry"
}
Unlike git registries, where there's quite a bit of interesting stuff going on,
there isn't much stuff to do with filesystem registries.
We simply use the registry database at <registry root>/port_versions
to get information about ports.
Git Registries
A git registry takes on the form:
"kind"
: The string"git"
"repository"
: The URL at which the git repository lives. May be any kind of URL that git understands
{
"kind": "git",
"repository": "https://github.com/microsoft/vcpkg"
}
Whenever the first vcpkg command is run with a git registry,
vcpkg notes down the exact commit that HEAD points to at the time of the run in the vcpkg-lock.json
file.
This will be used as the commit which vcpkg takes the "default"
baseline from,
and vcpkg will only update that commit when vcpkg update
is run.
Since the "versions"
field is strictly additive, we don't consider older refs than HEAD
.
We update the repository on some reasonable clip.
Likely, whenever a command is run that will change the set of installed ports.
vcpkg-lock.json
This file will contain metadata that we need to save across runs, to allow us to keep a "state-of-the-world" that doesn't change unless one explicitly asks for it to change. This means that, even across different machines, the same registries will be used. We will also be able to write down version resolution in this file as soon as that feature is added.
It is recommended that one adds this vcpkg-lock.json
to one's version control.
This file is machine generated, and it is not specified how it's layed out;
however, for purposes of this RFC, we will define how it relates to git registries.
In vcpkg-lock.json
, in the top level object,
there will be a "registries"
property that is an object.
This object will contain a "git"
field, which is an array of git-registry objects,
that contain:
"repository"
: The"repository"
field from the git registry object"baseline"
: The name of the baseline that we've used"baseline-ref"
: The ref which we've gotten the specific baseline from.
For example, a vcpkg-lock.json
might look like:
{
"registries": {
"git": [
{
"repository": "https://github.com/microsoft/vcpkg",
"baseline": "default",
"baseline-ref": "6185aa76504a5025f36754324abf307cc776f3da"
}
]
}
}
vcpkg update
You'll notice that once the repository is added the first time,
there is only one way to update the repository to the tag at a later date - deleting the lock file.
We additionally want to add support for the user updating the registry by themselves -
they will be able to do this via the vcpkg update
command.
The vcpkg update
command will, for each git registry,
update the registry and repoint the "commit"
field in vcpkg-lock.json
to the latest HEAD
.
There is no way to update only one git registry to a later date, since versions are strictly additive.
Git Registries: Implementation on Disk
There are two implementations on disk to consider here: the implementation of the registry database, and once we have the database entries for the ports, accessing the port data from the git tree object.
Both of these implementations are placed in the vcpkg cache home (shared by binary caching archives).
On unix, this is located at $XDG_CACHE_HOME/vcpkg
if the environment variable exists,
otherwise $HOME/.cache/vcpkg
; on Windows, it's located at %LOCALAPPDATA%\vcpkg
.
In this document, we use the variable $CACHE_ROOT
to refer to this folder.
We will add a new folder, $CACHE_ROOT/registries
, which will contain all the data we need.
First, we'll discuss the registry database.
Registry Database
At $CACHE_ROOT/registries/git
,
we'll create a new git repository root which contains all information from all git registries,
since the hashes should be unique, and this allows for deduplication
across repositories which have the same commits (e.g., for mirrors).
In order to get the data from git registries, we simply fetch
the URL of the git registry.
In order to grab a specific database entry from a git registry, git show
is used to grab the
file from the right commit: git show <commit id> -- port_versions/<first character>-/<portname>.json
.
One unfortunate thing about having one directory being used for all vcpkg instances on a machine is
that it's possible to have an issue with concurrency - for example, after fetch
ing the latest HEAD
of https://github.com/microsoft/vcpkg
, another vcpkg process might fetch the latest HEAD of
https://github.com/meow/vcpkg
before the first vcpkg process has the chance to git rev-parse FETCH_HEAD
.
Since the first vcpkg process will run git rev-parse
after the second fetch is done,
instead of getting the HEAD
of microsoft/vcpkg
, they instead get the HEAD
of meow/vcpkg
.
We will solve this by having a mutex file in $CACHE_ROOT/registries/git
that vcpkg locks before any fetches (and unlocks after rev-parse
ing).
Accessing Port Data from git-tree
s
Once we've done version resolution and everything with the database,
we then need to access the port data from the git history.
We will add a new folder, $CACHE_ROOT/registries/git-trees
, into which we'll check out the port data.
In this git-trees
directory, we will have all of the trees we check out, at their hashes.
For example, the asmjit port data from above will be located at
git-trees/2bb51d8ec8b43bb9b21032185ca8123da10ecc6c
.
We will add a mutex file in this git-trees
directory as well which is taken whenever
we are checking out a new git tree.
We wish to allow multiple vcpkg instances to read port data at a time,
and thus we do the check outs semi-atomically - if git-trees/<hash>
exists,
then the <hash>
must be completely checked out.
vcpkg does this by first checking out to a temporary directory,
and then renaming to the actual hash.
Future Extensions
The way forward for this is to allow the "builtin"
registry to be a git registry,
in order to support packaging and shipping vcpkg as a binary.
This is currently our plan, although it definitely is still a ways out.
Git registries are an important step on that road,
but are also a good way to support both enterprise,
and experimentation by our users.
They allow us a lot more flexibility than we've had in the past.