vcpkg/docs/specifications/registries.md
ras0219 75522bb1f2
[docs] Improve inter-document linking; add 'latest' links (#16502)
* [docs] Improve inter-document linking; add 'latest' links

* [docs] Apply suggestions from code review

Co-authored-by: nicole mazzuca <mazzucan@outlook.com>

Co-authored-by: nicole mazzuca <mazzucan@outlook.com>
Co-authored-by: Robert Schumacher <roschuma@microsoft.com>
2021-03-11 16:37:49 -08:00

13 KiB
Raw Blame History

Package Federation: Custom Registries

Note: this is the feature as it was initially specified and does not necessarily reflect the current behavior.

As it is now, vcpkg has over 1400 ports in the default registry (the /ports directory). For the majority of users, this repository of packages is enough. However, many enterprises need to more closely control their dependencies for one reason or another, and this document lays out a method which we will build into vcpkg for exactly that reason.

Background

A registry is simply a set of packages. In fact, there is already a registry in vcpkg: the default one. Package federation, implemented via custom registries, allows one to add new packages, edit existing packages, and have as much or as little control as one likes over the dependencies that one uses. It gives the control over dependencies that an enterprise requires.

How Does the Current Default Registry Work?

Of course, the existing vcpkg tool does have packages in the official, default registry. The way we describe these packages is in the ports tree at the base of the vcpkg install directory, there is a directory named ports, which contains on the order of 1300 directories, one for each package. Then, in each package directory, there are at least two files: a CONTROL or vcpkg.json file, which contains the name, version, description, and features of the package; and a portfile.cmake file which contains the information on how to download and build the package. There may be other files in this registry, like patches or usage instructions, but only those two files are needed.

Existing vcpkg Registry-like Features

There are some existing features in vcpkg that act somewhat like a custom registry. The most obvious feature that we have is overlay ports this feature allows you to specify any number of directories as "overlays", which either contain a package definition directly, or which contain some number of package directories; these overlays will be used instead of the ports tree for packages that exist in both places, and are specified exclusively on the command line. Additionally, unfortunately, if one installs a package from overlay ports that does not exist in the ports tree, one must pass these overlays to every vcpkg installation command.

There is also the less obvious "feature" which works by virtue of the ports tree being user-editable: one can always edit the ports tree on their own machine, and can even fork vcpkg and publish their own ports tree. Unfortunately, this then means that any updates to the source tree require merges, as opposed to being able to fast-forward to the newest sources.

Why Registries?

There are many reasons to want custom registries; however, the most important reasons are:

  • Legal requirements a company like Microsoft or Google needs the ability to strictly control the code that goes into their products, making certain that they are following the licenses strictly.
    • There have been examples in the past where a library which is licensed under certain terms contains code which is not legally allowed to be licensed under those terms (see this example, where a person tried to merge Microsoft-owned, Apache-licensed code into the GPL-licensed libstdc++).
  • Technical requirements a company may wish to run their own tests on the packages they ship, such as fuzzing.
  • Other requirements an organization may wish to strictly control its dependencies for a myriad of other reasons.
  • Newer versions vcpkg may not necessarily always be up to date for all libraries in our registry, and an organization may require a newer version than we ship; they can very easily update this package and have the version that they want.
  • Port modifications vcpkg has somewhat strict policies on port modifications, and an organization may wish to make different modifications than we do. It may allow that organization to make certain that the package works on triplets that our team does not test as extensively.
  • Testing just like port modifications, if a team wants to do specific testing on triplets they care about, they can do so via their custom registry.

Then, there is the question of why vcpkg needs a new solution for custom registries, beyond the existing overlay ports feature. There are two big reasons the first is to allow a project to define the registries that they use for their dependencies, and the second is the clear advantage in the user experience of the vcpkg tool. If a project requires specific packages to come from specific registries, they can do so without worrying that a user accidentally misses the overlay ports part of a command. Additionally, beyond a feature which makes overlay ports easier to use, custom registries allow for more complex and useful infrastructure around registries. In the initial custom registry implementation, we will allow overlay ports style paths, as well as git repositories, which means that people can run and use custom registries without writing their own infrastructure around getting people that registry.

It is the intention of vcpkg to be the most user-friendly package manager for C++, and this allows us to fulfill on that intention even further. As opposed to having to write --overlay-ports=path/to/overlay for every command one runs, or adding an environment variable VCPKG_OVERLAY_PORTS, one can simply write vcpkg install and the registries will be taken care of for you. As opposed to having to use git submodules, or custom registry code for every project, one can write and run the infrastructure in one place, and every project that uses that registry requires only a few lines of JSON.

Specification

We will be adding a new file that vcpkg understands - vcpkg-configuration.json. The way that vcpkg will find this file is different depending on what mode vcpkg is in: in classic mode, vcpkg finds this file alongside the vcpkg binary, in the root directory. In manifest mode, vcpkg finds this file alongside the manifest. For the initial implementation, this is all vcpkg will look for; however, in the future, vcpkg will walk the tree and include configuration all along the way: this allows for overriding defaults. The specific algorithm for applying this is not yet defined, since currently only one vcpkg-configuration.json is allowed.

The only thing allowed in a vcpkg-configuration.json is a <configuration> object.

A <configuration> is an object:

  • Optionally, "default-registry": A <registry-implementation> or null
  • Optionally, "registries": An array of <registry>s

Since this is the first RFC that adds anything to this field, as of now the only properties that can live in that object will be these.

A <registry-implementation> is an object matching one of the following:

  • <registry-implementation.builtin>:
    • "kind": The string "builtin"
  • <registry-implementation.directory>:
    • "kind": The string "directory"
    • "path": A path
  • <registry-implementation.git>:
    • "kind": The string "git"
    • "repository": A URI
    • Optionally, "path": An absolute path into the git repository
    • Optionally, "ref": A git reference

A <registry> is a <registry-implementation> object, plus the following properties:

  • Optionally, "scopes": An array of <package-name>s
  • Optionally, "packages": An array of <package-name>s

The "packages" and "scopes" fields of distinct registries must be disjoint, and each <registry> must have at least one of the "scopes" and "packages" property, since otherwise there's no point.

As an example, a package which uses a different default registry, and a different registry for boost, might look like the following:

{
  "default-registry": {
    "kind": "directory",
    "path": "vcpkg-ports"
  },
  "registries": [
    {
      "kind": "git",
      "repository": "https://github.com/boostorg/vcpkg-ports",
      "ref": "v1.73.0",
      "scopes": [ "boost" ]
    },
    {
      "kind": "builtin",
      "packages": [ "cppitertools" ]
    }
  ]
}

This will install fmt from <directory-of-vcpkg.json>/vcpkg-ports, cppitertools from the registry that ships with vcpkg, and any boost dependencies from https://github.com/boostorg/vcpkg-ports. Notably, this does not replace behavior up the tree -- only the vcpkg-configuration.jsons for the current invocation do anything.

Behavior

When a vcpkg command requires the installation of dependencies, it will generate the initial list of dependencies from the package, and then run the following algorithm on each dependency:

  1. Figure out which registry the package should come from by doing the following:
    1. If there is a registry in the registry set which contains the dependency name in the "packages" array, then use that registry.
    2. For every scope, in order from most specific to least, if there is a registry in the registry set which contains that scope in the "scopes" array, then use that registry. (For example, for "cat.meow.cute", check first for "cat.meow.cute", then "cat.meow", then "cat").
    3. If the default registry is not null, use that registry.
    4. Else, error.
  2. Then, add that package's dependencies to the list of packages to find, and repeat for the next dependency.

vcpkg will also rerun this algorithm whenever an install is run with different configuration.

How Registries are Layed Out

There are three kinds of registries, but they only differ in how the registry gets onto one's filesystem. Once the registry is there, package lookup runs the same, with each kind having it's own way of defining its own root.

In order to find a port meow in a registry with root R, vcpkg first sees if R/meow exists; if it does, then the port root is R/meow. Otherwise, see if R/m- exists; if it does, then the port root is R/m-/meow. (note: this algorithm may be extended further in the future).

For example, given the following port root:

R/
  abseil/...
  b-/
    boost/...
    boost-build/...
    banana/...
  banana/...

The port root for abseil is R/abseil; the port root for boost is R/b-/boost; the port root for banana is R/banana (although this duplication is not recommended).

The reason we are making this change to allow more levels in the ports tree is that ~1300 ports are hard to look through in a tree view, and this allows us to see only the ports we're interested in. Additionally, no port name may end in a -, so this means that these port subdirectories will never intersect with actual ports. Additionally, since we use only ASCII for port names, we don't have to worry about graphemes vs. code units vs. code points -- in ASCII, they are equivalent.

Let's now look at how different registry kinds work:

<registry.builtin>

For a <registry.builtin>, there is no configuration required. The registry root is simply <vcpkg-root>/ports.

<registry.directory>

For a <registry.directory>, it is again fairly simple. Given $path the value of the "path" property, the registry root is either:

  • If $path is absolute, then the registry root is $path.
  • If $path is drive-relative (only important on Windows), the registry root is (drive of vcpkg.json)/$path
  • If $path is relative, the registry root is (directory of vcpkg.json)/$path

Note that the path to vcpkg.json is not canonicalized; it is used exactly as it is seen by vcpkg.

<registry.git>

This registry is the most complex. We would like to cache existing registries, but we don't want to ignore new updates to the registry. It is the opinion of the author that we want to find more updates than not, so we will update the registry whenever the vcpkg.json or vcpkg-configuration.json is modified. We will do so by keeping a sha512 of the vcpkg.json and vcpkg-configuration.json inside the vcpkg-installed directory.

We will download the specific ref of the repository to a central location (and update as needed), and the root will be either: <path to repository>, if the "path" property is not defined, or else <path to repository>/<path property> if it is defined. The "path" property must be absolute, without a drive, and will be treated as relative to the path to the repository. For example:

{
  "kind": "git",
  "repository": "https://github.com/microsoft/vcpkg",
  "path": "/ports"
}

is the correct way to refer to the registry built in to vcpkg, at the latest version.

The following are all incorrect:

{
  "$reason": "path can't be drive-absolute",
  "kind": "git",
  "repository": "https://github.com/microsoft/vcpkg",
  "path": "F:/ports"
}
{
  "$reason": "path can't be relative",
  "kind": "git",
  "repository": "https://github.com/microsoft/vcpkg",
  "path": "ports"
}
{
  "$reason": "path _really_ can't be relative like that",
  "kind": "git",
  "repository": "https://github.com/microsoft/vcpkg",
  "path": "../../meow/ports"
}