Previously, all upstream DNS entries would be immediately re-resolved
on config reload. With a large number of upstreams, this creates
a spike of DNS resolution requests. These spikes can overwhelm the
DNS server or cause drops on the network.
This patch retains the TTL of previous resolutions across reloads
by copying each upstream's name's expiry time across configuration
cycles. As a result, no additional resolutions are needed.
After configuration is reloaded, it may take some time for the
re-resolvable upstream servers to resolve and become available
as peers. During this time, client requests might get dropped.
Such servers are now pre-resolved using the "cache" of already
resolved peers from the old shared memory zone.
Specifying the upstream server by a hostname together with the
"resolve" parameter will make the hostname to be periodically
resolved, and upstream servers added/removed as necessary.
This requires a "resolver" at the "http" configuration block.
The "resolver_timeout" parameter also affects when the failed
DNS requests will be attempted again. Responses with NXDOMAIN
will be attempted again in 10 seconds.
Upstream has a configuration generation number that is incremented each
time servers are added/removed to the primary/backup list. This number
is remembered by the peer.init method, and if peer.get detects a change
in configuration, it returns NGX_BUSY.
Each server has a reference counter. It is incremented by peer.get and
decremented by peer.free. When a server is removed, it is removed from
the list of servers and is marked as "zombie". The memory allocated by
a zombie peer is freed only when its reference count becomes zero.
Co-authored-by: Roman Arutyunyan <arut@nginx.com>
Co-authored-by: Sergey Kandaurov <pluknet@nginx.com>
Co-authored-by: Vladimir Homutov <vl@nginx.com>
The shared objects should generally be allocated from shared memory.
While peers->name and the data it points to allocated from cf->pool
happened to work on UNIX, it broke on Windows. On UNIX this worked
only because the shared memory zone for upstreams is re-created for
every new configuration.
But on Windows, a worker process does not inherit the address space
of the master process, so the peers->name pointed to data allocated
from cf->pool by the master process, and was invalid.