On nginx reload or binary upgrade, an attempt is made to inherit listen sockets
from the previous configuration. Previously, no check for socket type was made
and the inherited socket could have the wrong type. On binary upgrade, socket
type was not detected at all. Wrong socket type could lead to errors on that
socket due to different logic and unsupported syscalls. For example, a UDP
socket, inherited as TCP, lead to the following error after arrival of a
datagram: "accept() failed (102: Operation not supported on socket)".
Iterating through all connections takes a lot of CPU time, especially
with large number of worker connections configured. As a result
nginx processes used to consume CPU time during graceful shutdown.
To mitigate this we now only do a full scan for idle connections when
shutdown signal is received.
Transitions of connections to idle ones are now expected to be
avoided if the ngx_exiting flag is set. The upstream keepalive module
was modified to follow this.
If nginx was used under OpenVZ and a container with nginx was suspended
and resumed, configuration tests started to fail because of EADDRINUSE
returned from listen() instead of bind():
# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] listen() to 0.0.0.0:80, backlog 511 failed (98: Address already in use)
nginx: configuration file /etc/nginx/nginx.conf test failed
With this change EADDRINUSE errors returned by listen() are handled
similarly to errors returned by bind(), and configuration tests work
fine in the same environment:
# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
More details about OpenVZ suspend/resume bug:
https://bugzilla.openvz.org/show_bug.cgi?id=2470
When configured, an individual listen socket on a given address is
created for each worker process. This allows to reduce in-kernel lock
contention on configurations with high accept rates, resulting in better
performance. As of now it works on Linux and DragonFly BSD.
Note that on Linux incoming connection requests are currently tied up
to a specific listen socket, and if some sockets are closed, connection
requests will be reset, see https://lwn.net/Articles/542629/. With
nginx, this may happen if the number of worker processes is reduced.
There is no such problem on DragonFly BSD.
Based on previous work by Sepherosa Ziehau and Yingqi Lu.
Previously, this function checked for connection local address existence
and returned error if it was missing. Now a new address is assigned in this
case making it possible to call this function not only for accepted connections.
In theory, this can provide a bit better distribution of latencies.
Also it simplifies the code, since ngx_queue_t is now used instead
of custom implementation.
This isn't really important as configuration testing shortly ends with
a process termination which will free all sockets, though Coverity
complains.
Prodded by Coverity (CID 400872).
Linux returns EOPNOTSUPP for non-TCP sockets and ENOPROTOOPT for TCP
sockets, because getsockopt(TCP_FASTOPEN) is not implemented so far.
While there, lower the log level from ALERT to NOTICE to match other
getsockopt() failures.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
Backed out 05a56ebb084a, as it turns out that kernel can return connections
without any delay if syncookies are used. This basically means we can't
assume anything about connections returned with deferred accept set.
To solve original problem the 05a56ebb084a tried to solve, i.e. to don't
wait longer than needed if a connection was accepted after deferred accept
timeout, this patch changes a timeout set with setsockopt(TCP_DEFER_ACCEPT)
to 1 second, unconditionally. This is believed to be enough for speed
improvements, and doesn't imply major changes to timeouts used.
Note that before 2.6.32 connections were dropped after a timeout. Though
it is believed that 1s is still appropriate for kernels before 2.6.32,
as previously tcp_synack_retries controlled the actual timeout and 1s results
in more than 1 minute actual timeout by default.
Recent Linux versions started to return EOPNOTSUPP to getsockopt() calls
on unix sockets, resulting in log pollution on binary upgrade. Such errors
are silently ignored now.
Several warnings silenced, notably (ngx_socket_t) -1 is now checked
on socket operations instead of -1, as ngx_socket_t is unsigned on win32
and gcc complains on comparison.
With this patch, it's now possible to compile nginx using mingw gcc,
with options we normally compile on win32.
The call to ngx_sock_ntop() in ngx_connection_local_sockaddr() might be
performed with the uninitialized "len" variable. The fix is to initialize
variable to the size of corresponding socket address type.
The issue was introduced in commit 05ba5bce31e0.
On Linux, sockaddr length is required to process unix socket addresses properly
due to unnamed sockets (which don't have sun_path set at all) and abstract
namespace sockets.
On Win32 platforms 0 is used to indicate errors in file operations, so
comparing against -1 is not portable.
This was not much of an issue in patched code, since only ngx_fd_info() test
is actually reachable on Win32 and in worst case it might result in bogus
error log entry.
Patch by Piotr Sikora.
And corresponding variable $connections_waiting was added.
Previously, waiting connections were counted as the difference between
active connections and the sum of reading and writing connections.
That made it impossible to count more than one request in one connection
as reading or writing (as is the case for SPDY).
Also, we no longer count connections in handshake state as waiting.
The c->single_connection was intended to be used as lock mechanism
to serialize modifications of request object from several threads
working with client and upstream connections. The flag is redundant
since threads in nginx have never been used that way.
There is a general consensus that this change results in better
consistency between different operating systems and differently
tuned operating systems.
Note: this changes the width and meaning of the ipv6only field
of the ngx_listening_t structure. 3rd party modules that create
their own listening sockets might need fixing.
NetBSD 5.0+ has SO_ACCEPTFILTER support merged from FreeBSD, and having
accept filter check in FreeBSD-specific ngx_freebsd_config.h prevents it
from being used on NetBSD. Therefore move the check into configure (and
do the same for Linux-specific TCP_DEFER_ACCEPT, just to be in line).