On Windows, connect() errors are only reported via exceptfds descriptor set
from select(). Previously exceptfds was set to NULL, and connect() errors
were not detected at all, so connects to closed ports were waiting till
a timeout occurred.
Since ongoing connect() means that there will be a write event active,
except descriptor set is copied from the write one. While it is possible
to construct except descriptor set as a concatenation of both read and write
descriptor sets, this looks unneeded.
With this change, connect() errors are properly detected now when using
select(). Note well that it is not possible to detect connect() errors with
WSAPoll() (see https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/).
WSAPoll() is only available with Windows Vista and newer (and only
available during compilation if _WIN32_WINNT >= 0x0600). To make
sure the code works with Windows XP, we do not redefine _WIN32_WINNT,
but instead load WSAPoll() dynamically if it is not available during
compilation.
Also, sockets are not guaranteed to be small integers on Windows.
So an index array is used instead of NGX_USE_FD_EVENT to map
events to connections.
Previously, the code incorrectly assumed "ngx_event_t *" elements
instead of "struct pollfd".
This is mostly cosmetic change, as this code is never called now.
Previously, when using proxy_upload_rate and proxy_download_rate, the buffer
size for reading from a socket could be reduced as a result of rate limiting.
For connection-oriented protocols this behavior is normal since unread data will
normally be read at the next iteration. But for datagram-oriented protocols
this is not the case, and unread part of the datagram is lost.
Now buffer size is not limited for datagrams. Rate limiting still works in this
case by delaying the next reading event.
A shared connection does not own its file descriptor, which means that
ngx_handle_read_event/ngx_handle_write_event calls should do nothing for it.
Currently the c->shared flag is checked in several places in the stream proxy
module prior to calling these functions. However it was not done everywhere.
Missing checks could lead to calling
ngx_handle_read_event/ngx_handle_write_event on shared connections.
The problem manifested itself when using proxy_upload_rate and resulted in
either duplicate file descriptor error (e.g. with epoll) or incorrect further
udp packet processing (e.g. with kqueue).
The fix is to set and reset the event active flag in a way that prevents
ngx_handle_read_event/ngx_handle_write_event from scheduling socket events.
Previous interface of ngx_open_dir() assumed that passed directory name
has a room for NGX_DIR_MASK at the end (NGX_DIR_MASK_LEN bytes). While all
direct users of ngx_dir_open() followed this interface, this also implied
similar requirements for indirect uses - in particular, via ngx_walk_tree().
Currently none of ngx_walk_tree() uses provides appropriate space, and
fixing this does not look like a right way to go. Instead, ngx_dir_open()
interface was changed to not require any additional space and use
appropriate allocations instead.
If SSL_write_early_data() returned SSL_ERROR_WANT_WRITE, stop further reading
using a newly introduced c->ssl->write_blocked flag, as otherwise this would
result in SSL error "ssl3_write_bytes:bad length". Eventually, normal reading
will be restored by read event posted from successful SSL_write_early_data().
While here, place "SSL_write_early_data: want write" debug on the path.
Previously, if an SRV record was successfully resolved, but all of its A
records failed to resolve, NXDOMAIN was returned to the caller, which is
considered a successful resolve rather than an error. This could result in
losing the result of a previous successful resolve by the caller.
Now NXDOMAIN is only returned if at least one A resolve completed with this
code. Otherwise the error state of the first A resolve is returned.
Previously, unnamed regex captures matched in the parent request, were not
available in a cloned subrequest. Now 3 fields related to unnamed captures
are copied to a cloned subrequest: r->ncaptures, r->captures and
r->captures_data. Since r->captures cannot be changed by either request after
creating a clone, a new flag r->realloc_captures is introduced to force
reallocation of r->captures.
The issue was reported as a proxy_cache_background_update misbehavior in
http://mailman.nginx.org/pipermail/nginx/2018-December/057251.html.
In the past, there were several security issues which resulted in
worker process memory disclosure due to buffers with negative size.
It looks reasonable to check for such buffers in various places,
much like we already check for zero size buffers.
While here, removed "#if 1 / #endif" around zero size buffer checks.
It looks highly unlikely that we'll disable these checks anytime soon.
On 32-bit platforms mp4->buffer_pos might overflow when a large
enough (close to 4 gigabytes) atom is being skipped, resulting in
incorrect memory addesses being read further in the code. In most
cases this results in harmless errors being logged, though may also
result in a segmentation fault if hitting unmapped pages.
To address this, ngx_mp4_atom_next() now only increments mp4->buffer_pos
up to mp4->buffer_end. This ensures that overflow cannot happen.
Variables now do not depend on presence of the HTTP status code in response.
If the corresponding event occurred, variables contain time between request
creation and the event, and "-" otherwise.
Previously, intermediate value of the $upstream_response_time variable held
unix timestamp.
The directive allows to drop binding between a client and existing UDP stream
session after receiving a specified number of packets. First packet from the
same client address and port will start a new session. Old session continues
to exist and will terminate at moment defined by configuration: either after
receiving the expected number of responses, or after timeout, as specified by
the "proxy_responses" and/or "proxy_timeout" directives.
By default, proxy_requests is zero (disabled).
An attack that continuously switches HTTP/2 connection between
idle and active states can result in excessive CPU usage.
This is because when a connection switches to the idle state,
all of its memory pool caches are freed.
This change limits the maximum allowed number of idle state
switches to 10 * http2_max_requests (i.e., 10000 by default).
This limits possible CPU usage in one connection, and also
imposes a limit on the maximum lifetime of a connection.
Initially reported by Gal Goldshtein from F5 Networks.
Fixed uncontrolled memory growth in case peer is flooding us with
some frames (e.g., SETTINGS and PING) and doesn't read data. Fix
is to limit the number of allocated control frames.
Previously there was no validation for the size of a 64-bit atom
in an mp4 file. This could lead to a CPU hog when the size is 0,
or various other problems due to integer underflow when calculating
atom data size, including segmentation fault or worker process
memory disclosure.
Size of a shared memory zones must be at least two pages - one page
for slab allocator internal data, and another page for actual allocations.
Using 8192 instead is wrong, as there are systems with page sizes other
than 4096.
Note well that two pages is usually too low as well. In particular, cache
is likely to use two allocations of different sizes for global structures,
and at least four pages will be needed to properly allocate cache nodes.
Except in a few very special cases, with keys zone of just two pages nginx
won't be able to start. Other uses of shared memory impose a limit
of 8 pages, which provides some room for global allocations. This patch
doesn't try to address this though.
Inspired by ticket #1665.
With maximum version explicitly set, TLSv1.3 will not be unexpectedly
enabled if nginx compiled with OpenSSL 1.1.0 (without TLSv1.3 support)
will be run with OpenSSL 1.1.1 (with TLSv1.3 support).
In e3ba4026c02d (1.15.4) nginx own renegotiation checks were disabled
if SSL_OP_NO_RENEGOTIATION is available. But since SSL_OP_NO_RENEGOTIATION
is only set on a connection, not in an SSL context, SSL_clear_option()
removed it as long as a matching virtual server was found. This resulted
in a segmentation fault similar to the one fixed in a6902a941279 (1.9.8),
affecting nginx built with OpenSSL 1.1.0h or higher.
To fix this, SSL_OP_NO_RENEGOTIATION is now explicitly set in
ngx_http_ssl_servername() after adjusting options. Additionally, instead
of c->ssl->renegotiation we now check c->ssl->handshaked, which seems
to be a more correct flag to test, and will prevent the segmentation fault
from happening even if SSL_OP_NO_RENEGOTIATION is not working.
The "no suitable signature algorithm" errors are reported by OpenSSL 1.1.1
when using TLSv1.3 if there are no shared signature algorithms. In
particular, this can happen if the client limits available signature
algorithms to something we don't have a certificate for, or to an empty
list. For example, the following command:
openssl s_client -connect 127.0.0.1:8443 -sigalgs rsa_pkcs1_sha1
will always result in the "no suitable signature algorithm" error
as the "rsa_pkcs1_sha1" algorithm refers solely to signatures which
appear in certificates and not defined for use in TLS 1.3 handshake
messages.
The SSL_R_NO_COMMON_SIGNATURE_ALGORITHMS error is what BoringSSL returns
in the same situation.
The "no suitable key share" errors are reported by OpenSSL 1.1.1 when
using TLSv1.3 if there are no shared groups (that is, elliptic curves).
In particular, it is easy enough to trigger by using only a single
curve in ssl_ecdh_curve:
ssl_ecdh_curve secp384r1;
and using a different curve in the client:
openssl s_client -connect 127.0.0.1:443 -curves prime256v1
On the client side it is seen as "sslv3 alert handshake failure",
"SSL alert number 40":
0:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:ssl/record/rec_layer_s3.c:1528:SSL alert number 40
It can be also triggered with default ssl_ecdh_curve by using a curve
which is not in the default list (X25519, prime256v1, X448, secp521r1,
secp384r1):
openssl s_client -connect 127.0.0.1:8443 -curves brainpoolP512r1
Given that many clients hardcode prime256v1, these errors might become
a common problem with TLSv1.3 if ssl_ecdh_curve is redefined. Previously
this resulted in not using ECDH with such clients, but with TLSv1.3 it
is no longer possible and will result in a handshake failure.
The SSL_R_NO_SHARED_GROUP error is what BoringSSL returns in the same
situation.
Seen at:
https://serverfault.com/questions/932102/nginx-ssl-handshake-error-no-suitable-key-share
Previously, configurations with typo, for example
fastcgi_cache_valid 200301 302 5m;
successfully pass configuration test. Adding check for status
codes > 599, and such configurations are now properly rejected.
The bgcolor attribute overrides compatibility settings in browsers
and leads to undesirable behavior when the default font color is set
to white in the browser, since font-color is not also overridden.
Following 7319:dcab86115261, as long as SSL_OP_NO_RENEGOTIATION is
defined, it is OpenSSL library responsibility to prevent renegotiation,
so the checks are meaningless.
Additionally, with TLSv1.3 OpenSSL tends to report SSL_CB_HANDSHAKE_START
at various unexpected moments - notably, on KeyUpdate messages and
when sending tickets. This change prevents unexpected connection
close on KeyUpdate messages and when finishing handshake with upcoming
early data changes.
Trying to look into r->err_status in the "return" directive
makes it behave differently than real errors generated in other
parts of the code, and is an endless source of various problems.
This behaviour was introduced in 726:7b71936d5299 (0.4.4) with
the comment "fix: "return" always overrode "error_page" response code".
It is not clear if there were any real cases this was expected to fix,
but there are several cases which are broken due to this change, some
previously fixed (4147:7f64de1cc2c0).
In ticket #1634, the problem is that when r->err_status is set to
a non-special status code, it is not possible to return a response
by simply returning r->err_status. If this is the case, the only
option is to return script's e->status instead. An example
configuration:
location / {
error_page 404 =200 /err502;
return 404;
}
location = /err502 {
return 502;
}
After the change, such a configuration will properly return
standard 502 error, much like it happens when a 502 error is
generated by proxy_pass.
This also fixes the following configuration to properly close
connection as clearly requested by "return 444":
location / {
error_page 404 /close;
return 404;
}
location = /close {
return 444;
}
Previously, this required "error_page 404 = /close;" to work
as intended.
Socket leak was observed in the following configuration:
error_page 400 = /close;
location = /close {
return 444;
}
The problem is that "return 444" triggers termination of the request,
and due to error_page termination thinks that it needs to use a posted
request to clear stack. But at the early request processing where 400
errors are generated there are no ngx_http_run_posted_requests() calls,
so the request is only terminated after an external event.
Variants of the problem include "error_page 497" instead (ticket #695)
and various other errors generated during early request processing
(405, 414, 421, 494, 495, 496, 501, 505).
The same problem can be also triggered with "return 499" and "return 408"
as both codes trigger ngx_http_terminate_request(), much like "return 444".
To fix this, the patch adds ngx_http_run_posted_requests() calls to
ngx_http_process_request_line() and ngx_http_process_request_headers()
functions, and to ngx_http_v2_run_request() and ngx_http_v2_push_stream()
functions in HTTP/2.
Since the ngx_http_process_request() function is now only called via
other functions which call ngx_http_run_posted_requests(), the call
there is no longer needed and was removed.
It is possible that after SSL_read() will return SSL_ERROR_WANT_WRITE,
further calls will return SSL_ERROR_WANT_READ without reading any
application data. We have to call ngx_handle_write_event() and
switch back to normal write handling much like we do if there are some
application data, or the write there will be reported again and again.
Similarly, we have to switch back to normal read handling if there
is saved read handler and SSL_write() returns SSL_ERROR_WANT_WRITE.
While SSL_read() most likely to return SSL_ERROR_WANT_WRITE (and SSL_write()
accordingly SSL_ERROR_WANT_READ) during an SSL renegotiation, it is
not necessary mean that a renegotiation was started. In particular,
it can never happen during a renegotiation or can happen multiple times
during a renegotiation.
Because of the above, misleading "peer started SSL renegotiation" info
messages were replaced with "SSL_read: want write" and "SSL_write: want read"
debug ones.
Additionally, "SSL write handler" and "SSL read handler" are now logged
by the SSL write and read handlers, to make it easier to understand that
temporary SSL handlers are called instead of normal handlers.
The "do { c->recv() } while (c->read->ready)" form used in the
ngx_http_lingering_close_handler() is not really correct, as for
example with SSL c->read->ready may be still set when returning NGX_AGAIN
due to SSL_ERROR_WANT_WRITE. Therefore the above might be an infinite loop.
This doesn't really matter in lingering close, as we shutdown write side
of the socket anyway and also disable renegotiation (and even without shutdown
and with renegotiation it requires using very large certificate chain and
tuning socket buffers to trigger SSL_ERROR_WANT_WRITE). But for the sake of
correctness added an NGX_AGAIN check.
If sending request body was not completed (u->request_body_sent is not set),
the upstream keepalive module won't save such a connection. However, it
is theoretically possible (though highly unlikely) that sending of some
control frames can be blocked after the request body was sent. The
ctx->output_blocked flag introduced to disable keepalive in such cases.
The code is now able to parse additional control frames after
the response is received, and can send control frames as well.
This fixes keepalive problems as observed with grpc-c, which can
send window update and ping frames after the response, see
http://mailman.nginx.org/pipermail/nginx/2018-August/056620.html.
Previously the preread phase code ignored NGX_AGAIN value returned from
c->recv() and relied only on c->read->ready. But this flag is not reliable and
should only be checked for optimization purposes. For example, when using
SSL, c->read->ready may be set when no input is available. This can lead to
calling preread handler infinitely in a loop.
The problem does not manifest itself currently, because in case of
non-buffered reading, chain link created by u->create_request method
consists of a single element.
Found by PVS-Studio.
The directive configures maximum number of requests allowed on
a connection kept in the cache. Once a connection reaches the number
of requests configured, it is no longer saved to the cache.
The default is 100.
Much like keepalive_requests for client connections, this is mostly
a safeguard to make sure connections are closed periodically and the
memory allocated from the connection pool is freed.
The directive configures maximum time a connection can be kept in the
cache. By configuring a time which is smaller than the corresponding
timeout on the backend side one can avoid the race between closing
a connection by the backend and nginx trying to use the same connection
to send a request at the same time.
LibreSSL 2.8.0 "added const annotations to many existing APIs from OpenSSL,
making interoperability easier for downstream applications". This includes
the const change in the SSL_CTX_sess_set_get_cb() callback function (see
9dd43f4ef67e), which breaks compilation.
To fix this, added a condition on how we redefine OPENSSL_VERSION_NUMBER
when working with LibreSSL (see 382fc7069e3a). With LibreSSL 2.8.0,
we now set OPENSSL_VERSION_NUMBER to 0x1010000fL (OpenSSL 1.1.0), so the
appropriate conditions in the code will use "const" as it happens with
OpenSSL 1.1.0 and later versions.
There are clients which cannot handle HPACK's dynamic table size updates
as added in 12cadc4669a7 (1.13.6). Notably, old versions of OkHttp library
are known to fail on it (ticket #1397).
This change makes it possible to work with such clients by only sending
dynamic table size updates in response to SETTINGS_HEADER_TABLE_SIZE. As
a downside, clients which do not use SETTINGS_HEADER_TABLE_SIZE will
continue to maintain default 4k table.
Previously, a chunk of spaces larger than NGX_CONF_BUFFER (4096 bytes)
resulted in the "too long parameter" error during parsing such a
configuration. This was because the code only set start and start_line
on non-whitespace characters, and hence adjacent whitespace characters
were preserved when reading additional data from the configuration file.
Fix is to always move start and start_line if the last character was
a space.
Early data AKA 0-RTT mode is enabled as long as "ssl_early_data on" is
specified in the configuration (default is off).
The $ssl_early_data variable evaluates to "1" if the SSL handshake
isn't yet completed, and can be used to set the Early-Data header as
per draft-ietf-httpbis-replay-04.
BoringSSL currently requires SSL_CTX_set_max_proto_version(TLS1_3_VERSION)
to be able to enable TLS 1.3. This is because by default max protocol
version is set to TLS 1.2, and the SSL_OP_NO_* options are merely used
as a blacklist within the version range specified using the
SSL_CTX_set_min_proto_version() and SSL_CTX_set_max_proto_version()
functions.
With this change, we now call SSL_CTX_set_max_proto_version() with an
explicit maximum version set. This enables TLS 1.3 with BoringSSL.
As a side effect, this change also limits maximum protocol version to
the newest protocol we know about, TLS 1.3. This seems to be a good
change, as enabling unknown protocols might have unexpected results.
Additionally, we now explicitly call SSL_CTX_set_min_proto_version()
with 0. This is expected to help with Debian system-wide default
of MinProtocol set to TLSv1.2, see
http://mailman.nginx.org/pipermail/nginx-ru/2017-October/060411.html.
Note that there is no SSL_CTX_set_min_proto_version macro in BoringSSL,
so we call SSL_CTX_set_min_proto_version() and SSL_CTX_set_max_proto_version()
as long as the TLS1_3_VERSION macro is defined.
The behaviour is now in line with COPY of a directory with contents,
which preserves access masks on individual files, as well as the "cp"
command.
Requested by Roman Arutyunyan.
This fixes wrong permissions and file time after cross-device MOVE
in the DAV module (ticket #1577). Broken in 8101d9101ed8 (0.8.9) when
cross-device copying was introduced in ngx_ext_rename_file().
With this change, ngx_copy_file() always calls ngx_set_file_time(),
either with the time provided, or with the time from the original file.
This is considered acceptable given that copying the file is costly anyway,
and optimizing cases when we do not need to preserve time will require
interface changes.
Previously, ngx_open_file(NGX_FILE_CREATE_OR_OPEN) was used, resulting
in destination file being partially rewritten if exists. Notably,
this affected WebDAV COPY command (ticket #1576).
Previously, "%uA" was used, which corresponds to ngx_atomic_uint_t.
Size of ngx_atomic_uint_t can be easily different from uint64_t,
leading to undefined results.
In TLSv1.3, NewSessionTicket messages arrive after the handshake and
can come at any time. Therefore we use a callback to save the session
when we know about it. This approach works for < TLSv1.3 as well.
The callback function is set once per location on merge phase.
Since SSL_get_session() in BoringSSL returns an unresumable session for
TLSv1.3, peer save_session() methods have been updated as well to use a
session supplied within the callback. To preserve API, the session is
cached in c->ssl->session. It is preferably accessed in save_session()
methods by ngx_ssl_get_session() and ngx_ssl_get0_session() wrappers.
In OpenSSL 1.1.0 the SSL_CTRL_CLEAR_OPTIONS macro was removed, so
conditional compilation test on it results in SSL_clear_options()
and SSL_CTX_clear_options() not being used. Notably, this caused
"ssl_prefer_server_ciphers off" to not work in SNI-based virtual
servers if server preference was switched on in the default server.
It looks like the only possible fix is to test OPENSSL_VERSION_NUMBER
explicitly.
Starting with OpenSSL 1.1.0, SSL_R_UNSUPPORTED_PROTOCOL instead of
SSL_R_UNKNOWN_PROTOCOL is reported when a protocol is disabled via
an SSL_OP_NO_* option.
Additionally, SSL_R_VERSION_TOO_LOW is reported when using MinProtocol
or when seclevel checks (as set by @SECLEVEL=n in the cipher string)
rejects a protocol, and this is what happens with SSLv3 and @SECLEVEL=1,
which is the default.
There is also the SSL_R_VERSION_TOO_HIGH error code, but it looks like
it is not possible to trigger it.
There should be at least one worker connection for each listening socket,
plus an additional connection for channel between worker and master,
or starting worker processes will fail.
Previously, listenings sockets were not cloned if the worker_processes
directive was specified after "listen ... reuseport".
This also simplifies upcoming configuration check on the number
of worker connections, as it needs to know the number of listening
sockets before cloning.
The variable keeps the latest SSL protocol version supported by the client.
The variable has the same format as $ssl_protocol.
The version is read from the client_version field of ClientHello. If the
supported_versions extension is present in the ClientHello, then the version
is set to TLSv1.3.
Errors when sending UDP datagrams can happen, e.g., when local IP address
changes (see fa0e093b64d7), or an unavailable DNS server on the LAN can cause
send() to fail with EHOSTDOWN on BSD systems. If this happens during
initial query, retry sending immediately, to a different DNS server when
possible. If this is not enough, allow normal resend to happen by ignoring
the return code of the second ngx_resolver_send_query() call, much like we
do in ngx_resolver_resend().
The "http request" and "https proxy request" errors cannot happen
with HTTP due to pre-handshake checks in ngx_http_ssl_handshake(),
but can happen when SSL is used in stream and mail modules.
With gRPC it is possible that a request sending is blocked due to flow
control. Moreover, further sending might be only allowed once the
backend sees all the data we've already sent. With such a backend
it is required to clear the TCP_NOPUSH socket option to make sure all
the data we've sent are actually delivered to the backend.
As such, we now clear TCP_NOPUSH in ngx_http_upstream_send_request()
also on NGX_AGAIN if c->write->ready is set. This fixes a test (which
waits for all the 64k bytes as per initial window before allowing more
bytes) with sendfile enabled when the body was written to a file
in a different context.
Now tcp_nopush on peer connections is disabled if it is disabled on
the client connection, similar to how we handle c->sendfile. Previously,
tcp_nopush was always used on upstream connections, regardless of
the "tcp_nopush" directive.
We copy input buffers to our buffers, so various flags might be
unexpectedly set in buffers returned by ngx_chain_get_free_buf().
In particular, the b->in_file flag might be set when the body was
written to a file in a different context. With sendfile enabled this
in turn might result in protocol corruption if such a buffer was reused
for a control frame.
Make sure to clear buffers and set only fields we really need to be set.
The module implements random load-balancing algorithm with optional second
choice. In the latter case, the best of two servers is chosen, accounting
number of connections and server weight.
Example:
upstream u {
random [two [least_conn]];
server 127.0.0.1:8080;
server 127.0.0.1:8081;
server 127.0.0.1:8082;
server 127.0.0.1:8083;
}
Before 4a8c9139e579, ngx_resolver_create() didn't use configuration
pool, and allocations were done using malloc().
In 016352c19049, when resolver gained support of several servers,
new allocations were done from the pool.
With u->conf->preserve_output set the request body file might be used
after the response header is sent, so avoid cleaning it. (Normally
this is not a problem as u->conf->preserve_output is only set with
r->request_body_no_buffering, but the request body might be already
written to a file in a different context.)