In limit_req, auth_delay, and upstream code to check for broken
connections, tests for possible connection close by the client
did not work if the connection was already closed when relevant
event handler was set. This happened because there were no additional
events in case of edge-triggered event methods, and read events
were disabled in case of level-triggered ones.
Fix is to explicitly post a read event if the c->read->ready flag
is set.
For new data to be reported with eventport on Solaris,
ngx_handle_read_event() needs to be called after reading response
headers. To do so, ngx_http_upstream_process_non_buffered_upstream()
now called unconditionally if there are no prepread data. This
won't cause any read() syscalls as long as upstream connection
is not ready for reading (c->read->ready is not set), but will result
in proper handling of all events.
If we need to be notified about further events, ngx_handle_read_event()
needs to be called after a read event is processed. Without this,
an event can be removed from the kernel and won't be reported again,
notably when using oneshot event methods, such as eventport on Solaris.
While here, error handling is also added, similar to one present in
ngx_resolver_tcp_read(). This is not expected to make a difference
and mostly added for consistency.
If an attempt is made to delete an event which was already reported,
port_dissociate() returns an error. Fix is avoid doing anything if
ev->active is not set.
Possible alternative approach would be to avoid calling ngx_del_event()
at all if ev->active is not set. This approach, however, will require
something else to re-add the other event of the connection, since both
read and write events are dissociated if an event is reported on a file
descriptor. Currently ngx_eventport_del_event() re-associates write
event if called to delete read event, and vice versa.
If, at the start of an event loop iteration, there are any timers
in the past (including timers expiring now), the ngx_process_events()
function is called with zero timeout, and returns immediately even
if there are no events. But the following code only calls
ngx_event_expire_timers() if time actually changed, so this results
in nginx spinning in the event loop till current time changes.
While such timers are not expected to appear under normal conditions,
as all such timers should be removed on previous event loop iterations,
they still can appear due to bugs, zero timeouts set in the configuration
(if this is not explicitly handled by the code), or due to external
time changes on systems without clock_gettime(CLOCK_MONOTONIC).
Fix is to call ngx_event_expire_timers() unconditionally. Calling
it on each event loop iteration is not expected to be significant from
performance point of view, especially compared to a syscall in
ngx_process_events().
Without explicit handling, a zero timer was actually added, leading to
multiple unneeded syscalls. Further, sending GOAWAY frame early might
be beneficial for clients.
Reported by Sergey Kandaurov.
Unlike in 75e908236701, which added the logic to ngx_http_finalize_request(),
this change moves it to a more generic routine ngx_http_finalize_connection()
to cover cases when a request is finalized with NGX_DONE.
In particular, this fixes unwanted connection transition into the keepalive
state after receiving EOF while discarding request body. With edge-triggered
event methods that means the connection will last for extra seconds as set in
the keepalive_timeout directive.
The response size check introduced in 39501ce97e29 did not take into
account possible padding on DATA frames, resulting in incorrect
"upstream sent response body larger than indicated content length" errors
if upstream server used padding in responses with known length.
Fix is to check the actual size of response buffers produced by the code,
similarly to how it is done in other protocols, instead of checking
the size of DATA frames.
Reported at:
http://mailman.nginx.org/pipermail/nginx-devel/2021-March/013907.html
Activated with the "proxy_protocol" directive. Can be combined with
"listen ... proxy_protocol;" and "set_real_ip_from ...;" to pass
client address provided to nginx in the PROXY protocol header.
Activated with the "proxy_protocol" parameter of the "listen" directive.
Obtained information is passed to the auth_http script in Proxy-Protocol-Addr,
Proxy-Protocol-Port, Proxy-Protocol-Server-Addr, and Proxy-Protocol-Server-Port
headers.
Similarly to 40e8ce405859 in the stream module, this reduces the time
accept mutex is held. This also simplifies following changes to
introduce PROXY protocol support.
If we need to be notified about further events, ngx_handle_read_event()
needs to be called after a read event is processed. Without this,
an event can be removed from the kernel and won't be reported again,
notably when using oneshot event methods, such as eventport on Solaris.
For consistency, existing ngx_handle_read_event() call removed from
ngx_mail_read_command(), as this call only covers one of the code paths
where ngx_mail_read_command() returns NGX_AGAIN. Instead, appropriate
processing added to the callers, covering all code paths where NGX_AGAIN
is returned.
As long as a read event is blocked (ignored), ngx_handle_read_event()
needs to be called to make sure no further notifications will be
triggered when using level-triggered event methods, such as select() or
poll().
The "!rev->ready" test seems to be a typo, introduced in the original
commit (719:f30b1a75fd3b). The ngx_handle_write_event() code properly
tests for "rev->ready" instead.
Due to this typo, read events might be unexpectedly removed during
proxying after an event on the other part of the proxied connection.
Catched by mail proxying tests.
The strerrordesc_np() function, introduced in glibc 2.32, provides an
async-signal-safe way to obtain error messages. This makes it possible
to avoid copying error messages.
Previously, systems without sys_nerr (or _sys_nerr) were handled with an
assumption that errors start at 0 and continuous. This is, however, not
something POSIX requires, and not true on some platforms.
Notably, on Linux, where sys_nerr is no longer available for newly linked
binaries starting with glibc 2.32, there are gaps in error list, which
used to stop us from properly detecting maximum errno. Further, on
GNU/Hurd errors start at 0x40000001.
With this change, maximum errno detection is moved to the runtime code,
now able to ignore gaps, and also detects the first error if needed.
This fixes observed "Unknown error" messages as seen on Linux with
glibc 2.32 and on GNU/Hurd.
With this change, behaviour of HTTP/2 becomes even closer to HTTP/1.x,
and client_header_timeout instead of keepalive_timeout is used before
the first request is received.
This fixes HTTP/2 connections being closed even before the first request
if "keepalive_timeout 0;" was used in the configuration; the problem
appeared in f790816a0e87 (1.19.7).
Previously, PINGs and other frames extended possible keepalive time,
making it possible to keep an open HTTP/2 connection for a long time.
Now the connection is always closed as long as keepalive_timeout expires,
similarly to how it happens in HTTP/1.x.
Note that as a part of this change, incomplete frames are no longer
trigger a separate timeout, so http2_recv_timeout (replaced by
client_header_timeout in previous patches) is essentially cancelled.
The client_header_timeout is, however, used for SSL handshake and
while reading HEADERS frames.
Instead, keepalive_timeout and keepalive_requests are now used. This
is expected to simplify HTTP/2 code and usage. This also matches
directives used by upstream module for all protocols.
In case of default settings, this effectively changes maximum number
of requests per connection from 1000 to 100. This looks acceptable,
especially given that HTTP/2 code now properly supports lingering close.
Further, this changes default keepalive timeout in HTTP/2 from 300 seconds
to 75 seconds. This also looks acceptable, and larger than PING interval
used by Firefox (network.http.spdy.ping-threshold defaults to 58s),
the only browser to use PINGs.
Instead, the client_header_timeout is now used for HTTP/2 reading.
Further, the timeout is changed to be set once till no further data
left to read, similarly to how client_header_timeout is used in other
places.
New connections are marked reusable by ngx_http_init_connection() if there
are no data available for reading. As a result, if SSL is not used,
ngx_http_v2_init() might be called when the connection is marked reusable.
If a HEADERS frame is immediately available for reading, this resulted
in connection being preserved in reusable state with an active request,
and possibly closed later as if during worker shutdown (that is, after
all active requests were finalized).
Fix is to explicitly mark connections non-reusable in ngx_http_v2_init()
instead of (incorrectly) assuming they are already non-reusable.
Found by Sergey Kandaurov.
If ngx_drain_connections() fails to immediately reuse any connections
and there are no free connections, it now additionally tries to reuse
a connection again. This helps to provide at least one free connection
in case of HTTP/2 with lingering close, where merely trying to reuse
a connection once does not free it, but makes it reusable again,
waiting for lingering close.
This is particularly important in HTTP/2, where keepalive connections
are closed with lingering. Before the patch, reusing a keepalive HTTP/2
connection resulted in the connection waiting for lingering close to
remain in the reusable connections queue, preventing ngx_drain_connections()
from closing additional connections.
The patch fixes it by marking the connection reusable again, and so
moving it in the reusable connections queue. Further, it makes actually
possible to reuse such connections if needed.
This part somehow slipped away from c5840ca2063d.
While it is not expected to be needed in case of lingering close,
it is good to keep it for correctness (see 2b5528023f6b).
Keeping post_accept_timeout in ngx_listening_t is no longer needed since
we've switched to 1 second timeout for deferred accept in 5541:fdb67cfc957d.
Further, using it in HTTP code can result in client_header_timeout being
used from an incorrect server block, notably if address-specific virtual
servers are used along with a wildcard listening socket, or if we've switched
to a different server block based on SNI in SSL handshake.
The stub status module and ngx_http_send_response() (used by the empty gif
module and the "return" directive) incorrectly assumed that responding
to HEAD requests always results in r->header_only being set. This is not
true, and results in incorrect behaviour, for example, in the following
configuration:
location / {
image_filter size;
return 200 test;
}
Fix is to remove this incorrect micro-optimization from both stub status
module and ngx_http_send_response().
Reported by Chris Newton.
After 7675:9afa45068b8f and 7678:bffcc5af1d72 (1.19.1), during non-buffered
simple proxying, responses with extra data might result in zero size buffers
being generated and "zero size buf" alerts in writer. This bug is similar
to the one with FastCGI proxying fixed in 7689:da8d758aabeb.
In non-buffered mode, normally the filter function is not called if
u->length is already 0, since u->length is checked after each call of
the filter function. There is a case when this can happen though: if
the response length is 0, and there are pre-read response body data left
after reading response headers. As such, a check for u->length is needed
at the start of non-buffered filter functions, similar to the one
for p->length present in buffered filter functions.
Appropriate checks added to the existing non-buffered copy filters
in the upstream (used by scgi and uwsgi proxying) and proxy modules.
With introduction of open_file_cache in 1454:f497ed7682a7, opening a file
with ngx_open_cached_file() automatically adds a cleanup handler to close
the file. As such, calling ngx_close_file() directly for non-regular files
is no longer needed and will result in duplicate close() call.
In 1454:f497ed7682a7 ngx_close_file() call for non-regular files was removed
in the static module, but wasn't in the flv module. And the resulting
incorrect code was later copied to the mp4 module. Fix is to remove the
ngx_close_file() call from both modules.
Reported by Chris Newton.
The ngx_http_parse_complex_uri() function cannot make URI longer and does
not null-terminate URI, so there is no need to allocate an extra byte. This
allocation appears to be a leftover from changes in 461:a88a3e4e158f (0.1.5),
where null-termination of r->uri and many other strings was removed.
When the request line contains request-target in the absolute-URI form,
it can contain path-empty instead of a single slash (see RFC 7230, RFC 3986).
Previously, the ngx_http_parse_request_line() function only accepted empty
path when there was no query string.
With this change, non-empty query is also correctly handled. That is,
request line "GET http://example.com?foo HTTP/1.1" is accepted and results
in $uri "/" and $args "foo".
Note that $request_uri remains "?foo", similarly to how spaces in URIs
are handled. Providing "/?foo", similarly to how "/" is provided for
"GET http://example.com HTTP/1.1", requires allocation.
Previously, the number of next_upstream tries included servers marked
as "down", resulting in "no live upstreams" with the code 502 instead
of the code derived from an attempt to connect to the last tried "up"
server (ticket #2096).
Similarly to the problem fixed in 2096b21fcd10 (ticket #1792),
when a "trailer only" gRPC response (that is, a response with the
END_STREAM flag in the HEADERS frame) was immediately followed by
RST_STREAM(NO_ERROR) in the data preread along with the response
header, RST_STREAM wasn't properly skipped and caused "upstream
rejected request with error 0" errors.
Observed with "unknown service" gRPC errors returned by grpc-go.
Fix is to set ctx->done if we are going to parse additional data,
so the RST_STREAM(NO_ERROR) is properly skipped. Additionally, now
ngx_http_grpc_filter() will complain about frames sent for closed
stream if there are any.
When installing or running from a non-root user it is sometimes required to
override default, compiled in error log path. There was no way to do this
without rebuilding the binary (ticket #147).
This patch introduced "-e" command line option which allows one to override
compiled in error log path.
Before introduction of request body filter in 42d9beeb22db, the only
possible return code from the ngx_http_request_body_filter() call
without actual buffers was NGX_HTTP_INTERNAL_SERVER_ERROR, and
the code in ngx_http_read_client_request_body() hardcoded the only
possible error to simplify the code of initial call to set rb->rest.
This is no longer true after introduction of request body filters though,
as a request body filter might need to return other errors, such as 403.
Fix is to preserve the error code actually returned by the call
instead of assuming 500.
Added logging before returning NGX_HTTP_INTERNAL_SERVER_ERROR if there
are busy buffers after a request body flush. This should never happen
with current code, though bugs can be introduced by 3rd party modules.
Make sure debugging will be easy enough.
When doing lingering close, the socket was first shut down for writing,
so SSL shutdown initiated after lingering close was not able to send
the close_notify alerts (ticket #2056).
The fix is to call ngx_ssl_shutdown() before shutting down the socket.
Now "s", "V", and "v" format specifiers may be prefixed with "x" (lowercase)
or "X" (uppercase) to output corresponding data in hexadecimal format.
In collaboration with Maxim Dounin.
In some cases it might be needed to reject SSL handshake based on SNI
server name provided, for example, to make sure an invalid certificate
is not returned to clients trying to contact a name-based virtual server
without SSL configured. Previously, a "ssl_ciphers aNULL;" was used for
this. This workaround, however, is not compatible with TLSv1.3, in
particular, when using BoringSSL, where it is not possible to configure
TLSv1.3 ciphers at all.
With this change, the ssl_reject_handshake directive is introduced,
which instructs nginx to reject SSL handshakes with an "unrecognized_name"
alert in a particular server block.
For example, to reject handshake with names other than example.com,
one can use the following configuration:
server {
listen 443 ssl;
ssl_reject_handshake on;
}
server {
listen 443 ssl;
server_name example.com;
ssl_certificate example.com.crt;
ssl_certificate_key example.com.key;
}
The following configuration can be used to reject all SSL handshakes
without SNI server name provided:
server {
listen 443 ssl;
ssl_reject_handshake on;
}
server {
listen 443 ssl;
server_name ~^;
ssl_certificate example.crt;
ssl_certificate_key example.key;
}
Additionally, the ssl_reject_handshake directive makes configuring
certificates for the default server block optional. If no certificates
are configured in the default server for a given listening socket,
certificates must be defined in all non-default server blocks with
the listening socket in question.
Similarly to ssl_conf_command, proxy_ssl_conf_command can be used to
set arbitrary OpenSSL configuration parameters as long as nginx is
compiled with OpenSSL 1.0.2 or later, when connecting to upstream
servers with SSL. Full list of available configuration commands
can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
Similarly to ssl_conf_command, proxy_ssl_conf_command (grpc_ssl_conf_command,
uwsgi_ssl_conf_command) can be used to set arbitrary OpenSSL configuration
parameters as long as nginx is compiled with OpenSSL 1.0.2 or later,
when connecting to upstream servers with SSL. Full list of available
configuration commands can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
With the ssl_conf_command directive it is now possible to set
arbitrary OpenSSL configuration parameters as long as nginx is compiled
with OpenSSL 1.0.2 or later. Full list of available configuration
commands can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
In particular, this allows configuring PrioritizeChaCha option
(ticket #1445):
ssl_conf_command Options PrioritizeChaCha;
It can be also used to configure TLSv1.3 ciphers in OpenSSL,
which fails to configure them via the SSL_CTX_set_cipher_list()
interface (ticket #1529):
ssl_conf_command Ciphersuites TLS_CHACHA20_POLY1305_SHA256;
Configuration commands are applied after nginx own configuration
for SSL, so they can be used to override anything set by nginx.
Note though that configuring OpenSSL directly with ssl_conf_command
might result in a behaviour nginx does not expect, and should be
done with care.
With this change, it is now possible to use ngx_conf_merge_ptr_value()
to merge keyval arrays. This change actually follows much earlier
changes in ngx_conf_merge_ptr_value() and ngx_conf_set_str_array_slot()
in 1452:cd586e963db0 (0.6.10) and 1701:40d004d95d88 (0.6.22).
To preserve compatibility with existing 3rd party modules, both NULL
and NGX_CONF_UNSET_PTR are accepted for now.
Previously, if there were multiple limits configured, errors in
ngx_http_complex_value() during processing of a non-first limit
resulted in reference count leak in shared memory nodes of already
processed limits. Fix is to explicity unlock relevant nodes, much
like we do when rejecting requests.
The proxy_smtp_auth directive instructs nginx to authenticate users
on backend via the AUTH command (using the PLAIN SASL mechanism),
similar to what is normally done for IMAP and POP3.
If xclient is enabled along with proxy_smtp_auth, the XCLIENT command
won't try to send the LOGIN parameter.
In 7717:e3e8b8234f05, the 1st bit was incorrectly used. It shouldn't
be used for bitmask values, as it is used by NGX_CONF_BITMASK_SET.
Additionally, special value "off" added to make it possible to clear
inherited userid_flags value.
The "false" parameter of the proxy_redirect directive is deprecated.
Warning has been emitted since c2230102df6f (0.7.54).
The "off" parameter of the proxy_redirect, proxy_cookie_domain, and
proxy_cookie_path directives tells nginx not to inherit the
configuration from the previous configuration level.
Previously, after specifying the directive with the "off" parameter,
any other directives were ignored, and syntax checking was disabled.
The syntax was enforced to allow either one directive with the "off"
parameter, or several directives with other parameters.
Also, specifying "proxy_redirect default foo" no longer works like
"proxy_redirect default".
In rare cases, such as memory allocation failure, SSL_set_SSL_CTX() returns
NULL, which could mean that a different SSL configuration has not been set.
Note that this new behaviour seemingly originated in OpenSSL-1.1.0 release.
HTTP/2 code failed to run posted requests after calling the request body
handler, and this resulted in connection hang if a subrequest was created
in the body handler and no other actions were made.
If 400 errors were redirected to an upstream server using the error_page
directive, DATA frames from the client might cause segmentation fault
due to null pointer dereference. The bug had appeared in 6989:2c4dbcd6f2e4
(1.13.0).
Fix is to skip such frames in ngx_http_v2_state_read_data() (similarly
to 7561:9f1f9d6e056a). With the fix, behaviour of 400 errors in HTTP/2
is now similar to one in HTTP/1.x, that is, nginx doesn't try to read the
request body.
Note that proxying 400 errors, as well as other early stage errors, to
upstream servers might not be a good idea anyway. These errors imply
that reading and processing of the request (and the request headers)
wasn't complete, and proxying of such incomplete request might lead to
various errors.
Reported by Chenglong Zhang.
This fixes "SSL_shutdown() failed (SSL: ... bad write retry)" errors
as observed on the second SSL_shutdown() call after SSL shutdown fixes in
09fb2135a589 (1.19.2), notably when HTTP/2 connections are closed due
to read timeouts while there are incomplete writes.
This fixes "SSL_shutdown() failed (SSL: ... bad write retry)" errors
as observed on the second SSL_shutdown() call after SSL shutdown fixes in
09fb2135a589 (1.19.2), notably when sending fails in ngx_http_test_expect(),
similarly to ticket #1194.
Note that there are some places where c->error is misused to prevent
further output, such as ngx_http_v2_finalize_connection() if there
are pending streams, or in filter finalization. These places seem
to be extreme enough to don't care about missing shutdown though.
For example, filter finalization currently prevents keepalive from
being used.
The c->read->ready and c->write->ready flags need to be cleared to ensure
that appropriate read or write events will be reported by kernel. Without
this, SSL shutdown might wait till the timeout after blocking on writing
or reading even if there is a socket activity.
OpenSSL 1.1.1 fails to return SSL_ERROR_SYSCALL if an error happens
during SSL_write() after close_notify alert from the peer, and returns
SSL_ERROR_ZERO_RETURN instead. Broken by this commit, which removes
the "i == 0" check around the SSL_RECEIVED_SHUTDOWN one:
https://git.openssl.org/?p=openssl.git;a=commitdiff;h=8051ab2
In particular, if a client closed the connection without reading
the response but with properly sent close_notify alert, this resulted in
unexpected "SSL_write() failed while ..." critical log message instead
of correct "SSL_write() failed (32: Broken pipe)" at the info level.
Since SSL_ERROR_ZERO_RETURN cannot be legitimately returned after
SSL_write(), the fix is to convert all SSL_ERROR_ZERO_RETURN errors
after SSL_write() to SSL_ERROR_SYSCALL.
If the variant hash doesn't match one we used as a secondary cache key,
we switch back to the original key. In this case, c->body_start was kept
updated from an existing cache node overwriting the new response value.
After file cache update, it led to discrepancy between a cache node and
cache file seen as critical errors "file cache .. has too long header".
Previously, a variant not present in shared memory and stored on disk using a
secondary key was read using c->body_start from a variant stored with a main
key. This could result in critical errors "cache file .. has too long header".
Previously the stale-if-error extension of the Cache-Control upstream header
triggered the return of a stale response for all error conditions that can be
specified in the proxy_cache_use_stale directive. The list of these errors
includes both network/timeout/format errors, as well as some HTTP codes like
503, 504, 403, 429 etc. The latter prevented a cache entry from being updated
by a response with any of these HTTP codes during the stale-if-error period.
Now stale-if-error only works for network/timeout/format errors and ignores
the upstream HTTP code. The return of a stale response for certain HTTP codes
is still possible using the proxy_cache_use_stale directive.
This change also applies to the stale-while-revalidate extension of the
Cache-Control header, which triggers stale-if-error if it is missing.
Reported at
http://mailman.nginx.org/pipermail/nginx/2020-July/059723.html.
Reworked connections reuse, so closing connections is attempted in
advance, as long as number of free connections is less than 1/16 of
worker connections configured. This ensures that new connections can
be handled even if closing a reusable connection requires some time,
for example, for a lingering close (ticket #2017).
The 1/16 ratio is selected to be smaller than 1/8 used for disabling
accept when working with accept mutex, so nginx will try to balance
new connections to different workers first, and will start reusing
connections only if this won't help.
Previously, reusing connections happened silently and was only
visible in monitoring systems. This was shown to be not very user-friendly,
and administrators often didn't realize there were too few connections
available to withstand the load, and configured timeouts (keepalive_timeout
and http2_idle_timeout) were effectively reduced to keep things running.
To provide at least some information about this, a warning is now logged
(at most once per second, to avoid flooding the logs).
Sending shutdown when ngx_http_test_reading() detects the connection is
closed can result in "SSL_shutdown() failed (SSL: ... bad write retry)"
critical log messages if there are blocked writes.
Fix is to avoid sending shutdown via the c->ssl->no_send_shutdown flag,
similarly to how it is done in ngx_http_keepalive_handler() for kqueue
when pending EOF is detected.
Reported by Jan Prachař
(http://mailman.nginx.org/pipermail/nginx-devel/2018-December/011702.html).
Without the flag, SSL shutdown is attempted on such connections,
resulting in useless work and/or bogus "SSL_shutdown() failed
(SSL: ... bad write retry)" critical log messages if there are
blocked writes.
Previously, bidirectional shutdown never worked, due to two issues
in the code:
1. The code only tested SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE
when there was an error in the error queue, which cannot happen.
The bug was introduced in an attempt to fix unexpected error logging
as reported with OpenSSL 0.9.8g
(http://mailman.nginx.org/pipermail/nginx/2008-January/003084.html).
2. The code never called SSL_shutdown() for the second time to wait for
the peer's close_notify alert.
This change fixes both issues.
Note that after this change bidirectional shutdown is expected to work for
the first time, so c->ssl->no_wait_shutdown now makes a difference. This
is not a problem for HTTP code which always uses c->ssl->no_wait_shutdown,
but might be a problem for stream and mail code, as well as 3rd party
modules.
To minimize the effect of the change, the timeout, which was used to be 30
seconds and not configurable, though never actually used, is now set to
3 seconds. It is also expanded to apply to both SSL_ERROR_WANT_READ and
SSL_ERROR_WANT_WRITE, so timeout is properly set if writing to the socket
buffer is not possible.
If some additional data from a pipelined request happens to be
read into the body buffer, we copy it to r->header_in or allocate
an additional large client header buffer for it.
This ensures that copying won't write more than the buffer size
even if the buffer comes from hc->free and it is smaller than the large
client header buffer size in the virtual host configuration. This might
happen if size of large client header buffers is different in name-based
virtual hosts, similarly to the problem with number of buffers fixed
in 6926:e662cbf1b932.
After 05e42236e95b (1.19.1) responses with extra data might result in
zero size buffers being generated and "zero size buf" alerts in writer
(if f->rest happened to be 0 when processing additional stdout data).
Previously, the document generated by the xslt filter was always fully sent
to client even if a range was requested and response status was 206 with
appropriate Content-Range.
The xslt module is unable to serve a range because of suspending the header
filter chain. By the moment full response xml is buffered by the xslt filter,
range header filter is not called yet, but the range body filter has already
been called and did nothing.
The fix is to disable ranges by resetting the r->allow_ranges flag much like
the image filter that employs a similar technique.
The slice filter allows ranges for the response by setting the r->allow_ranges
flag, which enables the range filter. If the range was not requested, the
range filter adds an Accept-Ranges header to the response to signal the
support for ranges.
Previously, if an Accept-Ranges header was already present in the first slice
response, client received two copies of this header. Now, the slice filter
removes the Accept-Ranges header from the response prior to setting the
r->allow_ranges flag.
As long as the "Content-Length" header is given, we now make sure
it exactly matches the size of the response. If it doesn't,
the response is considered malformed and must not be forwarded
(https://tools.ietf.org/html/rfc7540#section-8.1.2.6). While it
is not really possible to "not forward" the response which is already
being forwarded, we generate an error instead, which is the closest
equivalent.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Also this
directly contradicts HTTP/2 specification requirements.
Note that the new behaviour for the gRPC proxy is more strict than that
applied in other variants of proxying. This is intentional, as HTTP/2
specification requires us to do so, while in other types of proxying
malformed responses from backends are well known and historically
tolerated.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
Additionally, we now also issue a warning if the response is too
short, and make sure the fact it is truncated is propagated to the
client. The u->error flag is introduced to make it possible to
propagate the error to the client in case of unbuffered proxying.
For responses to HEAD requests there is an exception: we do allow
both responses without body and responses with body matching the
Content-Length header.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
This change covers generic buffered and unbuffered filters as used
in the scgi and uwsgi modules. Appropriate input filter init
handlers are provided by the scgi and uwsgi modules to set corresponding
lengths.
Note that for responses to HEAD requests there is an exception:
we do allow any response length. This is because responses to HEAD
requests might be actual full responses, and it is up to nginx
to remove the response body. If caching is enabled, only full
responses matching the Content-Length header will be cached
(see b779728b180c).
Previously, additional data after final chunk was either ignored
(in the same buffer, or during unbuffered proxying) or sent to the
client (in the next buffer already if it was already read from the
socket). Now additional data are properly detected and ignored
in all cases. Additionally, a warning is now logged and keepalive
is disabled in the connection.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
If a memcached response was followed by a correct trailer, and then
the NUL character followed by some extra data - this was accepted by
the trailer checking code. This in turn resulted in ctx->rest underflow
and caused negative size buffer on the next reading from the upstream,
followed by the "negative size buf in writer" alert.
Fix is to always check for too long responses, so a correct trailer cannot
be followed by extra data.
After sending the GOAWAY frame, a connection is now closed using
the lingering close mechanism.
This allows for the reliable delivery of the GOAWAY frames, while
also fixing connection resets observed when http2_max_requests is
reached (ticket #1250), or with graceful shutdown (ticket #1544),
when some additional data from the client is received on a fully
closed connection.
For HTTP/2, the settings lingering_close, lingering_timeout, and
lingering_time are taken from the "server" level.
Using SSL_CTX_set_verify(SSL_VERIFY_PEER) implies that OpenSSL will
send a certificate request during an SSL handshake, leading to unexpected
certificate requests from browsers as long as there are any client
certificates installed. Given that ngx_ssl_trusted_certificate()
is called unconditionally by the ngx_http_ssl_module, this affected
all HTTPS servers. Broken by 699f6e55bbb4 (not released yet).
Fix is to set verify callback in the ngx_ssl_trusted_certificate() function
without changing the verify mode.
Clearing cache based on free space left on a file system is
expected to allow better disk utilization in some cases, notably
when disk space might be also used for something other than nginx
cache (including nginx own temporary files) and while loading
cache (when cache size might be inaccurate for a while, effectively
disabling max_size cache clearing).
Based on a patch by Adam Bambuch.
With XFS, using "allocsize=64m" mount option results in large preallocation
being reported in the st_blocks as returned by fstat() till the file is
closed. This in turn results in incorrect cache size calculations and
wrong clearing based on max_size.
To avoid too aggressive cache clearing on such volumes, st_blocks values
which result in sizes larger than st_size and eight blocks (an arbitrary
limit) are no longer trusted, and we use st_size instead.
The ngx_de_fs_size() counterpart is intentionally not modified, as
it is used on closed files and hence not affected by this problem.
NFS on Linux is known to report wsize as a block size (in both f_bsize
and f_frsize, both in statfs() and statvfs()). On the other hand,
typical file system block sizes on Linux (ext2/ext3/ext4, XFS) are limited
to pagesize. (With FAT, block sizes can be at least up to 512k in
extreme cases, but this doesn't really matter, see below.)
To avoid too aggressive cache clearing on NFS volumes on Linux, block
sizes larger than pagesize are now ignored.
Note that it is safe to ignore large block sizes. Since 3899:e7cd13b7f759
(1.0.1) cache size is calculated based on fstat() st_blocks, and rounding
to file system block size is preserved mostly for Windows.
Note well that on other OSes valid block sizes seen are at least up
to 65536. In particular, UFS on FreeBSD is known to work well with block
and fragment sizes set to 65536.
When validating second and further certificates, ssl callback could be called
twice to report the error. After the first call client connection is
terminated and its memory is released. Prior to the second call and in it
released connection memory is accessed.
Errors triggering this behavior:
- failure to create the request
- failure to start resolving OCSP responder name
- failure to start connecting to the OCSP responder
The fix is to rearrange the code to eliminate the second call.
The flush flag was not set when forwarding the request body to the uwsgi
server. When using uwsgi_pass suwsgi://..., this causes the uwsgi server
to wait indefinitely for the request body and eventually time out due to
SSL buffering.
This is essentially the same change as 4009:3183165283cc, which was made
to ngx_http_proxy_module.c.
This will fix the uwsgi bug https://github.com/unbit/uwsgi/issues/1490.
This ensures that certificate verification is properly logged to debug
log during upstream server certificate verification. This should help
with debugging various certificate issues.
Listening UNIX sockets were not removed on graceful shutdown, preventing
the next runs. The fix is to replace the custom socket closing code in
ngx_master_process_cycle() by the ngx_close_listening_sockets() call.
When changing binary, sending a SIGTERM to the new binary's master process
should not remove inherited UNIX sockets unless the old binary's master
process has exited.
Previously, invalid connection preface errors were only logged at debug
level, providing no visible feedback, in particular, when a plain text
HTTP/2 listening socket is erroneously used for HTTP/1.x connections.
Now these are explicitly logged at the info level, much like other
client-related errors.
When enabled, certificate status is stored in cache and is used to validate
the certificate in future requests.
New directive ssl_ocsp_cache is added to configure the cache.
OCSP validation for client certificates is enabled by the "ssl_ocsp" directive.
OCSP responder can be optionally specified by "ssl_ocsp_responder".
When session is reused, peer chain is not available for validation.
If the verified chain contains certificates from the peer chain not available
at the server, validation will fail.
Previously only the first responder address was used per each stapling update.
Now, in case of a network or parsing error, next address is used.
This also fixes the issue with unsupported responder address families
(ticket #1330).
As per https://tools.ietf.org/html/rfc7540#section-8.1,
: A server can send a complete response prior to the client
: sending an entire request if the response does not depend on
: any portion of the request that has not been sent and
: received. When this is true, a server MAY request that the
: client abort transmission of a request without error by
: sending a RST_STREAM with an error code of NO_ERROR after
: sending a complete response (i.e., a frame with the
: END_STREAM flag). Clients MUST NOT discard responses as a
: result of receiving such a RST_STREAM, though clients can
: always discard responses at their discretion for other
: reasons.
Previously, RST_STREAM(NO_ERROR) received from upstream after
a frame with the END_STREAM flag was incorrectly treated as an
error. Now, a single RST_STREAM(NO_ERROR) is properly handled.
This fixes problems observed with modern grpc-c [1], as well
as with the Go gRPC module.
[1] https://github.com/grpc/grpc/pull/1661
The request processing is delayed by a timer. Since nginx updates
internal time once at the start of each event loop iteration, this
normally ensures constant time delay, adding a mitigation from
time-based attacks.
A notable exception to this is the case when there are no additional
events before the timer expires. To ensure constant-time processing
in this case as well, we trigger an additional event loop iteration
by posting a dummy event for the next event loop iteration.
When "aio" or "aio threads" is used while processing the response body of an
in-memory background subrequest, the subrequest could be finalized with an aio
operation still in progress. Upon aio completion either parent request is
woken or the old r->write_event_handler is called again. The latter may result
in request errors. In either case post_subrequest handler is never called with
the full response body, which is typically expected when using in-memory
subrequests.
Currently in nginx background subrequests are created by the upstream module
and the mirror module. The issue does not manifest itself with these
subrequests because they are header-only. But it can manifest itself with
third-party modules which create in-memory background subrequests.
We used to have default error_page overwrite for 495, 496, and 497, so
a configuration like
error_page 495 /error;
will result in error 400, much like without any error_page configured.
The 494 status code was introduced later (in 3848:de59ad6bf557, nginx 0.9.4),
and relevant changes to ngx_http_core_error_page() were missed, resulting
in inconsistent behaviour of "error_page 494" - with error_page configured
it results in 494 being returned instead of 400.
Reported by Frank Liu,
http://mailman.nginx.org/pipermail/nginx/2020-February/058957.html.
In "co64" atom chunk start offset is a 64-bit unsigned integer. When trimming
the "mdat" atom, chunk offsets are casted to off_t values which are typically
64-bit signed integers. A specially crafted mp4 file with huge chunk offsets
may lead to off_t overflow and result in negative trim boundaries.
The consequences of the overflow are:
- Incorrect Content-Length header value in the response.
- Negative left boundary of the response file buffer holding the trimmed "mdat".
This leads to pread()/sendfile() errors followed by closing the client
connection.
On rare systems where off_t is a 32-bit integer, this scenario is also feasible
with the "stco" atom.
The fix is to add checks which make sure data chunks referenced by each track
are within the mp4 file boundaries. Additionally a few more checks are added to
ensure mp4 file consistency and log errors.
Duplicate "Host" headers were allowed in nginx 0.7.0 (revision b9de93d804ea)
as a workaround for some broken Motorola phones which used to generate
requests with two "Host" headers[1]. It is believed that this workaround
is no longer relevant.
[1] http://mailman.nginx.org/pipermail/nginx-ru/2008-May/017845.html
The "identity" transfer coding has been removed in RFC 7230. It is
believed that it is not used in real life, and at the same time it
provides a potential attack vector.
We anyway do not support more than one transfer encoding, so accepting
requests with multiple Transfer-Encoding headers doesn't make sense.
Further, we do not handle multiple headers, and ignore anything but
the first header.
Reported by Filippo Valsorda.
A connection could get stuck without timers if a client has partially sent
the HEADERS frame such that it was split on the individual header boundary.
In this case, it cannot be processed without the rest of the HEADERS frame.
The fix is to call ngx_http_v2_state_headers_save() in this case. Normally,
it would be called from the ngx_http_v2_state_header_block() handler on the
next iteration, when there is not enough data to continue processing. This
isn't the case if recv_buffer became empty and there's no more data to read.
With the recent change to prevent frames flood in d4448892a294,
nginx will finalize the connection with NGX_HTTP_V2_INTERNAL_ERROR
whenever flood is detected, causing nginx aborting or stopping if
the debug_points directive is used in nginx config.
Previous change 1ce3f01a4355 incorrectly introduced processing of the
ngx_posted_next_events queue at the end of operation, effectively making
posted next events a nop, since at the end of an event loop iteration
the queue is always empty. Correct approach is to move events to the
ngx_posted_events queue at an iteration start, as it was done previously.
Further, in some cases the c->read event might be already in the
ngx_posted_events queue, and calling ngx_post_event() with the
ngx_posted_next_events queue won't do anything. To make sure the event
will be correctly placed into the ngx_posted_next_events queue
we now check if it is already posted.
Introduced in 9d2ad2fb4423 available bytes handling in SSL relied
on connection read handler being overwritten to set the ready flag
and the amount of available bytes. This approach is, however, does
not work properly when connection read handler is changed, for example,
when switching to a next pipelined request, and can result in unexpected
connection timeouts, see here:
http://mailman.nginx.org/pipermail/nginx-devel/2019-December/012825.html
Fix is to introduce ngx_event_process_posted_next() instead, which
will set ready and available regardless of how event handler is set.
When ngx_http_v2_close_stream_handler() is used to retry stream close
after queued frames are sent, client timeouts on the stream can be
logged multiple times and/or in addition to already happened errors.
To resolve this, separate ngx_http_v2_retry_close_stream_handler()
was introduced, which does not try to log timeouts.
If a stream is closed with queued frames, it is possible that no further
write events will occur on the stream, leading to the socket leak.
To fix this, the stream's fake connection read handler is set to
ngx_http_v2_close_stream_handler(), to make sure that finalizing the
connection with ngx_http_v2_finalize_connection() will be able to
close the stream regardless of the current number of queued frames.
Additionally, the stream's fake connection fc->error flag is explicitly
set, so ngx_http_v2_handle_stream() will post a write event when queued
frames are finally sent even if stream flow control window is exhausted.
These checks were missed when chunked support was introduced. And also
added an explicit error message to ngx_http_dav_copy_move_handler()
(it was missed for some reason, in contrast to DELETE and MKCOL handlers).
While empty replacements were caught at run-time, parsing code
of the "rewrite" directive expects that a minimum length of the
"replacement" argument is 1.
If a rewritten URI has the null character, only a part of URI was
copied to a memory buffer allocated for path. In some setups this
could be exploited to expose uninitialized memory via the Location
header.
The "alias" directive cannot be used in the same location where URI
was rewritten. This has been detected in the "rewrite ... break"
case, but not when the standalone "break" directive was used.
This change also fixes proxy_pass with URI component in a similar
case:
location /aaa/ {
rewrite ^ /xxx/yyy;
break;
proxy_pass http://localhost:8080/bbb/;
}
Previously, the "/bbb/yyy" would be sent to a backend instead of
"/xxx/yyy". And if location's prefix was longer than the rewritten
URI, a segmentation fault might occur.
Previously, connections returned from keepalive cache had c->data
pointing to the keepalive cache item. While this shouldn't be a problem
for correct code, as c->data is not expected to be used before it is set,
explicitly clearing it might help to avoid confusion.
Previously only an rbtree was associated with a limit_conn. To make it
possible to associate more data with a limit_conn, shared context is introduced
similar to limit_req. Also, shared pool pointer is kept in a way similar to
limit_req.
Now a new structure ngx_proxy_protocol_t holds these fields. This allows
to add more PROXY protocol fields in the future without modifying the
connection structure.
With MinGW-w64, building 64-bit nginx binary with GCC 8 and above
results in warning due to cast of GetProcAddress() result to ngx_wsapoll_pt,
which GCC thinks is incorrect. Added intermediate cast to "void *" to
silence the warning.
FormatMessage() seems to return many errors which essentially indicate that
the language in question is not available. At least the following were
observed in the wild and during testing: ERROR_MUI_FILE_NOT_FOUND (15100)
(ticket #1868), ERROR_RESOURCE_TYPE_NOT_FOUND (1813). While documentation
says it should be ERROR_RESOURCE_LANG_NOT_FOUND (1815), this doesn't seem
to be the case.
As such, checking error code was removed, and as long as FormatMessage()
returns an error, we now always try the default language.
Added code to track number of bytes available in the socket.
This makes it possible to avoid looping for a long time while
working with fast enough peer when data are added to the socket buffer
faster than we are able to read and process data.
When kernel does not provide number of bytes available, it is
retrieved using ioctl(FIONREAD) as long as a buffer is filled by
SSL_read().
It is assumed that number of bytes returned by SSL_read() is close
to the number of bytes read from the socket, as we do not use
SSL compression. But even if it is not true for some reason, this
is not important, as we post an additional reading event anyway.
Note that data can be buffered at SSL layer, and it is not possible
to simply stop reading at some point and wait till the event will
be reported by the kernel again. This can be only done when there
are no data in SSL buffers, and there is no good way to find out if
it's the case.
Instead of trying to figure out if SSL buffers are empty, this patch
introduces events posted for the next event loop iteration - such
events will be processed only on the next event loop iteration,
after going into the kernel and retrieving additional events. This
seems to be simple and reliable approach.
This makes it possible to avoid looping for a long time while working
with a fast enough peer when data are added to the socket buffer faster
than we are able to read and process them (ticket #1431). This is
basically what we already do on FreeBSD with kqueue, where information
about the number of bytes in the socket buffer is returned by
the kevent() call.
With other event methods rev->available is now set to -1 when the socket
is ready for reading. Later in ngx_recv() and ngx_recv_chain(), if
full buffer is received, real number of bytes in the socket buffer is
retrieved using ioctl(FIONREAD). Reading more than this number of bytes
ensures that even with edge-triggered event methods the event will be
triggered again, so it is safe to stop processing of the socket and
switch to other connections.
Using ioctl(FIONREAD) only after reading a full buffer is an optimization.
With this approach we only call ioctl(FIONREAD) when there are at least
two recv()/readv() calls.
As long as there are data to read in the socket, yet the amount of data
is less than total size of the buffers in the chain, this saves one
unneeded read() syscall. Before this change, reading only stopped if
ngx_ssl_recv() returned no data, that is, two read() syscalls in a row
returned EAGAIN.
In SSL connections, data can be buffered by the SSL layer, and it is
wrong to avoid doing c->recv_chain() if c->read->available is 0 and
c->read->pending_eof is set. And tests show that the optimization in
question indeed can result in incorrect detection of premature connection
close if upstream closes the connection without sending a close notify
alert at the same time. Fix is to disable c->read->available optimization
for SSL connections.
This could happen when graceful shutdown configured by worker_shutdown_timeout
times out and is then followed by another timeout such as proxy_read_timeout.
In this case, the HEADERS frame is added to the output queue, but attempt to
send it fails (due to c->error forcibly set during graceful shutdown timeout).
This triggers request finalization which attempts to close the stream. But the
stream cannot be closed because there is a frame in the output queue, and the
connection cannot be finalized. This leaves the connection open without any
timer events leading to alert.
The fix is to post write event when sending output queue fails on c->error.
That will finalize the connection.
With this patch, all traffic over an HTTP/2 connection is counted in
the h2c->total_bytes field, and payload traffic is counted in
the h2c->payload_bytes field. As long as total traffic is many times
larger than payload traffic, we consider this to be a flood.
In 8df664ebe037, we've switched to maximizing stream window instead
of sending RST_STREAM. Since then handling of RST_STREAM with NO_ERROR
was fixed at least in Chrome, hence we switch back to using RST_STREAM.
This allows more effective rejecting of large bodies, and also minimizes
non-payload traffic to be accounted in the next patch.
Previously, if a response to the PTR request was cached, and ngx_resolver_dup()
failed to allocate memory for the resulting name, then the original node was
freed but left in expire_queue. A subsequent address resolving would end up
in a use-after-free memory access of the node either in ngx_resolver_expire()
or ngx_resolver_process_ptr(), when accessing it through expire_queue.
The fix is to leave the resolver node intact.
Don't waste server resources by sending RST_STREAM frames. Instead,
reject WINDOW_UPDATE frames with invalid zero increment by closing
connection with PROTOCOL_ERROR.
Don't waste server resources by sending RST_STREAM frames. Instead,
reject HEADERS and PRIORITY frames with self-dependency by closing
connection with PROTOCOL_ERROR.
When ngx_http_discard_request_body() call was added to ngx_http_send_response(),
there were no return codes other than NGX_OK and NGX_HTTP_INTERNAL_SERVER_ERROR.
Now it can also return NGX_HTTP_BAD_REQUEST, but ngx_http_send_response() still
incorrectly transforms it to NGX_HTTP_INTERNAL_SERVER_ERROR.
The fix is to propagate ngx_http_discard_request_body() errors.
As defined in HTTP/1.1, body chunks have the following ABNF:
chunk = chunk-size [ chunk-ext ] CRLF chunk-data CRLF
where chunk-data is a sequence of chunk-size octets.
With this change, chunk-data that doesn't end up with CRLF at chunk-size
offset will be treated as invalid, such as in the example provided below:
4
SEE-THIS-AND-
4
THAT
0
Previously, if unbuffered request body reading wasn't finished before
the request was redirected to a different location using error_page
or X-Accel-Redirect, and the request body is read again, this could
lead to disastrous effects, such as a duplicate post_handler call or
"http request count is zero" alert followed by a segmentation fault.
This happened in the following configuration (ticket #1819):
location / {
proxy_request_buffering off;
proxy_pass http://bad;
proxy_intercept_errors on;
error_page 502 = /error;
}
location /error {
proxy_pass http://backend;
}