The overflow can be used to circumvent the restriction on total size of
ranges introduced in c2a91088b0c0 (1.1.2). Additionally, overflow
allows producing ranges with negative start (such ranges can be created
by using a suffix, "bytes=-100"; normally this results in 200 due to
the total size check). These can result in the following errors in logs:
[crit] ... pread() ... failed (22: Invalid argument)
[alert] ... sendfile() failed (22: Invalid argument)
When using cache, it can be also used to reveal cache file header.
It is believed that there are no other negative effects, at least with
standard nginx modules.
In theory, this can also result in memory disclosure and/or segmentation
faults if multiple ranges are allowed, and the response is returned in a
single in-memory buffer. This never happens with standard nginx modules
though, as well as known 3rd party modules.
Fix is to properly protect from possible overflow when incrementing size.
It is safe because re-sending still works during graceful shutdown as
long as resolving takes place (and resolve tasks set their own timeouts
that are not cancelable).
Also, the new ctx->cancelable flag can be set to make resolve task's
timeout event cancelable.
Notably, on ppc64 with 64k pagesize, slab 0 (of size 8) requires
128 64-bit elements for bitmasks. The code bogusly assumed that
one uintptr_t is enough for bitmasks plus at least one free slot.
Resolving an SRV record includes resolving its host names in subrequests.
Previously, if memory allocation failed while reporting a subrequest result
after receiving a response from a DNS server, the SRV resolve handler was
called immediately with the NGX_ERROR state. However, if the SRV record
included another copy of the resolved name, it was reported once again.
This could trigger the use-after-free memory access after SRV resolve
handler freed the resolve context by calling ngx_resolve_name_done().
Now the SRV resolve handler is called only when all its subrequests are
completed.
Previously, each configured header was represented in one of two ways,
depending on whether or not its value included any variables.
If the value didn't include any variables, then it would be represented
as as a single script that contained complete header line with HTTP/1.1
delimiters, i.e.:
"Header: value\r\n"
But if the value included any variables, then it would be represented
as a series of three scripts: first contained header name and the ": "
delimiter, second evaluated to header value, and third contained only
"\r\n", i.e.:
"Header: "
"$value"
"\r\n"
This commit changes that, so that each configured header is represented
as a series of two scripts: first contains only header name, and second
contains (or evaluates to) only header value, i.e.:
"Header"
"$value"
or
"Header"
"value"
This not only makes things more consistent, but also allows header name
and value to be accessed separately.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
As per RFC 2616 / RFC 7233, any range request to an empty file
is expected to result in 416 Range Not Satisfiable response, as
there cannot be a "byte-range-spec whose first-byte-pos is less
than the current length of the entity-body". On the other hand,
this makes use of byte-range requests inconvenient in some cases,
as reported for the slice module here:
http://mailman.nginx.org/pipermail/nginx-devel/2017-June/010177.html
This commit changes range filter to instead return 200 if the file
is empty and the range requested starts at 0.
This change reworks 13a5f4765887 to only run posted requests once,
with nothing on stack. Running posted requests with other request
functions on stack may result in use-after-free in case of errors,
similar to the one reported in #788.
To only run posted request once, a separate function was introduced
to be used as ssl handshake handler in c->ssl->handler,
ngx_http_upstream_ssl_handshake_handler(). The ngx_http_run_posted_requests()
is only called in this function, and not in ngx_http_upstream_ssl_handshake()
which may be called directly on stack.
Additionaly, ngx_http_upstream_ssl_handshake_handler() now does appropriate
debug logging of the current subrequest, similar to what is done in other
event handlers.
Previously, the upstream resolve handler always called
ngx_http_run_posted_requests() to run posted requests after processing the
resolver response. However, if the handler was called directly from the
ngx_resolve_name() function (for example, if the resolver response was cached),
running posted requests from the handler could lead to the following errors:
- If the request was scheduled for termination, it could actually be terminated
in the resolve handler. Upper stack frames could reference the freed request
object in this case.
- If a significant number of requests were posted, and for each of them the
resolve handler was called directly from the ngx_resolve_name() function,
posted requests could be run recursively and lead to stack overflow.
Now ngx_http_run_posted_requests() is only called from asynchronously invoked
resolve handlers.
Trailers added using this directive are evaluated after response body
is processed by output filters (but before it's written to the wire),
so it's possible to use variables calculated from the response body
as the trailer value.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Example:
ngx_table_elt_t *h;
h = ngx_list_push(&r->headers_out.trailers);
if (h == NULL) {
return NGX_ERROR;
}
ngx_str_set(&h->key, "Fun");
ngx_str_set(&h->value, "with trailers");
h->hash = ngx_hash_key_lc(h->key.data, h->key.len);
The code above adds "Fun: with trailers" trailer to the response.
Modules that want to emit trailers must set r->expect_trailers = 1
in header filter, otherwise they might not be emitted for HTTP/1.1
responses that aren't already chunked.
This change also adds $sent_trailer_* variables.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
The current style in variable handlers returning NGX_OK is to either set
v->not_found to 1, or to initialize the entire ngx_http_variable_value_t
structure.
In theory, always setting v->valid = 1 for NGX_OK would be useful, which
would mean that the value was computed and is thus valid, including the
special case of v->not_found = 1. But currently that's not the case and
causes the (v->valid || v->not_found) check to access an uninitialized
v->valid value, which is safe only because its value doesn't matter when
v->not_found is set.
When evaluating a mapped $reset_uid variable in the userid filter,
if get_handler set to ngx_http_map_variable() returned an error,
this previously resulted in a NULL pointer dereference.
If memory allocation of a new r->uri.data storage failed, reset its length as
well. Request URI is used in ngx_http_finalize_request() for debug logging.
Previously, when using NGX_HTTP_SSI_ERROR, error was ignored in ssi processing,
thus timefmt could be accessed later in ngx_http_ssi_date_gmt_local_variable()
as part of "set" handler, or NULL format pointer could be passed to strftime().
Previously, SETTINGS ACK was sent immediately upon receipt of SETTINGS
frame, before already queued DATA frames created using old SETTINGS.
This incorrect behavior was source of interoperability issues, because
peers rely on the fact that new SETTINGS are in effect after receiving
SETTINGS ACK.
Reported by Feng Li.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Previously, new frames could be emitted in the middle of applying
new (and already acknowledged) SETTINGS params, which is illegal.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
If the main request was finalized while a background request performed an
asynchronous operation, the main request ended up in ngx_http_writer() and was
not finalized until a network event or a timeout. For example, cache
background update with aio enabled made nginx unable to process further client
requests or close the connection, keeping it open until client closes it.
Now regular finalization of the main request is not suspended because of an
asynchronous operation in another request.
If a background request was terminated while an asynchronous operation was in
progress, background request's write event handler was changed to
ngx_http_request_finalizer() and never called again.
Now, whenever a request is terminated while an asynchronous operation is in
progress, connection error flag is set to make further finalizations of any
request with this connection lead to termination.
These issues appeared in 1aeaae6e9446 (not yet released).
In http these checks were changed in a6d6d762c554, though mail module
was missed at that time. Since then, the stream module was introduced
based on mail, using "== NGX_ERROR" check.
With OpenSSL 1.1.0+, the workaround for handshake buffer size as introduced
in a720f0b0e083 (ticket #413) no longer works, as OpenSSL no longer exposes
handshake buffers, see https://github.com/openssl/openssl/commit/2e7dc7cd688.
Moreover, it is no longer possible to adjust handshake buffers at all now.
To avoid additional RTT if handshake uses more than 4k we now set TCP_NODELAY
on SSL connections before handshake. While this still results in sub-optimal
network utilization due to incomplete packets being sent, it seems to be
better than nothing.
Previously, cache background update might not work as expected, making client
wait for it to complete before receiving the final part of a stale response.
This could happen if the response could not be sent to the client socket in one
filter chain call.
Now background cache update is done in a background subrequest. This type of
subrequest does not block any other subrequests or the main request.
Previously, the read event of the accepted connection was marked ready, but not
available. This made EPOLLRDHUP-related code (for example, in ngx_unix_recv())
expect more data from the socket, leading to unexpected behavior.
For example, if SSL, PROXY protocol and deferred accept were enabled on a listen
socket, the client connection was aborted due to unexpected return value of
c->recv().
If allocation of cleanup handler in the HTTP/2 header filter failed, then
a stream might be freed with a HEADERS frame left in the output queue.
Now the HEADERS frame is accounted in the queue before trying to allocate
the cleanup handler.
Abnormally exited workers may leave locked cache entries, this can
result in the cache size on disk exceeding max_size and shared memory
exhaustion.
This change mitigates the issue by ignoring locked entries during forced
expire. It also increases the visibility of the problem by logging such
entries.
Previously, an allocation error resulted in uninitialized memory access
when evaluating $upstream_http_ variables.
On a related note, see r->headers_out.headers cleanup work in 0cdee26605f3.
In ac9b1df5b246 (1.13.0) we attempted to allow renegotiation in client mode,
but when using OpenSSL 1.0.2 or older versions it was additionally disabled
by SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS.
If initialization of a header failed for some reason after ngx_list_push(),
leaving the header as is can result in uninitialized memory access by
the header filter or the log module. The fix is to clear partially
initialized headers in case of errors.
For the Cache-Control header, the fix is to postpone pushing
r->headers_out.cache_control until its value is completed.
Previously, ngx_http_sub_header_filter() could fail with a partially
initialized context, later accessed in ngx_http_sub_body_filter()
if called from the perl content handler.
The issue had appeared in 2c045e5b8291 (1.9.4).
A better fix would be to handle ngx_http_send_header() errors in
the perl module, though this doesn't seem to be easy enough.
The SSL_CTRL_SET_CURVES_LIST macro is removed in the OpenSSL master branch.
SSL_CTX_set1_curves_list is preserved as compatibility with previous versions.
CVE-2009-3555 is no longer relevant and mitigated by the renegotiation
info extension (secure renegotiation). On the other hand, unexpected
renegotiation still introduces potential security risks, and hence we do
not allow renegotiation on the server side, as we never request renegotiation.
On the client side the situation is different though. There are backends
which explicitly request renegotiation, and disabled renegotiation
introduces interoperability problems. This change allows renegotiation
on the client side, and fixes interoperability problems as observed with
such backends (ticket #872).
Additionally, with TLSv1.3 the SSL_CB_HANDSHAKE_START flag is currently set
by OpenSSL when receiving a NewSessionTicket message, and was detected by
nginx as a renegotiation attempt. This looks like a bug in OpenSSL, though
this change also allows better interoperability till the problem is fixed.
Previously, the source IP address of a response UDP datagram could differ from
the original datagram destination address. This could happen if the server UDP
socket is bound to a wildcard address and the network interface chosen to output
the response packet has a different default address than the destination address
of the original packet. For example, if two addresses from the same network are
configured on an interface.
Now source address is set explicitly if a response is sent for a server UDP
socket bound to a wildcard address.
This change adds "http_429" parameter to "proxy_next_upstream" for
retrying rate-limited requests, and to "proxy_cache_use_stale" for
serving stale cached responses after being rate-limited.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
This change adds reason phrase in status line and pretty response body
when "429" status code is used in "return", "limit_conn_status" and/or
"limit_req_status" directives.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
When a slice subrequest was redirected to a new location, its context was lost.
After its completion, a new slice subrequest for the same slice was created.
This could lead to infinite loop. Now the slice module makes sure each slice
subrequest starts output with the slice context available.
With post_action or subrequests, it is possible that the timer set for
wev->delayed will expire while the active subrequest write event handler
is not ready to handle this. This results in request hangs as observed
with limit_rate / sendfile_max_chunk and post_action (ticket #776) or
subrequests (ticket #1228).
Moving the handling to the connection event handler fixes the hangs observed,
and also slightly simplifies the code.
Since limit_req uses connection's write event to delay request processing,
it can conflict with timers in other subrequests. In particular, even
if applied to an active subrequest, it can break things if wev->delayed
is already set (due to limit_rate or sendfile_max_chunk), since after
limit_req finishes the wev->delayed flag will be set and no timer will be
active.
Fix is to use the wev->delayed flag in limit_req as well. This ensures that
wev->delayed won't be set after limit_req finishes, and also ensures that
limit_req's timers will be properly handled by other subrequests if the one
delayed by limit_req is not active.
All streams in connection must be finalized before the connection
itself can be finalized and all related memory is freed. That's
not always possible on the current event loop iteration.
Thus when the last stream is finalized, it sets the special read
event handler ngx_http_v2_handle_connection_handler() and posts
the event.
Previously, this handler didn't check the connection state and
could call the regular event handler on a connection that was
already in finalization stage. In the worst case that could
lead to a segmentation fault, since some data structures aren't
supposed to be used during connection finalization. Particularly,
the waiting queue can contain already freed streams, so the
WINDOW_UPDATE frame received by that moment could trigger
accessing to these freed streams.
Now, the connection error flag is explicitly checked in
ngx_http_v2_handle_connection_handler().
In order to finalize stream the error flag is set on fake connection and
either "write" or "read" event handler is called. The read events of fake
connections are always ready, but it's not the case with the write events.
When the ready flag isn't set, the error flag can be not checked in some
cases and as a result stream isn't finalized. Now the ready flag is
explicilty set on write events for proper finalization in all cases.
Previously, flow control didn't account for padding in DATA frames,
which meant that its view of the world could drift from peer's view
by up to 256 bytes per received padded DATA frame, which could lead
to a deadlock.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Previously, its value accounted for payloads of HEADERS, CONTINUATION
and DATA frames, as well as frame headers of HEADERS and DATA frames,
but it didn't account for frame headers of CONTINUATION frames.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Previously, connection write handler was called, resulting in wake up
of the active subrequest. This change makes it possible to read data
in non-active subrequests as well. For example, this allows SSI to
process instructions in non-active subrequests earlier and start
additional subrequests if needed, reducing overall response time.
If the subrequest is already finalized, the handler set with aio_write
may still be used by sendfile in threads when using range requests
(see also e4c1f5b32868, and the original note in 9fd738b85fad). Calling
already finalized subrequest's r->write_event_handler in practice
results in request hang in some cases.
Fix is to trigger connection event handler if the subrequest was already
finalized.
The ngx_linux_sendfile() function is now used for both normal sendfile()
and sendfile in threads. The ngx_linux_sendfile_thread() function was
modified to use the same interface as ngx_linux_sendfile(), and is simply
called from ngx_linux_sendfile() when threads are enabled.
Special return code NGX_DONE is used to indicate that a thread task was
posted and no further actions are needed.
If number of bytes sent is less that what we were sending, we now always
retry sending. This is needed for sendfile() in threads as the number
of bytes we are sending might have been changed since the thread task
was posted. And this is also needed for Linux 4.3+, as sendfile() might
be interrupted at any time and provides no indication if it was interrupted
or not (ticket #1174).
If of.err is 0, it means that there was a memory allocation error
and no further logging and/or processing is needed. The of.failed
string can be only accessed if of.err is not 0.
The directive configures a timeout to be used when gracefully shutting down
worker processes. When the timer expires, nginx will try to close all
the connections currently open to facilitate shutdown.
There is no need to cancel timers early if there are other timers blocking
shutdown anyway. Preserving such timers allows nginx to continue some
periodic work till the shutdown is actually possible.
With the new approach, timers with ev->cancelable are simply ignored when
checking if there are any timers left during shutdown.
The ev->timedout flag is set on first timer expiration, and never reset
after it. Due to this the code to stop the timer when the timer was
canceled never worked (except in a very specific time frame immediately
after start), and the timer was always armed again. This essentially
resulted in a buffer flush at the end of an event loop iteration.
This behaviour actually seems to be better than just stopping the flush
timer for the whole shutdown, so it is preserved as is instead of fixing
the code to actually remove the timer. It will be further improved by
upcoming changes to preserve cancelable timers if there are other timers
blocking shutdown.
Most notably, this fixes possible buffer overflows if number of large
client header buffers in a virtual server is different from the one in
the default server.
Reported by Daniil Bondarev.
Cloned subrequests should inherit r->content_handler. This way they will
be able to use the same location configuration as the original request
if there are "if" directives in the configuration.
Without r->content_handler inherited, the following configuration tries
to access a static file in the update request:
location / {
set $true 1;
if ($true) {
# nothing
}
proxy_pass http://backend;
proxy_cache one;
proxy_cache_use_stale updating;
proxy_cache_background_update on;
}
See http://mailman.nginx.org/pipermail/nginx/2017-February/053019.html for
initial report.
With "proxy_ignore_client_abort off" (the default), upstream module changes
r->read_event_handler to ngx_http_upstream_rd_check_broken_connection().
If the handler is not cleared during upstream finalization, it can be
triggered later, causing unexpected effects, if, for example, a request
was redirected to a different location using error_page or X-Accel-Redirect.
In particular, it makes "proxy_ignore_client_abort on" non-working after
a redirection in a configuration like this:
location = / {
error_page 502 = /error;
proxy_pass http://127.0.0.1:8082;
}
location /error {
proxy_pass http://127.0.0.1:8083;
proxy_ignore_client_abort on;
}
It is also known to cause segmentation faults with aio used, see
http://mailman.nginx.org/pipermail/nginx-ru/2015-August/056570.html.
Fix is to explicitly set r->read_event_handler to ngx_http_block_reading()
during upstream finalization, similar to how it is done in the request body
reading code and in the limit_req module.
This allows to store larger ETag values for proxy_cache_revalidate,
including ones generated as SHA256, and cache responses with longer
Vary (ticket #826).
In particular, this fixes caching of Amazon S3 responses with CORS
enabled, which now use "Vary: Origin, Access-Control-Request-Headers,
Access-Control-Request-Method".
Cache version bumped accordingly.
Previously, slice subrequest location was selected based on request URI.
If request is then redirected to a new location, its context array is cleared,
making the slice module loose current slice range information. This lead to
broken output. Now subrequests with the NGX_HTTP_SUBREQUEST_CLONE flag are
created for slices. Such subrequests stay in the same location as the parent
request and keep the right slice context.
Previously, there was no way to enable the proxy_cache_use_stale behavior by
reading the backend response. Now, stale-while-revalidate and stale-if-error
Cache-Control extensions (RFC 5861) are supported. They specify, how long a
stale response can be used when a cache entry is being updated, or in case of
an error.
The function may leave error in the error queue while returning success,
e.g., when taking a DSO reference to itself as of OpenSSL 1.1.0d:
https://git.openssl.org/?p=openssl.git;a=commit;h=4af9f7f
Notably, this fixes alert seen with statically linked OpenSSL on some platforms.
While here, check OPENSSL_init_ssl() return value.
Previously, buffer size was not changed from the one saved during
initial ngx_ssl_create_connection(), even if the buffer itself was not
yet created. Fix is to change c->ssl->buffer_size in the SNI callback.
Note that it should be also possible to update buffer size even in non-SNI
virtual hosts as long as the buffer is not yet allocated. This looks
like an overcomplication though.
The ngx_event_pipe() function wasn't called on write events with
wev->delayed set. As a result, threaded writing results weren't
properly collected in ngx_event_pipe_write_to_downstream() when a
write event was triggered for a completed write.
Further, this wasn't detected, as p->aio was reset by a thread completion
handler, and results were later collected in ngx_event_pipe_read_upstream()
instead of scheduling a new write of additional data. If this happened
on the last reading from an upstream, last part of the response was never
written to the cache file.
Similar problems might also happen in case of timeouts when writing to
client, as this also results in ngx_event_pipe() not being called on write
events. In this scenario socket leaks were observed.
Fix is to check if p->writing is set in ngx_event_pipe_read_upstream(), and
therefore collect results of previous write operations in case of read events
as well, similar to how we do so in ngx_event_pipe_write_downstream().
This is enough to fix the wev->delayed case. Additionally, we now call
ngx_event_pipe() from ngx_http_upstream_process_request() if there are
uncollected write operations (p->writing and !p->aio). This also fixes
the wev->timedout case.
The ngx_chain_coalesce_file() function may produce more bytes to send then
requested in the limit passed, as it aligns the last file position
to send to memory page boundary. As a result, (limit - send) may become
negative. This resulted in big positive number when converted to size_t
while calling ngx_output_chain_to_iovec().
Another part of the problem is in ngx_chain_coalesce_file(): it changes cl
to the next chain link even if the current buffer is only partially sent
due to limit.
Therefore, if a file buffer was not expected to be fully sent due to limit,
and was followed by a memory buffer, nginx called sendfile() with a part
of the file buffer, and the memory buffer in trailer. If there were enough
room in the socket buffer, this resulted in a part of the file buffer being
skipped, and corresponding part of the memory buffer sent instead.
The bug was introduced in 8e903522c17a (1.7.8). Configurations affected
are ones using limits, that is, limit_rate and/or sendfile_max_chunk, and
memory buffers after file ones (may happen when using subrequests or
with proxying with disk buffering).
Fix is to explicitly check if (send < limit) before constructing trailer
with ngx_output_chain_to_iovec(). Additionally, ngx_chain_coalesce_file()
was modified to preserve unfinished file buffers in cl.
Closing up to 32 connections might be too aggressive if worker_connections
is set to a comparable number (and/or there are only a small number of
reusable connections). If an occasional connection shorage happens in
such a configuration, it leads to closing all reusable connections instead
of gradually reducing keepalive timeout to a smaller value. To improve
granularity in such configurations we now close no more than 1/8 of all
reusable connections at once.
Suggested by Joel Cunningham.
A missing check could cause ngx_stream_ssl_handler() to be applied
to a non-ssl session, which resulted in a null pointer dereference
if ssl_verify_client is enabled.
The bug had appeared in 1.11.8 (41cb1b64561d).
Previously, an unavailable peer was considered recovered after a successful
proxy session to this peer. Until then, only a single client connection per
fail_timeout was allowed to be proxied to the peer.
Since stream sessions can be long, it may take indefinite time for a peer to
recover, limiting the ability of the peer to receive new connections.
Now, a peer is considered recovered after a successful TCP connection is
established to it. Balancers are notified of this event via the notify()
callback.
OpenSSL 1.1.0 now uses normal "nmake; nmake install" instead of using
custom "ms\do_ms.bat" script and "ms\nt.mak" makefile. And Configure
now requires --prefix to be absolute, and no longer derives --openssldir
from prefix (so it's specified explicitly). Generated libraries are now
called "libcrypto.lib" and "libssl.lib" instead of "libeay32.lib"
and "ssleay32.lib". Appropriate tests added to support both old and new
variants.
Additionally, openssl/lhash.h now triggers warning C4090 ('function' :
different 'const' qualifiers), so the warning was disabled.
There are lots of C4244 warnings (conversion from 'type1' to 'type2',
possible loss of data), so they were disabled.
The same applies to C4267 warnings (conversion from 'size_t' to 'type',
possible loss of data), most notably - conversion from ngx_str_t.len to
ngx_variable_value_t.len (which is unsigned:28). Additionally, there
is at least one case when it is not possible to fix the warning properly
without introducing win32-specific code: recv() on win32 uses "int len",
while POSIX defines "size_t len".
The ssize_t type now properly defined for 64-bit compilation with MSVC.
Caught by warning C4305 (truncation from '__int64' to 'ssize_t'), on
"cutoff = NGX_MAX_SIZE_T_VALUE / 10" in ngx_atosz()).
Several C4334 warnings (result of 32-bit shift implicitly converted to 64 bits)
were fixed by adding explicit conversions.
Several C4214 warnings (nonstandard extension used: bit field types other
than int) in ngx_http_script.h fixed by changing bit field types from
uintptr_t to unsigned.
Most notably, warning W8012 (comparing signed and unsigned values) reported
in multiple places where an unsigned value of small type (e.g., u_short) is
promoted to an int and compared to an unsigned value.
Warning W8072 (suspicious pointer arithmetic) disabled, it is reported
when we increment base pointer in ngx_shm_alloc().
These types are available with MSVC (at least since 2003, in stddef.h),
all variants of GCC (in stdint.h) and Watcom C. We need to define them
only for Borland C.
This implies ticket key size of 80 bytes instead of previously used 48,
as both HMAC and AES keys are 32 bytes now. When an old 48-byte ticket key
is provided, we fall back to using backward-compatible AES128 encryption.
OpenSSL switched to using AES256 in 1.1.0, and we are providing equivalent
security. While here, order of HMAC and AES keys was reverted to make
the implementation compatible with keys used by OpenSSL with
SSL_CTX_set_tlsext_ticket_keys().
Prodded by Christian Klinger.
The current version of HTTP/1.1 standard allows relative references in
redirects (https://tools.ietf.org/html/rfc7231#section-7.1.2).
Allow this form for redirects generated by nginx by introducing the new
directive absolute_redirect.
SSL version 3.0 can be specified by the client at the record level for
compatibility reasons. Previously, ssl_preread module rejected such
connections, presuming they don't have SNI. Now SSL 3.0 is allowed at
the record level.
The resolver handles SRV requests in two stages. In the first
stage it gets all SRV RRs, and in the second stage it resolves
the names from SRV RRs into addresses.
Previously, if a response to an SRV request was cached, the
queries to resolve names were not limited by a timeout. If a
response to any of these queries was not received, the SRV
request could never complete.
If a response to an SRV request was not cached, and some of the
queries to resolve names timed out, NGX_RESOLVE_TIMEDOUT was
returned instead of successfully resolved addresses.
To fix both issues, resolving of names is now always limited by
a timeout.
Changeset e7cb5deb951d breaks build on CentOS 5 with "dereferencing
type-punned pointer will break strict-aliasing rules" warning. It is
backed out.
Instead, to keep builds with BoringSSL happy, type of the "value"
variable changed to "char *", and an explicit cast added before calling
ngx_parse_http_time().
A bug was introduced by 82efcedb310b that could lead to timing out of
responses or segmentation fault, when accept_mutex was enabled.
The output queue in HTTP/2 can contain frames from different streams.
When the queue is sent, all related write handlers need to be called.
In order to do so, the streams were added to the h2c->posted queue
after handling sent frames. Then this queue was processed in
ngx_http_v2_write_handler().
If accept_mutex is enabled, the event's "ready" flag is set but its
handler is not called immediately. Instead, the event is added to
the ngx_posted_events queue. At the same time in this queue can be
events from upstream connections. Such events can result in sending
output queue before ngx_http_v2_write_handler() is triggered. And
at the time ngx_http_v2_write_handler() is called, the output queue
can be already empty with some streams added to h2c->posted.
But after 82efcedb310b, these streams weren't processed if all frames
have already been sent and the output queue was empty. This might lead
to a situation when a number of streams were get stuck in h2c->posted
queue for a long time. Eventually these streams might get closed by
the send timeout.
In the worst case this might also lead to a segmentation fault, if
already freed stream was left in the h2c->posted queue. This could
happen if one of the streams was terminated but wasn't closed, due to
the HEADERS frame or a partially sent DATA frame left in the output
queue. If this happened the ngx_http_v2_filter_cleanup() handler
removed the stream from the h2c->waiting or h2c->posted queue on
termination stage, before the frame has been sent, and the stream
was again added to the h2c->posted queue after the frame was sent.
In order to fix all these problems and simplify the code, write
events of fake stream connections are now added to ngx_posted_events
instead of using a custom h2c->posted queue.
By default, "map" creates cacheable variables [1]. With this
parameter it creates a non-cacheable variable.
An original idea was to deduce the cacheability of the "map"
variable by checking the cacheability of variables specified
in source and resulting values, but it turned to be too hard.
For example, a cacheable variable can be overridden with the
"set" directive or with the SSI "set" command. Also, keeping
"map" variables cacheable by default is good for performance
reasons. This required adding a new parameter.
[1] Before db699978a33f (1.11.0), the cacheability of the
"map" variable could vary depending on the cacheability of
variables specified in resulting values (ticket #1090).
This is believed to be a bug rather than a feature.
Removed code that would cause an endless loop, and removed condition
check that is always false. The first page in the slot list is
guaranteed to satisfy an allocation.
On exit environment allocated from a pool is no longer available, leading
to a segmentation fault if, for example, a library tries to use it from
an atexit() handler.
Fix is to allocate environment via ngx_alloc() instead, and explicitly
free it using a pool cleanup handler if it's no longer used (e.g., on
configuration reload).
In Perl 5.8.6 the default was switched to use putenv() when used as
embedded library unless "PL_use_safe_putenv = 0" is explicitly used
in the code. Therefore, for modern versions of Perl it is no longer
necessary to restore previous environment when calling perl_destruct().
For Perl compiled with threads, without PERL_SET_INTERP() the PL_curinterp
remains set to the first interpreter created (that is, one created at
original start). As a result after a reload Perl thinks that operations
are done withing a thread, and, most notably, denies to change environment.
For example, the following code properly works on original start,
but fails after a reload:
perl 'sub {
my $r = shift;
$r->send_http_header("text/plain");
$ENV{TZ} = "UTC";
$r->print("tz: " . $ENV{TZ} . " (localtime " . (localtime()) . ")\n");
$ENV{TZ} = "Europe/Moscow";
$r->print("tz: " . $ENV{TZ} . " (localtime " . (localtime()) . ")\n");
return OK;
}';
To fix this, PERL_SET_INTERP() added anywhere where PERL_SET_CONTEXT()
was previously used.
Note that PERL_SET_INTERP() doesn't seem to be documented anywhere.
Yet it is used in some other software, and also seems to be the only
solution possible.
Atom size is the sum of atom header size and atom data size. The
specification says that the first 4 bytes are set to one when
the atom size is greater than the maximum unsigned 32-bit value.
Which means atom header size should be considered when the
comparison takes place between atom data size and 0xffffffff.
The variable contains a list of curves as supported by the client.
Known curves are listed by their names, unknown ones are shown
in hex, e.g., "0x001d:prime256v1:secp521r1:secp384r1".
Note that OpenSSL uses session data for SSL_get1_curves(), and
it doesn't store full list of curves supported by the client when
serializing a session. As a result $ssl_curves is only available
for new sessions (and will be empty for reused ones).
The variable is only meaningful when using OpenSSL 1.0.2 and above.
With older versions the variable is empty.
The variable contains list of ciphers as supported by the client.
Known ciphers are listed by their names, unknown ones are shown
in hex, e.g., ""AES128-SHA:AES256-SHA:0x00ff".
The variable is fully supported only when using OpenSSL 1.0.2 and above.
With older version there is an attempt to provide some information
using SSL_get_shared_ciphers(). It only lists known ciphers though.
Moreover, as OpenSSL uses session data for SSL_get_shared_ciphers(),
and it doesn't store relevant data when serializing a session. As
a result $ssl_ciphers is only available for new sessions (and not
available for reused ones) when using OpenSSL older than 1.0.2.
Now in case of a verification failure $ssl_client_verify contains
"FAILED:<reason>", similar to Apache's SSL_CLIENT_VERIFY, e.g.,
"FAILED:certificate has expired".
Detailed description of possible errors can be found in the verify(1)
manual page as provided by OpenSSL.
Normally, the epoll module calls the read and write handlers depending
on whether EPOLLIN and EPOLLOUT are reported by epoll_wait(). No error
processing is done in the module, the handlers are expected to get an
error when doing I/O.
If an error event is reported without EPOLLIN and EPOLLOUT, the module
set both EPOLLIN and EPOLLOUT to ensure the error event is handled at
least in one active handler.
This works well unless the error is delivered along with only one of
EPOLLIN or EPOLLOUT, and the corresponding handler does not do any I/O.
For example, it happened when getting EPOLLERR|EPOLLOUT from
epoll_wait() upon receiving "ICMP port unreachable" while proxying UDP.
As the write handler had nothing to send it was not able to detect and
log an error, and did not switch to the next upstream.
The fix is to unconditionally set EPOLLIN and EPOLLOUT in case of an
error event. In the aforementioned case, this causes the read handler
to be called which does recv() and detects an error.
In addition to the epoll module, analogous changes were made in
devpoll/eventport/poll.
Previously, a request body bigger than "client_body_buffer_size" wasn't written
into a temporary file if it has been pre-read entirely. The preread buffer
is freed after processing, thus subsequent use of it might result in sending
corrupted body or cause a segfault.
On Linux, the rename syscall can be slow due to a global file system lock,
acquired for the entire rename operation, unless both old and new files are
in the same directory. To address this temporary files are now created
in the same directory as the expected resulting cache file when using the
"use_temp_path=off" parameter.
This change mostly reverts 99639bfdfa2a and 3281de8142f5, restoring the
behaviour as of a9138c35120d (with minor changes).
Holding a cache node lock doesn't make sense as we can't use caching
anyway, and results in "ignore long locked inactive cache entry" alerts
if a node is locked for a long time.
The same is done for unbuffered connections, as they can be alive for
a long time as well.
It configures a threshold in bytes, above which client range
requests are not cached. In such a case the client's Range
header is passed directly to a proxied server.
As the pointer to the first argument was tested instead of the argument
itself, array of arguments was always created, even if there were no
arguments. Fix is to test args[0] instead of args.
Found by Coverity (CID 1356862).
The only thing that default_port comparison did in the current
code is prevented implicit upstreams to the same address/port
from being aliased for http and https, e.g.:
proxy_pass http://10.0.0.1:12345;
proxy_pass https://10.0.0.1:12345;
This is inconsistent because it doesn't work for a similar case
with uswgi_pass:
uwsgi_pass uwsgi://10.0.0.1:12345;
uwsgi_pass suwsgi://10.0.0.1:12345;
or with an explicit upstream:
upstream u {
server 10.0.0.1:12345;
}
proxy_pass http://u;
proxy_pass https://u;
Before c9059bd5445b, default_port comparison was needed to
differentiate implicit upstreams in
proxy_pass http://example.com;
and
proxy_pass https://example.com;
as u->port was not set.
When an upstream{} block follows a proxy_pass reference to it,
such an upstream inherited port and default_port settings from
proxy_pass. This was different from when they came in another
order (see ticket #1059). Explicit upstreams should not have
port and default_port in any case.
This fixes the following case:
server { location / { proxy_pass http://u; } ... }
upstream u { server 127.0.0.1; }
server { location / { proxy_pass https://u; } ... }
but not the following:
server { location / { proxy_pass http://u; } ... }
server { location / { proxy_pass https://u; } ... }
upstream u { server 127.0.0.1; }
If proxy_pass (and friends) with variables evaluates an upstream
specified with literal address, nginx always created a per-request
upstream.
Now, if there's a matching upstream specified in the configuration
(either implicit or explicit), it will be used instead.
This fixes inconsistency in what is stored in the "host" field.
Normally it would contain the "host" part of the parsed URL
(e.g., proxy_pass with variables), but for the case of an
implicit upstream specified with literal address it contained
the text representation of the socket address (that is, host
including port for IP).
Now the "host" field always contains the "host" part of the URL,
while the text representation of the socket address is stored
in the newly added "name" field.
The ngx_http_upstream_create_round_robin_peer() function was
modified accordingly in a way to be compatible with the code
that does not know about the new "name" field.
The "stream" code was similarly modified except for not adding
compatibility in ngx_stream_upstream_create_round_robin_peer().
This change is also a prerequisite for the next change.
The new directive "http2_max_requests" is introduced. From users point of
view it works quite similar to "keepalive_requests" but has significantly
bigger default value that is more suitable for HTTP/2.
This allows to correctly parse "start" and "end" arguments without
null-termination (ticket #475), and also fixes rounding errors observed
with strtod() when using i387 instructions.
Originally, the variables kept a result of X509_NAME_oneline(),
which is, according to the official documentation, a legacy
function. It produces a non standard output form and has
various quirks and inconsistencies.
The RFC2253 compliant behavior is introduced for these variables.
The original variables are available through $ssl_client_s_dn_legacy
and $ssl_client_i_dn_legacy.
Previously, while shutting down gracefully, the HTTP/2 connections were
closed in transition to idle state after all active streams have been
processed. That might never happen if the client continued opening new
streams.
Now, nginx sends GOAWAY to all HTTP/2 connections and ignores further
attempts to open new streams. A worker process will quit as soon as
processing of already opened streams is finished.
BoringSSL changed SSL_set_tlsext_host_name() to be a real function
with a (const char *) argument, so it now triggers a warning due to
conversion from (u_char *). Added an explicit cast to silence the
warning.
Prodded by Piotr Sikora, Alessandro Ghedini.
This is needed to allow TLS client certificate auth to work. With
ssl_verify_client configured, the auth daemon can choose to allow the
connection to proceed based on the certificate data.
This has been tested with Thunderbird for IMAP only. I've not yet found a
client that will do client certificate auth for POP3 or SMTP, and the method is
not really documented anywhere that I can find. That said, its simple enough
that the way I've done is probably right.
When headers are set at the "http" level and not redefined in
a server block, we now preserve conf->headers into the "http"
section configuration to inherit it to all servers.
The same applies to conf->headers_cache, though it may not be effective
if no servers use cache at the "server" level as conf->headers_cache
is only initialized if cache is enabled on a given level.
Similar changes made in fastcgi/scgi/uwsgi to preserve conf->params
and conf->params_cache.
When headers to hide are set at the "http" level and not redefined in
a server block, we now preserve compiled headers hash into the "http"
section configuration to inherit this hash to all servers.
Dependency on cache settings existed prior to 2728c4e4a9ae (0.8.44)
as Set-Cookie header was automatically hidden from responses when
using cache. This is no longer the case, and hide_headers_hash can
be safely inherited regardless of cache settings.
With this change it is now possible to load modules compiled without
the "--with-http_ssl_module" configure option into nginx binary compiled
with it, and vice versa (if a module doesn't use ssl-specific functions),
assuming both use the "--with-compat" option.
With this change it is now possible to load modules compiled without
the "--with-file-aio" configure option into nginx binary compiled with it,
and vice versa, assuming both use the "--with-compat" option.
With this change it is now possible to load modules compiled without
the "--with-threads" configure option into nginx binary compiled with it,
and vice versa (if a module does not use thread-specific functions),
assuming both use the "--with-compat" option.
It is used at least by SOAP (M-POST method, defined by RFC 2774) and
by WebDAV versioning (VERSION-CONTROL and BASELINE-CONTROL methods,
defined by RFC 3253).
Previously, user access bits were always set to "rw" unconditionally,
even with "user:r" explicitly specified. With this change we only add
default user access bits (0600) if they weren't set explicitly.
Duplicate processing was possible if the address set by realip was
listed in set_realip_from, and there was an internal redirect so module
context was cleared. This resulted in exactly the same address being set,
so this wasn't a problem before the $realip_remote_addr variable was
introduced, though now results in incorrect $realip_remote_addr being
picked.
Fix is to use ngx_http_realip_get_module_ctx() to look up module context
even if it was cleared. Additionally, the order of checks was switched to
check the configuration first as it looks more effective.
The new parameters "manager_files", "manager_sleep"
and "manager_threshold" were added to proxy_cache_path
and friends.
Note that ngx_path_manager_pt was changed to return ngx_msec_t
instead of time_t (API change).
Explicit checks for OPENSSL_VERSION_NUMBER replaced with checks
for X509_CHECK_FLAG_ALWAYS_CHECK_SUBJECT, thus allowing X509_check_host()
to be used with other libraries. In particular, X509_check_host() was
introduced in LibreSSL 2.5.0.
When the last_buf flag is cleared for add_after_body to append more data from a
subrequest, other filters may still have buffered data, which should be flushed
at this point. For example, the sub_filter may have a partial match buffered,
which will only be flushed after the subrequest is done, ending up with
interleaved data in output.
Setting last_in_chain instead of last_buf flushes the data and fixes the order
of output buffers.
The last_buf flag should only be set in the last buffer of the main request.
Otherwise, several last_buf flags can appear in output. This can, for example,
break the chunked filter, which will include several final chunks in output.
The IPV6_V6ONLY macro is now checked only while parsing appropriate flag
and when using the macro.
The ipv6only field in listen structures is always initialized to 1,
even if not supported on a given platform. This is expected to prevent
a module compiled without IPV6_V6ONLY from accidentally creating dual
sockets if loaded into main binary with proper IPV6_V6ONLY support.
It is to be used as a bitmask with various bits set/reset when appropriate.
Any bit set means that the peer should not be used, that is, exactly what
current checks do, no additional changes required.
Previously flags passed by --with-ld-opt were not used when building perl
module, which meant hardening flags provided by package build systems were not
applied.
All the errors that prevent loading configuration must be printed on the "emerg"
log level. Previously, nginx might silently fail to load configuration in some
cases as the default log level is "error".
The ssl_preread module extracts information from the SSL Client Hello message
without terminating SSL. Currently, only $ssl_preread_server_name variable
is supported, which contains server name from the SNI extension.
In this phase, head of a stream is read and analysed before proceeding to the
content phase. Amount of data read is controlled by the module implementing
the phase, but not more than defined by the "preread_buffer_size" directive.
The time spent on processing preread is controlled by the "preread_timeout"
directive.
The typical preread phase module will parse the beginning of a stream and set
variable that may be used by the content phase, for example to make routing
decision.
Previously, it was not possible to use the stream context
inside ngx_stream_init_connection() handlers. Now, limit_conn,
access handlers, as well as those added later, can create
their own contexts.
Previously, it was possible that some system calls could be
invoked while holding the accept mutex. This is clearly
wrong as it prevents incoming connections from being accepted
as quickly as possible.
Keeps the full address of the upstream server. If several servers were
contacted during proxying, their addresses are separated by commas,
e.g. "192.168.1.1:80, 192.168.1.2:80".
The stream session status is one of the following:
200 - normal completion
403 - access forbidden
500 - internal server error
502 - bad gateway
503 - limit conn
This fixes a problem with aio threads and sendfile with aio_write switched
off, as observed with range requests after fc72784b1f52 (1.9.13). Potential
problems with sendfile in threads were previously described in 9fd738b85fad,
and this seems to be one of them.
The problem occurred as file's thread_handler was set to NULL by event pipe
code after a sendfile thread task was scheduled. As a result, no sendfile
completion code was executed, and the same buffer was additionally sent
using non-threaded sendfile. Fix is to avoid modifying file's thread_handler
if aio_write is switched off.
Note that with "aio_write on" it is still possible that sendfile will use
thread_handler as set by event pipe. This is believed to be safe though,
as handlers used are compatible.
When c->recv_chain() returns an error, it is possible that we already
have some data previously read, e.g., in preread buffer. And in some
cases it may be even a complete response. Changed c->recv_chain() error
handling to process the data, much like it is already done if kevent
reports about an error.
This change, in particular, fixes processing of small responses
when an upstream fails to properly close a connection with lingering and
therefore the connection is reset, but the response is already fully
obtained by nginx (see ticket #1037).
Previously, the realip module could be left with uninitialized context after an
error in the ngx_http_realip_set_addr() function. That context could be later
accessed by $realip_remote_addr and $realip_remote_port variable handlers.
This prevents theoretical resource leak, since those threads are never joined.
Found with ThreadSanitizer.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
If the range includes two or more /16 networks and does
not start at the /16 boundary, the last subrange was not
removed (see 91cff7f97a50 for details).
Return 1 in the SSL_CTX_set_tlsext_ticket_key_cb() callback function
to indicate that a new session ticket is created, as per documentation.
Until 1.1.0, OpenSSL didn't make a distinction between non-negative
return values.
See https://git.openssl.org/?p=openssl.git;a=commitdiff;h=5c753de for details.
The IP_BIND_ADDRESS_NO_PORT option is set on upstream sockets
if proxy_bind does not specify a port. The SO_REUSEADDR option
is set on UDP upstream sockets if proxy_bind specifies a port.
Due to checking of the wrong port, IP_BIND_ADDRESS_NO_PORT was
never set, and SO_REUSEPORT was always set.
Unlike $upstream_response_length that only counts the body size,
the new variable also counts the size of response header and data
received after switching protocols when proxying WebSockets.
The change in b91bcba29351 was not enough to fix random() seeding.
On Windows, the srand() seeds the PRNG only in the current thread,
and worse, is not inherited from the calling thread. Due to this,
worker threads were not properly seeded.
Reported by Marc Bevand.
If PCRE is disabled, captures were treated as normal variables in
ngx_http_script_compile(), while code calculating flushes array length in
ngx_http_compile_complex_value() did not account captures as variables.
This could lead to write outside of the array boundary when setting
last element to -1.
Found with AddressSanitizer.
It fixes potential connection leak if some unsent data was left in the SSL
buffer. Particularly, that could happen when a client canceled the stream
after the HEADERS frame has already been created. In this case no other
frames might be produced and the HEADERS frame alone didn't flush the buffer.
Checking for return value of c->send_chain() isn't sufficient since there
are data can be left in the SSL buffer. Now the wew->ready flag is used
instead.
In particular, this fixed a connection leak in cases when all streams were
closed, but there's still some data to be sent in the SSL buffer and the
client forgot about the connection.
Particularly this fixes alerts on OS X and NetBSD systems when HTTP/2 is
configured over plain TCP sockets.
On these systems calling writev() with no data leads to EINVAL errors
being logged as "writev() failed (22: Invalid argument) while processing
HTTP/2 connection".
Previously, if the worker process exited, GOAWAY was sent to connections in
idle state, but connections with active streams were closed without GOAWAY.
This flag appeared in Linux 4.5 and is useful for avoiding thundering herd
problem.
The current Linux kernel implementation walks the list of exclusive waiters,
and queues an event to each epfd, until it finds the first waiter that has
threads blocked on it via epoll_wait().
Now it is believed that the accept mutex brings more harm than benefits.
Especially in various benchmarks it often results in situation where only
one worker grabs all connections.
On non-aligned platforms, properly cast argument before left-shifting it in
ngx_http_v2_parse_uint32 that is used with u_char. Otherwise it propagates
to int to hold the value and can step over the sign bit. Usually, on known
compilers, this results in negation. Furthermore, a subsequent store into a
wider type, that is ngx_uint_t on 64-bit platforms, results in sign-extension.
In practice, this can be observed in debug log as a very large exclusive bit
value, when client sent PRIORITY frame with exclusive bit set:
: *14 http2 PRIORITY frame sid:5 on 1 excl:8589934591 weight:17
Found with UndefinedBehaviorSanitizer.
Previously, when a buffer was processed by the sub filter, its final bytes
could be buffered by the filter even if they don't match any pattern.
This happened because the Boyer-Moore algorithm, employed by the sub filter
since b9447fc457b4 (1.9.4), matches the last characters of patterns prior to
checking other characters. If the last character is out of scope, initial
bytes of a potential match are buffered until the last character is available.
Now, after receiving a flush or recycled buffer, the filter performs
additional checks to reduce the number of buffered bytes. The potential match
is checked against the initial parts of all patterns. Non-matching bytes are
not buffered. This improves processing of a chunked response from upstream
by sending the entire chunks without buffering unless a partial match is found
at the end of a chunk.
This reduces the number of moving parts in ABI compatibility checks.
Additionally, it also allows to use OpenSSL in FIPS mode while still
using md5 for non-security tasks.
The option is only set if the socket is bound to a specific port to allow
several such sockets coexist at the same time. This is required, for example,
when nginx acts as a transparent proxy and receives two datagrams from the same
client in a short time.
The feature is only implemented for Linux.
The following two types of bind addresses are supported in addition to
$remote_addr and address literals:
- $remote_addr:$remote_port
- [$remote_addr]:$remote_port
In both cases client remote address with port is used in upstream socket bind.