In ac9b1df5b246 (1.13.0) we attempted to allow renegotiation in client mode,
but when using OpenSSL 1.0.2 or older versions it was additionally disabled
by SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS.
If initialization of a header failed for some reason after ngx_list_push(),
leaving the header as is can result in uninitialized memory access by
the header filter or the log module. The fix is to clear partially
initialized headers in case of errors.
For the Cache-Control header, the fix is to postpone pushing
r->headers_out.cache_control until its value is completed.
Previously, ngx_http_sub_header_filter() could fail with a partially
initialized context, later accessed in ngx_http_sub_body_filter()
if called from the perl content handler.
The issue had appeared in 2c045e5b8291 (1.9.4).
A better fix would be to handle ngx_http_send_header() errors in
the perl module, though this doesn't seem to be easy enough.
The SSL_CTRL_SET_CURVES_LIST macro is removed in the OpenSSL master branch.
SSL_CTX_set1_curves_list is preserved as compatibility with previous versions.
CVE-2009-3555 is no longer relevant and mitigated by the renegotiation
info extension (secure renegotiation). On the other hand, unexpected
renegotiation still introduces potential security risks, and hence we do
not allow renegotiation on the server side, as we never request renegotiation.
On the client side the situation is different though. There are backends
which explicitly request renegotiation, and disabled renegotiation
introduces interoperability problems. This change allows renegotiation
on the client side, and fixes interoperability problems as observed with
such backends (ticket #872).
Additionally, with TLSv1.3 the SSL_CB_HANDSHAKE_START flag is currently set
by OpenSSL when receiving a NewSessionTicket message, and was detected by
nginx as a renegotiation attempt. This looks like a bug in OpenSSL, though
this change also allows better interoperability till the problem is fixed.
Previously, the source IP address of a response UDP datagram could differ from
the original datagram destination address. This could happen if the server UDP
socket is bound to a wildcard address and the network interface chosen to output
the response packet has a different default address than the destination address
of the original packet. For example, if two addresses from the same network are
configured on an interface.
Now source address is set explicitly if a response is sent for a server UDP
socket bound to a wildcard address.
This change adds "http_429" parameter to "proxy_next_upstream" for
retrying rate-limited requests, and to "proxy_cache_use_stale" for
serving stale cached responses after being rate-limited.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
This change adds reason phrase in status line and pretty response body
when "429" status code is used in "return", "limit_conn_status" and/or
"limit_req_status" directives.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
When a slice subrequest was redirected to a new location, its context was lost.
After its completion, a new slice subrequest for the same slice was created.
This could lead to infinite loop. Now the slice module makes sure each slice
subrequest starts output with the slice context available.
With post_action or subrequests, it is possible that the timer set for
wev->delayed will expire while the active subrequest write event handler
is not ready to handle this. This results in request hangs as observed
with limit_rate / sendfile_max_chunk and post_action (ticket #776) or
subrequests (ticket #1228).
Moving the handling to the connection event handler fixes the hangs observed,
and also slightly simplifies the code.
Since limit_req uses connection's write event to delay request processing,
it can conflict with timers in other subrequests. In particular, even
if applied to an active subrequest, it can break things if wev->delayed
is already set (due to limit_rate or sendfile_max_chunk), since after
limit_req finishes the wev->delayed flag will be set and no timer will be
active.
Fix is to use the wev->delayed flag in limit_req as well. This ensures that
wev->delayed won't be set after limit_req finishes, and also ensures that
limit_req's timers will be properly handled by other subrequests if the one
delayed by limit_req is not active.
All streams in connection must be finalized before the connection
itself can be finalized and all related memory is freed. That's
not always possible on the current event loop iteration.
Thus when the last stream is finalized, it sets the special read
event handler ngx_http_v2_handle_connection_handler() and posts
the event.
Previously, this handler didn't check the connection state and
could call the regular event handler on a connection that was
already in finalization stage. In the worst case that could
lead to a segmentation fault, since some data structures aren't
supposed to be used during connection finalization. Particularly,
the waiting queue can contain already freed streams, so the
WINDOW_UPDATE frame received by that moment could trigger
accessing to these freed streams.
Now, the connection error flag is explicitly checked in
ngx_http_v2_handle_connection_handler().
In order to finalize stream the error flag is set on fake connection and
either "write" or "read" event handler is called. The read events of fake
connections are always ready, but it's not the case with the write events.
When the ready flag isn't set, the error flag can be not checked in some
cases and as a result stream isn't finalized. Now the ready flag is
explicilty set on write events for proper finalization in all cases.
Previously, flow control didn't account for padding in DATA frames,
which meant that its view of the world could drift from peer's view
by up to 256 bytes per received padded DATA frame, which could lead
to a deadlock.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Previously, its value accounted for payloads of HEADERS, CONTINUATION
and DATA frames, as well as frame headers of HEADERS and DATA frames,
but it didn't account for frame headers of CONTINUATION frames.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Previously, connection write handler was called, resulting in wake up
of the active subrequest. This change makes it possible to read data
in non-active subrequests as well. For example, this allows SSI to
process instructions in non-active subrequests earlier and start
additional subrequests if needed, reducing overall response time.
If the subrequest is already finalized, the handler set with aio_write
may still be used by sendfile in threads when using range requests
(see also e4c1f5b32868, and the original note in 9fd738b85fad). Calling
already finalized subrequest's r->write_event_handler in practice
results in request hang in some cases.
Fix is to trigger connection event handler if the subrequest was already
finalized.
The ngx_linux_sendfile() function is now used for both normal sendfile()
and sendfile in threads. The ngx_linux_sendfile_thread() function was
modified to use the same interface as ngx_linux_sendfile(), and is simply
called from ngx_linux_sendfile() when threads are enabled.
Special return code NGX_DONE is used to indicate that a thread task was
posted and no further actions are needed.
If number of bytes sent is less that what we were sending, we now always
retry sending. This is needed for sendfile() in threads as the number
of bytes we are sending might have been changed since the thread task
was posted. And this is also needed for Linux 4.3+, as sendfile() might
be interrupted at any time and provides no indication if it was interrupted
or not (ticket #1174).
If of.err is 0, it means that there was a memory allocation error
and no further logging and/or processing is needed. The of.failed
string can be only accessed if of.err is not 0.
The directive configures a timeout to be used when gracefully shutting down
worker processes. When the timer expires, nginx will try to close all
the connections currently open to facilitate shutdown.
There is no need to cancel timers early if there are other timers blocking
shutdown anyway. Preserving such timers allows nginx to continue some
periodic work till the shutdown is actually possible.
With the new approach, timers with ev->cancelable are simply ignored when
checking if there are any timers left during shutdown.
The ev->timedout flag is set on first timer expiration, and never reset
after it. Due to this the code to stop the timer when the timer was
canceled never worked (except in a very specific time frame immediately
after start), and the timer was always armed again. This essentially
resulted in a buffer flush at the end of an event loop iteration.
This behaviour actually seems to be better than just stopping the flush
timer for the whole shutdown, so it is preserved as is instead of fixing
the code to actually remove the timer. It will be further improved by
upcoming changes to preserve cancelable timers if there are other timers
blocking shutdown.
Most notably, this fixes possible buffer overflows if number of large
client header buffers in a virtual server is different from the one in
the default server.
Reported by Daniil Bondarev.
Cloned subrequests should inherit r->content_handler. This way they will
be able to use the same location configuration as the original request
if there are "if" directives in the configuration.
Without r->content_handler inherited, the following configuration tries
to access a static file in the update request:
location / {
set $true 1;
if ($true) {
# nothing
}
proxy_pass http://backend;
proxy_cache one;
proxy_cache_use_stale updating;
proxy_cache_background_update on;
}
See http://mailman.nginx.org/pipermail/nginx/2017-February/053019.html for
initial report.
With "proxy_ignore_client_abort off" (the default), upstream module changes
r->read_event_handler to ngx_http_upstream_rd_check_broken_connection().
If the handler is not cleared during upstream finalization, it can be
triggered later, causing unexpected effects, if, for example, a request
was redirected to a different location using error_page or X-Accel-Redirect.
In particular, it makes "proxy_ignore_client_abort on" non-working after
a redirection in a configuration like this:
location = / {
error_page 502 = /error;
proxy_pass http://127.0.0.1:8082;
}
location /error {
proxy_pass http://127.0.0.1:8083;
proxy_ignore_client_abort on;
}
It is also known to cause segmentation faults with aio used, see
http://mailman.nginx.org/pipermail/nginx-ru/2015-August/056570.html.
Fix is to explicitly set r->read_event_handler to ngx_http_block_reading()
during upstream finalization, similar to how it is done in the request body
reading code and in the limit_req module.
This allows to store larger ETag values for proxy_cache_revalidate,
including ones generated as SHA256, and cache responses with longer
Vary (ticket #826).
In particular, this fixes caching of Amazon S3 responses with CORS
enabled, which now use "Vary: Origin, Access-Control-Request-Headers,
Access-Control-Request-Method".
Cache version bumped accordingly.
Previously, slice subrequest location was selected based on request URI.
If request is then redirected to a new location, its context array is cleared,
making the slice module loose current slice range information. This lead to
broken output. Now subrequests with the NGX_HTTP_SUBREQUEST_CLONE flag are
created for slices. Such subrequests stay in the same location as the parent
request and keep the right slice context.
Previously, there was no way to enable the proxy_cache_use_stale behavior by
reading the backend response. Now, stale-while-revalidate and stale-if-error
Cache-Control extensions (RFC 5861) are supported. They specify, how long a
stale response can be used when a cache entry is being updated, or in case of
an error.
The function may leave error in the error queue while returning success,
e.g., when taking a DSO reference to itself as of OpenSSL 1.1.0d:
https://git.openssl.org/?p=openssl.git;a=commit;h=4af9f7f
Notably, this fixes alert seen with statically linked OpenSSL on some platforms.
While here, check OPENSSL_init_ssl() return value.
Previously, buffer size was not changed from the one saved during
initial ngx_ssl_create_connection(), even if the buffer itself was not
yet created. Fix is to change c->ssl->buffer_size in the SNI callback.
Note that it should be also possible to update buffer size even in non-SNI
virtual hosts as long as the buffer is not yet allocated. This looks
like an overcomplication though.
The ngx_event_pipe() function wasn't called on write events with
wev->delayed set. As a result, threaded writing results weren't
properly collected in ngx_event_pipe_write_to_downstream() when a
write event was triggered for a completed write.
Further, this wasn't detected, as p->aio was reset by a thread completion
handler, and results were later collected in ngx_event_pipe_read_upstream()
instead of scheduling a new write of additional data. If this happened
on the last reading from an upstream, last part of the response was never
written to the cache file.
Similar problems might also happen in case of timeouts when writing to
client, as this also results in ngx_event_pipe() not being called on write
events. In this scenario socket leaks were observed.
Fix is to check if p->writing is set in ngx_event_pipe_read_upstream(), and
therefore collect results of previous write operations in case of read events
as well, similar to how we do so in ngx_event_pipe_write_downstream().
This is enough to fix the wev->delayed case. Additionally, we now call
ngx_event_pipe() from ngx_http_upstream_process_request() if there are
uncollected write operations (p->writing and !p->aio). This also fixes
the wev->timedout case.
The ngx_chain_coalesce_file() function may produce more bytes to send then
requested in the limit passed, as it aligns the last file position
to send to memory page boundary. As a result, (limit - send) may become
negative. This resulted in big positive number when converted to size_t
while calling ngx_output_chain_to_iovec().
Another part of the problem is in ngx_chain_coalesce_file(): it changes cl
to the next chain link even if the current buffer is only partially sent
due to limit.
Therefore, if a file buffer was not expected to be fully sent due to limit,
and was followed by a memory buffer, nginx called sendfile() with a part
of the file buffer, and the memory buffer in trailer. If there were enough
room in the socket buffer, this resulted in a part of the file buffer being
skipped, and corresponding part of the memory buffer sent instead.
The bug was introduced in 8e903522c17a (1.7.8). Configurations affected
are ones using limits, that is, limit_rate and/or sendfile_max_chunk, and
memory buffers after file ones (may happen when using subrequests or
with proxying with disk buffering).
Fix is to explicitly check if (send < limit) before constructing trailer
with ngx_output_chain_to_iovec(). Additionally, ngx_chain_coalesce_file()
was modified to preserve unfinished file buffers in cl.
Closing up to 32 connections might be too aggressive if worker_connections
is set to a comparable number (and/or there are only a small number of
reusable connections). If an occasional connection shorage happens in
such a configuration, it leads to closing all reusable connections instead
of gradually reducing keepalive timeout to a smaller value. To improve
granularity in such configurations we now close no more than 1/8 of all
reusable connections at once.
Suggested by Joel Cunningham.
A missing check could cause ngx_stream_ssl_handler() to be applied
to a non-ssl session, which resulted in a null pointer dereference
if ssl_verify_client is enabled.
The bug had appeared in 1.11.8 (41cb1b64561d).
Previously, an unavailable peer was considered recovered after a successful
proxy session to this peer. Until then, only a single client connection per
fail_timeout was allowed to be proxied to the peer.
Since stream sessions can be long, it may take indefinite time for a peer to
recover, limiting the ability of the peer to receive new connections.
Now, a peer is considered recovered after a successful TCP connection is
established to it. Balancers are notified of this event via the notify()
callback.
OpenSSL 1.1.0 now uses normal "nmake; nmake install" instead of using
custom "ms\do_ms.bat" script and "ms\nt.mak" makefile. And Configure
now requires --prefix to be absolute, and no longer derives --openssldir
from prefix (so it's specified explicitly). Generated libraries are now
called "libcrypto.lib" and "libssl.lib" instead of "libeay32.lib"
and "ssleay32.lib". Appropriate tests added to support both old and new
variants.
Additionally, openssl/lhash.h now triggers warning C4090 ('function' :
different 'const' qualifiers), so the warning was disabled.
There are lots of C4244 warnings (conversion from 'type1' to 'type2',
possible loss of data), so they were disabled.
The same applies to C4267 warnings (conversion from 'size_t' to 'type',
possible loss of data), most notably - conversion from ngx_str_t.len to
ngx_variable_value_t.len (which is unsigned:28). Additionally, there
is at least one case when it is not possible to fix the warning properly
without introducing win32-specific code: recv() on win32 uses "int len",
while POSIX defines "size_t len".
The ssize_t type now properly defined for 64-bit compilation with MSVC.
Caught by warning C4305 (truncation from '__int64' to 'ssize_t'), on
"cutoff = NGX_MAX_SIZE_T_VALUE / 10" in ngx_atosz()).
Several C4334 warnings (result of 32-bit shift implicitly converted to 64 bits)
were fixed by adding explicit conversions.
Several C4214 warnings (nonstandard extension used: bit field types other
than int) in ngx_http_script.h fixed by changing bit field types from
uintptr_t to unsigned.
Most notably, warning W8012 (comparing signed and unsigned values) reported
in multiple places where an unsigned value of small type (e.g., u_short) is
promoted to an int and compared to an unsigned value.
Warning W8072 (suspicious pointer arithmetic) disabled, it is reported
when we increment base pointer in ngx_shm_alloc().
These types are available with MSVC (at least since 2003, in stddef.h),
all variants of GCC (in stdint.h) and Watcom C. We need to define them
only for Borland C.
This implies ticket key size of 80 bytes instead of previously used 48,
as both HMAC and AES keys are 32 bytes now. When an old 48-byte ticket key
is provided, we fall back to using backward-compatible AES128 encryption.
OpenSSL switched to using AES256 in 1.1.0, and we are providing equivalent
security. While here, order of HMAC and AES keys was reverted to make
the implementation compatible with keys used by OpenSSL with
SSL_CTX_set_tlsext_ticket_keys().
Prodded by Christian Klinger.