The NGX_DONE value returned from ngx_http_upstream_cache_send() indicates
that upstream was already finalized in ngx_http_upstream_process_headers().
It was treated as a generic error which resulted in duplicate finalization.
Handled NGX_HTTP_UPSTREAM_INVALID_HEADER from ngx_http_upstream_cache_send().
Previously, it could return within ngx_http_upstream_finalize_request(), and
since it's below NGX_HTTP_SPECIAL_RESPONSE, a client connection could stuck.
When parsing of headers in a cache file fails, already parsed headers
need to be cleared, and protocol state needs to be reinitialized. To do
so, u->request_sent is now set to ensure ngx_http_upstream_reinit() will
be called.
This change complements improvements in 46ddff109e72.
This slightly reduces cost of selecting a peer if all or almost all peers
failed, see ticket #1030. There should be no measureable difference with
other workloads.
While this may result in non-ideal distribution of requests if nginx
won't be able to select a server in a reasonable number of attempts,
this still looks better than severe performance degradation observed
if there is no limit and there are many points configured (ticket #1030).
This is also in line with what we do for other hash balancing methods.
Previously, unix sockets were treated as AF_INET ones, and this may
result in buffer overread on Linux, where unbound unix sockets have
2-byte addresses.
Note that it is not correct to use just sun_path as a binary representation
for unix sockets. This will result in an empty string for unbound unix
sockets, and thus behaviour of limit_req and limit_conn will change when
switching from $remote_addr to $binary_remote_addr. As such, normal text
representation is used.
Reported by Stephan Dollberg.
At least FreeBSD, macOS, NetBSD, and OpenBSD can return unix sockets
with non-null-terminated sun_path. Additionally, the address may become
non-null-terminated if it does not fit into the buffer provided and was
truncated (may happen on macOS, NetBSD, and Solaris, which allow unix socket
addresess larger than struct sockaddr_un). As such, ngx_sock_ntop() might
overread the sockaddr provided, as it used "%s" format and thus assumed
null-terminated string.
To fix this, the ngx_strnlen() function was introduced, and it is now used
to calculate correct length of sun_path.
Some OSes (notably macOS, NetBSD, and Solaris) allow unix socket addresses
larger than struct sockaddr_un. Moreover, some of them (macOS, Solaris)
return socklen of the socket address before it was truncated to fit the
buffer provided. As such, on these systems socklen must not be used without
additional check that it is within the buffer provided.
Appropriate checks added to ngx_event_accept() (after accept()),
ngx_event_recvmsg() (after recvmsg()), and ngx_set_inherited_sockets()
(after getsockname()).
We also obtain socket addresses via getsockname() in
ngx_connection_local_sockaddr(), but it does not need any checks as
it is only used for INET and INET6 sockets (as there can be no
wildcard unix sockets).
The sync flag of HTTP/2 request body buffer is used when the size of request
body is unknown or bigger than configured "client_body_buffer_size". In this
case the buffer points to body data inside the global receive buffer that is
used for reading all HTTP/2 connections in the worker process. Thus, when the
sync flag is set, the buffer must be flushed to a temporary file, otherwise
the request body data can be overwritten.
Previously, the sync buffer wasn't flushed to a temporary file if the whole
body was received in one DATA frame with the END_STREAM flag and wasn't
copied into the HTTP/2 body preread buffer. As a result, the request body
might be corrupted (ticket #1384).
Now, setting r->request_body_in_file_only enforces writing the sync buffer
to a temporary file in all cases.
When caching intercepted errors, previous behaviour was to use
proxy_cache_valid times specified, regardless of various cache control
headers present in the response. Fix is to check u->cacheable and
use u->cache->valid_sec as set by various cache control response headers,
similar to how we do this in the normal caching code path.
If cache file is truncated, it is possible that u->process_header()
will return NGX_AGAIN. Added appropriate handling of this case by
changing the error to NGX_HTTP_UPSTREAM_INVALID_HEADER.
Also, added appropriate logging of this and NGX_HTTP_UPSTREAM_INVALID_HEADER
cases at the "crit" level. Note that this will result in duplicate logging
in case of NGX_HTTP_UPSTREAM_INVALID_HEADER. While this is something better
to avoid, it is considered to be an overkill to implement cache-specific
error logging in u->process_header().
Additionally, u->buffer.start is now reset to be able to receive a new
response, and u->cache_status set to MISS to provide the value in the
$upstream_cache_status variable, much like it happens on other cache file
errors detected by ngx_http_file_cache_read(), instead of HIT, which is
believed to be misleading.
It is to be used as a bitmask with various bits set/reset when appropriate.
63b8b157b776 made a similar change to ngx_http_upstream_rr_peer_t.down and
ngx_stream_upstream_rr_peer_t.down.
Previously, "get indexed header" message was logged when in fact only
header name was obtained using an index, and "get indexed header name"
was logged when full header representation (name and value) was obtained
using an index. Fixed version logs "get indexed name" and "get indexed
header" respectively.
Previously, when the first UDP response packet was not received from the
proxied server within proxy_timeout, no error message was logged before
switching to the next upstream. Additionally, when one of succeeding response
packets was not received within the timeout, the timeout error had low severity
because it was logged as a client connection error as opposed to upstream
connection error.
Various buffers are allocated in an assumption that there would be
no more than 4 year digits. This might not be true on platforms
with 64-bit time_t, as 64-bit time_t is able to represent more than that.
Such dates with more than 4 year digits hardly make sense though, as
various date formats in use do not allow them anyway.
As such, all dates are now truncated by ngx_gmtime() to December 31, 9999.
This should have no effect on valid dates, though will prevent potential
buffer overflows on invalid ones.
In ngx_gmtime(), instead of casting to ngx_uint_t we now work with
time_t directly. This allows using dates after 2038 on 32-bit platforms
which use 64-bit time_t, notably NetBSD and OpenBSD.
As the code is not able to work with negative time_t values, argument
is now set to 0 for negative values. As a positive side effect, this
results in Epoch being used for such values instead of a date in distant
future.
This change lets NGINX talk to clients with SETTINGS_HEADER_TABLE_SIZE
smaller than the default 4KB. Previously, NGINX would ACK the SETTINGS
frame with a small dynamic table size, but it would never send dynamic
table size update, leading to a connection-level COMPRESSION_ERROR.
Also, it allows clients to release 4KB of memory per connection, since
NGINX doesn't use HPACK's dynamic table when encoding headers, however
clients had to maintain it, since NGINX never signaled that it doesn't
use it.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
When switching to a next upstream, some buffers could be stuck in the middle
of the filter chain. A condition existed that raised an error when this
happened. As it turned out, this condition prevented switching to a next
upstream if ssl preread was used with the TCP protocol (see the ticket).
In fact, the condition does not make sense for TCP, since after successful
connection to an upstream switching to another upstream never happens. As for
UDP, the issue with stuck buffers is unlikely to happen, but is still possible.
Specifically, if a filter delays sending data to upstream.
The condition can be relaxed to only check the "buffered" bitmask of the
upstream connection. The new condition is simpler and fixes the ticket issue
as well. Additionally, the upstream_out chain is now reset for UDP prior to
connecting to a new upstream to prevent repeating the client data twice.
When secure link checksum has length of 23 or 24 bytes, decoded base64 value
could occupy 17 or 18 bytes which is more than 16 bytes previously allocated
for it on stack. The buffer overflow does not have any security implications
since only one local variable was corrupted and this variable was not used in
this case.
The fix is to increase buffer size up to 18 bytes. Useless buffer size
initialization is removed as well.
This fixes at least the following cases, where no last_modified_time
(assuming caching is not enabled) resulted in incorrect behaviour:
- slice filter and If-Range requests (ticket #1357);
- If-Range requests with proxy_force_ranges;
- expires modified.
The $ssl_server_name variable used SSL_get_servername() result directly,
but this is not safe: it references a memory allocation in an SSL
session, and this memory might be freed at any time due to renegotiation.
Instead, copy the name to memory allocated from the pool.
This variable contains URL-encoded client SSL certificate. In contrast
to $ssl_client_cert, it doesn't depend on deprecated header continuation.
The NGX_ESCAPE_URI_COMPONENT variant of encoding is used, so the resulting
variable can be safely used not only in headers, but also as a request
argument.
The $ssl_client_cert variable should be considered deprecated now.
The $ssl_client_raw_cert variable will be eventually renambed back
to $ssl_client_cert.
Total length of a response with multiple ranges can be larger than a size_t
variable can hold, so type changed to off_t. Previously, an incorrect
Content-Length was returned when requesting more than 4G of ranges from
a large enough file on a 32-bit system.
An additional size_t variable introduced to calculate size of the boundary
header buffer, as off_t is not needed here and will require type casts on
win32.
Reported by Shuxin Yang,
http://mailman.nginx.org/pipermail/nginx/2017-July/054384.html.
The "fd" field should be after 3 pointers for ngx_event_ident() to use it.
This was broken by ccad84a174e0. While it does not seem to be currently used
for aio-related events, it should be a good idea to preserve the correct
layout nevertheless.
Pass NGX_FILE_OPEN to ngx_open_file() to fix "The parameter is incorrect"
error on win32 when using the ssl_session_ticket_key directive or loading
a binary geo base. On UNIX, this change is a no-op.
On Windows, a worker process does not call ngx_slab_init() from
ngx_init_zone_pool(), so ngx_slab_max_size, ngx_slab_exact_size,
and ngx_slab_exact_shift were left uninitialized.
The variable was considered non-existent in the absence of any
valid_referers directives.
Given the following config snippet,
location / {
return 200 $invalid_referer;
}
location /referer {
valid_referers server_names;
}
"location /" should work identically and independently on other
"location /referer".
The fix is to always add the $invalid_referer variable as long
as the module is compiled in, as is done by other modules.
The shared objects should generally be allocated from shared memory.
While peers->name and the data it points to allocated from cf->pool
happened to work on UNIX, it broke on Windows. On UNIX this worked
only because the shared memory zone for upstreams is re-created for
every new configuration.
But on Windows, a worker process does not inherit the address space
of the master process, so the peers->name pointed to data allocated
from cf->pool by the master process, and was invalid.
The phase is added instead of the try_files phase. Unlike the old phase, the
new one supports registering multiple handlers. The try_files implementation is
moved to a separate ngx_http_try_files_module, which now registers a precontent
phase handler.
The new request flag "preserve_body" indicates that the request body file should
not be removed by the upstream module because it may be used later by a
subrequest. The flag is set by the SSI (ticket #585), addition and slice
modules. Additionally, it is also set by the upstream module when a background
cache update subrequest is started to prevent the request body file removal
after an internal redirect. Only the main request is now allowed to remove the
file.
When closing a socket with SO_REUSEPORT, Linux drops all connections waiting
in this socket's listen queue. Previously, it was believed to only result
in connection resets when reconfiguring nginx to use smaller number of worker
processes. It also results in connection resets during configuration
testing though.
Workaround is to avoid using SO_REUSEPORT when testing configuration. It
should prevent listening sockets from being created if a conflicting socket
already exists, while still preserving detection of other possible errors.
It should also cover UDP sockets.
The only downside of this approach seems to be that a configuration testing
won't be able to properly report the case when nginx was compiled with
SO_REUSEPORT, but the kernel is not able to set it. Such errors will be
reported on a real start instead.
Suffix ranges no longer allowed to set negative start values, to prevent
ranges with negative start from appearing even if total size protection
will be removed.
The overflow can be used to circumvent the restriction on total size of
ranges introduced in c2a91088b0c0 (1.1.2). Additionally, overflow
allows producing ranges with negative start (such ranges can be created
by using a suffix, "bytes=-100"; normally this results in 200 due to
the total size check). These can result in the following errors in logs:
[crit] ... pread() ... failed (22: Invalid argument)
[alert] ... sendfile() failed (22: Invalid argument)
When using cache, it can be also used to reveal cache file header.
It is believed that there are no other negative effects, at least with
standard nginx modules.
In theory, this can also result in memory disclosure and/or segmentation
faults if multiple ranges are allowed, and the response is returned in a
single in-memory buffer. This never happens with standard nginx modules
though, as well as known 3rd party modules.
Fix is to properly protect from possible overflow when incrementing size.
It is safe because re-sending still works during graceful shutdown as
long as resolving takes place (and resolve tasks set their own timeouts
that are not cancelable).
Also, the new ctx->cancelable flag can be set to make resolve task's
timeout event cancelable.
Notably, on ppc64 with 64k pagesize, slab 0 (of size 8) requires
128 64-bit elements for bitmasks. The code bogusly assumed that
one uintptr_t is enough for bitmasks plus at least one free slot.
Resolving an SRV record includes resolving its host names in subrequests.
Previously, if memory allocation failed while reporting a subrequest result
after receiving a response from a DNS server, the SRV resolve handler was
called immediately with the NGX_ERROR state. However, if the SRV record
included another copy of the resolved name, it was reported once again.
This could trigger the use-after-free memory access after SRV resolve
handler freed the resolve context by calling ngx_resolve_name_done().
Now the SRV resolve handler is called only when all its subrequests are
completed.
Previously, each configured header was represented in one of two ways,
depending on whether or not its value included any variables.
If the value didn't include any variables, then it would be represented
as as a single script that contained complete header line with HTTP/1.1
delimiters, i.e.:
"Header: value\r\n"
But if the value included any variables, then it would be represented
as a series of three scripts: first contained header name and the ": "
delimiter, second evaluated to header value, and third contained only
"\r\n", i.e.:
"Header: "
"$value"
"\r\n"
This commit changes that, so that each configured header is represented
as a series of two scripts: first contains only header name, and second
contains (or evaluates to) only header value, i.e.:
"Header"
"$value"
or
"Header"
"value"
This not only makes things more consistent, but also allows header name
and value to be accessed separately.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
As per RFC 2616 / RFC 7233, any range request to an empty file
is expected to result in 416 Range Not Satisfiable response, as
there cannot be a "byte-range-spec whose first-byte-pos is less
than the current length of the entity-body". On the other hand,
this makes use of byte-range requests inconvenient in some cases,
as reported for the slice module here:
http://mailman.nginx.org/pipermail/nginx-devel/2017-June/010177.html
This commit changes range filter to instead return 200 if the file
is empty and the range requested starts at 0.
This change reworks 13a5f4765887 to only run posted requests once,
with nothing on stack. Running posted requests with other request
functions on stack may result in use-after-free in case of errors,
similar to the one reported in #788.
To only run posted request once, a separate function was introduced
to be used as ssl handshake handler in c->ssl->handler,
ngx_http_upstream_ssl_handshake_handler(). The ngx_http_run_posted_requests()
is only called in this function, and not in ngx_http_upstream_ssl_handshake()
which may be called directly on stack.
Additionaly, ngx_http_upstream_ssl_handshake_handler() now does appropriate
debug logging of the current subrequest, similar to what is done in other
event handlers.
Previously, the upstream resolve handler always called
ngx_http_run_posted_requests() to run posted requests after processing the
resolver response. However, if the handler was called directly from the
ngx_resolve_name() function (for example, if the resolver response was cached),
running posted requests from the handler could lead to the following errors:
- If the request was scheduled for termination, it could actually be terminated
in the resolve handler. Upper stack frames could reference the freed request
object in this case.
- If a significant number of requests were posted, and for each of them the
resolve handler was called directly from the ngx_resolve_name() function,
posted requests could be run recursively and lead to stack overflow.
Now ngx_http_run_posted_requests() is only called from asynchronously invoked
resolve handlers.
Trailers added using this directive are evaluated after response body
is processed by output filters (but before it's written to the wire),
so it's possible to use variables calculated from the response body
as the trailer value.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Example:
ngx_table_elt_t *h;
h = ngx_list_push(&r->headers_out.trailers);
if (h == NULL) {
return NGX_ERROR;
}
ngx_str_set(&h->key, "Fun");
ngx_str_set(&h->value, "with trailers");
h->hash = ngx_hash_key_lc(h->key.data, h->key.len);
The code above adds "Fun: with trailers" trailer to the response.
Modules that want to emit trailers must set r->expect_trailers = 1
in header filter, otherwise they might not be emitted for HTTP/1.1
responses that aren't already chunked.
This change also adds $sent_trailer_* variables.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
The current style in variable handlers returning NGX_OK is to either set
v->not_found to 1, or to initialize the entire ngx_http_variable_value_t
structure.
In theory, always setting v->valid = 1 for NGX_OK would be useful, which
would mean that the value was computed and is thus valid, including the
special case of v->not_found = 1. But currently that's not the case and
causes the (v->valid || v->not_found) check to access an uninitialized
v->valid value, which is safe only because its value doesn't matter when
v->not_found is set.
When evaluating a mapped $reset_uid variable in the userid filter,
if get_handler set to ngx_http_map_variable() returned an error,
this previously resulted in a NULL pointer dereference.
If memory allocation of a new r->uri.data storage failed, reset its length as
well. Request URI is used in ngx_http_finalize_request() for debug logging.
Previously, when using NGX_HTTP_SSI_ERROR, error was ignored in ssi processing,
thus timefmt could be accessed later in ngx_http_ssi_date_gmt_local_variable()
as part of "set" handler, or NULL format pointer could be passed to strftime().
Previously, SETTINGS ACK was sent immediately upon receipt of SETTINGS
frame, before already queued DATA frames created using old SETTINGS.
This incorrect behavior was source of interoperability issues, because
peers rely on the fact that new SETTINGS are in effect after receiving
SETTINGS ACK.
Reported by Feng Li.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Previously, new frames could be emitted in the middle of applying
new (and already acknowledged) SETTINGS params, which is illegal.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
If the main request was finalized while a background request performed an
asynchronous operation, the main request ended up in ngx_http_writer() and was
not finalized until a network event or a timeout. For example, cache
background update with aio enabled made nginx unable to process further client
requests or close the connection, keeping it open until client closes it.
Now regular finalization of the main request is not suspended because of an
asynchronous operation in another request.
If a background request was terminated while an asynchronous operation was in
progress, background request's write event handler was changed to
ngx_http_request_finalizer() and never called again.
Now, whenever a request is terminated while an asynchronous operation is in
progress, connection error flag is set to make further finalizations of any
request with this connection lead to termination.
These issues appeared in 1aeaae6e9446 (not yet released).
In http these checks were changed in a6d6d762c554, though mail module
was missed at that time. Since then, the stream module was introduced
based on mail, using "== NGX_ERROR" check.
With OpenSSL 1.1.0+, the workaround for handshake buffer size as introduced
in a720f0b0e083 (ticket #413) no longer works, as OpenSSL no longer exposes
handshake buffers, see https://github.com/openssl/openssl/commit/2e7dc7cd688.
Moreover, it is no longer possible to adjust handshake buffers at all now.
To avoid additional RTT if handshake uses more than 4k we now set TCP_NODELAY
on SSL connections before handshake. While this still results in sub-optimal
network utilization due to incomplete packets being sent, it seems to be
better than nothing.
Previously, cache background update might not work as expected, making client
wait for it to complete before receiving the final part of a stale response.
This could happen if the response could not be sent to the client socket in one
filter chain call.
Now background cache update is done in a background subrequest. This type of
subrequest does not block any other subrequests or the main request.
Previously, the read event of the accepted connection was marked ready, but not
available. This made EPOLLRDHUP-related code (for example, in ngx_unix_recv())
expect more data from the socket, leading to unexpected behavior.
For example, if SSL, PROXY protocol and deferred accept were enabled on a listen
socket, the client connection was aborted due to unexpected return value of
c->recv().
If allocation of cleanup handler in the HTTP/2 header filter failed, then
a stream might be freed with a HEADERS frame left in the output queue.
Now the HEADERS frame is accounted in the queue before trying to allocate
the cleanup handler.
Abnormally exited workers may leave locked cache entries, this can
result in the cache size on disk exceeding max_size and shared memory
exhaustion.
This change mitigates the issue by ignoring locked entries during forced
expire. It also increases the visibility of the problem by logging such
entries.
Previously, an allocation error resulted in uninitialized memory access
when evaluating $upstream_http_ variables.
On a related note, see r->headers_out.headers cleanup work in 0cdee26605f3.
In ac9b1df5b246 (1.13.0) we attempted to allow renegotiation in client mode,
but when using OpenSSL 1.0.2 or older versions it was additionally disabled
by SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS.
If initialization of a header failed for some reason after ngx_list_push(),
leaving the header as is can result in uninitialized memory access by
the header filter or the log module. The fix is to clear partially
initialized headers in case of errors.
For the Cache-Control header, the fix is to postpone pushing
r->headers_out.cache_control until its value is completed.
Previously, ngx_http_sub_header_filter() could fail with a partially
initialized context, later accessed in ngx_http_sub_body_filter()
if called from the perl content handler.
The issue had appeared in 2c045e5b8291 (1.9.4).
A better fix would be to handle ngx_http_send_header() errors in
the perl module, though this doesn't seem to be easy enough.
The SSL_CTRL_SET_CURVES_LIST macro is removed in the OpenSSL master branch.
SSL_CTX_set1_curves_list is preserved as compatibility with previous versions.
CVE-2009-3555 is no longer relevant and mitigated by the renegotiation
info extension (secure renegotiation). On the other hand, unexpected
renegotiation still introduces potential security risks, and hence we do
not allow renegotiation on the server side, as we never request renegotiation.
On the client side the situation is different though. There are backends
which explicitly request renegotiation, and disabled renegotiation
introduces interoperability problems. This change allows renegotiation
on the client side, and fixes interoperability problems as observed with
such backends (ticket #872).
Additionally, with TLSv1.3 the SSL_CB_HANDSHAKE_START flag is currently set
by OpenSSL when receiving a NewSessionTicket message, and was detected by
nginx as a renegotiation attempt. This looks like a bug in OpenSSL, though
this change also allows better interoperability till the problem is fixed.
Previously, the source IP address of a response UDP datagram could differ from
the original datagram destination address. This could happen if the server UDP
socket is bound to a wildcard address and the network interface chosen to output
the response packet has a different default address than the destination address
of the original packet. For example, if two addresses from the same network are
configured on an interface.
Now source address is set explicitly if a response is sent for a server UDP
socket bound to a wildcard address.
This change adds "http_429" parameter to "proxy_next_upstream" for
retrying rate-limited requests, and to "proxy_cache_use_stale" for
serving stale cached responses after being rate-limited.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
This change adds reason phrase in status line and pretty response body
when "429" status code is used in "return", "limit_conn_status" and/or
"limit_req_status" directives.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
When a slice subrequest was redirected to a new location, its context was lost.
After its completion, a new slice subrequest for the same slice was created.
This could lead to infinite loop. Now the slice module makes sure each slice
subrequest starts output with the slice context available.
With post_action or subrequests, it is possible that the timer set for
wev->delayed will expire while the active subrequest write event handler
is not ready to handle this. This results in request hangs as observed
with limit_rate / sendfile_max_chunk and post_action (ticket #776) or
subrequests (ticket #1228).
Moving the handling to the connection event handler fixes the hangs observed,
and also slightly simplifies the code.
Since limit_req uses connection's write event to delay request processing,
it can conflict with timers in other subrequests. In particular, even
if applied to an active subrequest, it can break things if wev->delayed
is already set (due to limit_rate or sendfile_max_chunk), since after
limit_req finishes the wev->delayed flag will be set and no timer will be
active.
Fix is to use the wev->delayed flag in limit_req as well. This ensures that
wev->delayed won't be set after limit_req finishes, and also ensures that
limit_req's timers will be properly handled by other subrequests if the one
delayed by limit_req is not active.
All streams in connection must be finalized before the connection
itself can be finalized and all related memory is freed. That's
not always possible on the current event loop iteration.
Thus when the last stream is finalized, it sets the special read
event handler ngx_http_v2_handle_connection_handler() and posts
the event.
Previously, this handler didn't check the connection state and
could call the regular event handler on a connection that was
already in finalization stage. In the worst case that could
lead to a segmentation fault, since some data structures aren't
supposed to be used during connection finalization. Particularly,
the waiting queue can contain already freed streams, so the
WINDOW_UPDATE frame received by that moment could trigger
accessing to these freed streams.
Now, the connection error flag is explicitly checked in
ngx_http_v2_handle_connection_handler().
In order to finalize stream the error flag is set on fake connection and
either "write" or "read" event handler is called. The read events of fake
connections are always ready, but it's not the case with the write events.
When the ready flag isn't set, the error flag can be not checked in some
cases and as a result stream isn't finalized. Now the ready flag is
explicilty set on write events for proper finalization in all cases.
Previously, flow control didn't account for padding in DATA frames,
which meant that its view of the world could drift from peer's view
by up to 256 bytes per received padded DATA frame, which could lead
to a deadlock.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Previously, its value accounted for payloads of HEADERS, CONTINUATION
and DATA frames, as well as frame headers of HEADERS and DATA frames,
but it didn't account for frame headers of CONTINUATION frames.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Previously, connection write handler was called, resulting in wake up
of the active subrequest. This change makes it possible to read data
in non-active subrequests as well. For example, this allows SSI to
process instructions in non-active subrequests earlier and start
additional subrequests if needed, reducing overall response time.
If the subrequest is already finalized, the handler set with aio_write
may still be used by sendfile in threads when using range requests
(see also e4c1f5b32868, and the original note in 9fd738b85fad). Calling
already finalized subrequest's r->write_event_handler in practice
results in request hang in some cases.
Fix is to trigger connection event handler if the subrequest was already
finalized.
The ngx_linux_sendfile() function is now used for both normal sendfile()
and sendfile in threads. The ngx_linux_sendfile_thread() function was
modified to use the same interface as ngx_linux_sendfile(), and is simply
called from ngx_linux_sendfile() when threads are enabled.
Special return code NGX_DONE is used to indicate that a thread task was
posted and no further actions are needed.
If number of bytes sent is less that what we were sending, we now always
retry sending. This is needed for sendfile() in threads as the number
of bytes we are sending might have been changed since the thread task
was posted. And this is also needed for Linux 4.3+, as sendfile() might
be interrupted at any time and provides no indication if it was interrupted
or not (ticket #1174).
If of.err is 0, it means that there was a memory allocation error
and no further logging and/or processing is needed. The of.failed
string can be only accessed if of.err is not 0.
The directive configures a timeout to be used when gracefully shutting down
worker processes. When the timer expires, nginx will try to close all
the connections currently open to facilitate shutdown.
There is no need to cancel timers early if there are other timers blocking
shutdown anyway. Preserving such timers allows nginx to continue some
periodic work till the shutdown is actually possible.
With the new approach, timers with ev->cancelable are simply ignored when
checking if there are any timers left during shutdown.
The ev->timedout flag is set on first timer expiration, and never reset
after it. Due to this the code to stop the timer when the timer was
canceled never worked (except in a very specific time frame immediately
after start), and the timer was always armed again. This essentially
resulted in a buffer flush at the end of an event loop iteration.
This behaviour actually seems to be better than just stopping the flush
timer for the whole shutdown, so it is preserved as is instead of fixing
the code to actually remove the timer. It will be further improved by
upcoming changes to preserve cancelable timers if there are other timers
blocking shutdown.
Most notably, this fixes possible buffer overflows if number of large
client header buffers in a virtual server is different from the one in
the default server.
Reported by Daniil Bondarev.
Cloned subrequests should inherit r->content_handler. This way they will
be able to use the same location configuration as the original request
if there are "if" directives in the configuration.
Without r->content_handler inherited, the following configuration tries
to access a static file in the update request:
location / {
set $true 1;
if ($true) {
# nothing
}
proxy_pass http://backend;
proxy_cache one;
proxy_cache_use_stale updating;
proxy_cache_background_update on;
}
See http://mailman.nginx.org/pipermail/nginx/2017-February/053019.html for
initial report.
With "proxy_ignore_client_abort off" (the default), upstream module changes
r->read_event_handler to ngx_http_upstream_rd_check_broken_connection().
If the handler is not cleared during upstream finalization, it can be
triggered later, causing unexpected effects, if, for example, a request
was redirected to a different location using error_page or X-Accel-Redirect.
In particular, it makes "proxy_ignore_client_abort on" non-working after
a redirection in a configuration like this:
location = / {
error_page 502 = /error;
proxy_pass http://127.0.0.1:8082;
}
location /error {
proxy_pass http://127.0.0.1:8083;
proxy_ignore_client_abort on;
}
It is also known to cause segmentation faults with aio used, see
http://mailman.nginx.org/pipermail/nginx-ru/2015-August/056570.html.
Fix is to explicitly set r->read_event_handler to ngx_http_block_reading()
during upstream finalization, similar to how it is done in the request body
reading code and in the limit_req module.
This allows to store larger ETag values for proxy_cache_revalidate,
including ones generated as SHA256, and cache responses with longer
Vary (ticket #826).
In particular, this fixes caching of Amazon S3 responses with CORS
enabled, which now use "Vary: Origin, Access-Control-Request-Headers,
Access-Control-Request-Method".
Cache version bumped accordingly.
Previously, slice subrequest location was selected based on request URI.
If request is then redirected to a new location, its context array is cleared,
making the slice module loose current slice range information. This lead to
broken output. Now subrequests with the NGX_HTTP_SUBREQUEST_CLONE flag are
created for slices. Such subrequests stay in the same location as the parent
request and keep the right slice context.
Previously, there was no way to enable the proxy_cache_use_stale behavior by
reading the backend response. Now, stale-while-revalidate and stale-if-error
Cache-Control extensions (RFC 5861) are supported. They specify, how long a
stale response can be used when a cache entry is being updated, or in case of
an error.
The function may leave error in the error queue while returning success,
e.g., when taking a DSO reference to itself as of OpenSSL 1.1.0d:
https://git.openssl.org/?p=openssl.git;a=commit;h=4af9f7f
Notably, this fixes alert seen with statically linked OpenSSL on some platforms.
While here, check OPENSSL_init_ssl() return value.
Previously, buffer size was not changed from the one saved during
initial ngx_ssl_create_connection(), even if the buffer itself was not
yet created. Fix is to change c->ssl->buffer_size in the SNI callback.
Note that it should be also possible to update buffer size even in non-SNI
virtual hosts as long as the buffer is not yet allocated. This looks
like an overcomplication though.
The ngx_event_pipe() function wasn't called on write events with
wev->delayed set. As a result, threaded writing results weren't
properly collected in ngx_event_pipe_write_to_downstream() when a
write event was triggered for a completed write.
Further, this wasn't detected, as p->aio was reset by a thread completion
handler, and results were later collected in ngx_event_pipe_read_upstream()
instead of scheduling a new write of additional data. If this happened
on the last reading from an upstream, last part of the response was never
written to the cache file.
Similar problems might also happen in case of timeouts when writing to
client, as this also results in ngx_event_pipe() not being called on write
events. In this scenario socket leaks were observed.
Fix is to check if p->writing is set in ngx_event_pipe_read_upstream(), and
therefore collect results of previous write operations in case of read events
as well, similar to how we do so in ngx_event_pipe_write_downstream().
This is enough to fix the wev->delayed case. Additionally, we now call
ngx_event_pipe() from ngx_http_upstream_process_request() if there are
uncollected write operations (p->writing and !p->aio). This also fixes
the wev->timedout case.
The ngx_chain_coalesce_file() function may produce more bytes to send then
requested in the limit passed, as it aligns the last file position
to send to memory page boundary. As a result, (limit - send) may become
negative. This resulted in big positive number when converted to size_t
while calling ngx_output_chain_to_iovec().
Another part of the problem is in ngx_chain_coalesce_file(): it changes cl
to the next chain link even if the current buffer is only partially sent
due to limit.
Therefore, if a file buffer was not expected to be fully sent due to limit,
and was followed by a memory buffer, nginx called sendfile() with a part
of the file buffer, and the memory buffer in trailer. If there were enough
room in the socket buffer, this resulted in a part of the file buffer being
skipped, and corresponding part of the memory buffer sent instead.
The bug was introduced in 8e903522c17a (1.7.8). Configurations affected
are ones using limits, that is, limit_rate and/or sendfile_max_chunk, and
memory buffers after file ones (may happen when using subrequests or
with proxying with disk buffering).
Fix is to explicitly check if (send < limit) before constructing trailer
with ngx_output_chain_to_iovec(). Additionally, ngx_chain_coalesce_file()
was modified to preserve unfinished file buffers in cl.
Closing up to 32 connections might be too aggressive if worker_connections
is set to a comparable number (and/or there are only a small number of
reusable connections). If an occasional connection shorage happens in
such a configuration, it leads to closing all reusable connections instead
of gradually reducing keepalive timeout to a smaller value. To improve
granularity in such configurations we now close no more than 1/8 of all
reusable connections at once.
Suggested by Joel Cunningham.
A missing check could cause ngx_stream_ssl_handler() to be applied
to a non-ssl session, which resulted in a null pointer dereference
if ssl_verify_client is enabled.
The bug had appeared in 1.11.8 (41cb1b64561d).
Previously, an unavailable peer was considered recovered after a successful
proxy session to this peer. Until then, only a single client connection per
fail_timeout was allowed to be proxied to the peer.
Since stream sessions can be long, it may take indefinite time for a peer to
recover, limiting the ability of the peer to receive new connections.
Now, a peer is considered recovered after a successful TCP connection is
established to it. Balancers are notified of this event via the notify()
callback.
OpenSSL 1.1.0 now uses normal "nmake; nmake install" instead of using
custom "ms\do_ms.bat" script and "ms\nt.mak" makefile. And Configure
now requires --prefix to be absolute, and no longer derives --openssldir
from prefix (so it's specified explicitly). Generated libraries are now
called "libcrypto.lib" and "libssl.lib" instead of "libeay32.lib"
and "ssleay32.lib". Appropriate tests added to support both old and new
variants.
Additionally, openssl/lhash.h now triggers warning C4090 ('function' :
different 'const' qualifiers), so the warning was disabled.
There are lots of C4244 warnings (conversion from 'type1' to 'type2',
possible loss of data), so they were disabled.
The same applies to C4267 warnings (conversion from 'size_t' to 'type',
possible loss of data), most notably - conversion from ngx_str_t.len to
ngx_variable_value_t.len (which is unsigned:28). Additionally, there
is at least one case when it is not possible to fix the warning properly
without introducing win32-specific code: recv() on win32 uses "int len",
while POSIX defines "size_t len".
The ssize_t type now properly defined for 64-bit compilation with MSVC.
Caught by warning C4305 (truncation from '__int64' to 'ssize_t'), on
"cutoff = NGX_MAX_SIZE_T_VALUE / 10" in ngx_atosz()).
Several C4334 warnings (result of 32-bit shift implicitly converted to 64 bits)
were fixed by adding explicit conversions.
Several C4214 warnings (nonstandard extension used: bit field types other
than int) in ngx_http_script.h fixed by changing bit field types from
uintptr_t to unsigned.
Most notably, warning W8012 (comparing signed and unsigned values) reported
in multiple places where an unsigned value of small type (e.g., u_short) is
promoted to an int and compared to an unsigned value.
Warning W8072 (suspicious pointer arithmetic) disabled, it is reported
when we increment base pointer in ngx_shm_alloc().
These types are available with MSVC (at least since 2003, in stddef.h),
all variants of GCC (in stdint.h) and Watcom C. We need to define them
only for Borland C.
This implies ticket key size of 80 bytes instead of previously used 48,
as both HMAC and AES keys are 32 bytes now. When an old 48-byte ticket key
is provided, we fall back to using backward-compatible AES128 encryption.
OpenSSL switched to using AES256 in 1.1.0, and we are providing equivalent
security. While here, order of HMAC and AES keys was reverted to make
the implementation compatible with keys used by OpenSSL with
SSL_CTX_set_tlsext_ticket_keys().
Prodded by Christian Klinger.
The current version of HTTP/1.1 standard allows relative references in
redirects (https://tools.ietf.org/html/rfc7231#section-7.1.2).
Allow this form for redirects generated by nginx by introducing the new
directive absolute_redirect.
SSL version 3.0 can be specified by the client at the record level for
compatibility reasons. Previously, ssl_preread module rejected such
connections, presuming they don't have SNI. Now SSL 3.0 is allowed at
the record level.
The resolver handles SRV requests in two stages. In the first
stage it gets all SRV RRs, and in the second stage it resolves
the names from SRV RRs into addresses.
Previously, if a response to an SRV request was cached, the
queries to resolve names were not limited by a timeout. If a
response to any of these queries was not received, the SRV
request could never complete.
If a response to an SRV request was not cached, and some of the
queries to resolve names timed out, NGX_RESOLVE_TIMEDOUT was
returned instead of successfully resolved addresses.
To fix both issues, resolving of names is now always limited by
a timeout.
Changeset e7cb5deb951d breaks build on CentOS 5 with "dereferencing
type-punned pointer will break strict-aliasing rules" warning. It is
backed out.
Instead, to keep builds with BoringSSL happy, type of the "value"
variable changed to "char *", and an explicit cast added before calling
ngx_parse_http_time().
A bug was introduced by 82efcedb310b that could lead to timing out of
responses or segmentation fault, when accept_mutex was enabled.
The output queue in HTTP/2 can contain frames from different streams.
When the queue is sent, all related write handlers need to be called.
In order to do so, the streams were added to the h2c->posted queue
after handling sent frames. Then this queue was processed in
ngx_http_v2_write_handler().
If accept_mutex is enabled, the event's "ready" flag is set but its
handler is not called immediately. Instead, the event is added to
the ngx_posted_events queue. At the same time in this queue can be
events from upstream connections. Such events can result in sending
output queue before ngx_http_v2_write_handler() is triggered. And
at the time ngx_http_v2_write_handler() is called, the output queue
can be already empty with some streams added to h2c->posted.
But after 82efcedb310b, these streams weren't processed if all frames
have already been sent and the output queue was empty. This might lead
to a situation when a number of streams were get stuck in h2c->posted
queue for a long time. Eventually these streams might get closed by
the send timeout.
In the worst case this might also lead to a segmentation fault, if
already freed stream was left in the h2c->posted queue. This could
happen if one of the streams was terminated but wasn't closed, due to
the HEADERS frame or a partially sent DATA frame left in the output
queue. If this happened the ngx_http_v2_filter_cleanup() handler
removed the stream from the h2c->waiting or h2c->posted queue on
termination stage, before the frame has been sent, and the stream
was again added to the h2c->posted queue after the frame was sent.
In order to fix all these problems and simplify the code, write
events of fake stream connections are now added to ngx_posted_events
instead of using a custom h2c->posted queue.
By default, "map" creates cacheable variables [1]. With this
parameter it creates a non-cacheable variable.
An original idea was to deduce the cacheability of the "map"
variable by checking the cacheability of variables specified
in source and resulting values, but it turned to be too hard.
For example, a cacheable variable can be overridden with the
"set" directive or with the SSI "set" command. Also, keeping
"map" variables cacheable by default is good for performance
reasons. This required adding a new parameter.
[1] Before db699978a33f (1.11.0), the cacheability of the
"map" variable could vary depending on the cacheability of
variables specified in resulting values (ticket #1090).
This is believed to be a bug rather than a feature.
Removed code that would cause an endless loop, and removed condition
check that is always false. The first page in the slot list is
guaranteed to satisfy an allocation.
On exit environment allocated from a pool is no longer available, leading
to a segmentation fault if, for example, a library tries to use it from
an atexit() handler.
Fix is to allocate environment via ngx_alloc() instead, and explicitly
free it using a pool cleanup handler if it's no longer used (e.g., on
configuration reload).
In Perl 5.8.6 the default was switched to use putenv() when used as
embedded library unless "PL_use_safe_putenv = 0" is explicitly used
in the code. Therefore, for modern versions of Perl it is no longer
necessary to restore previous environment when calling perl_destruct().
For Perl compiled with threads, without PERL_SET_INTERP() the PL_curinterp
remains set to the first interpreter created (that is, one created at
original start). As a result after a reload Perl thinks that operations
are done withing a thread, and, most notably, denies to change environment.
For example, the following code properly works on original start,
but fails after a reload:
perl 'sub {
my $r = shift;
$r->send_http_header("text/plain");
$ENV{TZ} = "UTC";
$r->print("tz: " . $ENV{TZ} . " (localtime " . (localtime()) . ")\n");
$ENV{TZ} = "Europe/Moscow";
$r->print("tz: " . $ENV{TZ} . " (localtime " . (localtime()) . ")\n");
return OK;
}';
To fix this, PERL_SET_INTERP() added anywhere where PERL_SET_CONTEXT()
was previously used.
Note that PERL_SET_INTERP() doesn't seem to be documented anywhere.
Yet it is used in some other software, and also seems to be the only
solution possible.
Atom size is the sum of atom header size and atom data size. The
specification says that the first 4 bytes are set to one when
the atom size is greater than the maximum unsigned 32-bit value.
Which means atom header size should be considered when the
comparison takes place between atom data size and 0xffffffff.
The variable contains a list of curves as supported by the client.
Known curves are listed by their names, unknown ones are shown
in hex, e.g., "0x001d:prime256v1:secp521r1:secp384r1".
Note that OpenSSL uses session data for SSL_get1_curves(), and
it doesn't store full list of curves supported by the client when
serializing a session. As a result $ssl_curves is only available
for new sessions (and will be empty for reused ones).
The variable is only meaningful when using OpenSSL 1.0.2 and above.
With older versions the variable is empty.
The variable contains list of ciphers as supported by the client.
Known ciphers are listed by their names, unknown ones are shown
in hex, e.g., ""AES128-SHA:AES256-SHA:0x00ff".
The variable is fully supported only when using OpenSSL 1.0.2 and above.
With older version there is an attempt to provide some information
using SSL_get_shared_ciphers(). It only lists known ciphers though.
Moreover, as OpenSSL uses session data for SSL_get_shared_ciphers(),
and it doesn't store relevant data when serializing a session. As
a result $ssl_ciphers is only available for new sessions (and not
available for reused ones) when using OpenSSL older than 1.0.2.
Now in case of a verification failure $ssl_client_verify contains
"FAILED:<reason>", similar to Apache's SSL_CLIENT_VERIFY, e.g.,
"FAILED:certificate has expired".
Detailed description of possible errors can be found in the verify(1)
manual page as provided by OpenSSL.
Normally, the epoll module calls the read and write handlers depending
on whether EPOLLIN and EPOLLOUT are reported by epoll_wait(). No error
processing is done in the module, the handlers are expected to get an
error when doing I/O.
If an error event is reported without EPOLLIN and EPOLLOUT, the module
set both EPOLLIN and EPOLLOUT to ensure the error event is handled at
least in one active handler.
This works well unless the error is delivered along with only one of
EPOLLIN or EPOLLOUT, and the corresponding handler does not do any I/O.
For example, it happened when getting EPOLLERR|EPOLLOUT from
epoll_wait() upon receiving "ICMP port unreachable" while proxying UDP.
As the write handler had nothing to send it was not able to detect and
log an error, and did not switch to the next upstream.
The fix is to unconditionally set EPOLLIN and EPOLLOUT in case of an
error event. In the aforementioned case, this causes the read handler
to be called which does recv() and detects an error.
In addition to the epoll module, analogous changes were made in
devpoll/eventport/poll.
Previously, a request body bigger than "client_body_buffer_size" wasn't written
into a temporary file if it has been pre-read entirely. The preread buffer
is freed after processing, thus subsequent use of it might result in sending
corrupted body or cause a segfault.
On Linux, the rename syscall can be slow due to a global file system lock,
acquired for the entire rename operation, unless both old and new files are
in the same directory. To address this temporary files are now created
in the same directory as the expected resulting cache file when using the
"use_temp_path=off" parameter.
This change mostly reverts 99639bfdfa2a and 3281de8142f5, restoring the
behaviour as of a9138c35120d (with minor changes).
Holding a cache node lock doesn't make sense as we can't use caching
anyway, and results in "ignore long locked inactive cache entry" alerts
if a node is locked for a long time.
The same is done for unbuffered connections, as they can be alive for
a long time as well.
It configures a threshold in bytes, above which client range
requests are not cached. In such a case the client's Range
header is passed directly to a proxied server.
As the pointer to the first argument was tested instead of the argument
itself, array of arguments was always created, even if there were no
arguments. Fix is to test args[0] instead of args.
Found by Coverity (CID 1356862).
The only thing that default_port comparison did in the current
code is prevented implicit upstreams to the same address/port
from being aliased for http and https, e.g.:
proxy_pass http://10.0.0.1:12345;
proxy_pass https://10.0.0.1:12345;
This is inconsistent because it doesn't work for a similar case
with uswgi_pass:
uwsgi_pass uwsgi://10.0.0.1:12345;
uwsgi_pass suwsgi://10.0.0.1:12345;
or with an explicit upstream:
upstream u {
server 10.0.0.1:12345;
}
proxy_pass http://u;
proxy_pass https://u;
Before c9059bd5445b, default_port comparison was needed to
differentiate implicit upstreams in
proxy_pass http://example.com;
and
proxy_pass https://example.com;
as u->port was not set.
When an upstream{} block follows a proxy_pass reference to it,
such an upstream inherited port and default_port settings from
proxy_pass. This was different from when they came in another
order (see ticket #1059). Explicit upstreams should not have
port and default_port in any case.
This fixes the following case:
server { location / { proxy_pass http://u; } ... }
upstream u { server 127.0.0.1; }
server { location / { proxy_pass https://u; } ... }
but not the following:
server { location / { proxy_pass http://u; } ... }
server { location / { proxy_pass https://u; } ... }
upstream u { server 127.0.0.1; }
If proxy_pass (and friends) with variables evaluates an upstream
specified with literal address, nginx always created a per-request
upstream.
Now, if there's a matching upstream specified in the configuration
(either implicit or explicit), it will be used instead.
This fixes inconsistency in what is stored in the "host" field.
Normally it would contain the "host" part of the parsed URL
(e.g., proxy_pass with variables), but for the case of an
implicit upstream specified with literal address it contained
the text representation of the socket address (that is, host
including port for IP).
Now the "host" field always contains the "host" part of the URL,
while the text representation of the socket address is stored
in the newly added "name" field.
The ngx_http_upstream_create_round_robin_peer() function was
modified accordingly in a way to be compatible with the code
that does not know about the new "name" field.
The "stream" code was similarly modified except for not adding
compatibility in ngx_stream_upstream_create_round_robin_peer().
This change is also a prerequisite for the next change.
The new directive "http2_max_requests" is introduced. From users point of
view it works quite similar to "keepalive_requests" but has significantly
bigger default value that is more suitable for HTTP/2.
This allows to correctly parse "start" and "end" arguments without
null-termination (ticket #475), and also fixes rounding errors observed
with strtod() when using i387 instructions.
Originally, the variables kept a result of X509_NAME_oneline(),
which is, according to the official documentation, a legacy
function. It produces a non standard output form and has
various quirks and inconsistencies.
The RFC2253 compliant behavior is introduced for these variables.
The original variables are available through $ssl_client_s_dn_legacy
and $ssl_client_i_dn_legacy.
Previously, while shutting down gracefully, the HTTP/2 connections were
closed in transition to idle state after all active streams have been
processed. That might never happen if the client continued opening new
streams.
Now, nginx sends GOAWAY to all HTTP/2 connections and ignores further
attempts to open new streams. A worker process will quit as soon as
processing of already opened streams is finished.
BoringSSL changed SSL_set_tlsext_host_name() to be a real function
with a (const char *) argument, so it now triggers a warning due to
conversion from (u_char *). Added an explicit cast to silence the
warning.
Prodded by Piotr Sikora, Alessandro Ghedini.
This is needed to allow TLS client certificate auth to work. With
ssl_verify_client configured, the auth daemon can choose to allow the
connection to proceed based on the certificate data.
This has been tested with Thunderbird for IMAP only. I've not yet found a
client that will do client certificate auth for POP3 or SMTP, and the method is
not really documented anywhere that I can find. That said, its simple enough
that the way I've done is probably right.
When headers are set at the "http" level and not redefined in
a server block, we now preserve conf->headers into the "http"
section configuration to inherit it to all servers.
The same applies to conf->headers_cache, though it may not be effective
if no servers use cache at the "server" level as conf->headers_cache
is only initialized if cache is enabled on a given level.
Similar changes made in fastcgi/scgi/uwsgi to preserve conf->params
and conf->params_cache.
When headers to hide are set at the "http" level and not redefined in
a server block, we now preserve compiled headers hash into the "http"
section configuration to inherit this hash to all servers.
Dependency on cache settings existed prior to 2728c4e4a9ae (0.8.44)
as Set-Cookie header was automatically hidden from responses when
using cache. This is no longer the case, and hide_headers_hash can
be safely inherited regardless of cache settings.
With this change it is now possible to load modules compiled without
the "--with-http_ssl_module" configure option into nginx binary compiled
with it, and vice versa (if a module doesn't use ssl-specific functions),
assuming both use the "--with-compat" option.
With this change it is now possible to load modules compiled without
the "--with-file-aio" configure option into nginx binary compiled with it,
and vice versa, assuming both use the "--with-compat" option.
With this change it is now possible to load modules compiled without
the "--with-threads" configure option into nginx binary compiled with it,
and vice versa (if a module does not use thread-specific functions),
assuming both use the "--with-compat" option.
It is used at least by SOAP (M-POST method, defined by RFC 2774) and
by WebDAV versioning (VERSION-CONTROL and BASELINE-CONTROL methods,
defined by RFC 3253).
Previously, user access bits were always set to "rw" unconditionally,
even with "user:r" explicitly specified. With this change we only add
default user access bits (0600) if they weren't set explicitly.
Duplicate processing was possible if the address set by realip was
listed in set_realip_from, and there was an internal redirect so module
context was cleared. This resulted in exactly the same address being set,
so this wasn't a problem before the $realip_remote_addr variable was
introduced, though now results in incorrect $realip_remote_addr being
picked.
Fix is to use ngx_http_realip_get_module_ctx() to look up module context
even if it was cleared. Additionally, the order of checks was switched to
check the configuration first as it looks more effective.
The new parameters "manager_files", "manager_sleep"
and "manager_threshold" were added to proxy_cache_path
and friends.
Note that ngx_path_manager_pt was changed to return ngx_msec_t
instead of time_t (API change).
Explicit checks for OPENSSL_VERSION_NUMBER replaced with checks
for X509_CHECK_FLAG_ALWAYS_CHECK_SUBJECT, thus allowing X509_check_host()
to be used with other libraries. In particular, X509_check_host() was
introduced in LibreSSL 2.5.0.
When the last_buf flag is cleared for add_after_body to append more data from a
subrequest, other filters may still have buffered data, which should be flushed
at this point. For example, the sub_filter may have a partial match buffered,
which will only be flushed after the subrequest is done, ending up with
interleaved data in output.
Setting last_in_chain instead of last_buf flushes the data and fixes the order
of output buffers.
The last_buf flag should only be set in the last buffer of the main request.
Otherwise, several last_buf flags can appear in output. This can, for example,
break the chunked filter, which will include several final chunks in output.
The IPV6_V6ONLY macro is now checked only while parsing appropriate flag
and when using the macro.
The ipv6only field in listen structures is always initialized to 1,
even if not supported on a given platform. This is expected to prevent
a module compiled without IPV6_V6ONLY from accidentally creating dual
sockets if loaded into main binary with proper IPV6_V6ONLY support.
It is to be used as a bitmask with various bits set/reset when appropriate.
Any bit set means that the peer should not be used, that is, exactly what
current checks do, no additional changes required.
Previously flags passed by --with-ld-opt were not used when building perl
module, which meant hardening flags provided by package build systems were not
applied.
All the errors that prevent loading configuration must be printed on the "emerg"
log level. Previously, nginx might silently fail to load configuration in some
cases as the default log level is "error".
The ssl_preread module extracts information from the SSL Client Hello message
without terminating SSL. Currently, only $ssl_preread_server_name variable
is supported, which contains server name from the SNI extension.
In this phase, head of a stream is read and analysed before proceeding to the
content phase. Amount of data read is controlled by the module implementing
the phase, but not more than defined by the "preread_buffer_size" directive.
The time spent on processing preread is controlled by the "preread_timeout"
directive.
The typical preread phase module will parse the beginning of a stream and set
variable that may be used by the content phase, for example to make routing
decision.
Previously, it was not possible to use the stream context
inside ngx_stream_init_connection() handlers. Now, limit_conn,
access handlers, as well as those added later, can create
their own contexts.
Previously, it was possible that some system calls could be
invoked while holding the accept mutex. This is clearly
wrong as it prevents incoming connections from being accepted
as quickly as possible.
Keeps the full address of the upstream server. If several servers were
contacted during proxying, their addresses are separated by commas,
e.g. "192.168.1.1:80, 192.168.1.2:80".
The stream session status is one of the following:
200 - normal completion
403 - access forbidden
500 - internal server error
502 - bad gateway
503 - limit conn
This fixes a problem with aio threads and sendfile with aio_write switched
off, as observed with range requests after fc72784b1f52 (1.9.13). Potential
problems with sendfile in threads were previously described in 9fd738b85fad,
and this seems to be one of them.
The problem occurred as file's thread_handler was set to NULL by event pipe
code after a sendfile thread task was scheduled. As a result, no sendfile
completion code was executed, and the same buffer was additionally sent
using non-threaded sendfile. Fix is to avoid modifying file's thread_handler
if aio_write is switched off.
Note that with "aio_write on" it is still possible that sendfile will use
thread_handler as set by event pipe. This is believed to be safe though,
as handlers used are compatible.
When c->recv_chain() returns an error, it is possible that we already
have some data previously read, e.g., in preread buffer. And in some
cases it may be even a complete response. Changed c->recv_chain() error
handling to process the data, much like it is already done if kevent
reports about an error.
This change, in particular, fixes processing of small responses
when an upstream fails to properly close a connection with lingering and
therefore the connection is reset, but the response is already fully
obtained by nginx (see ticket #1037).
Previously, the realip module could be left with uninitialized context after an
error in the ngx_http_realip_set_addr() function. That context could be later
accessed by $realip_remote_addr and $realip_remote_port variable handlers.
This prevents theoretical resource leak, since those threads are never joined.
Found with ThreadSanitizer.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
If the range includes two or more /16 networks and does
not start at the /16 boundary, the last subrange was not
removed (see 91cff7f97a50 for details).
Return 1 in the SSL_CTX_set_tlsext_ticket_key_cb() callback function
to indicate that a new session ticket is created, as per documentation.
Until 1.1.0, OpenSSL didn't make a distinction between non-negative
return values.
See https://git.openssl.org/?p=openssl.git;a=commitdiff;h=5c753de for details.
The IP_BIND_ADDRESS_NO_PORT option is set on upstream sockets
if proxy_bind does not specify a port. The SO_REUSEADDR option
is set on UDP upstream sockets if proxy_bind specifies a port.
Due to checking of the wrong port, IP_BIND_ADDRESS_NO_PORT was
never set, and SO_REUSEPORT was always set.
Unlike $upstream_response_length that only counts the body size,
the new variable also counts the size of response header and data
received after switching protocols when proxying WebSockets.
The change in b91bcba29351 was not enough to fix random() seeding.
On Windows, the srand() seeds the PRNG only in the current thread,
and worse, is not inherited from the calling thread. Due to this,
worker threads were not properly seeded.
Reported by Marc Bevand.
If PCRE is disabled, captures were treated as normal variables in
ngx_http_script_compile(), while code calculating flushes array length in
ngx_http_compile_complex_value() did not account captures as variables.
This could lead to write outside of the array boundary when setting
last element to -1.
Found with AddressSanitizer.
It fixes potential connection leak if some unsent data was left in the SSL
buffer. Particularly, that could happen when a client canceled the stream
after the HEADERS frame has already been created. In this case no other
frames might be produced and the HEADERS frame alone didn't flush the buffer.
Checking for return value of c->send_chain() isn't sufficient since there
are data can be left in the SSL buffer. Now the wew->ready flag is used
instead.
In particular, this fixed a connection leak in cases when all streams were
closed, but there's still some data to be sent in the SSL buffer and the
client forgot about the connection.
Particularly this fixes alerts on OS X and NetBSD systems when HTTP/2 is
configured over plain TCP sockets.
On these systems calling writev() with no data leads to EINVAL errors
being logged as "writev() failed (22: Invalid argument) while processing
HTTP/2 connection".
Previously, if the worker process exited, GOAWAY was sent to connections in
idle state, but connections with active streams were closed without GOAWAY.
This flag appeared in Linux 4.5 and is useful for avoiding thundering herd
problem.
The current Linux kernel implementation walks the list of exclusive waiters,
and queues an event to each epfd, until it finds the first waiter that has
threads blocked on it via epoll_wait().
Now it is believed that the accept mutex brings more harm than benefits.
Especially in various benchmarks it often results in situation where only
one worker grabs all connections.
On non-aligned platforms, properly cast argument before left-shifting it in
ngx_http_v2_parse_uint32 that is used with u_char. Otherwise it propagates
to int to hold the value and can step over the sign bit. Usually, on known
compilers, this results in negation. Furthermore, a subsequent store into a
wider type, that is ngx_uint_t on 64-bit platforms, results in sign-extension.
In practice, this can be observed in debug log as a very large exclusive bit
value, when client sent PRIORITY frame with exclusive bit set:
: *14 http2 PRIORITY frame sid:5 on 1 excl:8589934591 weight:17
Found with UndefinedBehaviorSanitizer.
Previously, when a buffer was processed by the sub filter, its final bytes
could be buffered by the filter even if they don't match any pattern.
This happened because the Boyer-Moore algorithm, employed by the sub filter
since b9447fc457b4 (1.9.4), matches the last characters of patterns prior to
checking other characters. If the last character is out of scope, initial
bytes of a potential match are buffered until the last character is available.
Now, after receiving a flush or recycled buffer, the filter performs
additional checks to reduce the number of buffered bytes. The potential match
is checked against the initial parts of all patterns. Non-matching bytes are
not buffered. This improves processing of a chunked response from upstream
by sending the entire chunks without buffering unless a partial match is found
at the end of a chunk.
This reduces the number of moving parts in ABI compatibility checks.
Additionally, it also allows to use OpenSSL in FIPS mode while still
using md5 for non-security tasks.
The option is only set if the socket is bound to a specific port to allow
several such sockets coexist at the same time. This is required, for example,
when nginx acts as a transparent proxy and receives two datagrams from the same
client in a short time.
The feature is only implemented for Linux.
The following two types of bind addresses are supported in addition to
$remote_addr and address literals:
- $remote_addr:$remote_port
- [$remote_addr]:$remote_port
In both cases client remote address with port is used in upstream socket bind.
This patch moves various OpenSSL-specific function calls into the
OpenSSL module and introduces ngx_ssl_ciphers() to make nginx more
crypto-library-agnostic.
When the stream is terminated the HEADERS frame can still wait in the output
queue. This frame can't be removed and must be sent to the client anyway,
since HTTP/2 uses stateful compression for headers. So in order to postpone
closing and freeing memory of such stream the special close stream handler
is set to the write event. After the HEADERS frame is sent the write event
is called and the stream will be finally closed.
Some events like receiving a RST_STREAM can trigger the read handler of such
stream in closing state and cause unexpected processing that can result in
another attempt to finalize the request. To prevent it the read handler is
now set to ngx_http_empty_handler.
Thanks to Amazon.
There is no reason to add the "Content-Length: 0" header to a proxied request
without body if the header isn't presented in the original request.
Thanks to Amazon.
According to RFC 7540, an endpoint should not send more than one RST_STREAM
frame for any stream.
Also, now all the data frames will be skipped while termination.
The ngx_http_v2_finalize_connection() closes current stream, but that is an
invalid operation while processing unbuffered upload. This results in access
to already freed memory, since the upstream module sets a cleanup handler that
also finalizes the request.
A special last buffer with cl->buf->pos set to NULL can be present in
a chain when writing request body if chunked encoding was used. This
resulted in a NULL pointer dereference if it happened to be the only
buffer left after a do...while loop iteration in ngx_write_chain_to_file().
The problem originally appeared in nginx 1.3.9 with chunked encoding
support. Additionally, rev. 3832b608dc8d (nginx 1.9.13) changed the
minimum number of buffers to trigger this from IOV_MAX (typically 1024)
to NGX_IOVS_PREALLOCATE (typically 64).
Fix is to skip such buffers in ngx_chain_to_iovec(), much like it is
done in other places.
Previously, the stream's window was kept zero in order to prevent a client
from sending the request body before it was requested (see 887cca40ba6a for
details). Until such initial window was acknowledged all requests with
data were rejected (see 0aa07850922f for details).
That approach revealed a number of problems:
1. Some clients (notably MS IE/Edge, Safari, iOS applications) show an error
or even crash if a stream is rejected;
2. This requires at least one RTT for every request with body before the
client receives window update and able to send data.
To overcome these problems the new directive "http2_body_preread_size" is
introduced. It sets the initial window and configures a special per stream
preread buffer that is used to save all incoming data before the body is
requested and processed.
If the directive's value is lower than the default initial window (65535),
as previously, all streams with data will be rejected until the new window
is acknowledged. Otherwise, no special processing is used and all requests
with data are welcome right from the connection start.
The default value is chosen to be 64k, which is bigger than the default
initial window. Setting it to zero is fully complaint to the previous
behavior.
Now, the module extracts optional port which may accompany an
IP address. This custom extension is introduced, among other
things, in order to facilitate logging of original client ports.
Addresses with ports are expected to be in the RFC 3986 format,
that is, with IPv6 addresses in square brackets. E.g.,
"X-Real-IP: [2001:0db8::1]:12345" sets client port ($remote_port)
to 12345.
Previously, when the client address was changed to the one from
the PROXY protocol header, the client port ($remote_port) was
reset to zero. Now the client port is also changed to the one
from the PROXY protocol header.
The 6f8254ae61b8 change inadvertently fixed the duplicate port
detection similar to how it was fixed for mail in b2920b517490.
It also revealed another issue: the socket type (tcp vs. udp)
wasn't taken into account.
Since 4fbef397c753 nginx rejects with the 400 error any attempts of
requesting different host over the same connection, if the relevant
virtual server requires verification of a client certificate.
While requesting hosts other than negotiated isn't something legal
in HTTP/1.x, the HTTP/2 specification explicitly permits such requests
for connection reuse and has introduced a special response code 421.
According to RFC 7540 Section 9.1.2 this code can be sent by a server
that is not configured to produce responses for the combination of
scheme and authority that are included in the request URI. And the
client may retry the request over a different connection.
Now this code is used for requests that aren't authorized in current
connection. After receiving the 421 response a client will be able
to open a new connection, provide the required certificate and retry
the request.
Unfortunately, not all clients currently are able to handle it well.
Notably Chrome just shows an error, while at least the latest version
of Firefox retries the request over a new connection.
Using the same DH parameters on multiple servers is believed to be subject
to precomputation attacks, see http://weakdh.org/. Additionally, 1024 bits
are not enough in the modern world as well. Let users provide their own
DH parameters with the ssl_dhparam directive if they want to use EDH ciphers.
Note that SSL_CTX_set_dh_auto() as provided by OpenSSL 1.1.0 uses fixed
DH parameters from RFC 5114 and RFC 3526, and therefore subject to the same
precomputation attacks. We avoid using it as well.
This change also fixes compilation with OpenSSL 1.1.0-pre5 (aka Beta 2),
as OpenSSL developers changed their policy after releasing Beta 1 and
broke API once again by making the DH struct opaque (see ticket #860).
OpenSSL 1.0.2+ allows configuring a curve list instead of a single curve
previously supported. This allows use of different curves depending on
what client supports (as available via the elliptic_curves extension),
and also allows use of different curves in an ECDHE key exchange and
in the ECDSA certificate.
The special value "auto" was introduced (now the default for ssl_ecdh_curve),
which means "use an internal list of curves as available in the OpenSSL
library used". For versions prior to OpenSSL 1.0.2 it maps to "prime256v1"
as previously used. The default in 1.0.2b+ prefers prime256v1 as well
(and X25519 in OpenSSL 1.1.0+).
As client vs. server preference of curves is controlled by the
same option as used for ciphers (SSL_OP_CIPHER_SERVER_PREFERENCE),
the ssl_prefer_server_ciphers directive now controls both.
The SSL_CTX_add0_chain_cert() function as introduced in OpenSSL 1.0.2 now
used instead of SSL_CTX_add_extra_chain_cert().
SSL_CTX_add_extra_chain_cert() adds extra certs for all certificates
in the context, while SSL_CTX_add0_chain_cert() only to a particular
certificate. There is no difference unless multiple certificates are used,
though it is important when using multiple certificates.
Additionally, SSL_CTX_select_current_cert() is now called before using
a chain to make sure correct chain will be returned.
A pointer to a previously configured certificate now stored in a certificate.
This makes it possible to iterate though all certificates configured in
the SSL context. This is now used to configure OCSP stapling for all
certificates, and in ngx_ssl_session_id_context().
As SSL_CTX_use_certificate() frees previously loaded certificate of the same
type, and we have no way to find out if it's the case, X509_free() calls
are now posponed till ngx_ssl_cleanup_ctx().
Note that in OpenSSL 1.0.2+ this can be done without storing things in exdata
using the SSL_CTX_set_current_cert() and SSL_CTX_get0_certificate() functions.
These are not yet available in all supported versions though, so it's easier
to continue to use exdata for now.
This makes it possible to properly return OCSP staple with multiple
certificates configured.
Note that it only works properly in OpenSSL 1.0.1d+, 1.0.0k, 0.9.8y+.
In older versions SSL_get_certificate() fails to return correct certificate
when the certificate status callback is called.
Both minor and major versions are now limited to 999 maximum. In case of
r->http_minor, this limit is already implied by the code. Major version,
r->http_major, in theory can be up to 65535 with current code, but such
values are very unlikely to become real (and, additionally, such values
are not allowed by RFC 7230), so the same test was used for r->http_major.
When it's known that the kernel supports EPOLLRDHUP, there is no need in
additional recv() call to get EOF or error when the flag is absent in the
event generated by the kernel. A special runtime test is done at startup
to detect if EPOLLRDHUP is actually supported by the kernel because
epoll_ctl() silently ignores unknown flags.
With this knowledge it's now possible to drop the "ready" flag for partial
read. Previously, the "ready" flag was kept until the recv() returned EOF
or error. In particular, this change allows the lingering close heuristics
(which relies on the "ready" flag state) to actually work on Linux, and not
wait for more data in most cases.
The "available" flag is now used in the read event with the semantics similar
to the corresponding counter in kqueue.
This parameter lets binding the proxy connection to a non-local address.
Upstream will see the connection as coming from that address.
When used with $remote_addr, upstream will accept the connection from real
client address.
Example:
proxy_bind $remote_addr transparent;
The WINDOW_UPDATE frame could be left in the output queue for an indefinite
period of time resulting in the request timeout.
This might happen if reading of the body was triggered by an event unrelated
to client connection, e.g. by the limit_req timer.
Particularly this prevents sending WINDOW_UPDATE with zero delta
which can result in PROTOCOL_ERROR.
Also removed surplus setting of no_flow_control to 0.
The ngx_thread_pool_done object isn't volatile, and at least some
compilers assume that it is permitted to reorder modifications of
volatile and non-volatile objects. Added appropriate ngx_memory_barrier()
calls to make sure all modifications will happen before the lock is released.
Reported by Mindaugas Rasiukevicius,
http://mailman.nginx.org/pipermail/nginx-devel/2016-April/008160.html.
Refusing streams is known to be incorrectly handled at least by IE, Edge
and Safari. Make sure to provide appropriate logging to simplify fixing
this in the affected browsers.
After the 92464ebace8e change, it has been discovered that not all
clients follow the RFC and handle RST_STREAM with NO_ERROR properly.
Notably, Chrome currently interprets it as INTERNAL_ERROR and discards
the response.
As a workaround, instead of RST_STREAM the maximum stream window update
will be sent, which will let client to send up to 2 GB of a request body
data before getting stuck on flow control. All the received data will
be silently discarded.
See for details:
http://mailman.nginx.org/pipermail/nginx-devel/2016-April/008143.htmlhttps://bugs.chromium.org/p/chromium/issues/detail?id=603182
A client is allowed to send requests before receiving and acknowledging
the SETTINGS frame. Such a client having a wrong idea about the stream's
could send the request body that nginx isn't ready to process.
The previous behavior was to send RST_STREAM with FLOW_CONTROL_ERROR in
such case, but it didn't allow retrying requests that have been rejected.
This prevents forming empty records out of such buffers. Particularly it fixes
double end-of-stream records with chunked transfer encoding, or when HTTP/2 is
used and the END_STREAM flag has been sent without data. In both cases there
is an empty buffer at the end of the request body chain with the "last_buf"
flag set.
The canonical libfcgi, as well as php implementation, tolerates such records,
while the HHVM parser is more strict and drops the connection (ticket #950).
There are two improvements:
1. Support for request body filters;
2. Receiving of request body is started only after
the ngx_http_read_client_request_body() call.
The last one fixes the problem when the client_max_body_size value might not be
respected from the right location if the location was changed either during the
process of receiving body or after the whole body had been received.
RFC 7540 states that "A server can send a complete response prior to the client
sending an entire request if the response does not depend on any portion of the
request that has not been sent and received. When this is true, a server MAY
request that the client abort transmission of a request without error by sending
a RST_STREAM with an error code of NO_ERROR after sending a complete response
(i.e., a frame with the END_STREAM flag)."
This should prevent a client from blocking on the stream window, since it isn't
maintained for closed streams. Currently, quite big initial stream windows are
used, so such blocking is very unlikly, but that will be changed in the further
patches.
SSLeay_version() and SSLeay() are no longer available if OPENSSL_API_COMPAT
is set to 0x10100000L. Switched to using OpenSSL_version() instead.
Additionally, we now compare version strings instead of version numbers,
and this correctly works for LibreSSL as well.
OPENSSL_config() deprecated in OpenSSL 1.1.0. Additionally,
SSL_library_init(), SSL_load_error_strings() and OpenSSL_add_all_algorithms()
are no longer available if OPENSSL_API_COMPAT is set to 0x10100000L.
The OPENSSL_init_ssl() function is now used instead with appropriate
arguments to trigger the same behaviour. The configure test changed to
use SSL_CTX_set_options().
Deinitialization now happens automatically in OPENSSL_cleanup() called
via atexit(3), so we no longer call EVP_cleanup() and ENGINE_cleanup()
directly.
LibreSSL defines OPENSSL_VERSION_NUMBER to 0x20000000L, but uses an old
API derived from OpenSSL at the time LibreSSL forked. As a result, every
version check we use to test for new API elements in newer OpenSSL versions
requires an explicit check for LibreSSL.
To reduce clutter, redefine OPENSSL_VERSION_NUMBER to 0x1000107fL if
LibreSSL is used. The same is done by FreeBSD port of LibreSSL.
Correct error code for NGX_EXDEV on Windows is ERROR_NOT_SAME_DEVICE,
"The system cannot move the file to a different disk drive".
Previously used ERROR_WRONG_DISK is about wrong diskette in the drive and
is not appropriate.
There is no real difference though, as MoveFile() is able to copy files
between disk drives, and will fail with ERROR_ACCESS_DENIED when asked
to copy directories. The ERROR_NOT_SAME_DEVICE error is only used
by MoveFileEx() when called without the MOVEFILE_COPY_ALLOWED flag.
On Windows there are two possible error codes which correspond to
the EEXIST error code: ERROR_FILE_EXISTS used by CreateFile(CREATE_NEW),
and ERROR_ALREADY_EXISTS used by CreateDirectory().
MoveFile() seems to use both: ERROR_ALREADY_EXISTS when moving within
one filesystem, and ERROR_FILE_EXISTS when copying a file to a different
drive.
By default, requests with non-idempotent methods (POST, LOCK, PATCH)
are no longer retried in case of errors if a request was already sent
to a backend. Previous behaviour can be restored by using
"proxy_next_upstream ... non_idempotent".
Much like normal connections, cached connections are now tested against
u->conf->next_upstream, and u->state->status is now always set.
This allows to disable additional tries even with upstream keepalive
by using "proxy_next_upstream off".
Fixes various aspects of --test-build-devpoll, --test-build-eventport, and
--test-build-epoll.
In particular, if --test-build-devpoll was used on Linux, then "devpoll"
event method would be preferred over "epoll". Also, wrong definitions of
event macros were chosen.
This fixes buffer over-read while using variables in the "proxy_pass",
"fastcgi_pass", "scgi_pass", and "uwsgi_pass" directives, where result
of string evaluation isn't null-terminated.
Found with MemorySanitizer.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
On nginx reload or binary upgrade, an attempt is made to inherit listen sockets
from the previous configuration. Previously, no check for socket type was made
and the inherited socket could have the wrong type. On binary upgrade, socket
type was not detected at all. Wrong socket type could lead to errors on that
socket due to different logic and unsupported syscalls. For example, a UDP
socket, inherited as TCP, lead to the following error after arrival of a
datagram: "accept() failed (102: Operation not supported on socket)".
It allows to turn off accumulation of small pool allocations into a big
preallocated chunk of memory. This is useful for debugging memory access
with sanitizer, since such accumulation can cover buffer overruns from
being detected.
This structure cannot be allocated as a large block anyway, otherwise that will
result in infinite recursion, since each large allocation requires to allocate
another ngx_pool_large_t.
The room for the structure is guaranteed by the NGX_MIN_POOL_SIZE constant.
When a keys_zone is full then each next request to the cache is
penalized. That is, the cache has to evict older files to get a
slot from the keys_zone synchronously. The patch introduces new
behavior in this scenario. Manager will try to maintain available
free slots in the keys_zone by cleaning old files in the background.
The "aio_write" directive is introduced, which enables use of aio
for writing. Currently it is meaningful only with "aio threads".
Note that aio operations can be done by both event pipe and output
chain, so proper mapping between r->aio and p->aio is provided when
calling ngx_event_pipe() and in output filter.
In collaboration with Valentin Bartenev.
The ngx_thread_write_chain_to_file() function introduced, which
uses ngx_file_t thread_handler, thread_ctx and thread_task fields.
The task context structure (ngx_thread_file_ctx_t) is the same for
both reading and writing, and can be safely shared as long as
operations are serialized.
The task->handler field is now always set (and not only when task is
allocated), as the same task can be used with different handlers.
The thread_write flag is introduced in the ngx_temp_file_t structure
to explicitly enable use of ngx_thread_write_chain_to_file() in
ngx_write_chain_to_temp_file() when supported by caller.
In collaboration with Valentin Bartenev.
This simplifies the interface of the ngx_thread_read() function.
Additionally, most of the thread operations now explicitly set
file->thread_task, file->thread_handler and file->thread_ctx,
to facilitate use of thread operations in other places.
(Potential problems remain with sendfile in threads though - it uses
file->thread_handler as set in ngx_output_chain(), and it should not
be overwritten to an incompatible one.)
In collaboration with Valentin Bartenev.
If a write event happens after sendfile() but before we've got the
sendfile results in the main thread, this write event will be ignored.
And if no more events will happen, the connection will hang.
Removing the events works in the simple cases, but not always, as
in some cases events are added back by an unrelated code. E.g.,
the upstream module adds write event in the ngx_http_upstream_init()
to track client aborts.
Fix is to use wev->complete instead. It is now set to 0 before
a sendfile() task is posted, and it is set to 1 once a write event
happens. If on completion of the sendfile() task wev->complete is 1,
we know that an event happened while we were executing sendfile(), and
the socket is still ready for writing even if sendfile() did not sent
all the data or returned EAGAIN.
While sendfilev() is documented to return -1 with EINVAL set
if the file was truncated, at least Solaris 11 silently returns 0,
and this results in CPU hog. Added a test to complain appropriately
if 0 is returned.
The main proxy function ngx_stream_proxy_process() can terminate the stream
session. The code, following it, should check its return code to make sure the
session still exists. This happens in client and upstream initialization
functions. Swapping ngx_stream_proxy_process() call with the code, that
follows it, leaves the same problem vice versa.
In future ngx_stream_proxy_process() will call ngx_stream_proxy_next_upstream()
making it too complicated to know if stream session still exists after this
call.
Now ngx_stream_proxy_process() is called from posted event handlers in both
places with no code following it. The posted event is automatically removed
once session is terminated.
It can now be set to "off" conditionally, e.g. using the map
directive.
An empty value will disable the emission of the Server: header
and the signature in error messages generated by nginx.
Any other value is treated as "on", meaning that full nginx
version is emitted in the Server: header and error messages
generated by nginx.
If proxy_cache is enabled, and proxy_no_cache tests true, it was previously
possible for the client connection to be closed after a 304. The fix is to
recheck r->header_only after the final cacheability is determined, and end the
request if no longer cacheable.
Example configuration:
proxy_cache foo;
proxy_cache_bypass 1;
proxy_no_cache 1;
If a client sends If-None-Match, and the upstream server returns 200 with a
matching ETag, no body should be returned to the client. At the start of
ngx_http_upstream_send_response proxy_no_cache is not yet tested, thus cacheable
is still 1 and downstream_error is set.
However, by the time the downstream_error check is done in process_request,
proxy_no_cache has been tested and cacheable is set to 0. The client connection
is then closed, regardless of keepalive.
If caching was used, "zero size buf in output" alerts might appear
in logs if a client prematurely closed connection. Alerts appeared
in the following situation:
- writing to client returned an error, so event pipe
drained all busy buffers leaving body output filters
in an invalid state;
- when upstream response was fully received,
ngx_http_upstream_finalize_request() tried to flush
all pending data.
Fix is to avoid flushing body if p->downstream_error is set.
Sendfile handlers (aio preload and thread handler) are called within
ctx->output_filter() in ngx_output_chain(), and hence ctx->aio cannot
be set directly in ngx_output_chain(). Meanwhile, it must be set to
make sure loop within ngx_output_chain() will be properly terminated.
There are no known cases that trigger the problem, though in theory
something like aio + sub filter (something that needs body in memory,
and can also free some memory buffers) + sendfile can result in
"task already active" and "second aio post" alerts.
The fix is to set ctx->aio in ngx_http_copy_aio_sendfile_preload()
and ngx_http_copy_thread_handler().
For consistency, ctx->aio is no longer set explicitly in
ngx_output_chain_copy_buf(), as it's now done in
ngx_http_copy_thread_handler().
If sendfile in threads is used, it is possible that multiple
subrequests will trigger multiple ngx_linux_sendfile_thread() calls,
as operations are only serialized in output chain based on r->aio,
that is, on subrequest level.
This resulted in "task #N already active" alerts, in particular, when
running proxy_store.t with "aio threads; sendfile on;".
Fix is to tolerate duplicate calls, with an additional safety check
that the file is the same as previously used.
The same problem also affects "aio on; sendfile on;" on FreeBSD
(previously known as "aio sendfile;"), where aio->preload_handler()
could be called multiple times due to similar reasons, resulting in
"second aio post" alerts. Fix is the same as well.
It is also believed that similar problems can arise if a filter
calls the next body filter multiple times for some reason. These are
mostly theoretical though.
Previously, there were only three timeouts used globally for the whole HTTP/2
connection:
1. Idle timeout for inactivity when there are no streams in processing
(the "http2_idle_timeout" directive);
2. Receive timeout for incomplete frames when there are no streams in
processing (the "http2_recv_timeout" directive);
3. Send timeout when there are frames waiting in the output queue
(the "send_timeout" directive on a server level).
Reaching one of these timeouts leads to HTTP/2 connection close.
This left a number of scenarios when a connection can get stuck without any
processing and timeouts:
1. A client has sent the headers block partially so nginx starts processing
a new stream but cannot continue without the rest of HEADERS and/or
CONTINUATION frames;
2. When nginx waits for the request body;
3. All streams are stuck on exhausted connection or stream windows.
The first idea that was rejected was to detect when the whole connection
gets stuck because of these situations and set the global receive timeout.
The disadvantage of such approach would be inconsistent behaviour in some
typical use cases. For example, if a user never replies to the browser's
question about where to save the downloaded file, the stream will be
eventually closed by a timeout. On the other hand, this will not happen
if there's some activity in other concurrent streams.
Now almost all the request timeouts work like in HTTP/1.x connections, so
the "client_header_timeout", "client_body_timeout", and "send_timeout" are
respected. These timeouts close the request.
The global timeouts work as before.
Previously, the c->write->delayed flag was abused to avoid setting timeouts on
stream events. Now, the "active" and "ready" flags are manipulated instead to
control the processing of individual streams.
This is required for implementing per request timeouts.
Previously, the temporary pool was used only during skipping of
headers and the request pool was used otherwise. That required
switching of pools if the request was closed while parsing.
It wasn't a problem since the request could be closed only after
the validation of the fully parsed header. With the per request
timeouts, the request can be closed at any moment, and switching
of pools in the middle of parsing header name or value becomes a
problem.
To overcome this, the temporary pool is now always created and
used. Special checks are added to keep it when either the stream
is being processed or until header block is fully parsed.
Since 667aaf61a778 (1.1.17) the ngx_http_parse_header_line() function can return
NGX_HTTP_PARSE_INVALID_HEADER when a header contains NUL character. In this
case the r->header_end pointer isn't properly initialized, but the log message
in ngx_http_process_request_headers() hasn't been adjusted. It used the pointer
in size calculation, which might result in up to 2k buffer over-read.
Found with afl-fuzz.
This fixes "called a function you should not call" and
"shutdown while in init" errors as observed with OpenSSL 1.0.2f
due to changes in how OpenSSL handles SSL_shutdown() during
SSL handshakes.
When the "pending" value is zero, the "buf" will be right shifted
by the width of its type, which results in undefined behavior.
Found by Coverity (CID 1352150).
Changes to NGX_MODULE_V1 and ngx_module_t in 85dea406e18f (1.9.11)
broke all modules written in C++, because ISO C++11 does not allow
conversion from string literal to char *.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
The auto/module script is extended to understand ngx_module_link=DYNAMIC.
When set, it links the module as a shared object rather than statically
into nginx binary. The module can later be loaded using the "load_module"
directive.
New auto/module parameter ngx_module_order allows to define module loading
order in complex cases. By default the order is set based on ngx_module_type.
3rd party modules can be compiled dynamically using the --add-dynamic-module
configure option, which will preset ngx_module_link to "DYNAMIC" before
calling the module config script.
Win32 support is rudimentary, and only works when using MinGW gcc (which
is able to handle exports/imports automatically).
In collaboration with Ruslan Ermilov.
Due to greater priority of the unary plus operator over the ternary operator
the expression didn't work as expected. That might result in one byte less
allocation than needed for the HEADERS frame buffer.
The previous code only parsed the first answer, without checking its
type, and required a compressed RR name.
The new code checks the RR type, supports responses with multiple
answers, and doesn't require the RR name to be compressed.
This has a side effect in limited support of CNAME. If a response
includes both CNAME and PTR RRs, like when recursion is enabled on
the server, PTR RR is handled.
Full CNAME support in PTR response is not implemented in this change.
Previously, a global server balancer was used to assign the next DNS server to
send a query to. That could lead to a non-uniform distribution of servers per
request. A request could be assigned to the same dead server several times in a
row and wait longer for a valid server or even time out without being processed.
Now each query is sent to all servers sequentially in a circle until a
response is received or timeout expires. Initial server for each request is
still globally balanced.
When several requests were waiting for a response, then after getting
a CNAME response only the last request's context had the name updated.
Contexts of other requests had the wrong name. This name was used by
ngx_resolve_name_done() to find the node to remove the request context
from. When the name was wrong, the request could not be properly
cancelled, its context was freed but stayed linked to the node's waiting
list. This happened e.g. when the first request was aborted or timed
out before the resolving completed. When it completed, this triggered
a use-after-free memory access by calling ctx->handler of already freed
request context. The bug manifests itself by
"could not cancel <name> resolving" alerts in error_log.
When a request was responded with a CNAME, the request context kept
the pointer to the original node's rn->u.cname. If the original node
expired before the resolving timed out or completed with an error,
this would trigger a use-after-free memory access via ctx->name in
ctx->handler().
The fix is to keep ctx->name unmodified. The name from context
is no longer used by ngx_resolve_name_done(). Instead, we now keep
the pointer to resolver node to which this request is linked.
Keeping the original name intact also improves logging.
When several requests were waiting for a response, then after getting
a CNAME response only the last request was properly processed, while
others were left waiting.
If one or more requests were waiting for a response, then after
getting a CNAME response, the timeout event on the first request
remained active, pointing to the wrong node with an empty
rn->waiting list, and that could cause either null pointer
dereference or use-after-free memory access if this timeout
expired.
If several requests were waiting for a response, and the first
request terminated (e.g., due to client closing a connection),
other requests were left without a timeout and could potentially
wait indefinitely.
This is fixed by introducing per-request independent timeouts.
This change also reverts 954867a2f0a6 and 5004210e8c78.
If enabled, workers are bound to available CPUs, each worker to once CPU
in order. If there are more workers than available CPUs, remaining are
bound in a loop, starting again from the first available CPU.
The optional mask parameter defines which CPUs are available for automatic
binding.
In collaboration with Vladimir Homutov.
With main request buffered, it's possible, that a slice subrequest will send
output before it. For example, while main request is waiting for aio read to
complete, a slice subrequest can start an aio operation as well. The order
in which aio callbacks are called is undetermined.
Skip SSL_CTX_set_tlsext_servername_callback in case of renegotiation.
Do nothing in SNI callback as in this case it will be supplied with
request in c->data which isn't expected and doesn't work this way.
This was broken by b40af2fd1c16 (1.9.6) with OpenSSL master branch and LibreSSL.
Splits a request into subrequests, each providing a specific range of response.
The variable "$slice_range" must be used to set subrequest range and proper
cache key. The directive "slice" sets slice size.
The following example splits requests into 1-megabyte cacheable subrequests.
server {
listen 8000;
location / {
slice 1m;
proxy_cache cache;
proxy_cache_key $uri$is_args$args$slice_range;
proxy_set_header Range $slice_range;
proxy_cache_valid 200 206 1h;
proxy_pass http://127.0.0.1:9000;
}
}
If an upstream with variables evaluated to address without a port,
then instead of a "no port in upstream" error an attempt was made
to connect() which failed with EADDRNOTAVAIL.
This fixes suboptimal behavior caused by surplus lseek() for sequential writes
on systems without pwrite(). A consecutive read after write might result in an
error on systems without pread() and pwrite().
Fortunately, at the moment there are no widely used systems without these
syscalls.
The HEADERS frame is always represented by more than one buffer since
b930e598a199, but the handling code hasn't been adjusted.
Only the first buffer of HEADERS frame was checked and if it had been
sent while others had not, the rest of the frame was dropped, resulting
in broken connection.
Before b930e598a199, the problem could only be seen in case of HEADERS
frame with CONTINUATION.
The r->invalid_header flag wasn't reset once an invalid header appeared in a
request, resulting in all subsequent headers in the request were also marked
as invalid.
The directive toggles conversion of HEAD to GET for cacheable proxy requests.
When disabled, $request_method must be added to cache key for consistency.
By default, HEAD is converted to GET as before.
OpenSSL doesn't check if the negotiated protocol has been announced.
As a result, the client might force using HTTP/2 even if it wasn't
enabled in configuration.
It caused inconsistency between setting "in_closed" flag and the moment when
the last DATA frame was actually read. As a result, the body buffer might not
be initialized properly in ngx_http_v2_init_request_body(), which led to a
segmentation fault in ngx_http_v2_state_read_data(). Also it might cause
start processing of incomplete body.
This issue could be triggered when the processing of a request was delayed,
e.g. in the limit_req or auth_request modules.
The code failed to ensure that "s" is within the buffer passed for
parsing when checking for "ms", and this resulted in unexpected errors when
parsing non-null-terminated strings with trailing "m". The bug manifested
itself when the expires directive was used with variables.
Found by Roman Arutyunyan.
Now it limits only the maximum length of literal string (either raw or
compressed) in HPACK request header fields. It's easier to understand
and to describe in the documentation.
Previous code has been based on assumption that the header block can only be
splitted at the borders of individual headers. That wasn't the case and might
result in emitting frames bigger than the frame size limit.
The current approach is to split header blocks by the frame size limit.
Previously, nginx worker would crash because of a double free
if client disconnected or timed out before sending all headers.
Found with afl-fuzz.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Previously, streams that were indirectly reprioritized (either because of
a new exclusive dependency on their parent or because of removal of their
parent from the dependency tree), didn't have their pointer to the parent
node updated.
This broke detection of circular dependencies and, as a result, nginx
worker would crash due to stack overflow whenever such dependency was
introduced.
Found with afl-fuzz.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Per RFC7540, a stream cannot depend on itself.
Previously, this requirement was enforced on PRIORITY frames, but not on
HEADERS frames and due to the implementation details nginx worker would
crash (stack overflow) while opening self-dependent stream.
Found with afl-fuzz.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
As setitimer() isn't available on Windows, time wasn't updated at all
if timer_resolution was used with the select event method. Fix is
to ignore timer_resolution in such cases.
This context is needed for shared sessions cache to work in configurations
with multiple virtual servers sharing the same port. Unfortunately, OpenSSL
does not provide an API to access the session context, thus storing it
separately.
In collaboration with Vladimir Homutov.
If no space left in buffer after adding formatting symbols, error message
could be left without terminating null. The fix is to output message using
actual length.
The code for displaying version info and configuration info seemed to be
cluttering up the main function. I was finding it hard to read main. This
extracts out all of the logic for displaying version and configuration info
into its own function, thus making main easier to read.
RAND_pseudo_bytes() is deprecated in the OpenSSL master branch, so the only
use was changed to RAND_bytes(). Access to internal structures is no longer
possible, so now we don't try to set SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS even
if it's defined.
Since an output buffer can only be used for either reading or sending, small
amounts of data left from the previous operation (due to some limits) must be
sent before nginx will be able to read further into the buffer. Using only
one output buffer can result in suboptimal behavior that manifests itself in
forming and sending too small chunks of data. This is particularly painful
with SPDY (or HTTP/2) where each such chunk needs to be prefixed with some
header.
The default flow-control window in HTTP/2 is 64k minus one bytes. With one
32k output buffer this results is one byte left after exhausting the window.
With two 32k buffers the data will be read into the second free buffer before
sending, thus the minimum output is increased to 32k + 1 bytes which is much
better.
A configuration like
server { server_name .foo^@; }
server { server_name .foo; }
resulted in a segmentation fault during construction of server names hash.
Reported by Markus Linnala.
Found with afl-fuzz.
A configuration with a named location inside a zero-length prefix
or regex location used to trigger a segmentation fault, as
ngx_http_core_location() failed to properly detect if a nested location
was created. Example configuration to reproduce the problem:
location "" {
location @foo {}
}
Fix is to not rely on a parent location name length, but rather check
command type we are currently parsing.
Identical fix is also applied to ngx_http_rewrite_if(), which used to
incorrectly assume the "if" directive is on server{} level in such
locations.
Reported by Markus Linnala.
Found with afl-fuzz.
This prevents a potential attack that discloses cached data if an attacker
will be able to craft a hash collision between some cache key the attacker
is allowed to access and another cache key with protected data.
See http://mailman.nginx.org/pipermail/nginx-devel/2015-September/007288.html.
Thanks to Gena Makhomed and Sergey Brester.
The value of NGX_ERROR, returned from filter handlers, was treated as a generic
upstream error and changed to NGX_HTTP_INTERNAL_SERVER_ERROR before calling
ngx_http_finalize_request(). This resulted in "header already sent" alert
if header was already sent in filter handlers.
The problem appeared in 54e9b83d00f0 (1.7.5).
This overflow has become possible after the change in 06e850859a26,
since concurrent subrequests are not limited now and each of them is
counted in r->main->count.
Resolved warnings about declarations that hide previous local declarations.
Warnings about WSASocketA() being deprecated resolved by explicit use of
WSASocketW() instead of WSASocket(). When compiling without IPv6 support,
WinSock deprecated warnings are disabled to allow use of gethostbyname().
The following configuration with alias, nested location and try_files
resulted in wrong file being used. Request "/foo/test.gif" tried to
use "/tmp//foo/test.gif" instead of "/tmp/test.gif":
location /foo/ {
alias /tmp/;
location ~ gif {
try_files $uri =405;
}
}
Additionally, rev. c985d90a8d1f introduced a regression if
the "/tmp//foo/test.gif" file was found (ticket #768). Resulting URI
was set to "gif?/foo/test.gif", as the code used clcf->name of current
location ("location ~ gif") instead of parent one ("location /foo/").
Fix is to use r->uri instead of clcf->name in all cases in the
ngx_http_core_try_files_phase() function. It is expected to be
already matched and identical to the clcf->name of the right
location.
If alias was used in a location given by a regular expression,
nginx used to do wrong thing in try_files if a location name (i.e.,
regular expression) was an exact prefix of URI. The following
configuration triggered a segmentation fault on a request to "/mail":
location ~ /mail {
alias /path/to/directory;
try_files $uri =404;
}
Reported by Per Hansson.
Iterating through all connections takes a lot of CPU time, especially
with large number of worker connections configured. As a result
nginx processes used to consume CPU time during graceful shutdown.
To mitigate this we now only do a full scan for idle connections when
shutdown signal is received.
Transitions of connections to idle ones are now expected to be
avoided if the ngx_exiting flag is set. The upstream keepalive module
was modified to follow this.
If nginx was used under OpenVZ and a container with nginx was suspended
and resumed, configuration tests started to fail because of EADDRINUSE
returned from listen() instead of bind():
# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] listen() to 0.0.0.0:80, backlog 511 failed (98: Address already in use)
nginx: configuration file /etc/nginx/nginx.conf test failed
With this change EADDRINUSE errors returned by listen() are handled
similarly to errors returned by bind(), and configuration tests work
fine in the same environment:
# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
More details about OpenVZ suspend/resume bug:
https://bugzilla.openvz.org/show_bug.cgi?id=2470
OCSP responses may contain no nextUpdate. As per RFC 6960, this means
that nextUpdate checks should be bypassed. Handle this gracefully by
using NGX_MAX_TIME_T_VALUE as "valid" in such a case.
The problem was introduced by 6893a1007a7c (1.9.2).
Reported by Matthew Baldwin.
Broken by 6893a1007a7c (1.9.2) during introduction of strict OCSP response
validity checks. As stapling file is expected to be returned unconditionally,
fix is to set its validity to the maximum supported time.
Reported by Faidon Liambotis.
Once upstream is connected, the upstream buffer is allocated. Previously, the
proxy module used the buffer allocation status to check if upstream is
connected. Now it's enough to check the flag.
If the -T option is passed, additionally to configuration test, configuration
files are output to stdout.
In the debug mode, configuration files are kept in memory and can be accessed
using a debugger.
The function is now called ngx_parse_http_time(), and can be used by
any code to parse HTTP-style date and time. In particular, it will be
used for OCSP stapling.
For compatibility, a macro to map ngx_http_parse_time() to the new name
provided for a while.
With this change it's no longer needed to pass -D_GNU_SOURCE manually,
and -D_FILE_OFFSET_BITS=64 is set to use 64-bit off_t.
Note that nginx currently fails to work properly with master process
enabled on GNU Hurd, as fcntl(F_SETOWN) returns EOPNOTSUPP for sockets
as of GNU Hurd 0.6. Additionally, our strerror() preloading doesn't
work well with GNU Hurd, as it uses large numbers for most errors.
When configured, an individual listen socket on a given address is
created for each worker process. This allows to reduce in-kernel lock
contention on configurations with high accept rates, resulting in better
performance. As of now it works on Linux and DragonFly BSD.
Note that on Linux incoming connection requests are currently tied up
to a specific listen socket, and if some sockets are closed, connection
requests will be reset, see https://lwn.net/Articles/542629/. With
nginx, this may happen if the number of worker processes is reduced.
There is no such problem on DragonFly BSD.
Based on previous work by Sepherosa Ziehau and Yingqi Lu.
There is no need to set "i" to 0, as it's expected to be 0 assuming
the bindings are properly sorted, and we already rely on this when
explicitly set hport->naddrs to 1. Remaining conditional code is
replaced with identical "hport->naddrs = i + 1".
Identical modifications are done in the mail and stream modules,
in the ngx_mail_optimize_servers() and ngx_stream_optimize_servers()
functions, respectively.
No functional changes.
This may happen if eventfd() returns ENOSYS, notably seen on CentOS 5.4.
Such a failure will now just disable the notification mechanism and let
the callers cope with it, instead of failing to start worker processes.
If thread pools are not configured, this can safely be ignored.
Two mechanisms are implemented to make it possible to store pointers
in shared memory on Windows, in particular on Windows Vista and later
versions with ASLR:
- The ngx_shm_remap() function added to allow remapping of a shared memory
zone to the address originally used for it in the master process. While
important, it doesn't solve the problem by itself as in many cases it's
not possible to use the address because of conflicts with other
allocations.
- We now create mappings at the same address in all processes by starting
mappings at predefined addresses normally unused by newborn processes.
These two mechanisms combined allow to use shared memory on Windows
almost without problems, including reloads.
Based on the patch by Sergey Brester:
http://mailman.nginx.org/pipermail/nginx-devel/2015-April/006836.html
It's now enough to specify proxy_protocol option in one listen directive to
enable it in all servers listening on the same address/port. Previously,
the setting from the first directive was always used.
When client or upstream connection is closed, level-triggered read event
remained active until the end of the session leading to cpu hog. Now the flag
NGX_CLOSE_EVENT is used to unschedule the event.
If a peer was initially skipped due to max_fails, there's no reason
not to try it again if enough time has passed, and the next_upstream
logic is in action.
This also reduces diffs with NGINX Plus.
Similar to ngx_http_file_cache_set_slot(), the last component of file->name
with a fixed length of 10 bytes, as generated in ngx_create_temp_path(), is
used as a source for the names of intermediate subdirectories with each one
taking its own part. Ensure that the sum of specified levels with slashes
fits into the length (ticket #731).
Missing call to X509_STORE_CTX_free when X509_STORE_CTX_init fails.
Missing call to OCSP_CERTID_free when OCSP_request_add0_id fails.
Possible leaks in vary particular scenariis of memory shortage.
This helps to avoid suboptimal behavior when a client waits for a control
frame or more data to increase window size, but the frames have been delayed
in the socket buffer.
The delays can be caused by bad interaction between Nagle's algorithm on
nginx side and delayed ACK on the client side or by TCP_CORK/TCP_NOPUSH
if SPDY was working without SSL and sendfile() was used.
The pushing code is now very similar to ngx_http_set_keepalive().
If any preread body bytes were sent in the first chain, chunk size was
incorrectly added before the whole chain, including header, resulting in
an invalid request sent to upstream. Fixed to properly add chunk size
after the header.
The r->request_body_no_buffering flag was introduced. It instructs
client request body reading code to avoid reading the whole body, and
to call post_handler early instead. The caller should use the
ngx_http_read_unbuffered_request_body() function to read remaining
parts of the body.
Upstream module is now able to use this mode, if configured with
the proxy_request_buffering directive.
If the last header evaluation resulted in an empty header, the e.skip flag
was set and was not reset when we've switched to evaluation of body_values.
This incorrectly resulted in body values being skipped instead of producing
some correct body as set by proxy_set_body. Fix is to properly reset
the e.skip flag.
As the problem only appeared if the last potentially non-empty header
happened to be empty, it only manifested itself if proxy_set_body was used
with proxy_cache.
The SSL_MODE_NO_AUTO_CHAIN mode prevents OpenSSL from automatically
building a certificate chain on the fly if there is no certificate chain
explicitly provided. Before this change, certificates provided via the
ssl_client_certificate and ssl_trusted_certificate directives were
used by OpenSSL to automatically build certificate chains, resulting
in unexpected (and in some cases unneeded) chains being sent to clients.
LibreSSL removed support for export ciphers and a call to
SSL_CTX_set_tmp_rsa_callback() results in an error left in the error
queue. This caused alerts "ignoring stale global SSL error (...called
a function you should not call) while SSL handshaking" on a first connection
in each worker process.
LibreSSL 2.1.1+ started to set SSL_OP_NO_SSLv3 option by default on
new contexts. This makes sure to clear it to make it possible to use SSLv3
with LibreSSL if enabled in nginx config.
Prodded by Kuramoto Eiji.
Example of usage:
error_log memory:16m debug;
This allows to configure debug logging with minimum impact on performance.
It's especially useful when rare crashes are experienced under high load.
The log can be extracted from a coredump using the following gdb script:
set $log = ngx_cycle->log
while $log->writer != ngx_log_memory_writer
set $log = $log->next
end
set $buf = (ngx_log_memory_buf_t *) $log->wdata
dump binary memory debug_log.txt $buf->start $buf->end
Keeping the ready flag in this case might results in missing notification of
broken connection until nginx tried to use it again.
While there, stale comment about stale event was removed since this function
is also can be called directly.
The code that calls sendfile() was cut into a separate function.
This simplifies EINTR processing, yet is needed for the following
changes that add threads support.
In case of filter finalization, r->upstream might be changed during
the ngx_event_pipe() call. Added an argument to preserve it while
calling the ngx_http_upstream_process_request() function.
A request may be already finalized when ngx_http_upstream_finalize_request()
is called, due to filter finalization: after filter finalization upstream
can be finalized via ngx_http_upstream_cleanup(), either from
ngx_http_terminate_request(), or because a new request was initiated
to an upstream. Then the upstream code will see an error returned from
the filter chain and will call the ngx_http_upstream_finalize_request()
function again.
To prevent corruption of various upstream data in this situation, make sure
to do nothing but merely call ngx_http_finalize_request().
Prodded by Yichun Zhang, for details see the thread at
http://nginx.org/pipermail/nginx-devel/2015-February/006539.html.
Previously, connection hung after calling ngx_http_ssl_handshake() with
rev->ready set and no bytes in socket to read. It's possible in at least the
following cases:
- when processing a connection with expired TCP_DEFER_ACCEPT on Linux
- after parsing PROXY protocol header if it arrived in a separate TCP packet
Thanks to James Hamlin.
When replacing a stale cache entry, its last_modified and etag could be
inherited from the old entry if the response code is not 200 or 206. Moreover,
etag could be inherited with any response code if it's missing in the new
response. As a result, the cache entry is left with invalid last_modified or
etag which could lead to broken revalidation.
For example, when a file is deleted from backend, its last_modified is copied to
the new 404 cache entry and is used later for revalidation. Once the old file
appears again with its original timestamp, revalidation succeeds and the cached
404 response is sent to client instead of the file.
The problem appeared with etags in 44b9ab7752e3 (1.7.3) and affected
last_modified in 1573fc7875fa (1.7.9).
Repeatedly calling ngx_http_upstream_add_chash_point() to create
the points array in sorted order, is O(n^2) to the total weight.
This can cause nginx startup and reconfigure to be substantially
delayed. For example, when total weight is 1000, startup takes
5s on a modern laptop.
Replace this with a linear insertion followed by QuickSort and
duplicates removal. Startup for total weight of 1000 reduces to 40ms.
Based on a patch by Wai Keen Woon.
Previously, the Auth-SSL-Verify header with the "NONE" value was always passed
to the auth_http script if verification of client certificates is disabled.
The "ssl_verify_client", "ssl_verify_depth", "ssl_client_certificate",
"ssl_trusted_certificate", and "ssl_crl" directives introduced to control
SSL client certificate verification in mail proxy module.
If there is a certificate, detail of the certificate are passed to
the auth_http script configured via Auth-SSL-Verify, Auth-SSL-Subject,
Auth-SSL-Issuer, Auth-SSL-Serial, Auth-SSL-Fingerprint headers. If
the auth_http_pass_client_cert directive is set, client certificate
in PEM format will be passed in the Auth-SSL-Cert header (urlencoded).
If there is no required certificate provided during an SSL handshake
or certificate verification fails then a protocol-specific error is
returned after the SSL handshake and the connection is closed.
Based on previous work by Sven Peter, Franck Levionnois and Filipe Da Silva.
Initial size as calculated from the number of elements may be bigger
than max_size. If this happens, make sure to set size to max_size.
Reported by Chris West.
Previously, this function checked for connection local address existence
and returned error if it was missing. Now a new address is assigned in this
case making it possible to call this function not only for accepted connections.
It appeared that the NGX_HAVE_AIO_SENDFILE macro was defined regardless of
the "--with-file-aio" configure option and the NGX_HAVE_FILE_AIO macro.
Now they are related.
Additionally, fixed one macro.
This reduces layering violation and simplifies the logic of AIO preread, since
it's now triggered by the send chain function itself without falling back to
the copy filter. The context of AIO operation is now stored per file buffer,
which makes it possible to properly handle cases when multiple buffers come
from different locations, each with its own configuration.
If fastcgi_pass (or any look-alike that doesn't imply a default
port) is specified as an IP literal (as opposed to a hostname),
port absence was not detected at configuration time and could
result in EADDRNOTAVAIL at run time.
Fixed this in such a way that configs like
http {
server {
location / {
fastcgi_pass 127.0.0.1;
}
}
upstream 127.0.0.1 {
server 10.0.0.1:12345;
}
}
still work. That is, port absence check is delayed until after
we make sure there's no explicit upstream with such a name.
The mtx->wait counter was not decremented if we were able to obtain the lock
right after incrementing it. This resulted in unneeded sem_post() calls,
eventually leading to EOVERFLOW errors being logged, "sem_post() failed
while wake shmtx (75: Value too large for defined data type)".
To close the race, mtx->wait is now decremented if we obtain the lock right
after incrementing it in ngx_shmtx_lock(). The result can become -1 if a
concurrent ngx_shmtx_unlock() decrements mtx->wait before the added code does.
However, that only leads to one extra iteration in the next call of
ngx_shmtx_lock().
The use_temp_path http cache feature is now implemented using a separate temp
hierarchy in cache directory. Prefix-based temp files are no longer needed.
If use_temp_path is set to off, a subdirectory "temp" is created in the cache
directory. It's used instead of proxy_temp_path and friends for caching
upstream response.
The trailer.count variable was not initialized if there was a header,
resulting in "sendfile() failed (22: Invalid argument)" alerts on OS X
if the "sendfile" directive was used. The bug was introduced
in 8e903522c17a (1.7.8).
Some parts of code related to handling variants of a resource moved into
a separate function that is called earlier. This allows to use cache file
name as a prefix for temporary file in the following patch.
The original check for NGX_AGAIN was surplus, since the function returns
only NGX_OK or NGX_ERROR. Now it looks similar to other places.
No functional changes.
The configuration handling code has changed to look similar to the proxy_store
directive and friends. This simplifies adding variable support in the following
patch.
No functional changes.
Currently, storing and caching mechanisms cannot work together, and a
configuration error is thrown when the proxy_store and proxy_cache
directives (as well as their friends) are configured on the same level.
But configurations like in the example below were allowed and could result
in critical errors in the error log:
proxy_store on;
location / {
proxy_cache one;
}
Only proxy_store worked in this case.
For more predictable and errorless behavior these directives now prevent
each other from being inherited from the previous level.
This changes internal API related to handling of the "store"
flag in ngx_http_upstream_conf_t. Previously, a non-null value
of "store_lengths" was enough to enable store functionality with
custom path. Now, the "store" flag is also required to be set.
No functional changes.
The proxy_store, fastcgi_store, scgi_store and uwsgi_store were inherited
incorrectly if a directive with variables was defined, and then redefined
to the "on" value, i.e. in configurations like:
proxy_store /data/www$upstream_http_x_store;
location / {
proxy_store on;
}
In the following configuration request was sent to a backend without
URI changed to '/' due to if:
location /proxy-pass-uri {
proxy_pass http://127.0.0.1:8080/;
set $true 1;
if ($true) {
# nothing
}
}
Fix is to inherit conf->location from the location where proxy_pass was
configured, much like it's done with conf->vars.
The proxy_pass directive and other handlers are not expected to be inherited
into nested locations, but there is a special code to inherit upstream
handlers into limit_except blocks, as well as a configuration into if{}
blocks. This caused incorrect behaviour in configurations with nested
locations and limit_except blocks, like this:
location / {
proxy_pass http://u;
location /inner/ {
# no proxy_pass here
limit_except GET {
# nothing
}
}
}
In such a configuration the limit_except block inside "location /inner/"
unexpectedly used proxy_pass defined in "location /", while it shouldn't.
Fix is to avoid inheritance of conf->upstream.upstream (and
conf->proxy_lengths) into locations which don't have noname flag.
Instead of independant inheritance of conf->upstream.upstream (proxy_pass
without variables) and conf->proxy_lengths (proxy_pass with variables)
we now test them both and inherit only if neither is set. Additionally,
SSL context is also inherited only in this case now.
Based on the patch by Alexey Radkov.
RFC7232 says:
The 304 (Not Modified) status code indicates that a conditional GET
or HEAD request has been received and would have resulted in a 200
(OK) response if it were not for the fact that the condition
evaluated to false.
which means that there is no reason to send requests with "If-None-Match"
and/or "If-Modified-Since" headers for responses cached with other status
codes.
Also, sending conditional requests for responses cached with other status
codes could result in a strange behavior, e.g. upstream server returning
304 Not Modified for cached 404 Not Found responses, etc.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
In case of a cache lock timeout and in the aio handler we now call
r->write_event_handler() instead of a connection write handler,
to make sure to run appropriate subrequest. Previous code failed to run
inactive subrequests and hence resulted in suboptimal behaviour, see
report by Yichun Zhang:
http://mailman.nginx.org/pipermail/nginx-devel/2013-October/004435.html
(Infinite hang claimed in the report seems impossible without 3rd party
modules, as subrequests will be eventually woken up by the postpone filter.)
To ensure proper logging make sure to set current_request in all event
handlers, including resolve, ssl handshake, cache lock wait timer and
aio read handlers. A macro ngx_http_set_log_request() introduced to
simplify this.
The alert was introduced in 03ff14058272 (1.5.4), and was triggered on each
post_action invocation.
There is no real need to call header filters in case of post_action,
so return NGX_OK from ngx_http_send_header() if r->post_action is set.
This helps to avoid delays in sending the last chunk of data because
of bad interaction between Nagle's algorithm on nginx side and
delayed ACK on the client side.
Delays could also be caused by TCP_CORK/TCP_NOPUSH if SPDY was
working without SSL and sendfile() was used.
In 954867a2f0a6, we switched to using resolver node as the timer event data.
This broke debug event logging.
Replaced now unused ngx_resolver_ctx_t.ident with ngx_resolver_node_t.ident
so that ngx_event_ident() extracts something sensible when accessing
ngx_resolver_node_t as ngx_connection_t.
In 954867a2f0a6, we switched to using resolver node as the
timer event data, so make sure we do not free resolver node
memory until the corresponding timer is deleted.
There was no real problem since the amount of bytes can be sent is limited by
NGX_SENDFILE_MAXSIZE to less than 2G. But that can be changed in the future
Though ngx_solaris_sendfilev_chain() shouldn't suffer from the problem mentioned
in d1bde5c3c5d2 since currently IOV_MAX on Solaris is 16, but this follows the
change from 3d5717550371 in order to make the code look similar to other systems
and potentially eliminates the problem in the future.
The upstream modules remove and alter a number of client headers
before sending the request to upstream. This set of headers is
smaller or even empty when cache is disabled.
It's still possible that a request in a cache-enabled location is
uncached, for example, if cache entry counter is below min_uses.
In this case it's better to alter a smaller set of headers and
pass more client headers to backend unchanged. One of the benefits
is enabling server-side byte ranges in such requests.
Once this age is reached, the cache lock is discarded and another
request can acquire the lock. Requests which failed to acquire
the lock are not allowed to cache the response.
For further progress a new buffer must be at least two bytes larger than
the remaining unparsed data. One more byte is needed for null-termination
and another one for further progress. Otherwise inflate() fails with
Z_BUF_ERROR.
Instead of collecting a number of the possible SSL_CTX_use_PrivateKey_file()
error codes that becomes more and more difficult with the rising variety of
OpenSSL versions and its derivatives, just continue with the next password.
Multiple passwords in a single ssl_password_file feature was broken after
recent OpenSSL changes (commit 4aac102f75b517bdb56b1bcfd0a856052d559f6e).
Affected OpenSSL releases: 0.9.8zc, 1.0.0o, 1.0.1j and 1.0.2-beta3.
Reported by Piotr Sikora.
Previously, nginx would emit empty values in a header with multiple,
NULL-separated values.
This is forbidden by the SPDY specification, which requires headers to
have either a single (possibly empty) value or multiple, NULL-separated
non-empty values.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
When got multiple upstream IP addresses using DNS resolving, the number of
upstreams tries and the maxinum time spent for these tries were not affected.
This patch fixed it.
Spaces in Accept-Charset, Accept-Encoding, and Accept-Language headers
are now ignored. As per syntax of these headers spaces can only appear
in places where they are optional.
If a variant stored can't be used to respond to a request, the variant
hash is used as a secondary key.
Additionally, if we previously switched to a secondary key, while storing
a response to cache we check if the variant hash still apply. If not, we
switch back to the original key, to handle cases when Vary changes.
To cache responses with Vary, we now calculate hash of headers listed
in Vary, and return the response from cache only if new request headers
match.
As of now, only one variant of the same resource can be stored in cache.
Previous code resulted in transfer stalls when client happened
to read all the data in buffers at once, while all gzip buffers
were exhausted (but ctx->nomem wasn't set). Make sure to call
next body filter at least once per call if there are busy buffers.
Additionally, handling of calls with NULL chain was changed to follow
the same logic, i.e., next body filter is only called with NULL chain
if there are busy buffers. This is expected to fix "output chain is empty"
alerts as reported by some users after c52a761a2029 (1.5.7).
GetQueuedCompletionStatus() document on MSDN says the
following signature:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364986.aspx
BOOL WINAPI GetQueuedCompletionStatus(
_In_ HANDLE CompletionPort,
_Out_ LPDWORD lpNumberOfBytes,
_Out_ PULONG_PTR lpCompletionKey,
_Out_ LPOVERLAPPED *lpOverlapped,
_In_ DWORD dwMilliseconds
);
In the latest specification, the type of the third argument
(lpCompletionKey) is PULONG_PTR not LPDWORD.
Due to the u->headers_in.last_modified_time not being correctly initialized,
this variable was evaluated to "Thu, 01 Jan 1970 00:00:00 GMT" for responses
cached without the "Last-Modified" header which resulted in subsequent proxy
requests being sent with "If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT"
header.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
Previously, a value of the "send" variable wasn't properly adjusted
in a rare case when syscall was interrupted by a signal. As a result,
these functions could send less data than the limit allows.
The c->sent is reset to 0 on each request by server-side http code,
so do the same on client side. This allows to count number of bytes
sent in a particular request.
One intentional side effect of this change is that key is allowed only
in the first position. Previously, it was possible to specify the key
variable at any position, but that was never documented, and is contrary
with nginx configuration practice for positional parameters.
One intentional side effect of this change is that key is allowed only
in the first position. Previously, it was possible to specify the key
variable at any position, but that was never documented, and is contrary
to nginx configuration practice for positional parameters.
If a syslog daemon is restarted and the unix socket is used, further logging
might stop to work. In case of send error, socket is closed, forcing
a reconnection at the next logging attempt.
The ngx_cycle->log is used when sending the message. This allows to log syslog
send errors in another log.
Logging to syslog after its cleanup handler has been executed was prohibited.
Previously, this was possible from ngx_destroy_pool(), which resulted in error
messages caused by attempts to write into the closed socket.
The "processing" flag is renamed to "busy" to better match its semantics.
Previously, a file buffer start position was reset to the file start.
Now it's reset to the previous file buffer end. This fixes
reinitialization of requests having multiple successive parts of a
single file. Such requests are generated by fastcgi module.
This prevents inappropriate session reuse in unrelated server{}
blocks, while preserving ability to restore sessions on other servers
when using TLS Session Tickets.
Additionally, session context is now set even if there is no session cache
configured. This is needed as it's also used for TLS Session Tickets.
Thanks to Antoine Delignat-Lavaud and Piotr Sikora.
The new directives {proxy,fastcgi,scgi,uwsgi,memcached}_next_upstream_tries
and {proxy,fastcgi,scgi,uwsgi,memcached}_next_upstream_timeout limit
the number of upstreams tried and the maximum time spent for these tries
when searching for a valid upstream.
When memory allocation failed in ngx_http_upstream_cache(), the connection
would be terminated directly in ngx_http_upstream_init_request().
Return a INTERNAL_SERVER_ERROR response instead.
The ngx_init_setproctitle() function, as used on systems without
setproctitle(3), may fail due to memory allocation errors, and
therefore its return code needs to be checked.
Reported by Markus Linnala.
The etag->hash must be set to 0 to avoid an empty ETag header being
returned with the 500 Internal Server Error page after the memory
allocation failure.
Reported by Markus Linnala.
Some of the OpenSSL forks (read: BoringSSL) started removing unused,
no longer necessary and/or not really working bug workarounds along
with the SSL options and defines for them.
Instead of fixing nginx build after each removal, be proactive
and guard use of all SSL options for bug workarounds.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>