When a keys_zone is full then each next request to the cache is
penalized. That is, the cache has to evict older files to get a
slot from the keys_zone synchronously. The patch introduces new
behavior in this scenario. Manager will try to maintain available
free slots in the keys_zone by cleaning old files in the background.
The "aio_write" directive is introduced, which enables use of aio
for writing. Currently it is meaningful only with "aio threads".
Note that aio operations can be done by both event pipe and output
chain, so proper mapping between r->aio and p->aio is provided when
calling ngx_event_pipe() and in output filter.
In collaboration with Valentin Bartenev.
The ngx_thread_write_chain_to_file() function introduced, which
uses ngx_file_t thread_handler, thread_ctx and thread_task fields.
The task context structure (ngx_thread_file_ctx_t) is the same for
both reading and writing, and can be safely shared as long as
operations are serialized.
The task->handler field is now always set (and not only when task is
allocated), as the same task can be used with different handlers.
The thread_write flag is introduced in the ngx_temp_file_t structure
to explicitly enable use of ngx_thread_write_chain_to_file() in
ngx_write_chain_to_temp_file() when supported by caller.
In collaboration with Valentin Bartenev.
This simplifies the interface of the ngx_thread_read() function.
Additionally, most of the thread operations now explicitly set
file->thread_task, file->thread_handler and file->thread_ctx,
to facilitate use of thread operations in other places.
(Potential problems remain with sendfile in threads though - it uses
file->thread_handler as set in ngx_output_chain(), and it should not
be overwritten to an incompatible one.)
In collaboration with Valentin Bartenev.
If a write event happens after sendfile() but before we've got the
sendfile results in the main thread, this write event will be ignored.
And if no more events will happen, the connection will hang.
Removing the events works in the simple cases, but not always, as
in some cases events are added back by an unrelated code. E.g.,
the upstream module adds write event in the ngx_http_upstream_init()
to track client aborts.
Fix is to use wev->complete instead. It is now set to 0 before
a sendfile() task is posted, and it is set to 1 once a write event
happens. If on completion of the sendfile() task wev->complete is 1,
we know that an event happened while we were executing sendfile(), and
the socket is still ready for writing even if sendfile() did not sent
all the data or returned EAGAIN.
While sendfilev() is documented to return -1 with EINVAL set
if the file was truncated, at least Solaris 11 silently returns 0,
and this results in CPU hog. Added a test to complain appropriately
if 0 is returned.
The main proxy function ngx_stream_proxy_process() can terminate the stream
session. The code, following it, should check its return code to make sure the
session still exists. This happens in client and upstream initialization
functions. Swapping ngx_stream_proxy_process() call with the code, that
follows it, leaves the same problem vice versa.
In future ngx_stream_proxy_process() will call ngx_stream_proxy_next_upstream()
making it too complicated to know if stream session still exists after this
call.
Now ngx_stream_proxy_process() is called from posted event handlers in both
places with no code following it. The posted event is automatically removed
once session is terminated.
It can now be set to "off" conditionally, e.g. using the map
directive.
An empty value will disable the emission of the Server: header
and the signature in error messages generated by nginx.
Any other value is treated as "on", meaning that full nginx
version is emitted in the Server: header and error messages
generated by nginx.
If proxy_cache is enabled, and proxy_no_cache tests true, it was previously
possible for the client connection to be closed after a 304. The fix is to
recheck r->header_only after the final cacheability is determined, and end the
request if no longer cacheable.
Example configuration:
proxy_cache foo;
proxy_cache_bypass 1;
proxy_no_cache 1;
If a client sends If-None-Match, and the upstream server returns 200 with a
matching ETag, no body should be returned to the client. At the start of
ngx_http_upstream_send_response proxy_no_cache is not yet tested, thus cacheable
is still 1 and downstream_error is set.
However, by the time the downstream_error check is done in process_request,
proxy_no_cache has been tested and cacheable is set to 0. The client connection
is then closed, regardless of keepalive.
If caching was used, "zero size buf in output" alerts might appear
in logs if a client prematurely closed connection. Alerts appeared
in the following situation:
- writing to client returned an error, so event pipe
drained all busy buffers leaving body output filters
in an invalid state;
- when upstream response was fully received,
ngx_http_upstream_finalize_request() tried to flush
all pending data.
Fix is to avoid flushing body if p->downstream_error is set.
Sendfile handlers (aio preload and thread handler) are called within
ctx->output_filter() in ngx_output_chain(), and hence ctx->aio cannot
be set directly in ngx_output_chain(). Meanwhile, it must be set to
make sure loop within ngx_output_chain() will be properly terminated.
There are no known cases that trigger the problem, though in theory
something like aio + sub filter (something that needs body in memory,
and can also free some memory buffers) + sendfile can result in
"task already active" and "second aio post" alerts.
The fix is to set ctx->aio in ngx_http_copy_aio_sendfile_preload()
and ngx_http_copy_thread_handler().
For consistency, ctx->aio is no longer set explicitly in
ngx_output_chain_copy_buf(), as it's now done in
ngx_http_copy_thread_handler().
If sendfile in threads is used, it is possible that multiple
subrequests will trigger multiple ngx_linux_sendfile_thread() calls,
as operations are only serialized in output chain based on r->aio,
that is, on subrequest level.
This resulted in "task #N already active" alerts, in particular, when
running proxy_store.t with "aio threads; sendfile on;".
Fix is to tolerate duplicate calls, with an additional safety check
that the file is the same as previously used.
The same problem also affects "aio on; sendfile on;" on FreeBSD
(previously known as "aio sendfile;"), where aio->preload_handler()
could be called multiple times due to similar reasons, resulting in
"second aio post" alerts. Fix is the same as well.
It is also believed that similar problems can arise if a filter
calls the next body filter multiple times for some reason. These are
mostly theoretical though.
Previously, there were only three timeouts used globally for the whole HTTP/2
connection:
1. Idle timeout for inactivity when there are no streams in processing
(the "http2_idle_timeout" directive);
2. Receive timeout for incomplete frames when there are no streams in
processing (the "http2_recv_timeout" directive);
3. Send timeout when there are frames waiting in the output queue
(the "send_timeout" directive on a server level).
Reaching one of these timeouts leads to HTTP/2 connection close.
This left a number of scenarios when a connection can get stuck without any
processing and timeouts:
1. A client has sent the headers block partially so nginx starts processing
a new stream but cannot continue without the rest of HEADERS and/or
CONTINUATION frames;
2. When nginx waits for the request body;
3. All streams are stuck on exhausted connection or stream windows.
The first idea that was rejected was to detect when the whole connection
gets stuck because of these situations and set the global receive timeout.
The disadvantage of such approach would be inconsistent behaviour in some
typical use cases. For example, if a user never replies to the browser's
question about where to save the downloaded file, the stream will be
eventually closed by a timeout. On the other hand, this will not happen
if there's some activity in other concurrent streams.
Now almost all the request timeouts work like in HTTP/1.x connections, so
the "client_header_timeout", "client_body_timeout", and "send_timeout" are
respected. These timeouts close the request.
The global timeouts work as before.
Previously, the c->write->delayed flag was abused to avoid setting timeouts on
stream events. Now, the "active" and "ready" flags are manipulated instead to
control the processing of individual streams.
This is required for implementing per request timeouts.
Previously, the temporary pool was used only during skipping of
headers and the request pool was used otherwise. That required
switching of pools if the request was closed while parsing.
It wasn't a problem since the request could be closed only after
the validation of the fully parsed header. With the per request
timeouts, the request can be closed at any moment, and switching
of pools in the middle of parsing header name or value becomes a
problem.
To overcome this, the temporary pool is now always created and
used. Special checks are added to keep it when either the stream
is being processed or until header block is fully parsed.
Since 667aaf61a778 (1.1.17) the ngx_http_parse_header_line() function can return
NGX_HTTP_PARSE_INVALID_HEADER when a header contains NUL character. In this
case the r->header_end pointer isn't properly initialized, but the log message
in ngx_http_process_request_headers() hasn't been adjusted. It used the pointer
in size calculation, which might result in up to 2k buffer over-read.
Found with afl-fuzz.
This fixes "called a function you should not call" and
"shutdown while in init" errors as observed with OpenSSL 1.0.2f
due to changes in how OpenSSL handles SSL_shutdown() during
SSL handshakes.
When the "pending" value is zero, the "buf" will be right shifted
by the width of its type, which results in undefined behavior.
Found by Coverity (CID 1352150).
Changes to NGX_MODULE_V1 and ngx_module_t in 85dea406e18f (1.9.11)
broke all modules written in C++, because ISO C++11 does not allow
conversion from string literal to char *.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
The auto/module script is extended to understand ngx_module_link=DYNAMIC.
When set, it links the module as a shared object rather than statically
into nginx binary. The module can later be loaded using the "load_module"
directive.
New auto/module parameter ngx_module_order allows to define module loading
order in complex cases. By default the order is set based on ngx_module_type.
3rd party modules can be compiled dynamically using the --add-dynamic-module
configure option, which will preset ngx_module_link to "DYNAMIC" before
calling the module config script.
Win32 support is rudimentary, and only works when using MinGW gcc (which
is able to handle exports/imports automatically).
In collaboration with Ruslan Ermilov.
Due to greater priority of the unary plus operator over the ternary operator
the expression didn't work as expected. That might result in one byte less
allocation than needed for the HEADERS frame buffer.
The previous code only parsed the first answer, without checking its
type, and required a compressed RR name.
The new code checks the RR type, supports responses with multiple
answers, and doesn't require the RR name to be compressed.
This has a side effect in limited support of CNAME. If a response
includes both CNAME and PTR RRs, like when recursion is enabled on
the server, PTR RR is handled.
Full CNAME support in PTR response is not implemented in this change.
Previously, a global server balancer was used to assign the next DNS server to
send a query to. That could lead to a non-uniform distribution of servers per
request. A request could be assigned to the same dead server several times in a
row and wait longer for a valid server or even time out without being processed.
Now each query is sent to all servers sequentially in a circle until a
response is received or timeout expires. Initial server for each request is
still globally balanced.
When several requests were waiting for a response, then after getting
a CNAME response only the last request's context had the name updated.
Contexts of other requests had the wrong name. This name was used by
ngx_resolve_name_done() to find the node to remove the request context
from. When the name was wrong, the request could not be properly
cancelled, its context was freed but stayed linked to the node's waiting
list. This happened e.g. when the first request was aborted or timed
out before the resolving completed. When it completed, this triggered
a use-after-free memory access by calling ctx->handler of already freed
request context. The bug manifests itself by
"could not cancel <name> resolving" alerts in error_log.
When a request was responded with a CNAME, the request context kept
the pointer to the original node's rn->u.cname. If the original node
expired before the resolving timed out or completed with an error,
this would trigger a use-after-free memory access via ctx->name in
ctx->handler().
The fix is to keep ctx->name unmodified. The name from context
is no longer used by ngx_resolve_name_done(). Instead, we now keep
the pointer to resolver node to which this request is linked.
Keeping the original name intact also improves logging.
When several requests were waiting for a response, then after getting
a CNAME response only the last request was properly processed, while
others were left waiting.
If one or more requests were waiting for a response, then after
getting a CNAME response, the timeout event on the first request
remained active, pointing to the wrong node with an empty
rn->waiting list, and that could cause either null pointer
dereference or use-after-free memory access if this timeout
expired.
If several requests were waiting for a response, and the first
request terminated (e.g., due to client closing a connection),
other requests were left without a timeout and could potentially
wait indefinitely.
This is fixed by introducing per-request independent timeouts.
This change also reverts 954867a2f0a6 and 5004210e8c78.
If enabled, workers are bound to available CPUs, each worker to once CPU
in order. If there are more workers than available CPUs, remaining are
bound in a loop, starting again from the first available CPU.
The optional mask parameter defines which CPUs are available for automatic
binding.
In collaboration with Vladimir Homutov.
With main request buffered, it's possible, that a slice subrequest will send
output before it. For example, while main request is waiting for aio read to
complete, a slice subrequest can start an aio operation as well. The order
in which aio callbacks are called is undetermined.
Skip SSL_CTX_set_tlsext_servername_callback in case of renegotiation.
Do nothing in SNI callback as in this case it will be supplied with
request in c->data which isn't expected and doesn't work this way.
This was broken by b40af2fd1c16 (1.9.6) with OpenSSL master branch and LibreSSL.
Splits a request into subrequests, each providing a specific range of response.
The variable "$slice_range" must be used to set subrequest range and proper
cache key. The directive "slice" sets slice size.
The following example splits requests into 1-megabyte cacheable subrequests.
server {
listen 8000;
location / {
slice 1m;
proxy_cache cache;
proxy_cache_key $uri$is_args$args$slice_range;
proxy_set_header Range $slice_range;
proxy_cache_valid 200 206 1h;
proxy_pass http://127.0.0.1:9000;
}
}
If an upstream with variables evaluated to address without a port,
then instead of a "no port in upstream" error an attempt was made
to connect() which failed with EADDRNOTAVAIL.
This fixes suboptimal behavior caused by surplus lseek() for sequential writes
on systems without pwrite(). A consecutive read after write might result in an
error on systems without pread() and pwrite().
Fortunately, at the moment there are no widely used systems without these
syscalls.