This makes it possible to avoid looping for a long time while working
with a fast enough peer when data are added to the socket buffer faster
than we are able to read and process them (ticket #1431). This is
basically what we already do on FreeBSD with kqueue, where information
about the number of bytes in the socket buffer is returned by
the kevent() call.
With other event methods rev->available is now set to -1 when the socket
is ready for reading. Later in ngx_recv() and ngx_recv_chain(), if
full buffer is received, real number of bytes in the socket buffer is
retrieved using ioctl(FIONREAD). Reading more than this number of bytes
ensures that even with edge-triggered event methods the event will be
triggered again, so it is safe to stop processing of the socket and
switch to other connections.
Using ioctl(FIONREAD) only after reading a full buffer is an optimization.
With this approach we only call ioctl(FIONREAD) when there are at least
two recv()/readv() calls.
As long as there are data to read in the socket, yet the amount of data
is less than total size of the buffers in the chain, this saves one
unneeded read() syscall. Before this change, reading only stopped if
ngx_ssl_recv() returned no data, that is, two read() syscalls in a row
returned EAGAIN.
In SSL connections, data can be buffered by the SSL layer, and it is
wrong to avoid doing c->recv_chain() if c->read->available is 0 and
c->read->pending_eof is set. And tests show that the optimization in
question indeed can result in incorrect detection of premature connection
close if upstream closes the connection without sending a close notify
alert at the same time. Fix is to disable c->read->available optimization
for SSL connections.
This could happen when graceful shutdown configured by worker_shutdown_timeout
times out and is then followed by another timeout such as proxy_read_timeout.
In this case, the HEADERS frame is added to the output queue, but attempt to
send it fails (due to c->error forcibly set during graceful shutdown timeout).
This triggers request finalization which attempts to close the stream. But the
stream cannot be closed because there is a frame in the output queue, and the
connection cannot be finalized. This leaves the connection open without any
timer events leading to alert.
The fix is to post write event when sending output queue fails on c->error.
That will finalize the connection.
With this patch, all traffic over an HTTP/2 connection is counted in
the h2c->total_bytes field, and payload traffic is counted in
the h2c->payload_bytes field. As long as total traffic is many times
larger than payload traffic, we consider this to be a flood.
In 8df664ebe037, we've switched to maximizing stream window instead
of sending RST_STREAM. Since then handling of RST_STREAM with NO_ERROR
was fixed at least in Chrome, hence we switch back to using RST_STREAM.
This allows more effective rejecting of large bodies, and also minimizes
non-payload traffic to be accounted in the next patch.
Previously, if a response to the PTR request was cached, and ngx_resolver_dup()
failed to allocate memory for the resulting name, then the original node was
freed but left in expire_queue. A subsequent address resolving would end up
in a use-after-free memory access of the node either in ngx_resolver_expire()
or ngx_resolver_process_ptr(), when accessing it through expire_queue.
The fix is to leave the resolver node intact.
Don't waste server resources by sending RST_STREAM frames. Instead,
reject WINDOW_UPDATE frames with invalid zero increment by closing
connection with PROTOCOL_ERROR.
Don't waste server resources by sending RST_STREAM frames. Instead,
reject HEADERS and PRIORITY frames with self-dependency by closing
connection with PROTOCOL_ERROR.
When ngx_http_discard_request_body() call was added to ngx_http_send_response(),
there were no return codes other than NGX_OK and NGX_HTTP_INTERNAL_SERVER_ERROR.
Now it can also return NGX_HTTP_BAD_REQUEST, but ngx_http_send_response() still
incorrectly transforms it to NGX_HTTP_INTERNAL_SERVER_ERROR.
The fix is to propagate ngx_http_discard_request_body() errors.
As defined in HTTP/1.1, body chunks have the following ABNF:
chunk = chunk-size [ chunk-ext ] CRLF chunk-data CRLF
where chunk-data is a sequence of chunk-size octets.
With this change, chunk-data that doesn't end up with CRLF at chunk-size
offset will be treated as invalid, such as in the example provided below:
4
SEE-THIS-AND-
4
THAT
0
Previously, if unbuffered request body reading wasn't finished before
the request was redirected to a different location using error_page
or X-Accel-Redirect, and the request body is read again, this could
lead to disastrous effects, such as a duplicate post_handler call or
"http request count is zero" alert followed by a segmentation fault.
This happened in the following configuration (ticket #1819):
location / {
proxy_request_buffering off;
proxy_pass http://bad;
proxy_intercept_errors on;
error_page 502 = /error;
}
location /error {
proxy_pass http://backend;
}
Fixed excessive memory growth and CPU usage if stream windows are
manipulated in a way that results in generating many small DATA frames.
Fix is to limit the number of simultaneously allocated DATA frames.
Fixed uncontrolled memory growth if peer sends a stream of
headers with a 0-length header name and 0-length header value.
Fix is to reject headers with zero name length.
When using SMTP with SSL and resolver, read events might be enabled
during address resolving, leading to duplicate ngx_mail_ssl_handshake_handler()
calls if something arrives from the client, and duplicate session
initialization - including starting another resolving. This can lead
to a segmentation fault if the session is closed after first resolving
finished. Fix is to block read events while resolving.
Reported by Robert Norris,
http://mailman.nginx.org/pipermail/nginx/2019-July/058204.html.
After ac5a741d39cf it is now possible that after zstream.avail_out
reaches 0 and we allocate additional buffer, there will be no more data
to put into this buffer, triggering "zero size buf" alert. Fix is to
reset b->temporary flag in this case.
Additionally, an optimization added to avoid allocating additional buffer
in this case, by checking if last deflate() call returned Z_STREAM_END.
Note that checking for Z_STREAM_END by itself is not enough to fix alerts,
as deflate() can return Z_STREAM_END without producing any output if the
buffer is smaller than gzip trailer.
Reported by Witold Filipczyk,
http://mailman.nginx.org/pipermail/nginx-devel/2019-July/012469.html.
Due to shortcomings of the ccv->zero flag implementation in complex value
interface, length of the resulting string from ngx_http_complex_value()
might either not include terminating null character or include it,
so the only safe way to work with the result is to use it as a
null-terminated string.
Reported by Patrick Wollgast.
When "-" follows a parameter of maximum length, a single byte buffer
overflow happens, since the error branch does not check parameter length.
Fix is to avoid saving "-" to the parameter key, and instead use an error
message with "-" explicitly written. The message is mostly identical to
one used in similar cases in the preequal state.
Reported by Patrick Wollgast.
With level-triggered event methods it is important to specify
the NGX_CLOSE_EVENT flag to ngx_handle_read_event(), otherwise
the event won't be removed, resulting in CPU hog.
Reported by Patrick Wollgast.
To save memory hash code uses u_short to store resulting bucket sizes,
so maximum bucket size is limited to 65536 minus ngx_cacheline_size (larger
values will be aligned to 65536 which will overflow u_short). However,
there were no checks to enforce this, and using larger bucket sizes
resulted in overflows and segmentation faults.
Appropriate safety checks to enforce this added to ngx_hash_init().
When nginx is used with zlib patched with [1], which provides
integration with the future IBM Z hardware deflate acceleration, it ends
up computing CRC32 twice: one time in hardware, which always does this,
and one time in software by explicitly calling crc32().
crc32() calls were added in changesets 133:b27548f540ad ("nginx-0.0.1-
2003-09-24-23:51:12 import") and 134:d57c6835225c ("nginx-0.0.1-
2003-09-26-09:45:21 import") as part of gzip wrapping feature - back
then zlib did not support it.
However, since then gzip wrapping was implemented in zlib v1.2.0.4,
and it's already being used by nginx for log compression.
This patch replaces hand-written gzip wrapping with the one provided by
zlib. It simplifies the code, and makes it avoid computing CRC32 twice
when using hardware acceleration.
[1] https://github.com/madler/zlib/pull/410
Similarly to the change in 5491:74bfa803a5aa (1.5.9), we should accept
properly escaped URIs and unescape them as needed, else it is not possible
to handle URIs with question marks.
As we now have ctx->header_sent flag, it is further used to prevent
duplicate $r->send_http_header() calls, prevent output before sending
header, and $r->internal_redirect() after sending header.
Further, $r->send_http_header() protected from calls after
$r->internal_redirect().
Returning NGX_HTTP_INTERNAL_SERVER_ERROR if a perl code died after
sending header will lead to a "header already sent" alert. To avoid
it, we now check if header was already sent, and return NGX_ERROR
instead if it was.
Previously, redirects scheduled with $r->internal_redirect() were followed
even if the code then died. Now these are ignored and nginx will return
an error instead.
Variable handlers are not expected to send anything to the client, cannot
sleep or read body, and are not expected to modify the request. Added
appropriate protection to prevent accidental foot shooting.
Duplicate $r->sleep() and/or $r->has_request_body() calls result
in undefined behaviour (in practice, connection leaks were observed).
To prevent this, croak() added in appropriate places.
Previously, allocation errors in nginx.xs were more or less ignored,
potentially resulting in incorrect code execution in specific low-memory
conditions. This is changed to use ctx->error bit and croak(), similarly
to how output errors are now handled.
Note that this is mostly a cosmetic change, as Perl itself exits on memory
allocation errors, and hence nginx with Perl is hardly usable in low-memory
conditions.
When an error happens, the ctx->error bit is now set, and croak()
is called to terminate further processing. The ctx->error bit is
checked in ngx_http_perl_call_handler() to cancel further processing,
and is also checked in various output functions - to make sure these won't
be called if croak() was handled by an eval{} in perl code.
In particular, this ensures that output chain won't be called after
errors, as filters might not expect this to happen. This fixes some
segmentation faults under low memory conditions. Also this stops
request processing after filter finalization or request body reading
errors.
For cases where an HTTP error status can be additionally returned (for
example, 416 (Requested Range Not Satisfiable) from the range filter),
the ctx->status field is also added.
This ensures that correct ctx is always available, including after
filter finalization. In particular, this fixes a segmentation fault
with the following configuration:
location / {
image_filter test;
perl 'sub {
my $r = shift;
$r->send_http_header();
$r->print("foo\n");
$r->print("bar\n");
}';
}
This also seems to be the only way to correctly handle filter finalization
in various complex cases, for example, when embedded perl is used both
in the original handler and in an error page called after filter
finalization.
The NGX_DONE test in ngx_http_perl_handle_request() was introduced
in 1702:86bb52e28ce0, which also modified ngx_http_perl_call_handler()
to return NGX_DONE with c->destroyed. The latter part was then
removed in 3050:f54b02dbb12b, so NGX_DONE test is no longer needed.
Embedded perl does not set any request fields needed for conditional
requests processing. Further, filter finalization in the not_modified
filter can cause segmentation faults due to cleared ctx as in
ticket #1786.
Before 5fb1e57c758a (1.7.3) the not_modified filter was implicitly disabled
for perl responses, as r->headers_out.last_modified_time was -1. This
change restores this behaviour by using the explicit r->disable_not_modified
flag.
Note that this patch doesn't try to address perl module robustness against
filter finalization and other errors returned from filter chains. It should
be eventually reworked to handle errors instead of ignoring them.
A new directive limit_req_dry_run allows enabling the dry run mode. In this
mode requests are neither rejected nor delayed, but reject/delay status is
logged as usual.
In case of filter finalization, essential request fields like r->uri,
r->args etc could be changed, which affected the cache update subrequest.
Also, after filter finalization r->cache could be set to NULL, leading to
null pointer dereference in ngx_http_upstream_cache_background_update().
The fix is to create background cache update subrequest before sending the
cached response.
Since initial introduction in 1aeaae6e9446 (1.11.10) background cache update
subrequest was created after sending the cached response because otherwise it
blocked the parent request output. In 9552758a786e (1.13.1) background
subrequests were introduced to eliminate the delay before sending the final
part of the cached response. This also made it possible to create the
background cache update subrequest before sending the response.
Note that creating the subrequest earlier does not change the fact that in case
of filter finalization the background cache update subrequest will likely not
have enough time to successfully update the cache entry. Filter finalization
leads to the main request termination as soon the current iteration of request
processing is complete.
In ngx_http_range_singlepart_body() special buffers where passed
unmodified, including ones after the end of the range. As such,
if the last buffer of a response was sent separately as a special
buffer, two buffers with b->last_buf set were present in the response.
In particular, this might result in a duplicate final chunk when using
chunked transfer encoding (normally range filter and chunked transfer
encoding are not used together, but this may happen if there are trailers
in the response). This also likely to cause problems in HTTP/2.
Fix is to skip all special buffers after we've sent the last part of
the range requested. These special buffers are not meaningful anyway,
since we set b->last_buf in the buffer with the last part of the range,
and everything is expected to be flushed due to it.
Additionally, ngx_http_next_body_filter() is now called even
if no buffers are to be passed to it. This ensures that various
write events are properly propagated through the filter chain. In
particular, this fixes test failures observed with the above change
and aio enabled.
Filters are not allowed to change incoming chain links, and should allocate
their own links if any modifications are needed. Nevertheless
ngx_http_range_singlepart_body() modified incoming chain links in some
cases, notably at the end of the requested range.
No problems caused by this are currently known, mostly because of
limited number of possible modifications and the position of the range
body filter in the filter chain. Though this behaviour is clearly incorrect
and tests demonstrate that it can at least cause some proxy buffers being
lost when using proxy_force_ranges, leading to less effective handling
of responses.
Fix is to always allocate new chain links in ngx_http_range_singlepart_body().
Links are explicitly freed to ensure constant memory usage with long-lived
requests.
If a complex value is expected to be of type size_t, and the compiled
value is constant, the constant size_t value is remembered at compile
time.
The value is accessed through ngx_http_complex_value_size() which
either returns the remembered constant or evaluates the expression
and parses it as size_t.
Previously, ngx_utf8_decode() was called from ngx_utf8_length() with
incorrect length, potentially resulting in out-of-bounds read when
handling invalid UTF-8 strings.
In practice out-of-bounds reads are not possible though, as autoindex, the
only user of ngx_utf8_length(), provides null-terminated strings, and
ngx_utf8_decode() anyway returns an errors when it sees a null in the
middle of an UTF-8 sequence.
Reported by Yunbin Liu.
If OCSP stapling was enabled with dynamic certificate loading, with some
OpenSSL versions (1.0.2o and older, 1.1.0h and older; fixed in 1.0.2p,
1.1.0i, 1.1.1) a segmentation fault might happen.
The reason is that during an abbreviated handshake the certificate
callback is not called, but the certificate status callback was called
(https://github.com/openssl/openssl/issues/1662), leading to NULL being
returned from SSL_get_certificate().
Fix is to explicitly check SSL_get_certificate() result.
If X509_get_issuer_name() or X509_get_subject_name() returned NULL,
this could lead to a certificate reference leak. It cannot happen
in practice though, since each function returns an internal pointer
to a mandatory subfield of the certificate successfully decoded by
d2i_X509() during certificate message processing (closes#1751).
Previously the ngx_inet_resolve_host() function sorted addresses in a way that
IPv4 addresses came before IPv6 addresses. This was implemented in eaf95350d75c
(1.3.10) along with the introduction of getaddrinfo() which could resolve host
names to IPv6 addresses. Since the "listen" directive only used the first
address, sorting allowed to preserve "listen" compatibility with the previous
behavior and with the behavior of nginx built without IPv6 support. Now
"listen" uses all resolved addresses which makes sorting pointless.
Previously only one address was used by the listen directive handler even if
host name resolved to multiple addresses. Now a separate listening socket is
created for each address.
This makes it possible to provide certificates directly via variables
in ssl_certificate / ssl_certificate_key directives, without using
intermediate files.
It was accidentally introduced in 77436d9951a1 (1.15.9). In MSVC 2015
and more recent MSVC versions it triggers warning C4456 (declaration of
'pkey' hides previous local declaration). Previously, all such warnings
were resolved in 2a621245f4cf.
Reported by Steve Stevenson.
Server name callback is always called by OpenSSL, even
if server_name extension is not present in ClientHello. As such,
checking c->ssl->handshaked before the SSL_get_servername() result
should help to more effectively prevent renegotiation in
OpenSSL 1.1.0 - 1.1.0g, where neither SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS
nor SSL_OP_NO_RENEGOTIATION is available.
The SSL_OP_NO_CLIENT_RENEGOTIATION option was introduced in LibreSSL 2.5.1.
Unlike OpenSSL's SSL_OP_NO_RENEGOTIATION, it only disables client-initiated
renegotiation, and hence can be safely used on all SSL contexts.
If ngx_pool_cleanup_add() fails, we have to clean just created SSL context
manually, thus appropriate call added.
Additionally, ngx_pool_cleanup_add() moved closer to ngx_ssl_create() in
the ngx_http_ssl_module, to make sure there are no leaks due to intermediate
code.
Notably this affects various allocation errors, and should generally
improve things if an allocation error actually happens during a callback.
Depending on the OpenSSL version, returning an error can result in
either SSL_R_CALLBACK_FAILED or SSL_R_CLIENTHELLO_TLSEXT error from
SSL_do_handshake(), so both errors were switched to the "info" level.
OpenSSL 1.1.1 does not save server name to the session if server name
callback returns anything but SSL_TLSEXT_ERR_OK, thus breaking
the $ssl_server_name variable in resumed sessions.
Since $ssl_server_name can be used even if we've selected the default
server and there are no other servers, it looks like the only viable
solution is to always return SSL_TLSEXT_ERR_OK regardless of the actual
result.
To fix things in the stream module as well, added a dummy server name
callback which always returns SSL_TLSEXT_ERR_OK.
A virtual server may have no SSL context if it does not have certificates
defined, so we have to use config of the ngx_http_ssl_module from the
SSL context in the certificate callback. To do so, it is now passed as
the argument of the callback.
The stream module doesn't really need any changes, but was modified as
well to match http code.
Dynamic certificates re-introduce problem with incorrect session
reuse (AKA "virtual host confusion", CVE-2014-3616), since there are
no server certificates to generate session id context from.
To prevent this, session id context is now generated from ssl_certificate
directives as specified in the configuration. This approach prevents
incorrect session reuse in most cases, while still allowing sharing
sessions across multiple machines with ssl_session_ticket_key set as
long as configurations are identical.
Passwords have to be copied to the configuration pool to be used
at runtime. Also, to prevent blocking on stdin (with "daemon off;")
an empty password list is provided.
To make things simpler, password handling was modified to allow
an empty array (with 0 elements and elts set to NULL) as an equivalent
of an array with 1 empty password.
To evaluate variables, a request is created in the certificate callback,
and then freed. To do this without side effects on the stub_status
counters and connection state, an additional function was introduced,
ngx_http_alloc_request().
Only works with OpenSSL 1.0.2+, since there is no SSL_CTX_set_cert_cb()
in older versions.
This makes it possible to reuse certificate loading at runtime,
as introduced in the following patches.
Additionally, this improves error logging, so nginx will now log
human-friendly messages "cannot load certificate" instead of only
referring to sometimes cryptic names of OpenSSL functions.
The "(SSL:)" snippet currently appears in logs when nginx code uses
ngx_ssl_error() to log an error, but OpenSSL's error queue is empty.
This can happen either because the error wasn't in fact from OpenSSL,
or because OpenSSL did not indicate the error in the error queue
for some reason.
In particular, currently "(SSL:)" can be seen in errors at least in
the following cases:
- When SSL_write() fails due to a syscall error,
"[info] ... SSL_write() failed (SSL:) (32: Broken pipe)...".
- When loading a certificate with no data in it,
"[emerg] PEM_read_bio_X509_AUX(...) failed (SSL:)".
This can easily happen due to an additional empty line before
the end line, so all lines of the certificate are interpreted
as header lines.
- When trying to configure an unknown curve,
"[emerg] SSL_CTX_set1_curves_list("foo") failed (SSL:)".
Likely there are other cases as well.
With this change, "(SSL:...)" will be only added to the error message
if there is something in the error queue. This is expected to make
logs more readable in the above cases. Additionally, with this change
it is now possible to use ngx_ssl_error() to log errors when some
of the possible errors are not from OpenSSL and not expected to have
anything in the error queue.
Checking multiple errors at once is a bad practice, as in general
it is not guaranteed that an object can be used after the error.
In this particular case, checking errors after multiple allocations
can result in excessive errors being logged when there is no memory
available.
On Windows, connect() errors are only reported via exceptfds descriptor set
from select(). Previously exceptfds was set to NULL, and connect() errors
were not detected at all, so connects to closed ports were waiting till
a timeout occurred.
Since ongoing connect() means that there will be a write event active,
except descriptor set is copied from the write one. While it is possible
to construct except descriptor set as a concatenation of both read and write
descriptor sets, this looks unneeded.
With this change, connect() errors are properly detected now when using
select(). Note well that it is not possible to detect connect() errors with
WSAPoll() (see https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/).
WSAPoll() is only available with Windows Vista and newer (and only
available during compilation if _WIN32_WINNT >= 0x0600). To make
sure the code works with Windows XP, we do not redefine _WIN32_WINNT,
but instead load WSAPoll() dynamically if it is not available during
compilation.
Also, sockets are not guaranteed to be small integers on Windows.
So an index array is used instead of NGX_USE_FD_EVENT to map
events to connections.
Previously, the code incorrectly assumed "ngx_event_t *" elements
instead of "struct pollfd".
This is mostly cosmetic change, as this code is never called now.
Previously, when using proxy_upload_rate and proxy_download_rate, the buffer
size for reading from a socket could be reduced as a result of rate limiting.
For connection-oriented protocols this behavior is normal since unread data will
normally be read at the next iteration. But for datagram-oriented protocols
this is not the case, and unread part of the datagram is lost.
Now buffer size is not limited for datagrams. Rate limiting still works in this
case by delaying the next reading event.