Previously, unix sockets were treated as AF_INET ones, and this may
result in buffer overread on Linux, where unbound unix sockets have
2-byte addresses.
Note that it is not correct to use just sun_path as a binary representation
for unix sockets. This will result in an empty string for unbound unix
sockets, and thus behaviour of limit_req and limit_conn will change when
switching from $remote_addr to $binary_remote_addr. As such, normal text
representation is used.
Reported by Stephan Dollberg.
At least FreeBSD, macOS, NetBSD, and OpenBSD can return unix sockets
with non-null-terminated sun_path. Additionally, the address may become
non-null-terminated if it does not fit into the buffer provided and was
truncated (may happen on macOS, NetBSD, and Solaris, which allow unix socket
addresess larger than struct sockaddr_un). As such, ngx_sock_ntop() might
overread the sockaddr provided, as it used "%s" format and thus assumed
null-terminated string.
To fix this, the ngx_strnlen() function was introduced, and it is now used
to calculate correct length of sun_path.
Some OSes (notably macOS, NetBSD, and Solaris) allow unix socket addresses
larger than struct sockaddr_un. Moreover, some of them (macOS, Solaris)
return socklen of the socket address before it was truncated to fit the
buffer provided. As such, on these systems socklen must not be used without
additional check that it is within the buffer provided.
Appropriate checks added to ngx_event_accept() (after accept()),
ngx_event_recvmsg() (after recvmsg()), and ngx_set_inherited_sockets()
(after getsockname()).
We also obtain socket addresses via getsockname() in
ngx_connection_local_sockaddr(), but it does not need any checks as
it is only used for INET and INET6 sockets (as there can be no
wildcard unix sockets).
The sync flag of HTTP/2 request body buffer is used when the size of request
body is unknown or bigger than configured "client_body_buffer_size". In this
case the buffer points to body data inside the global receive buffer that is
used for reading all HTTP/2 connections in the worker process. Thus, when the
sync flag is set, the buffer must be flushed to a temporary file, otherwise
the request body data can be overwritten.
Previously, the sync buffer wasn't flushed to a temporary file if the whole
body was received in one DATA frame with the END_STREAM flag and wasn't
copied into the HTTP/2 body preread buffer. As a result, the request body
might be corrupted (ticket #1384).
Now, setting r->request_body_in_file_only enforces writing the sync buffer
to a temporary file in all cases.
When switching to a next upstream, some buffers could be stuck in the middle
of the filter chain. A condition existed that raised an error when this
happened. As it turned out, this condition prevented switching to a next
upstream if ssl preread was used with the TCP protocol (see the ticket).
In fact, the condition does not make sense for TCP, since after successful
connection to an upstream switching to another upstream never happens. As for
UDP, the issue with stuck buffers is unlikely to happen, but is still possible.
Specifically, if a filter delays sending data to upstream.
The condition can be relaxed to only check the "buffered" bitmask of the
upstream connection. The new condition is simpler and fixes the ticket issue
as well. Additionally, the upstream_out chain is now reset for UDP prior to
connecting to a new upstream to prevent repeating the client data twice.
Total length of a response with multiple ranges can be larger than a size_t
variable can hold, so type changed to off_t. Previously, an incorrect
Content-Length was returned when requesting more than 4G of ranges from
a large enough file on a 32-bit system.
An additional size_t variable introduced to calculate size of the boundary
header buffer, as off_t is not needed here and will require type casts on
win32.
Reported by Shuxin Yang,
http://mailman.nginx.org/pipermail/nginx/2017-July/054384.html.
When closing a socket with SO_REUSEPORT, Linux drops all connections waiting
in this socket's listen queue. Previously, it was believed to only result
in connection resets when reconfiguring nginx to use smaller number of worker
processes. It also results in connection resets during configuration
testing though.
Workaround is to avoid using SO_REUSEPORT when testing configuration. It
should prevent listening sockets from being created if a conflicting socket
already exists, while still preserving detection of other possible errors.
It should also cover UDP sockets.
The only downside of this approach seems to be that a configuration testing
won't be able to properly report the case when nginx was compiled with
SO_REUSEPORT, but the kernel is not able to set it. Such errors will be
reported on a real start instead.
Previously, the read event of the accepted connection was marked ready, but not
available. This made EPOLLRDHUP-related code (for example, in ngx_unix_recv())
expect more data from the socket, leading to unexpected behavior.
For example, if SSL, PROXY protocol and deferred accept were enabled on a listen
socket, the client connection was aborted due to unexpected return value of
c->recv().
The overflow can be used to circumvent the restriction on total size of
ranges introduced in c2a91088b0c0 (1.1.2). Additionally, overflow
allows producing ranges with negative start (such ranges can be created
by using a suffix, "bytes=-100"; normally this results in 200 due to
the total size check). These can result in the following errors in logs:
[crit] ... pread() ... failed (22: Invalid argument)
[alert] ... sendfile() failed (22: Invalid argument)
When using cache, it can be also used to reveal cache file header.
It is believed that there are no other negative effects, at least with
standard nginx modules.
In theory, this can also result in memory disclosure and/or segmentation
faults if multiple ranges are allowed, and the response is returned in a
single in-memory buffer. This never happens with standard nginx modules
though, as well as known 3rd party modules.
Fix is to properly protect from possible overflow when incrementing size.
This change adds "http_429" parameter to "proxy_next_upstream" for
retrying rate-limited requests, and to "proxy_cache_use_stale" for
serving stale cached responses after being rate-limited.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
This change adds reason phrase in status line and pretty response body
when "429" status code is used in "return", "limit_conn_status" and/or
"limit_req_status" directives.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
When a slice subrequest was redirected to a new location, its context was lost.
After its completion, a new slice subrequest for the same slice was created.
This could lead to infinite loop. Now the slice module makes sure each slice
subrequest starts output with the slice context available.
With post_action or subrequests, it is possible that the timer set for
wev->delayed will expire while the active subrequest write event handler
is not ready to handle this. This results in request hangs as observed
with limit_rate / sendfile_max_chunk and post_action (ticket #776) or
subrequests (ticket #1228).
Moving the handling to the connection event handler fixes the hangs observed,
and also slightly simplifies the code.
Since limit_req uses connection's write event to delay request processing,
it can conflict with timers in other subrequests. In particular, even
if applied to an active subrequest, it can break things if wev->delayed
is already set (due to limit_rate or sendfile_max_chunk), since after
limit_req finishes the wev->delayed flag will be set and no timer will be
active.
Fix is to use the wev->delayed flag in limit_req as well. This ensures that
wev->delayed won't be set after limit_req finishes, and also ensures that
limit_req's timers will be properly handled by other subrequests if the one
delayed by limit_req is not active.
All streams in connection must be finalized before the connection
itself can be finalized and all related memory is freed. That's
not always possible on the current event loop iteration.
Thus when the last stream is finalized, it sets the special read
event handler ngx_http_v2_handle_connection_handler() and posts
the event.
Previously, this handler didn't check the connection state and
could call the regular event handler on a connection that was
already in finalization stage. In the worst case that could
lead to a segmentation fault, since some data structures aren't
supposed to be used during connection finalization. Particularly,
the waiting queue can contain already freed streams, so the
WINDOW_UPDATE frame received by that moment could trigger
accessing to these freed streams.
Now, the connection error flag is explicitly checked in
ngx_http_v2_handle_connection_handler().
In order to finalize stream the error flag is set on fake connection and
either "write" or "read" event handler is called. The read events of fake
connections are always ready, but it's not the case with the write events.
When the ready flag isn't set, the error flag can be not checked in some
cases and as a result stream isn't finalized. Now the ready flag is
explicilty set on write events for proper finalization in all cases.
Previously, flow control didn't account for padding in DATA frames,
which meant that its view of the world could drift from peer's view
by up to 256 bytes per received padded DATA frame, which could lead
to a deadlock.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Previously, its value accounted for payloads of HEADERS, CONTINUATION
and DATA frames, as well as frame headers of HEADERS and DATA frames,
but it didn't account for frame headers of CONTINUATION frames.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Previously, connection write handler was called, resulting in wake up
of the active subrequest. This change makes it possible to read data
in non-active subrequests as well. For example, this allows SSI to
process instructions in non-active subrequests earlier and start
additional subrequests if needed, reducing overall response time.
If the subrequest is already finalized, the handler set with aio_write
may still be used by sendfile in threads when using range requests
(see also e4c1f5b32868, and the original note in 9fd738b85fad). Calling
already finalized subrequest's r->write_event_handler in practice
results in request hang in some cases.
Fix is to trigger connection event handler if the subrequest was already
finalized.
The ngx_linux_sendfile() function is now used for both normal sendfile()
and sendfile in threads. The ngx_linux_sendfile_thread() function was
modified to use the same interface as ngx_linux_sendfile(), and is simply
called from ngx_linux_sendfile() when threads are enabled.
Special return code NGX_DONE is used to indicate that a thread task was
posted and no further actions are needed.
If number of bytes sent is less that what we were sending, we now always
retry sending. This is needed for sendfile() in threads as the number
of bytes we are sending might have been changed since the thread task
was posted. And this is also needed for Linux 4.3+, as sendfile() might
be interrupted at any time and provides no indication if it was interrupted
or not (ticket #1174).
If of.err is 0, it means that there was a memory allocation error
and no further logging and/or processing is needed. The of.failed
string can be only accessed if of.err is not 0.