When reading exactly rev->available bytes, rev->available might become 0
after FIONREAD usage introduction in efd71d49bde0. On the next call of
ngx_readv_chain() on systems with EPOLLRDHUP this resulted in return without
any actions, that is, with rev->ready set, and this in turn resulted in no
timers set in event pipe, leading to socket leaks.
Fix is to reset rev->ready in ngx_readv_chain() when returning due to
rev->available being 0 with EPOLLRDHUP, much like it is already done in
ngx_unix_recv(). This ensures that if rev->available will become 0, on
systems with EPOLLRDHUP support appropriate EPOLLRDHUP-specific handling
will happen on the next ngx_readv_chain() call.
While here, also synced ngx_readv_chain() to match ngx_unix_recv() and
reset rev->ready when returning due to rev->available being 0 with kqueue.
This is mostly cosmetic change, as rev->ready is anyway reset when
rev->available is set to 0.
Some servers might emit Content-Range header on 200 responses, and this
does not seem to contradict RFC 9110: as per RFC 9110, the Content-Range
header has no meaning for status codes other than 206 and 416. Previously
this resulted in duplicate Content-Range headers in nginx responses handled
by the range filter. Fix is to clear pre-existing headers.
Starting with OpenSSL 1.1.1, various additional errors can be reported
by OpenSSL in case of client-related issues, most notably during TLSv1.3
handshakes. In particular, SSL_R_BAD_KEY_SHARE ("bad key share"),
SSL_R_BAD_EXTENSION ("bad extension"), SSL_R_BAD_CIPHER ("bad cipher"),
SSL_R_BAD_ECPOINT ("bad ecpoint"). These are now logged at the "info"
level.
To ensure optimal use of memory, SSL contexts for proxying are now
inherited from previous levels as long as relevant proxy_ssl_* directives
are not redefined.
Further, when no proxy_ssl_* directives are redefined in a server block,
we now preserve plcf->upstream.ssl in the "http" section configuration
to inherit it to all servers.
Similar changes made in uwsgi, grpc, and stream proxy.
Similar to 70e65bf8dfd7, the change is made to ensure that the ability to
cancel resolver tasks is fully controlled by the caller. As mentioned in the
referenced commit, it is safe to make this timer cancelable because resolve
tasks can have their own timeouts that are not cancelable.
The scenario where this may become a problem is a periodic background resolve
task (not tied to a specific request or a client connection), which receives a
response with short TTL, large enough to warrant fallback to a TCP query.
With each event loop wakeup, we either have a previously set write timer
instance or schedule a new one. The non-cancelable write timer can delay or
block graceful shutdown of a worker even if the ngx_resolver_ctx_t->cancelable
flag is set by the API user, and there are no other tasks or connections.
We use the resolver API in this way to maintain the list of upstream server
addresses specified with the 'resolve' parameter, and there could be third-party
modules implementing similar logic.
When we generate the last_buf buffer for an UDP upstream recv error, it does
not contain any data from the wire. ngx_stream_write_filter attempts to forward
it anyways, which is incorrect (e.g., UDP upstream ECONNREFUSED will be
translated to an empty packet).
This happens because we mark the buffer as both 'flush' and 'last_buf', and
ngx_stream_write_filter has special handling for flush with certain types of
connections (see d127837c714f, 32b0ba4855a6). The flags are meant to be
mutually exclusive, so the fix is to ensure that flush and last_buf are not set
at the same time.
Reproduction:
stream {
upstream unreachable {
server 127.0.0.1:8880;
}
server {
listen 127.0.0.1:8998 udp;
proxy_pass unreachable;
}
}
1 0.000000000 127.0.0.1 → 127.0.0.1 UDP 47 45588 → 8998 Len=5
2 0.000166300 127.0.0.1 → 127.0.0.1 UDP 47 51149 → 8880 Len=5
3 0.000172600 127.0.0.1 → 127.0.0.1 ICMP 75 Destination unreachable (Port
unreachable)
4 0.000202400 127.0.0.1 → 127.0.0.1 UDP 42 8998 → 45588 Len=0
Fixes d127837c714f.
Both "count" and "duration" variables are 32-bit, so their product might
potentially overflow. It is used to reduce 64-bit start_time variable,
and with very large start_time this can result in incorrect seeking.
Found by Coverity (CID 1499904).
Now, if the directive is given an empty string, such configuration cancels
loading of certificates, in particular, if they would be otherwise inherited
from the previous level. This restores previous behaviour, before variables
support in certificates was introduced (3ab8e1e2f0f7).
Previously, if caching was disabled due to Expires in the past, nginx
failed to cache the response even if it was cacheable as per subsequently
parsed Cache-Control header (ticket #964).
Similarly, if caching was disabled due to Expires in the past,
"Cache-Control: no-cache" or "Cache-Control: max-age=0", caching was not
used if it was cacheable as per subsequently parsed X-Accel-Expires header.
Fix is to avoid disabling caching immediately after parsing Expires in
the past or Cache-Control, but rather set flags which are later checked by
ngx_http_upstream_process_headers() (and cleared by "Cache-Control: max-age"
and X-Accel-Expires).
Additionally, now X-Accel-Expires does not prevent parsing of cache control
extensions, notably stale-while-revalidate and stale-if-error. This
ensures that order of the X-Accel-Expires and Cache-Control headers is not
important.
Prodded by Vadim Fedorenko and Yugo Horie.
If a module adds multiple WWW-Authenticate headers (ticket #485) to the
response, linked in r->headers_out.www_authenticate, all headers are now
cleared if another module later allows access.
This change is a nop for standard modules, since the only access module which
can add multiple WWW-Authenticate headers is the auth request module, and
it is checked after other standard access modules. Though this might
affect some third party access modules.
Note that if a 3rd party module adds a single WWW-Authenticate header
and not yet modified to set the header's next pointer to NULL, attempt to
clear such a header with this change will result in a segmentation fault.
When using auth_request with an upstream server which returns 401
(Unauthorized), multiple WWW-Authenticate headers from the upstream server
response are now properly copied to the response.
When using proxy_intercept_errors and an error page for error 401
(Unauthorized), multiple WWW-Authenticate headers from the upstream server
response are now properly copied to the response.
Most of the known duplicate upstream response headers are now ignored
with a warning.
If syntax permits multiple headers, these are now properly linked to
the lists, notably Vary and WWW-Authenticate. This makes it possible
to further handle such lists where it makes sense.
With this change, duplicate Content-Length and Transfer-Encoding headers
are now rejected. Further, responses with invalid Content-Length or
Transfer-Encoding headers are now rejected, as well as responses with both
Content-Length and Transfer-Encoding.
The h->next pointer properly provided as NULL in all cases where known
output headers are added.
Note that there are 3rd party modules which might not do this, and it
might be risky to rely on this for arbitrary headers.
Since introduction of offset handling in ngx_http_upstream_copy_header_line()
in revision 573:58475592100c, the ngx_http_upstream_copy_content_encoding()
function is no longer needed, as its behaviour is exactly equivalent to
ngx_http_upstream_copy_header_line() with appropriate offset. As such,
the ngx_http_upstream_copy_content_encoding() function was removed.
Further, the u->headers_in.content_encoding field is not used anywhere,
so it was removed as well.
Further, Content-Encoding handling no longer depends on NGX_HTTP_GZIP,
as it can be used even without any gzip handling compiled in (for example,
in the charset filter).
As all known input headers are now linked lists, these are now handled
identically. In particular, this makes it possible to access properly
combined values of headers not specifically handled previously, such
as "Via" or "Connection".
The ngx_http_process_multi_header_lines() function is removed, as it is
exactly equivalent to ngx_http_process_header_line(). Similarly,
ngx_http_variable_header() is used instead of ngx_http_variable_headers().
Multi headers are now using linked lists instead of arrays. Notably,
the following fields were changed: r->headers_in.cookies (renamed
to r->headers_in.cookie), r->headers_in.x_forwarded_for,
r->headers_out.cache_control, r->headers_out.link, u->headers_in.cache_control
u->headers_in.cookies (renamed to u->headers_in.set_cookie).
The r->headers_in.cookies and u->headers_in.cookies fields were renamed
to r->headers_in.cookie and u->headers_in.set_cookie to match header names.
The ngx_http_parse_multi_header_lines() and ngx_http_parse_set_cookie_lines()
functions were changed accordingly.
With this change, multi headers are now essentially equivalent to normal
headers, and following changes will further make them equivalent.
Previously, $http_*, $sent_http_*, $sent_trailer_*, $upstream_http_*,
and $upstream_trailer_* variables returned only the first header (with
a few specially handled exceptions: $http_cookie, $http_x_forwarded_for,
$sent_http_cache_control, $sent_http_link).
With this change, all headers are returned, combined together. For
example, $http_foo variable will be "a, b" if there are "Foo: a" and
"Foo: b" headers in the request.
Note that $upstream_http_set_cookie will also return all "Set-Cookie"
headers (ticket #1843), though this might not be what one want, since
the "Set-Cookie" header does not follow the list syntax (see RFC 7230,
section 3.2.2).
The uwsgi specification states that "The uwsgi block vars represent a
dictionary/hash". This implies that no duplicate headers are expected.
Further, provided headers are expected to follow CGI specification,
which also requires to combine headers (RFC 3875, section "4.1.18.
Protocol-Specific Meta-Variables"): "If multiple header fields with
the same field-name are received then the server MUST rewrite them
as a single value having the same semantics".
SCGI specification explicitly forbids headers with duplicate names
(section "3. Request Format"): "Duplicate names are not allowed in
the headers".
Further, provided headers are expected to follow CGI specification,
which also requires to combine headers (RFC 3875, section "4.1.18.
Protocol-Specific Meta-Variables"): "If multiple header fields with
the same field-name are received then the server MUST rewrite them
as a single value having the same semantics".
FastCGI responder is expected to receive CGI/1.1 environment variables
in the parameters (see section "6.2 Responder" of the FastCGI specification).
Obviously enough, there cannot be multiple environment variables with
the same name.
Further, CGI specification (RFC 3875, section "4.1.18. Protocol-Specific
Meta-Variables") explicitly requires to combine headers: "If multiple
header fields with the same field-name are received then the server MUST
rewrite them as a single value having the same semantics".
Previously, the r->header_in->connection pointer was never set despite
being present in ngx_http_headers_in, resulting in incorrect value returned
by $r->header_in("Connection") in embedded perl.
In 7583:efd71d49bde0 (nginx 1.17.5) along with introduction of the
ioctl(FIONREAD) support proper handling of systems without EPOLLRDHUP
support in the kernel (but with EPOLLRDHUP in headers) was broken.
Before the change, rev->available was never set to 0 unless
ngx_use_epoll_rdhup was also set (that is, runtime test for EPOLLRDHUP
introduced in 6536:f7849bfb6d21 succeeded). After the change,
rev->available might reach 0 on systems without runtime EPOLLRDHUP
support, stopping further reading in ngx_readv_chain() and ngx_unix_recv().
And, if EOF happened to be already reported along with the last event,
it is not reported again by epoll_wait(), leading to connection hangs
and timeouts on such systems.
This affects Linux kernels before 2.6.17 if nginx was compiled
with newer headers, and, more importantly, emulation layers, such as
DigitalOcean's App Platform's / gVisor's epoll emulation layer.
Fix is to explicitly check ngx_use_epoll_rdhup before the corresponding
rev->pending_eof tests in ngx_readv_chain() and ngx_unix_recv().
Previously, QUIC used the existing UDP framework, which was created for UDP in
Stream. However the way QUIC connections are created and looked up is different
from the way UDP connections in Stream are created and looked up. Now these
two implementations are decoupled.
Previously, last buffer was tracked by keeping a pointer to the previous
chain link "next" field. When the previous buffer was split and then removed,
the pointer was no longer valid. Writing at this pointer resulted in broken
data chains.
Now last buffer is tracked by keeping a direct pointer to it.
The object is used instead of ngx_chain_t pointer for buffer operations like
ngx_quic_write_chain() and ngx_quic_read_chain(). These functions are renamed
to ngx_quic_write_buffer() and ngx_quic_read_buffer().
Such fatal errors are reported by OpenSSL 1.1.1, and similarly by BoringSSL,
if application data is encountered during SSL shutdown, which started to be
observed on the second SSL_shutdown() call after SSL shutdown fixes made in
09fb2135a589 (1.19.2). The error means that the client continues to send
application data after receiving the "close_notify" alert (ticket #2318).
Previously it was reported as SSL_shutdown() error of SSL_ERROR_SYSCALL.
Now ngx_quic_stream_t is decoupled from ngx_connection_t in a way that it
can persist after connection is closed by application. During this period,
server is expecting stream final size from client for correct flow control.
Also, buffered output is sent to client as more flow control credit is granted.
As shown in RFC 8446, section 2.2, Figure 3, and further specified in
section 4.6.1, BoringSSL releases session tickets in Application Data
(along with Finished) early, based on a precalculated client Finished
transcript, once client signalled early data in extensions.
Initially, frames are genereated and stored in ctx->frames.
Next, ngx_quic_output() collects frames to be sent in in ctx->sending.
On failure, ngx_quic_revert_sned() returns frames into ctx->frames.
On success, the ngx_quic_commit_send() moves ack-eliciting frames into
ctx->sent and frees non-ack-eliciting frames.
This function also updates in-flight bytes counter, so only actually sent
frames are accounted.
The counter is decremented in the following cases:
- acknowledgment is received
- packet was declared lost
- we are discarding context completely
In each of this cases frame is removed from ctx->sent queue and in-flight
counter is accordingly decremented.
The patch fixes the case of discarding context - only removing frames
from ctx->sent must be followed by in-flight bytes counter decrement,
otherwise cg->in_flight could experience type underflow.
The issue appeared in b1676cd64dc9.
The cd8018bc81a5 fixed unintended send of non-padded initial packets,
but failed to restore context properly: only processed contexts need
to be restored. As a consequence, a packet number could be restored
from uninitialized value.
Previously, the flag could be reset after send_chain() with a limit, even
though there was room for more data. The application then started waiting for
a write event notification, which never happened.
Now the wev->ready flag is only reset when flow control is exhausted.
With large http2_max_concurrent_streams or http2_max_concurrent_pushes, more
than 255 ngx_http_v2_node_t structures might be allocated, eventually leading
to h2c->closed_nodes overflow when closing corresponding streams. This will
in turn result in additional allocations in ngx_http_v2_get_node_by_id().
While mostly harmless, it can result in excessive memory usage by a HTTP/2
connection, notably in configurations with many keepalive_requests allowed.
Fix is to use ngx_uint_t for h2c->closed_nodes instead of unsigned:8.
The switch happens when received byte counter reaches stream final size.
Previously, this state was skipped. The stream went from SIZE_KNOWN to
DATA_READ when all bytes were read by application.
The change prevents STOP_SENDING frames from being sent when all data is
received from client, but not yet fully read by application.
Previously, size was calculated based on the number of input bytes processed
by the function. Now only the copied bytes are considered. This prevents
overlapping buffers from contributing twice to the overall written size.
Response headers can be buffered in the SSL buffer. But stream's fake
connection buffered flag did not reflect this, so any attempts to flush
the buffer without sending additional data were stopped by the write filter.
It does not seem to be possible to reflect this in fc->buffered though, as
we never known if main connection's c->buffered corresponds to the particular
stream or not. As such, fc->buffered might prevent request finalization
due to sending data on some other stream.
Fix is to implement handling of flush buffers when the c->need_flush_buf
flag is set, similarly to the existing last buffer handling. The same
flag is now used for UDP sockets in the stream module instead of explicit
checking of c->type.
Previously, non-padded initial packet could be sent as a result of the
following situation:
- initial queue is not empty (so padding to 1200 is required)
- handshake queue is not empty (so padding is to be added after h/s packet)
- path is limited
If serializing handshake packet would violate path limit, such packet was
omitted, and the non-padded initial packet was sent.
The fix is to avoid sending the packet at all in such case. This follows the
original intention introduced in c5155a0cb12f.
During configuration reload two cache managers might exist for a short
time. If both tried to delete the same cache node, the "ignore long locked
inactive cache entry" alert appeared in logs. Additionally,
ngx_http_file_cache_forced_expire() might be also called by worker
processes, with similar results.
Fix is to ignore cache nodes being deleted, similarly to how it is
done in ngx_http_file_cache_expire() since 3755:76e3a93821b1. This
was somehow missed in 7002:ab199f0eb8e8, when ignoring long locked
cache entries was introduced in ngx_http_file_cache_forced_expire().
- wording in log->action is adjusted to match function names.
- connection close steps are made obvious and start with "quic close" prefix:
*1 quic close initiated rc:-4
*1 quic close silent drain:0 timedout:1
*1 quic close resumed rc:-1
*1 quic close resumed rc:-1
*1 quic close resumed rc:-4
*1 quic close completed
this makes it easy to understand if particular "close" record is an initial
cause or lasting process, or the final one.
- cases of close without quic connection now logged as "packet rejected":
*14 quic run
*14 quic packet rx long flags:ec version:1
*14 quic packet rx hs len:61
*14 quic packet rx dcid len:20 00000000000002c32f60e4aa2b90a64a39dc4228
*14 quic packet rx scid len:8 81190308612cd019
*14 quic expected initial, got handshake
*14 quic packet done rc:-1 level:hs decr:0 pn:0 perr:0
*14 quic packet rejected rc:-1, cleanup connection
*14 reusable connection: 0
this makes it easy to spot early packet rejection and avoid confuse with
quic connection closing (which in fact was not even created).
- packet processing summary now uses same prefix "quic packet done rc:"
- added debug to places where packet was rejected without any reason logged
The NGX_DECLINED is replaced with NGX_DONE to match closer to return code
of ngx_quic_handle_packet() and ngx_quic_close_connection() rc argument.
The ngx_quic_close_connection() rc code is used only when quic connection
exists, thus anything goes if qc == NULL.
The ngx_quic_handle_datagram() does not return NG_OK in cases when quic
connection is not yet created.
Previously, closure detection for server-initiated uni streams was not properly
implemented. Instead, HTTP/3 code relied on QUIC code posting the read event
and setting rev->error when it needed to close the stream. Then, regular
uni stream read handler called c->recv() and received error, which closed the
stream. This was an ad-hoc solution. If, for whatever reason, the read
handler was called earlier, c->recv() would return 0, which would also close
the stream.
Now server-initiated uni streams have a separate read event handler for
tracking stream closure. The handler calls c->recv(), which normally returns
0, but may return error in case of closure.
Sending the instruction is delayed until the end of the current event cycle.
Delaying the instruction is allowed by quic-qpack-21, section 2.2.2.3.
The goal is to reduce the amount of data sent back to client by accumulating
several inserts in one instruction and sometimes not sending the instruction at
all, if Section Acknowledgement was sent just before it.
Operations like ngx_quic_open_stream(), ngx_http_quic_get_connection(),
ngx_http_v3_finalize_connection(), ngx_http_v3_shutdown_connection() used to
receive a QUIC stream connection. Now they can receive the main QUIC
connection as well. This is useful when calling them from a stream context.
This is to ease transition with oldish BoringSSL versions,
the default for SSL_set_quic_use_legacy_codepoint() has been
flipped in BoringSSL a1d3bfb64fd7ef2cb178b5b515522ffd75d7b8c5.
Chrome only uses TLS session tickets once with TLS 1.3, likely following
RFC 8446 Appendix C.4 recommendation. With OpenSSL, this works fine with
built-in session tickets, since these are explicitly renewed in case of
TLS 1.3 on each session reuse, but results in only two connections being
reused after an initial handshake when using ssl_session_ticket_key.
Fix is to always renew TLS session tickets in case of TLS 1.3 when using
ssl_session_ticket_key, similarly to how it is done by OpenSSL internally.
Previously, when input ended on a QUIC buffer boundary, input chain was not
advanced to the next buffer. As a result, ngx_quic_write_chain() returned
a chain with an empty buffer instead of NULL. This broke HTTP write filter,
preventing it from closing the HTTP request and eventually timing out.
Now input chain is always advanced to a buffer that has data, before checking
QUIC buffer boundary condition.
RFC 9000, 9.3. Responding to Connection Migration:
An endpoint only changes the address to which it sends packets in
response to the highest-numbered non-probing packet.
The patch extends this requirement to probing packets. Although it may
seem excessive, it helps with mitigation of reply attacks (when an off-path
attacker has copied packet with PATH_CHALLENGE and uses different
addresses to exhaust available connection ids).
The quic connection now holds active, backup and probe paths instead
of sockets. The number of migration paths is now limited and cannot
be inflated by a bad client or an attacker.
The client id is now associated with path rather than socket. This allows
to simplify processing of output and connection ids handling.
New migration abandons any previously started migrations. This allows to
free consumed client ids and request new for use in future migrations and
make progress in case when connection id limit is hit during migration.
A path now can be revalidated without losing its state.
The patch also fixes various issues with NAT rebinding case handling:
- paths are now validated (previously, there was no validation
and paths were left in limited state)
- attempt to reuse id on different path is now again verified
(this was broken in 40445fc7c403)
- former path is now validated in case of apparent migration
The function splits a buffer at given offset. The function is now
called from ngx_quic_read_chain() and ngx_quic_write_chain(), which
simplifies both functions.
After 9ae239d2547d, ngx_quic_handle_write_event() no longer runs into
ngx_send_lowat() for QUIC connections, so the check became excessive.
It is assumed that external modules operating with SO_SNDLOWAT
(I'm not aware of any) should do this check on their own.
Previously, ngx_quic_write_chain() treated each input buffer as a memory
buffer, which is not always the case. Special buffers were not skipped, which
is especially important when hitting the input byte limit.
The issue manifested itself with ngx_quic_write_chain() returning a non-empty
chain consisting of a special last_buf buffer when called from QUIC stream
send_chain(). In order for this to happen, input byte limit should be equal to
the chain length, and the input chain should end with an empty last_buf buffer.
An easy way to achieve this is the following:
location /empty {
return 200;
}
When this non-empty chain was returned from send_chain(), it signalled to the
caller that input was blocked, while in fact it wasn't. This prevented HTTP
request from finalization, which prevented QUIC from sending STREAM FIN to
the client. The QUIC stream was then reset after a timeout.
Now special buffers are skipped and send_chain() returns NULL in the case
above, which signals to the caller a successful operation.
Also, original byte limit is now passed to ngx_quic_write_chain() from
send_chain() instead of actual chain length to make sure it's never zero.
Previously, when a STREAM FIN frame with no data bytes was received after all
prior stream data were already read by the application layer, the frame was
ignored and eof was not reported to the application.
When a worker process is shutting down, keepalive is not used: this is checked
before the ngx_http_set_keepalive() call in ngx_http_finalize_connection().
Yet the "Connection: keep-alive" header was still sent, even if we know that
the worker process is shutting down, potentially resulting in additional
requests being sent to the connection which is going to be closed anyway.
While clients are expected to be able to handle asynchronous close events
(see ticket #1022), it is certainly possible to send the "Connection: close"
header instead, informing the client that the connection is going to be closed
and potentially saving some unneeded work.
With this change, we additionally check for worker process shutdown just
before sending response headers, and disable keepalive accordingly.
Linux with EPOLLEXCLUSIVE usually notifies only the process which was first
to add the listening socket to the epoll instance. As a result most of the
connections are handled by the first worker process (ticket #2285). To fix
this, we re-add the socket periodically, so other workers will get a chance
to accept connections.
The SF_NOCACHE flag, introduced in FreeBSD 11 along with the new non-blocking
sendfile() implementation by glebius@, makes it possible to use sendfile()
along with the "directio" directive.
Starting with FreeBSD 11, there is no need to use AIO operations to preload
data into cache for sendfile(SF_NODISKIO) to work. Instead, sendfile()
handles non-blocking loading data from disk by itself. It still can, however,
return EBUSY if a page is already being loaded (for example, by a different
process). If this happens, we now post an event for the next event loop
iteration, so sendfile() is retried "after a short period", as manpage
recommends.
The limit of the number of EBUSY tolerated without any progress is preserved,
but now it does not result in an alert, since on an idle system event loop
iteration might be very short and EBUSY can happen many times in a row.
Instead, SF_NODISKIO is simply disabled for one call once the limit is
reached.
With this change, sendfile(SF_NODISKIO) is now used automatically as long as
sendfile() is enabled, and no longer requires "aio on;".
It was mostly copy of the ngx_quic_listen(). Now ngx_quic_listen() no
longer generates server id and increments seqnum. Instead, the server
id is generated when the socket is created.
The ngx_quic_alloc_socket() function is renamed to ngx_quic_create_socket().
Notably, NAXSI is known to misuse ngx_regex_compile() with rc.options set
to PCRE_CASELESS | PCRE_MULTILINE. With PCRE2 support, and notably binary
compatibility changes, it is no longer possible to set PCRE[2]_MULTILINE
option without using proper interface. To facilitate correct usage,
this change adds the NGX_REGEX_MULTILINE option.
With this change, dynamic modules using nginx regex interface can be used
regardless of the variant of the PCRE library nginx was compiled with.
If a module is compiled with different PCRE library variant, in case of
ngx_regex_exec() errors it will report wrong function name in error
messages. This is believed to be tolerable, given that fixing this will
require interface changes.
The PCRE2 library is now used by default if found, instead of the
original PCRE library. If needed for some reason, this can be disabled
with the --without-pcre2 configure option.
To make it possible to specify paths to the library and include files
via --with-cc-opt / --with-ld-opt, the library is first tested without
any additional paths and options. If this fails, the pcre2-config script
is used.
Similarly to the original PCRE library, it is now possible to build PCRE2
from sources with nginx configure, by using the --with-pcre= option.
It automatically detects if PCRE or PCRE2 sources are provided.
Note that compiling PCRE2 10.33 and later requires inttypes.h. When
compiling on Windows with MSVC, inttypes.h is only available starting
with MSVC 2013. In older versions some replacement needs to be provided
("echo '#include <stdint.h>' > pcre2-10.xx/src/inttypes.h" is good enough
for MSVC 2010).
The interface on nginx side remains unchanged.
If a configuration parsing fails for some reason, ngx_regex_module_init()
is not called, and ngx_pcre_studies remained set despite the fact that
the pool it was allocated from is already freed. This might result in
a segmentation fault during runtime regular expression compilation, such
as in SSI, for example, in the single process mode, or if a worker process
died and was respawned from a master process in such an inconsistent state.
Fix is to clear ngx_pcre_studies from the pool cleanup handler (which is
anyway used to free JIT-compiled patterns).
Previously, buffer lists was used to track used buffers. Now reference
counter is used instead. The new implementation is simpler and faster with
many buffer clones.
They are replaced with ngx_quic_write_chain() and ngx_quic_read_chain().
These functions represent the API to data buffering.
The first function adds data of given size at given offset to the buffer.
Now it returns the unwritten part of the chain similar to c->send_chain().
The second function returns data of given size from the beginning of the buffer.
Its second argument and return value are swapped compared to
ngx_quic_split_bufs() to better match ngx_quic_write_chain().
Added, returned and stored data are regular ngx_chain_t/ngx_buf_t chains.
Missing data is marked with b->sync flag.
The functions are now used in both send and recv data chains in QUIC streams.
Previously, when a few bytes were send to a QUIC stream by the application, a
4K buffer was allocated for these bytes. Then a STREAM frame was created and
that entire buffer was used as data for that frame. The frame with the buffer
were in use up until the frame was acked by client. Meanwhile, when more
bytes were send to the stream, more buffers were allocated and assigned as
data to newer STREAM frames. In this scenario most buffer memory is unused.
Now the unused part of the stream output buffer is available for further
stream output while earlier parts of the buffer are waiting to be acked.
This is achieved by splitting the output buffer.
The output is always sent to the active path, which is stored in the
quic connection. There is no need to pass it in arguments.
When output has to be send to to a specific path (in rare cases, such as
path probing), a separate method exists (ngx_quic_frame_sendto()).
The path validation status and anti-amplification limit status is actually
two different variables. It is possible that validating path should not
be limited (for example, when re-validating former path).
Previously, path was considered valid during arbitrary selected 10m timeout
since validation. This is quite not what RFC 9000 says; the relevant
part is:
An endpoint MAY skip validation of a peer address if that
address has been seen recently.
The patch considers a path to be 'recently seen' if packets were received
during idle timeout. If a packet is received from the path that was seen
not so recently, such path is considered new, and anti-amplification
restrictions apply.
After creation, a client stream is added to qc->streams.uninitialized queue.
After initialization it's removed from the queue. If a stream is never
initialized, it is freed in ngx_quic_close_streams(). Stream initializer
is now set as read event handler in stream connection.
Previously qc->streams.uninitialized was used only for delayed stream
initialization.
The change makes it possible not to handle separately the case of a new stream
in stream-related frame handlers. It makes these handlers simpler since new
streams and existing streams are now handled by the same code.
With sendfile() in threads ("aio threads; sendfile on;"), client connection
can block on writing, waiting for sendfile() to complete. In HTTP/2 this
might result in the request hang, since an attempt to continue processing
in thread event handler will call request's write event handler, which
is usually stopped by ngx_http_v2_send_chain(): it does nothing if there
are no additional data and stream->queued is set. Further, HTTP/2 resets
stream's c->write->ready to 0 if writing blocks, so just fixing
ngx_http_v2_send_chain() is not enough.
Can be reproduced with test suite on Linux with:
TEST_NGINX_GLOBALS_HTTP="aio threads; sendfile on;" prove h2*.t
The following tests currently fail: h2_keepalive.t, h2_priority.t,
h2_proxy_max_temp_file_size.t, h2.t, h2_trailers.t.
Similarly, sendfile() with AIO preloading on FreeBSD can block as well,
with similar results. This is, however, harder to reproduce, especially
on modern FreeBSD systems, since sendfile() usually does not return EBUSY.
Fix is to modify ngx_http_v2_send_chain() so it actually tries to send
data to the main connection when called, and to make sure that
c->write->ready is set by the relevant event handlers.
With sendfile in threads, "task already active" alerts might appear in logs
if a write event happens on the main HTTP/2 connection, triggering a sendfile
in threads while another thread operation is already running. Observed
with "aio threads; aio_write on; sendfile on;" and with thread event handlers
modified to post a write event to the main HTTP/2 connection (though can
happen without any modifications).
Similarly, sendfile() with AIO preloading on FreeBSD can trigger duplicate
aio operation, resulting in "second aio post" alerts. This is, however,
harder to reproduce, especially on modern FreeBSD systems, since sendfile()
usually does not return EBUSY.
Fix is to avoid starting a sendfile operation if other thread operation
is active by checking r->aio in the thread handler (and, similarly, in
aio preload handler). The added check also makes duplicate calls protection
redundant, so it is removed.
The SSL_OP_ENABLE_MIDDLEBOX_COMPAT option is provided by QuicTLS and enabled
by default in the newly created SSL contexts. SSL_set_quic_method() is used
to clear it, which is required for SSL handshake to work on QUIC connections.
Switching context in the ngx_http_ssl_servername() SNI callback overrides SSL
options from the new SSL context. This results in the option set again.
Fix is to explicitly clear it when switching to another SSL context.
Initially reported here (in Russian):
http://mailman.nginx.org/pipermail/nginx-ru/2021-November/063989.html
ngx_http_v3_tables.h and ngx_http_v3_tables.c are renamed to
ngx_http_v3_table.h and ngx_http_v3_table.c to better match HTTP/2 code.
ngx_http_v3_streams.h and ngx_http_v3_streams.c are renamed to
ngx_http_v3_uni.h and ngx_http_v3_uni.c to better match their content.
Directives that set transport parameters are removed from the configuration.
Corresponding values are derived from the quic configuration or initialized
to default. Whenever possible, quic configuration parameters are taken from
higher-level protocol settings, i.e. HTTP/3.
A new variable $http3 is added. The variable equals to "h3" for HTTP/3
connections, "hq" for hq connections and is an empty string otherwise.
The variable $quic is eliminated.
The new variable is similar to $http2 variable.
RFC 9000 19.16
The sequence number specified in a RETIRE_CONNECTION_ID frame MUST NOT
refer to the Destination Connection ID field of the packet in which the
frame is contained.
Before the patch, the RETIRE_CONNECTION_ID frame was sent before switching
to the new client id. If retired client id was currently in use, this lead
to violation of the spec.
The c->udp->dgram may be NULL only if the quic connection was just
created: the ngx_event_udp_recvmsg() passes information about datagrams
to existing connections by providing information in c->udp.
If case of a new connection, c->udp is allocated by the QUIC code during
creation of quic connection (it uses c->sockaddr to initialize qsock->path).
Thus the check for qsock->path is excessive and can be read wrong, assuming
that other options possible, leading to warnings from clang static analyzer.
Removed sending CLOSE_CONNECTION directly to avoid duplicate frames,
since it is sent later again in SSL_do_handshake() error handling.
As such, removed redundant settings of error fields set elsewhere.
While here, improved debug message.
All open sockets are stored in a queue. There is no need to close some
of them separately. If it happens that active and backup point to same
socket, double close may happen (leading to possible segfault).
The RFC 9000 allows a packet from known CID arrive from unknown path:
These requirements regarding connection ID reuse apply only to the
sending of packets, as unintentional changes in path without a change
in connection ID are possible. For example, after a period of
network inactivity, NAT rebinding might cause packets to be sent on a
new path when the client resumes sending.
Before the patch, such packets were rejected with an error in the
ngx_quic_check_migration() function. Removing the check makes the
separate function excessive - remaining checks are early migration
check and "disable_active_migration" check. The latter is a transport
parameter sent to client and it should not be used by server.
The server should send "disable_active_migration" "if the endpoint does
not support active connection migration" (18.2). The support status depends
on nginx configuration: to have migration working with multiple workers,
you need bpf helper, available on recent Linux systems. The patch does
not set "disable_active_migration" automatically and leaves it for the
administrator. By default, active migration is enabled.
RFC 900 says that it is ok to migrate if the peer violates
"disable_active_migration" flag requirements:
If the peer violates this requirement,
the endpoint MUST either drop the incoming packets on that path without
generating a Stateless Reset
OR
proceed with path validation and allow the peer to migrate. Generating a
Stateless Reset or closing the connection would allow third parties in the
network to cause connections to close by spoofing or otherwise manipulating
observed traffic.
So, nginx adheres to the second option and proceeds to path validation.
Note:
The ngtcp2 may be used for testing both active migration and NAT rebinding:
ngtcp2/client --change-local-addr=200ms --delay-stream=500ms <ip> <port> <url>
ngtcp2/client --change-local-addr=200ms --delay-stream=500ms --nat-rebinding \
<ip> <port> <url>
Single UDP datagram may contain multiple QUIC datagrams. In order to
facilitate handling of such cases, 'first' flag in the ngx_quic_header_t
structure is introduced.
Previously, the retired socket was not closed if it didn't match
active or backup.
New sockets could not be created (due to count limit), since retired socket
was not closed before calling ngx_quic_create_sockets().
When replacing retired socket, new socket is only requested after closing
old one, to avoid hitting the limit on the number of active connection ids.
Together with added restrictions, this fixes an issue when a current socket
could be closed during migration, recreated and erroneously reused leading
to null pointer dereference.
Previously the frame was not handled and connection was closed with an error.
Now, after receiving this frame, global flow control is updated and new
flow control credit is sent to client.
Previously, after receiving STREAM_DATA_BLOCKED, current flow control limit
was sent to client. Now, if the limit can be updated to the full window size,
it is updated and the new value is sent to client, otherwise nothing is sent.
The change lets client update flow control credit on demand. Also, it saves
traffic by not sending MAX_STREAM_DATA with the same value twice.
The reasons why a stream may not be created by server currently include hitting
worker_connections limit and memory allocation error. Previously in these
cases the entire QUIC connection was closed and all its streams were shut down.
Now the new stream is rejected and existing streams continue working.
To reject an HTTP/3 request stream, RESET_STREAM and STOP_SENDING with
H3_REQUEST_REJECTED error code are sent to client. HTTP/3 uni streams and
Stream streams are not rejected.
The variable contains a negotiated curve used for the handshake key
exchange process. Known curves are listed by their names, unknown
ones are shown in hex.
Note that for resumed sessions in TLSv1.2 and older protocols,
$ssl_curve contains the curve used during the initial handshake,
while in TLSv1.3 it contains the curve used during the session
resumption (see the SSL_get_negotiated_group manual page for
details).
The variable is only meaningful when using OpenSSL 3.0 and above.
With older versions the variable is empty.
Without this change, aio used with HTTP/2 can result in connection hang,
as observed with "aio threads; aio_write on;" and proxying (ticket #2248).
The problem is that HTTP/2 updates buffers outside of the output filters
(notably, marks them as sent), and then posts a write event to call
output filters. If a filter does not call the next one for some reason
(for example, because of an AIO operation in progress), this might
result in a state when the owner of a buffer already called
ngx_chain_update_chains() and can reuse the buffer, while the same buffer
is still sitting in the busy chain of some other filter.
In the particular case a buffer was sitting in output chain's ctx->busy,
and was reused by event pipe. Output chain's ctx->busy was permanently
blocked by it, and this resulted in connection hang.
Fix is to change ngx_chain_update_chains() to skip buffers from other
modules unconditionally, without trying to wait for these buffers to
become empty.
The "sendfile_max_chunk" directive is important to prevent worker
monopolization by fast connections. The 2m value implies maximum 200ms
delay with 100 Mbps links, 20ms delay with 1 Gbps links, and 2ms on
10 Gbps links. It also seems to be a good value for disks.
Previously, connections to upstream servers used sendfile() if it was
enabled, but never honored sendfile_max_chunk. This might result
in worker monopolization for a long time if large request bodies
are allowed.
On Linux starting with 2.6.16, sendfile() silently limits all operations
to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly
triggered the interrupt check, and resulted in 0-sized writev() on the
next loop iteration.
Fix is to make sure the limit is always checked, so we will return from
the loop if the limit is already reached even if number of bytes sent is
not exactly equal to the number of bytes we've tried to send.
Previously, it was checked that sendfile_max_chunk was enabled and
almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
delaying connections where sendfile_max_chunk wasn't reached (for example,
when sending responses smaller than sendfile_max_chunk). Now we instead
check if there are unsent data, and the connection is still ready for writing.
Additionally we also check c->write->delayed to ignore connections already
delayed by limit_rate.
This approach is believed to be more robust, and correctly handles
not only sendfile_max_chunk, but also internal limits of c->send_chain(),
such as sendfile() maximum supported length (ticket #1870).
Previously, 1 millisecond delay was used instead. In certain edge cases
this might result in noticeable performance degradation though, notably on
Linux with typical CONFIG_HZ=250 (so 1ms delay becomes 4ms),
sendfile_max_chunk 2m, and link speed above 2.5 Gbps.
Using posted next events removes the artificial delay and makes processing
fast in all cases.
The directive enables including all frames from start time to the most recent
key frame in the result. Those frames are removed from presentation timeline
using mp4 edit lists.
Edit lists are currently supported by popular players and browsers such as
Chrome, Safari, QuickTime and ffmpeg. Among those not supporting them properly
is Firefox[1].
Based on a patch by Tracey Jaquith, Internet Archive.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1735300
The function updates the duration field of mdhd atom. Previously it was
updated in ngx_http_mp4_read_mdhd_atom(). The change makes it possible to
alter track duration as a result of processing track frames.
As per quic-qpack-21:
When a stream is reset or reading is abandoned, the decoder emits a
Stream Cancellation instruction.
Previously the instruction was not sent. Now it's sent when closing QUIC
stream connection if dynamic table capacity is non-zero and eof was not
received from client. The latter condition means that a trailers section
may still be on its way from client and the stream needs to be cancelled.
A QUIC stream connection is treated as reusable until first bytes of request
arrive, which is also when the request object is now allocated. A connection
closed as a result of draining, is reset with the error code
H3_REQUEST_REJECTED. Such behavior is allowed by quic-http-34:
Once a request stream has been opened, the request MAY be cancelled
by either endpoint. Clients cancel requests if the response is no
longer of interest; servers cancel requests if they are unable to or
choose not to respond.
When the server cancels a request without performing any application
processing, the request is considered "rejected." The server SHOULD
abort its response stream with the error code H3_REQUEST_REJECTED.
The client can treat requests rejected by the server as though they had
never been sent at all, thereby allowing them to be retried later.
When an HTTP/3 function returns an error in context of a QUIC stream, it's
this function's responsibility now to finalize the entire QUIC connection
with the right code, if required. Previously, QUIC connection finalization
could be done both outside and inside such functions. The new rule follows
a similar rule for logging, leads to cleaner code, and allows to provide more
details about the error.
While here, a few error cases are no longer treated as fatal and QUIC connection
is no longer finalized in these cases. A few other cases now lead to
stream reset instead of connection finalization.
The PATH_RESPONSE frame must be expanded to 1200, except the case
when anti-amplification limit is in effect, i.e. on unvalidated paths.
Previously, the anti-amplification limit was always applied.
If client ID was never used, its refcount is zero. To keep things simple,
the ngx_quic_unref_client_id() function is now aware of such IDs.
If client ID was used, the ngx_quic_replace_retired_client_id() function
is supposed to find all users and unref the ID, thus ngx_quic_unref_client_id()
should not be called after it.
Previously, it was not enforced in the stream module.
Now, since b9e02e9b2f1d it is possible to specify protocols.
Since ALPN is always required, the 'require_alpn' setting is now obsolete.
The "min" and "max" arguments refer to UDP datagram size. Generating payload
requires to account properly for header size, which is variable and depends on
payload size and packet number.
After fe919fd63b0b, processing QUIC streams was postponed until after handshake
completion, which means that 0-RTT is effectively off. With ssl_ocsp enabled,
it could be further delayed. This differs from how OCSP validation works with
SSL_read_early_data(). With this change, processing QUIC streams is unlocked
when obtaining 0-RTT secret.
The sent queue is sorted by packet number. It is possible to avoid
traversing full queue while handling ack ranges. It makes sense to
start traversing from the queue head (i.e. check oldest packets first).
With this patch, all traffic over a QUIC connection is compared to traffic
over QUIC streams. As long as total traffic is many times larger than stream
traffic, we consider this to be a flood.
With this patch, all traffic over HTTP/3 bidi and uni streams is counted in
the h3c->total_bytes field, and payload traffic is counted in the
h3c->payload_bytes field. As long as total traffic is many times larger than
payload traffic, we consider this to be a flood.
Request header traffic is counted as if all fields are literal. Response
header traffic is counted as is.
Checking the reset after encryption avoids false positives. More importantly,
it avoids the check entirely in the usual case where decryption succeeds.
RFC 9000, 10.3.1 Detecting a Stateless Reset
Endpoints MAY skip this check if any packet from a datagram is
successfully processed.
As per RFC 9000:
An endpoint that receives a STOP_SENDING frame MUST send a RESET_STREAM
frame if the stream is in the "Ready" or "Send" state.
An endpoint SHOULD copy the error code from the STOP_SENDING frame to
the RESET_STREAM frame it sends, but it can use any application error code.
The flag indicates that the entire response was sent to the socket up to the
last_buf flag. The flag is only usable for protocol implementations that call
ngx_http_write_filter() from header filter, such as HTTP/1.x and HTTP/3.
Similar to the previous change, a segmentation fault occurres when evaluating
SSL certificates on a QUIC connection due to an uninitialized stream session.
The fix is to adjust initializing the QUIC part of a connection until after
it has session and variables initialized.
Similarly, this appends logging error context for QUIC connections:
- client 127.0.0.1:54749 connected to 127.0.0.1:8880 while handling frames
- quic client timed out (60: Operation timed out) while handling quic input
A QUIC connection doesn't have c->log->data and friends initialized to sensible
values. Yet, a request can be created in the certificate callback with such an
assumption, which leads to a segmentation fault due to null pointer dereference
in ngx_http_free_request(). The fix is to adjust initializing the QUIC part of
a connection such that it has all of that in place.
Further, this appends logging error context for unsuccessful QUIC handshakes:
- cannot load certificate .. while handling frames
- SSL_do_handshake() failed .. while sending frames
Previously the counter was not incremented for HTTP/3 streams, but still
decremented in ngx_http_close_connection(). There are two solutions here, one
is to increment the counter for HTTP/3 streams, and the other one is not to
decrement the counter for HTTP/3 streams. The latter solution looks
inconsistent with ngx_stat_reading/ngx_stat_writing, which are incremented on a
per-request basis. The change adds ngx_stat_active increment for HTTP/3
request and push streams.
Notably, it is to avoid setting the TCP_NODELAY flag for QUIC streams
in ngx_http_upstream_send_response(). It is an invalid operation on
inherently SOCK_DGRAM sockets, which leads to QUIC connection close.
The change reduces diff to the default branch in stream content phase.
This function was only referenced from ngx_http_v3_create_push_request() to
initialize push connection log. Now the log handler is copied from the parent
request connection.
The change reduces diff to the default branch.
The functions ngx_quic_handle_read_event() and ngx_quic_handle_write_event()
are added. Previously this code was a part of ngx_handle_read_event() and
ngx_handle_write_event().
The change simplifies ngx_handle_read_event() and ngx_handle_write_event()
by moving QUIC-related code to a QUIC source file.
Previously it had -1 as fd. This fixes proxying, which relies on downstream
connection having a real fd. Also, this reduces diff to the default branch for
ngx_close_connection().
The request body filter chain is no longer called after processing
a DATA frame. Instead, we now post a read event to do this. This
ensures that multiple small DATA frames read during the same event loop
iteration are coalesced together, resulting in much faster processing.
Since rb->buf can now contain unprocessed data, window update is no
longer sent in ngx_http_v2_state_read_data() in case of flow control
being used due to filter buffering. Instead, window will be updated
by ngx_http_v2_read_client_request_body_handler() in the posted read
event.
Following rb->filter_need_buffering changes, request body reading is
only finished after the filter chain is called and rb->last_saved is set.
As such, with r->request_body_no_buffering, timer on fc->read is no
longer removed when the last part of the body is received, potentially
resulting in incorrect behaviour.
The fix is to call ngx_http_v2_process_request_body() from the
ngx_http_v2_read_unbuffered_request_body() function instead of
directly calling ngx_http_v2_filter_request_body(), so the timer
is properly removed.
In the body read handler, the window was incorrectly calculated
based on the full buffer size instead of the amount of free space
in the buffer. If the request body is buffered by a filter, and
the buffer is not empty after the read event is generated by the
filter to resume request body processing, this could result in
"http2 negative window update" alerts.
Further, in the body ready handler and in ngx_http_v2_state_read_data()
the buffer wasn't cleared when the data were already written to disk,
so the client might stuck without window updates.
If a MAX_DATA frame was received before any stream was created, then the worker
process would crash in nginx_quic_handle_max_data_frame() while traversing the
stream tree. The issue is solved by adding a check that makes sure the tree is
not empty.
If a filter wants to buffer the request body during reading (for
example, to check an external scanner), it can now do so. To make
it possible, the code now checks rb->last_saved (introduced in the
previous change) along with rb->rest == 0.
Since in HTTP/2 this requires flow control to avoid overflowing the
request body buffer, so filters which need buffering have to set
the rb->filter_need_buffering flag on the first filter call. (Note
that each filter is expected to call the next filter, so all filters
will be able set the flag if needed.)
It indicates that the last buffer was received by the save filter,
and can be used to check this at higher levels. To be used in the
following changes.
If due to an error ngx_http_request_body_save_filter() is called
more than once with rb->rest == 0, this used to result in a segmentation
fault. Added an alert to catch such errors, just in case.
Previously, fully preread unbuffered requests larger than client body
buffer size were saved to disk, despite the fact that "unbuffered" is
expected to imply no disk buffering.
The save body filter saves the request body to disk once the buffer is full.
Yet in HTTP/2 this might happen even if there is no need to save anything
to disk, notably when content length is known and the END_STREAM flag is
sent in a separate empty DATA frame. Workaround is to provide additional
byte in the buffer, so saving the request body won't be triggered.
This fixes unexpected request body disk buffering in HTTP/2 observed after
the previous change when content length is known and the END_STREAM flag
is sent in a separate empty DATA frame.
In particular, now the code always uses a buffer limited by
client_body_buffer_size. At the cost of an additional copy it
ensures that small DATA frames are not directly mapped to small
write() syscalls, but rather buffered in memory before writing.
Further, requests without Content-Length are no longer forced
to use temporary files.
With SSL it is possible that an established connection is ready for
reading after the handshake. Further, events might be already disabled
in case of level-triggered event methods. If this happens and
ngx_http_upstream_send_request() blocks waiting for some data from
the upstream, such as flow control in case of gRPC, the connection
will time out due to no read events on the upstream connection.
Fix is to explicitly check the c->read->ready flag if sending request
blocks and post a read event if it is set.
Note that while it is possible to modify ngx_ssl_handshake() to keep
read events active, this won't completely resolve the issue, since
there can be data already received during the SSL handshake
(see 573bd30e46b4).
Hash initialization ignores elements with key.data set to NULL.
Nevertheless, the initial hash bucket size check didn't skip them,
resulting in unnecessary restrictions on, for example, variables with
long names and with the NGX_HTTP_VARIABLE_NOHASH flag.
Fix is to update the initial hash bucket size check to skip elements
with key.data set to NULL, similarly to how it is done in other parts
of the code.
Requires OpenSSL 3.0 compiled with "enable-ktls" option. Further, KTLS
needs to be enabled in kernel, and in OpenSSL, either via OpenSSL
configuration file or with "ssl_conf_command Options KTLS;" in nginx
configuration.
On FreeBSD, kernel TLS is available starting with FreeBSD 13.0, and
can be enabled with "sysctl kern.ipc.tls.enable=1" and "kldload ktls_ocf"
to load a software backend, see man ktls(4) for details.
On Linux, kernel TLS is available starting with kernel 4.13 (at least 5.2
is recommended), and needs kernel compiled with CONFIG_TLS=y (with
CONFIG_TLS=m, which is used at least on Ubuntu 21.04 by default,
the tls module needs to be loaded with "modprobe tls").
While clock_gettime(CLOCK_MONOTONIC_COARSE) is faster than
clock_gettime(CLOCK_MONOTONIC), the latter is fast enough on Linux for
practical usage, and the difference is negligible compared to other costs
at each event loop iteration. On the other hand, CLOCK_MONOTONIC_COARSE
causes various issues with typical CONFIG_HZ=250, notably very inaccurate
limit_rate handling in some edge cases (ticket #1678) and negative difference
between $request_time and $upstream_response_time (ticket #1965).
This is a recommended behavior by RFC 7301 and is useful
for mitigation of protocol confusion attacks [1].
To avoid possible negative effects, list of supported protocols
was extended to include all possible HTTP protocol ALPN IDs
registered by IANA [2], i.e. "http/1.0" and "http/0.9".
[1] https://alpaca-attack.com/
[2] https://www.iana.org/assignments/tls-extensiontype-values/
In b87b7092cedb (nginx 1.21.1), logging level of "upstream sent invalid
header" errors was accidentally changed to "info". This change restores
the "error" level, which is a proper logging level for upstream-side
errors.
The u->keepalive flag is initialized early if the response has no body
(or an empty body), and needs to be reset if there are any extra data,
similarly to how it is done in ngx_http_proxy_copy_filter(). Missed
in 83c4622053b0.
The "proxy_half_close" directive enables handling of TCP half close. If
enabled, connection to proxied server is kept open until both read ends get
EOF. Write end shutdown is properly transmitted via proxy.
Do this only when the entire request body is empty and
r->request_body_in_file_only is set.
The issue manifested itself with missing warning "a client request body is
buffered to a temporary file" when the entire rb->buf is full and all buffers
are delayed by a filter.
This adds new Auth-SSL-Protocol and Auth-SSL-Cipher headers to
the mail proxy auth protocol when SSL is enabled.
This can be useful for detecting users using older clients that
negotiate old ciphers when you want to upgrade to newer
TLS versions of remove suppport for old and insecure ciphers.
You can use your auth backend to notify these users before the
upgrade that they either need to upgrade their client software
or contact your support team to work out an upgrade path.
To load old/weak server or client certificates it might be needed to adjust
the security level, as introduced in OpenSSL 1.1.0. This change ensures that
ciphers are set before loading the certificates, so security level changes
via the cipher string apply to certificate loading.
Export ciphers are forbidden to negotiate in TLS 1.1 and later protocol modes.
They are disabled since OpenSSL 1.0.2g by default unless explicitly configured
with "enable-weak-ssl-ciphers", and completely removed in OpenSSL 1.1.0.
A new behaviour was introduced in OpenSSL 1.1.1e, when a peer does not send
close_notify before closing the connection. Previously, it was to return
SSL_ERROR_SYSCALL with errno 0, known since at least OpenSSL 0.9.7, and is
handled gracefully in nginx. Now it returns SSL_ERROR_SSL with a distinct
reason SSL_R_UNEXPECTED_EOF_WHILE_READING ("unexpected eof while reading").
This leads to critical errors seen in nginx within various routines such as
SSL_do_handshake(), SSL_read(), SSL_shutdown(). The behaviour was restored
in OpenSSL 1.1.1f, but presents in OpenSSL 3.0 by default.
Use of the SSL_OP_IGNORE_UNEXPECTED_EOF option added in OpenSSL 3.0 allows
to set a compatible behaviour to return SSL_ERROR_ZERO_RETURN:
https://git.openssl.org/?p=openssl.git;a=commitdiff;h=09b90e0
See for additional details: https://github.com/openssl/openssl/issues/11381
The OPENSSL_SUPPRESS_DEPRECATED macro is used to suppress deprecation warnings.
This covers Session Tickets keys, SSL Engine, DH low level API for DHE ciphers.
Unlike OPENSSL_API_COMPAT, it works well with OpenSSL built with no-deprecated.
In particular, it doesn't unhide various macros in OpenSSL includes, which are
meant to be hidden under OPENSSL_NO_DEPRECATED.
The only consumer is a callback function for SSL_CTX_set_tmp_rsa_callback()
deprecated in OpenSSL 1.1.0. Now the function is conditionally compiled too.
The latest HTTP/1.1 draft describes Transfer-Encoding in HTTP/1.0 as having
potentially faulty message framing as that could have been forwarded without
handling of the chunked encoding, and forbids processing subsequest requests
over that connection: https://github.com/httpwg/http-core/issues/879.
While handling of such requests is permitted, the most secure approach seems
to reject them.
The c->read->ready and c->write->ready flags might be reset during
the handshake, and not set again if the handshake was finished on
the other event. At the same time, some data might be read from
the socket during the handshake, so missing c->read->ready flag might
result in a connection hang, for example, when waiting for an SMTP
greeting (which was already received during the handshake).
Found by Sergey Kandaurov.
Previously, when cleaning up a QUIC stream in shutdown mode,
ngx_quic_shutdown_quic() was called, which could close the QUIC connection
right away. This could be a problem if the connection was referenced up the
stack. For example, this could happen in ngx_quic_init_streams(),
ngx_quic_close_streams(), ngx_quic_create_client_stream() etc.
With a typical HTTP/3 client the issue is unlikely because of HTTP/3 uni
streams which need a posted event to close. In this case QUIC connection
cannot be closed right away.
Now QUIC connection read event is posted and it will shut down the connection
asynchronously.
After receiving GOAWAY, client is not supposed to create new streams. However,
until client reads this frame, we allow it to create new streams, which are
gracefully rejected. To prevent client from abusing this algorithm, a new
limit is introduced. Upon reaching keepalive_requests * 2, server now closes
the entire QUIC connection claiming excessive load.
The "hq" mode is HTTP/0.9-1.1 over QUIC. The following limits are introduced:
- uni streams are not allowed
- keepalive_requests is enforced
- keepalive_time is enforced
In case of error, QUIC connection is finalized with 0x101 code. This code
corresponds to HTTP/3 General Protocol Error.
Previously, in-flight byte counter and congestion window were properly
maintained, but the limit was not properly implemented.
Now a new datagram is sent only if in-flight byte counter is less than window.
The limit is datagram-based, which means that a single datagram may lead to
exceeding the limit, but the next one will not be sent.
Previously, the error was ignored leading to unnecessary retransmits.
Now, unsent frames are returned into output queue, state is reset, and
timer is started for the next send attempt.
As per quic-http-34:
Endpoints SHOULD create the HTTP control stream as well as the
unidirectional streams required by mandatory extensions (such as the
QPACK encoder and decoder streams) first, and then create additional
streams as allowed by their peer.
Previously, client could create and destroy additional uni streams unlimited
number of times before creating mandatory streams.
OpenSSL is known to provide read keys for an encryption level before the
level is active in TLS, following the old BoringSSL API. In BoringSSL,
it was then fixed to defer releasing read keys until QUIC may use them.
The directive enables usage of UDP segmentation offloading by quic.
By default, gso is disabled since it is not always operational when
detected (depends on interface configuration).
To improve output performance, UDP segmentation offloading is used
if available. If there is a significant amount of data in an output
queue and path is verified, QUIC packets are not sent one-by-one,
but instead are collected in a buffer, which is then passed to kernel
in a single sendmsg call, using UDP GSO. Such method greatly decreases
number of system calls and thus system load.
Additionally, the ngx_init_srcaddr_cmsg() function is introduced which
initializes control message with connection local address.
The NGX_HAVE_ADDRINFO_CMSG macro is defined when at least one of methods
to deal with corresponding control message is available.
The ngx_wsasend_chain() and ngx_wsarecv_chain() functions were
modified to use only preallocated memory, and the number of
preallocated wsabufs was increased to 64.
Sometimes, QUIC packets need to be of certain (or minimal) size. This is
achieved by adding PADDING frames. It is possible, that adding padding will
affect header size, thus forcing us to recalculate padding size once more.
In d1bde5c3c5d2, the number of preallocated iovec's for ngx_readv_chain()
was increased. Still, in some setups, the function might allocate memory
for iovec's from a connection pool, which is only freed when closing the
connection.
The ngx_readv_chain() function was modified to use only preallocated
memory, similarly to the ngx_writev_chain() change in 8e903522c17a.
Renamed header -> field per quic-qpack naming convention, in particular:
- Header Field -> Field Line
- Header Block -> (Encoded) Field Section
- Without Name Reference -> With Literal Name
- Header Acknowledgement -> Section Acknowledgment
Control characters (0x00-0x1f, 0x7f) and space are not expected to appear
in the Host header. Requests with such characters in the Host header are
now unconditionally rejected.
In 71edd9192f24 logging of invalid headers which were rejected with the
NGX_HTTP_PARSE_INVALID_HEADER error was restricted to just the "client
sent invalid header line" message, without any attempts to log the header
itself.
This patch returns logging of the header up to the invalid character and
the character itself. The r->header_end pointer is now properly set
in all cases to make logging possible.
The same logging is also introduced when parsing headers from upstream
servers.
Control characters (0x00-0x1f, 0x7f), space, and colon were never allowed in
header names. The only somewhat valid use is header continuation which nginx
never supported and which is explicitly obsolete by RFC 7230.
Previously, such headers were considered invalid and were ignored by default
(as per ignore_invalid_headers directive). With this change, such headers
are unconditionally rejected.
It is expected to make nginx more resilient to various attacks, in particular,
with ignore_invalid_headers switched off (which is inherently unsecure, though
nevertheless sometimes used in the wild).
Control characters (0x00-0x1f, 0x7f) were never allowed in URIs, and must
be percent-encoded by clients. Further, these are not believed to appear
in practice. On the other hand, passing such characters might make various
attacks possible or easier, despite the fact that currently allowed control
characters are not significant for HTTP request parsing.
From now on, requests with spaces in URIs are immediately rejected rather
than allowed. Spaces were allowed in 31e9677b15a1 (0.8.41) to handle bad
clients. It is believed that now this behaviour causes more harm than
good.
Per RFC 3986 only the following characters are allowed in URIs unescaped:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
And "%" can appear as a part of escaping itself. The following
characters are not allowed and need to be escaped: %00-%1F, %7F-%FF,
" ", """, "<", ">", "\", "^", "`", "{", "|", "}".
Not escaping ">" is known to cause problems at least with MS Exchange (see
http://nginx.org/pipermail/nginx-ru/2010-January/031261.html) and in
Tomcat (ticket #2191).
The patch adds escaping of the following chars in all URI parts: """, "<",
">", "\", "^", "`", "{", "|", "}". Note that comments are mostly preserved
to outline important characters being escaped.
HTTP clients are not allowed to generate such requests since Transfer-Encoding
introduction in RFC 2068, and they are not expected to appear in practice
except in attempts to perform a request smuggling attack. While handling of
such requests is strictly defined, the most secure approach seems to reject
them.
No valid CONNECT requests are expected to appear within nginx, since it
is not a forward proxy. Further, request line parsing will reject
proper CONNECT requests anyway, since we don't allow authority-form of
request-target. On the other hand, RFC 7230 specifies separate message
length rules for CONNECT which we don't support, so make sure to always
reject CONNECTs to avoid potential abuse.
Previously, TRACE requests were rejected before parsing Transfer-Encoding.
This is not important since keepalive is not enabled at this point anyway,
though rejecting such requests after properly parsing other headers is
less likely to cause issues in case of further code changes.
After 2096b21fcd10, a single RST_STREAM(NO_ERROR) may not result in an error.
This change removes several unnecessary ctx->type checks for such a case.
As per quic-http-34, these are the cases when this error should be generated:
If an endpoint receives a second SETTINGS frame
on the control stream, the endpoint MUST respond with a connection
error of type H3_FRAME_UNEXPECTED
SETTINGS frames MUST NOT be sent on any stream other than the control
stream. If an endpoint receives a SETTINGS frame on a different
stream, the endpoint MUST respond with a connection error of type
H3_FRAME_UNEXPECTED.
A client MUST NOT send a PUSH_PROMISE frame. A server MUST treat the
receipt of a PUSH_PROMISE frame as a connection error of type
H3_FRAME_UNEXPECTED; see Section 8.
The MAX_PUSH_ID frame is always sent on the control stream. Receipt
of a MAX_PUSH_ID frame on any other stream MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED.
Receipt of an invalid sequence of frames MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED; see Section 8. In
particular, a DATA frame before any HEADERS frame, or a HEADERS or
DATA frame after the trailing HEADERS frame, is considered invalid.
A CANCEL_PUSH frame is sent on the control stream. Receiving a
CANCEL_PUSH frame on a stream other than the control stream MUST be
treated as a connection error of type H3_FRAME_UNEXPECTED.
The GOAWAY frame is always sent on the control stream.
The quic-http-34 is ambiguous as to what error should be generated for the
first frame in control stream:
Each side MUST initiate a single control stream at the beginning of
the connection and send its SETTINGS frame as the first frame on this
stream. If the first frame of the control stream is any other frame
type, this MUST be treated as a connection error of type
H3_MISSING_SETTINGS.
If a DATA frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED.
If a HEADERS frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED.
Previously, H3_FRAME_UNEXPECTED had priority, but now H3_MISSING_SETTINGS has.
The arguments in the spec sound more compelling for H3_MISSING_SETTINGS.
- Function ngx_quic_control_flow() is introduced. This functions does
both MAX_DATA and MAX_STREAM_DATA flow controls. The function is called
from STREAM and RESET_STREAM frame handlers. Previously, flow control
was only accounted for STREAM. Also, MAX_DATA flow control was not accounted
at all.
- Function ngx_quic_update_flow() is introduced. This function advances flow
control windows and sends MAX_DATA/MAX_STREAM_DATA. The function is called
from RESET_STREAM frame handler, stream cleanup handler and stream recv()
handler.
Recent fixes to SSL shutdown with lingering close (554c6ae25ffc, 1.19.5)
broke logging of SSL variables. To make sure logging of SSL variables
works properly, avoid freeing c->ssl when doing an SSL shutdown before
lingering close.
Reported by Reinis Rozitis
(http://mailman.nginx.org/pipermail/nginx/2021-May/060670.html).
Instead of calling SSL_free() with each return point, introduced a single
place where cleanup happens. As a positive side effect, this fixes two
potential memory leaks on ngx_handle_read_event() and ngx_handle_write_event()
errors where there were no SSL_free() calls (though unlikely practical,
as errors there are only expected to happen due to bugs or kernel issues).
When starting processing a new encoder instruction, the header state is not
memzero'ed because generally it's burdensome. If the header value is empty,
this resulted in inserting a stale value left from the previous instruction.
Based on a patch by Zhiyong Sun.
On Linux, SO_REUSEADDR allows completely duplicate UDP sockets, so using
SO_REUSEADDR when testing configuration results in packets being dropped
if there is an existing traffic on the sockets being tested (ticket #2187).
While dropped packets are expected with UDP, it is better to avoid this
when possible.
With this change, SO_REUSEADDR is no longer set on datagram sockets when
testing configuration.
Since we anyway do not set SO_REUSEPORT when testing configuration
(see ecb5cd305b06), trying to open additional sockets does not make much
sense, as all these additional sockets are expected to result in EADDRINUSE
errors from bind(). On the other hand, there are reports that trying
to open these sockets takes significant time under load: total configuration
testing time greater than 15s was observed in ticket #2188, compared to less
than 1s without load.
With this change, no additional sockets are opened during testing
configuration.
As per quic-transport 34, FINAL_SIZE_ERROR is generated if an endpoint received
a STREAM frame or a RESET_STREAM frame containing a final size that was lower
than the size of stream data that was already received.
Since nginx always uses exactly one entry in the question section of
a DNS query, and never uses compression pointers in this entry, parsing
of a DNS response in ngx_resolver_process_response() does not expect
compression pointers to appear in the question section of the DNS
response. Indeed, compression pointers in the first name of a DNS response
hardly make sense, do not seem to be allowed by RFC 1035 (which says
"a pointer to a prior occurance of the same name", note "prior"), and
were never observed in practice.
Added an explicit check to ngx_resolver_process_response()'s parsing
of the question section to properly report an error if compression pointers
nevertheless appear in the question section.
Instead of checking on each label if we need to place a dot or not,
now it always adds a dot after a label, and reduces the resulting
length afterwards.
Previously, anything with any of the two high bits set were interpreted
as compression pointers. This is incorrect, as RFC 1035 clearly states
that "The 10 and 01 combinations are reserved for future use". Further,
the 01 combination is actually allocated for EDNS extended label type
(see RFC 2671 and RFC 6891), not really used though.
Fix is to reject unrecognized label types rather than misinterpreting
them as compression pointers.
It is believed to be harmless, and in the worst case it uses some
uninitialized memory as a part of the compression pointer length,
eventually leading to the "name is out of DNS response" error.
Generic function ngx_quic_order_bufs() is introduced. This function creates
and maintains a chain of buffers with holes. Holes are marked with b->sync
flag. Several buffers and holes in this chain may share the same underlying
memory buffer.
When processing STREAM frames with this function, frame data is copied only
once to the right place in the stream input chain. Previously data could
be copied twice. First when buffering an out-of-order frame data, and then
when filling stream buffer from ordered frame queue. Now there's only one
data chain for both tasks.
When variables are used in ssl_certificate or ssl_certificate_key, a request
is created in the certificate callback to evaluate the variables, and then
freed. Freeing it, however, updates c->log->action to "closing request",
resulting in confusing error messages like "client timed out ... while
closing request" when a client times out during the SSL handshake.
Fix is to restore c->log->action after calling ngx_http_free_request().
According to profiling, those two are among most frequently called,
so inlining is generally useful, and unrolling should help with it.
Further, this fixes undefined behaviour seen with invalid values.
Inspired by Yu Liu.
Similarly to smtpd_hard_error_limit in Postfix and smtp_max_unknown_commands
in Exim, specifies the number of errors after which the connection is closed.
The change is mostly the same as the SMTP one (04e43d03e153 and 3f5d0af4e40a),
and ensures that nginx is able to properly handle or reject multiple IMAP
commands. The s->cmd field is not really used and set for consistency.
Non-synchronizing literals handling in invalid/unknown commands is limited,
so when a non-synchronizing literal is detected at the end of a discarded
line, the connection is closed.
Only "A-Za-z0-9-._" characters now allowed (which is stricter than what
RFC 3501 requires, but expected to be enough for all known clients),
and tags shouldn't be longer than 32 characters.
Previously, s->backslash was set if any of the arguments was a quoted
string with a backslash character. After successful command parsing
this resulted in all arguments being filtered to remove backslashes.
This is, however, incorrect, as backslashes should not be removed from
IMAP literals. For example:
S: * OK IMAP4 ready
C: a01 login {9}
S: + OK
C: user\name "pass\"word"
S: * BAD internal server error
resulted in "Auth-User: username" instead of "Auth-User: user\name"
as it should.
Fix is to apply backslash filtering on per-argument basis during parsing.
As discussed in the previous change, s->arg_start handling in the "done"
labels of ngx_mail_pop3_parse_command(), ngx_mail_imap_parse_command(),
and ngx_mail_smtp_parse_command() is wrong: s->arg_start cannot be
set there, as it is handled and cleared on all code paths where the
"done" labels are reached. The relevant code is dead and now removed.
Previously, s->arg_start was left intact after invalid IMAP commands,
and this might result in an argument incorrectly added to the following
command. Similarly, s->backslash was left intact as well, leading
to unneeded backslash removal.
For example (LFs from the client are explicitly shown as "<LF>"):
S: * OK IMAP4 ready
C: a01 login "\<LF>
S: a01 BAD invalid command
C: a0000000000\2 authenticate <LF>
S: a00000000002 aBAD invalid command
The backslash followed by LF generates invalid command with s->arg_start
and s->backslash set, the following command incorrectly treats anything
from the old s->arg_start to the space after the command as an argument,
and removes the backslash from the tag. If there is no space, s->arg_end
will be NULL.
Both things seem to be harmless though. In particular:
- This can be used to provide an incorrect argument to a command without
arguments. The only command which seems to look at the single argument
is AUTHENTICATE, and it checks the argument length before trying to
access it.
- Backslash removal uses the "end" pointer, and stops due to "src < end"
condition instead of scanning all the process memory if s->arg_end is
NULL (and arg[0].len is huge).
- There should be no backslashes in unquoted strings.
An obvious fix is to clear s->arg_start and s->backslash on invalid commands,
similarly to how it is done in POP3 parsing (added in 810:e3aa8f305d21) and
SMTP parsing.
This, however, makes it clear that s->arg_start handling in the "done"
label is wrong: s->arg_start cannot be legitimately set there, as it
is expected to be cleared in all possible cases when the "done" label is
reached. The relevant code is dead and will be removed by the following
change.
The change is mostly the same as the SMTP one (04e43d03e153 and 3f5d0af4e40a),
and ensures that nginx is able to properly handle or reject multiple POP3
commands, as required by the PIPELINING capability (RFC 2449). The s->cmd
field is not really used and set for consistency.
There is no need to scan buffer from s->buffer->pos, as we already scanned
the buffer till "p" and wasn't able to find an LF.
There is no real need for this change in SMTP, since it is at most a
microoptimization of a non-common code path. Similar code in IMAP, however,
will have to start scanning from "p" to be correct, since there can be
newlines in IMAP literals.
Previously, if an invalid SMTP command was split between reads, nginx failed
to wait for LF before returning an error, and interpreted the rest of the
command received later as a separate command.
The sw_invalid state in ngx_mail_smtp_parse_command(), introduced in
04e43d03e153, did not work, since ngx_mail_smtp_auth_state() clears
s->state when returning an error due to NGX_MAIL_PARSE_INVALID_COMMAND.
And not clearing s->state will introduce another problem: the rest
of the command would trigger duplicate error when rest of the command is
received.
Fix is to return NGX_AGAIN from ngx_mail_smtp_parse_command() until full
command is received.
Previously, if there were some pipelined SMTP data in the buffer when
a proxied connection with the backend was established, nginx called
ngx_mail_proxy_handler() to send these data, and not tried to send the
response to the last command. In most cases, this response was later sent
along with the response to the pipelined command, but if for some reason
client decides to wait for the response before finishing the next command
this might result in a connection hang.
Fix is to always call ngx_mail_proxy_handler() to send the response, and
additionally post an event to send the pipelined data if needed.
When using server push, a segfault occured because
ngx_http_v3_create_push_request() accessed ngx_http_v3_session_t object the old
way. Prior to 9ec3e71f8a61, HTTP/3 session was stored directly in c->data.
Now it's referenced by the v3_session field of ngx_http_connection_t.
With this change, it is now possible to use ngx_conf_merge_ptr_value()
to merge complex values. This change follows much earlier changes in
ngx_conf_merge_ptr_value() and ngx_conf_set_str_array_slot()
in 1452:cd586e963db0 (0.6.10) and 1701:40d004d95d88 (0.6.22), and the
change in ngx_conf_set_keyval_slot() (7728:485dba3e2a01, 1.19.4).
To preserve compatibility with existing 3rd party modules, both NULL
and NGX_CONF_UNSET_PTR are accepted for now.
Client IDs cannot be reused on different paths. This change allows to reuse
client id previosly seen on the same path (but with different dcid) in case
when no unused client IDs are available.
According to quic-transport, 9.1:
PATH_CHALLENGE, PATH_RESPONSE, NEW_CONNECTION_ID, and PADDING frames
are "probing frames", and all other frames are "non-probing frames".
Previously it was parsed in ngx_http_v3_streams.c, while the streams were
parsed in ngx_http_v3_parse.c. Now all parsing is done in one file. This
simplifies parsing API and cleans up ngx_http_v3_streams.c.
Previously, an ngx_http_v3_connection_t object was created for HTTP/3 and
then assinged to c->data instead of the generic ngx_http_connection_t object.
Now a direct reference is added to ngx_http_connection_t, which is less
confusing and does not require a flag for http3.
It's used instead of accessing c->quic->parent->data directly. Apart from being
simpler, it allows to change the way session is stored in the future by changing
the macro.
Now ngx_http_v3_ack_header() and ngx_http_v3_inc_insert_count() always generate
decoder error. Our implementation does not use dynamic tables and does not
expect client to send Section Acknowledgement or Insert Count Increment.
Stream Cancellation, on the other hand, is allowed to be sent anyway. This is
why ngx_http_v3_cancel_stream() does not return an error.
The patch adds proper transitions between multiple networking addresses that
can be used by a single quic connection. New networking paths are validated
using PATH_CHALLENGE/PATH_RESPONSE frames.
Due to structure's alignment, some uninitialized memory contents may have
been passed between processes.
Zeroing was removed in 0215ec9aaa8a.
Reported by Johnny Wang.
7.2.1:
If a DATA frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED;
7.2.2:
If a HEADERS frame is received on a control stream, the recipient MUST
respond with a connection error (Section 8) of type H3_FRAME_UNEXPECTED.
With SMTP pipelining, ngx_mail_read_command() can be called with s->buffer
without any space available, to parse additional commands received to the
buffer on previous calls. Previously, this resulted in recv() being called
with zero length, resulting in zero being returned, which was interpreted
as a connection close by the client, so nginx silently closed connection.
Fix is to avoid calling c->recv() if there is no free space in the buffer,
but continue parsing of the already received commands.
Currently both names are used which is confusing. Historically these were
different objects, but now it's the same one. The name qs (quic stream) makes
more sense than sn (stream node).
PATH_RESPONSE was explicitly forbidden in 0-RTT since at least draft-22, but
the Frame Types table was not updated until recently while in IESG evaluation.
Stop including QUIC headers with no user-serviceable parts inside.
This allows to provide a much cleaner QUIC interface. To cope with that,
ngx_quic_derive_key() is now explicitly exported for v3 and quic modules.
Additionally, this completely hides the ngx_quic_keys_t internal type.
The "ngx_event_quic.h" header file now contains only public definitions,
used by modules. All internal definitions are moved into
the "ngx_event_quic_connection.h" header file.
The function correctly cleans up resources in case of failure to create
initial server id: it removes previously created udp node for odcid from
listening rbtree.
It turns out no browsers implement HTTP/2 GOAWAY handling properly, and
large enough number of resources on a page results in failures to load
some resources. In particular, Chrome seems to experience errors if
loading of all resources requires more than 1 connection (while it
is usually able to retry requests at least once, even with 2 connections
there are occasional failures for some reason), Safari if loading requires
more than 3 connections, and Firefox if loading requires more than 10
connections (can be configured with network.http.request.max-attempts,
defaults to 10).
It does not seem to be possible to resolve this on nginx side, even strict
limiting of maximum concurrency does not help, and loading issues seems to
be triggered by merely queueing of a request for a particular connection.
The only available mitigation seems to use higher keepalive_requests value.
The new default is 1000 and matches previously used default for
http2_max_requests. It is expected to be enough for 99.98% of the pages
(https://httparchive.org/reports/state-of-the-web?start=latest#reqTotal)
even in Chrome.
Similar to lingering_time, it limits total connection lifetime before
keepalive is switched off. The default is 1 hour, which is close to
the total maximum connection lifetime possible with default
keepalive_requests and keepalive_timeout.
Firefox uses several idle streams for PRIORITY frames[1], and
"http2_max_concurrent_streams 1;" results in "client sent too many
PRIORITY frames" errors when a connection is established by Firefox.
Fix is to relax the PRIORITY frames limit to use at least 100 as
the initial value (which is the recommended by the HTTP/2 protocol
minimum limit on the number of concurrent streams, so it is not
unreasonable for clients to assume that similar number of idle streams
can be used for prioritization).
[1] https://hg.mozilla.org/mozilla-central/file/32a9e6e145d6e3071c3993a20bb603a2f388722b/netwerk/protocol/http/Http2Stream.cpp#l1270
In current versions (all versions based on zlib 1.2.11, at least
since 2018) it no longer uses 64K hash and does not force window
bits to 13 if it is less than 13. That is, it needs just 16 bytes
more memory than normal zlib, so these bytes are simply added to
the normal size calculation.
In limit_req, auth_delay, and upstream code to check for broken
connections, tests for possible connection close by the client
did not work if the connection was already closed when relevant
event handler was set. This happened because there were no additional
events in case of edge-triggered event methods, and read events
were disabled in case of level-triggered ones.
Fix is to explicitly post a read event if the c->read->ready flag
is set.
For new data to be reported with eventport on Solaris,
ngx_handle_read_event() needs to be called after reading response
headers. To do so, ngx_http_upstream_process_non_buffered_upstream()
now called unconditionally if there are no prepread data. This
won't cause any read() syscalls as long as upstream connection
is not ready for reading (c->read->ready is not set), but will result
in proper handling of all events.
If we need to be notified about further events, ngx_handle_read_event()
needs to be called after a read event is processed. Without this,
an event can be removed from the kernel and won't be reported again,
notably when using oneshot event methods, such as eventport on Solaris.
While here, error handling is also added, similar to one present in
ngx_resolver_tcp_read(). This is not expected to make a difference
and mostly added for consistency.
If an attempt is made to delete an event which was already reported,
port_dissociate() returns an error. Fix is avoid doing anything if
ev->active is not set.
Possible alternative approach would be to avoid calling ngx_del_event()
at all if ev->active is not set. This approach, however, will require
something else to re-add the other event of the connection, since both
read and write events are dissociated if an event is reported on a file
descriptor. Currently ngx_eventport_del_event() re-associates write
event if called to delete read event, and vice versa.
If, at the start of an event loop iteration, there are any timers
in the past (including timers expiring now), the ngx_process_events()
function is called with zero timeout, and returns immediately even
if there are no events. But the following code only calls
ngx_event_expire_timers() if time actually changed, so this results
in nginx spinning in the event loop till current time changes.
While such timers are not expected to appear under normal conditions,
as all such timers should be removed on previous event loop iterations,
they still can appear due to bugs, zero timeouts set in the configuration
(if this is not explicitly handled by the code), or due to external
time changes on systems without clock_gettime(CLOCK_MONOTONIC).
Fix is to call ngx_event_expire_timers() unconditionally. Calling
it on each event loop iteration is not expected to be significant from
performance point of view, especially compared to a syscall in
ngx_process_events().