After fe919fd63b0b, processing QUIC streams was postponed until after handshake
completion, which means that 0-RTT is effectively off. With ssl_ocsp enabled,
it could be further delayed. This differs from how OCSP validation works with
SSL_read_early_data(). With this change, processing QUIC streams is unlocked
when obtaining 0-RTT secret.
The sent queue is sorted by packet number. It is possible to avoid
traversing full queue while handling ack ranges. It makes sense to
start traversing from the queue head (i.e. check oldest packets first).
With this patch, all traffic over a QUIC connection is compared to traffic
over QUIC streams. As long as total traffic is many times larger than stream
traffic, we consider this to be a flood.
With this patch, all traffic over HTTP/3 bidi and uni streams is counted in
the h3c->total_bytes field, and payload traffic is counted in the
h3c->payload_bytes field. As long as total traffic is many times larger than
payload traffic, we consider this to be a flood.
Request header traffic is counted as if all fields are literal. Response
header traffic is counted as is.
Checking the reset after encryption avoids false positives. More importantly,
it avoids the check entirely in the usual case where decryption succeeds.
RFC 9000, 10.3.1 Detecting a Stateless Reset
Endpoints MAY skip this check if any packet from a datagram is
successfully processed.
As per RFC 9000:
An endpoint that receives a STOP_SENDING frame MUST send a RESET_STREAM
frame if the stream is in the "Ready" or "Send" state.
An endpoint SHOULD copy the error code from the STOP_SENDING frame to
the RESET_STREAM frame it sends, but it can use any application error code.
The flag indicates that the entire response was sent to the socket up to the
last_buf flag. The flag is only usable for protocol implementations that call
ngx_http_write_filter() from header filter, such as HTTP/1.x and HTTP/3.
Similar to the previous change, a segmentation fault occurres when evaluating
SSL certificates on a QUIC connection due to an uninitialized stream session.
The fix is to adjust initializing the QUIC part of a connection until after
it has session and variables initialized.
Similarly, this appends logging error context for QUIC connections:
- client 127.0.0.1:54749 connected to 127.0.0.1:8880 while handling frames
- quic client timed out (60: Operation timed out) while handling quic input
A QUIC connection doesn't have c->log->data and friends initialized to sensible
values. Yet, a request can be created in the certificate callback with such an
assumption, which leads to a segmentation fault due to null pointer dereference
in ngx_http_free_request(). The fix is to adjust initializing the QUIC part of
a connection such that it has all of that in place.
Further, this appends logging error context for unsuccessful QUIC handshakes:
- cannot load certificate .. while handling frames
- SSL_do_handshake() failed .. while sending frames
Previously the counter was not incremented for HTTP/3 streams, but still
decremented in ngx_http_close_connection(). There are two solutions here, one
is to increment the counter for HTTP/3 streams, and the other one is not to
decrement the counter for HTTP/3 streams. The latter solution looks
inconsistent with ngx_stat_reading/ngx_stat_writing, which are incremented on a
per-request basis. The change adds ngx_stat_active increment for HTTP/3
request and push streams.
Notably, it is to avoid setting the TCP_NODELAY flag for QUIC streams
in ngx_http_upstream_send_response(). It is an invalid operation on
inherently SOCK_DGRAM sockets, which leads to QUIC connection close.
The change reduces diff to the default branch in stream content phase.
This function was only referenced from ngx_http_v3_create_push_request() to
initialize push connection log. Now the log handler is copied from the parent
request connection.
The change reduces diff to the default branch.
The functions ngx_quic_handle_read_event() and ngx_quic_handle_write_event()
are added. Previously this code was a part of ngx_handle_read_event() and
ngx_handle_write_event().
The change simplifies ngx_handle_read_event() and ngx_handle_write_event()
by moving QUIC-related code to a QUIC source file.
Previously it had -1 as fd. This fixes proxying, which relies on downstream
connection having a real fd. Also, this reduces diff to the default branch for
ngx_close_connection().
The request body filter chain is no longer called after processing
a DATA frame. Instead, we now post a read event to do this. This
ensures that multiple small DATA frames read during the same event loop
iteration are coalesced together, resulting in much faster processing.
Since rb->buf can now contain unprocessed data, window update is no
longer sent in ngx_http_v2_state_read_data() in case of flow control
being used due to filter buffering. Instead, window will be updated
by ngx_http_v2_read_client_request_body_handler() in the posted read
event.
Following rb->filter_need_buffering changes, request body reading is
only finished after the filter chain is called and rb->last_saved is set.
As such, with r->request_body_no_buffering, timer on fc->read is no
longer removed when the last part of the body is received, potentially
resulting in incorrect behaviour.
The fix is to call ngx_http_v2_process_request_body() from the
ngx_http_v2_read_unbuffered_request_body() function instead of
directly calling ngx_http_v2_filter_request_body(), so the timer
is properly removed.
In the body read handler, the window was incorrectly calculated
based on the full buffer size instead of the amount of free space
in the buffer. If the request body is buffered by a filter, and
the buffer is not empty after the read event is generated by the
filter to resume request body processing, this could result in
"http2 negative window update" alerts.
Further, in the body ready handler and in ngx_http_v2_state_read_data()
the buffer wasn't cleared when the data were already written to disk,
so the client might stuck without window updates.
If a MAX_DATA frame was received before any stream was created, then the worker
process would crash in nginx_quic_handle_max_data_frame() while traversing the
stream tree. The issue is solved by adding a check that makes sure the tree is
not empty.
If a filter wants to buffer the request body during reading (for
example, to check an external scanner), it can now do so. To make
it possible, the code now checks rb->last_saved (introduced in the
previous change) along with rb->rest == 0.
Since in HTTP/2 this requires flow control to avoid overflowing the
request body buffer, so filters which need buffering have to set
the rb->filter_need_buffering flag on the first filter call. (Note
that each filter is expected to call the next filter, so all filters
will be able set the flag if needed.)
It indicates that the last buffer was received by the save filter,
and can be used to check this at higher levels. To be used in the
following changes.
If due to an error ngx_http_request_body_save_filter() is called
more than once with rb->rest == 0, this used to result in a segmentation
fault. Added an alert to catch such errors, just in case.
Previously, fully preread unbuffered requests larger than client body
buffer size were saved to disk, despite the fact that "unbuffered" is
expected to imply no disk buffering.
The save body filter saves the request body to disk once the buffer is full.
Yet in HTTP/2 this might happen even if there is no need to save anything
to disk, notably when content length is known and the END_STREAM flag is
sent in a separate empty DATA frame. Workaround is to provide additional
byte in the buffer, so saving the request body won't be triggered.
This fixes unexpected request body disk buffering in HTTP/2 observed after
the previous change when content length is known and the END_STREAM flag
is sent in a separate empty DATA frame.
In particular, now the code always uses a buffer limited by
client_body_buffer_size. At the cost of an additional copy it
ensures that small DATA frames are not directly mapped to small
write() syscalls, but rather buffered in memory before writing.
Further, requests without Content-Length are no longer forced
to use temporary files.
With SSL it is possible that an established connection is ready for
reading after the handshake. Further, events might be already disabled
in case of level-triggered event methods. If this happens and
ngx_http_upstream_send_request() blocks waiting for some data from
the upstream, such as flow control in case of gRPC, the connection
will time out due to no read events on the upstream connection.
Fix is to explicitly check the c->read->ready flag if sending request
blocks and post a read event if it is set.
Note that while it is possible to modify ngx_ssl_handshake() to keep
read events active, this won't completely resolve the issue, since
there can be data already received during the SSL handshake
(see 573bd30e46b4).
Hash initialization ignores elements with key.data set to NULL.
Nevertheless, the initial hash bucket size check didn't skip them,
resulting in unnecessary restrictions on, for example, variables with
long names and with the NGX_HTTP_VARIABLE_NOHASH flag.
Fix is to update the initial hash bucket size check to skip elements
with key.data set to NULL, similarly to how it is done in other parts
of the code.
Requires OpenSSL 3.0 compiled with "enable-ktls" option. Further, KTLS
needs to be enabled in kernel, and in OpenSSL, either via OpenSSL
configuration file or with "ssl_conf_command Options KTLS;" in nginx
configuration.
On FreeBSD, kernel TLS is available starting with FreeBSD 13.0, and
can be enabled with "sysctl kern.ipc.tls.enable=1" and "kldload ktls_ocf"
to load a software backend, see man ktls(4) for details.
On Linux, kernel TLS is available starting with kernel 4.13 (at least 5.2
is recommended), and needs kernel compiled with CONFIG_TLS=y (with
CONFIG_TLS=m, which is used at least on Ubuntu 21.04 by default,
the tls module needs to be loaded with "modprobe tls").
While clock_gettime(CLOCK_MONOTONIC_COARSE) is faster than
clock_gettime(CLOCK_MONOTONIC), the latter is fast enough on Linux for
practical usage, and the difference is negligible compared to other costs
at each event loop iteration. On the other hand, CLOCK_MONOTONIC_COARSE
causes various issues with typical CONFIG_HZ=250, notably very inaccurate
limit_rate handling in some edge cases (ticket #1678) and negative difference
between $request_time and $upstream_response_time (ticket #1965).
This is a recommended behavior by RFC 7301 and is useful
for mitigation of protocol confusion attacks [1].
To avoid possible negative effects, list of supported protocols
was extended to include all possible HTTP protocol ALPN IDs
registered by IANA [2], i.e. "http/1.0" and "http/0.9".
[1] https://alpaca-attack.com/
[2] https://www.iana.org/assignments/tls-extensiontype-values/
In b87b7092cedb (nginx 1.21.1), logging level of "upstream sent invalid
header" errors was accidentally changed to "info". This change restores
the "error" level, which is a proper logging level for upstream-side
errors.
The u->keepalive flag is initialized early if the response has no body
(or an empty body), and needs to be reset if there are any extra data,
similarly to how it is done in ngx_http_proxy_copy_filter(). Missed
in 83c4622053b0.
The "proxy_half_close" directive enables handling of TCP half close. If
enabled, connection to proxied server is kept open until both read ends get
EOF. Write end shutdown is properly transmitted via proxy.
Do this only when the entire request body is empty and
r->request_body_in_file_only is set.
The issue manifested itself with missing warning "a client request body is
buffered to a temporary file" when the entire rb->buf is full and all buffers
are delayed by a filter.
This adds new Auth-SSL-Protocol and Auth-SSL-Cipher headers to
the mail proxy auth protocol when SSL is enabled.
This can be useful for detecting users using older clients that
negotiate old ciphers when you want to upgrade to newer
TLS versions of remove suppport for old and insecure ciphers.
You can use your auth backend to notify these users before the
upgrade that they either need to upgrade their client software
or contact your support team to work out an upgrade path.
To load old/weak server or client certificates it might be needed to adjust
the security level, as introduced in OpenSSL 1.1.0. This change ensures that
ciphers are set before loading the certificates, so security level changes
via the cipher string apply to certificate loading.
Export ciphers are forbidden to negotiate in TLS 1.1 and later protocol modes.
They are disabled since OpenSSL 1.0.2g by default unless explicitly configured
with "enable-weak-ssl-ciphers", and completely removed in OpenSSL 1.1.0.
A new behaviour was introduced in OpenSSL 1.1.1e, when a peer does not send
close_notify before closing the connection. Previously, it was to return
SSL_ERROR_SYSCALL with errno 0, known since at least OpenSSL 0.9.7, and is
handled gracefully in nginx. Now it returns SSL_ERROR_SSL with a distinct
reason SSL_R_UNEXPECTED_EOF_WHILE_READING ("unexpected eof while reading").
This leads to critical errors seen in nginx within various routines such as
SSL_do_handshake(), SSL_read(), SSL_shutdown(). The behaviour was restored
in OpenSSL 1.1.1f, but presents in OpenSSL 3.0 by default.
Use of the SSL_OP_IGNORE_UNEXPECTED_EOF option added in OpenSSL 3.0 allows
to set a compatible behaviour to return SSL_ERROR_ZERO_RETURN:
https://git.openssl.org/?p=openssl.git;a=commitdiff;h=09b90e0
See for additional details: https://github.com/openssl/openssl/issues/11381
The OPENSSL_SUPPRESS_DEPRECATED macro is used to suppress deprecation warnings.
This covers Session Tickets keys, SSL Engine, DH low level API for DHE ciphers.
Unlike OPENSSL_API_COMPAT, it works well with OpenSSL built with no-deprecated.
In particular, it doesn't unhide various macros in OpenSSL includes, which are
meant to be hidden under OPENSSL_NO_DEPRECATED.
The only consumer is a callback function for SSL_CTX_set_tmp_rsa_callback()
deprecated in OpenSSL 1.1.0. Now the function is conditionally compiled too.
The latest HTTP/1.1 draft describes Transfer-Encoding in HTTP/1.0 as having
potentially faulty message framing as that could have been forwarded without
handling of the chunked encoding, and forbids processing subsequest requests
over that connection: https://github.com/httpwg/http-core/issues/879.
While handling of such requests is permitted, the most secure approach seems
to reject them.
The c->read->ready and c->write->ready flags might be reset during
the handshake, and not set again if the handshake was finished on
the other event. At the same time, some data might be read from
the socket during the handshake, so missing c->read->ready flag might
result in a connection hang, for example, when waiting for an SMTP
greeting (which was already received during the handshake).
Found by Sergey Kandaurov.
Previously, when cleaning up a QUIC stream in shutdown mode,
ngx_quic_shutdown_quic() was called, which could close the QUIC connection
right away. This could be a problem if the connection was referenced up the
stack. For example, this could happen in ngx_quic_init_streams(),
ngx_quic_close_streams(), ngx_quic_create_client_stream() etc.
With a typical HTTP/3 client the issue is unlikely because of HTTP/3 uni
streams which need a posted event to close. In this case QUIC connection
cannot be closed right away.
Now QUIC connection read event is posted and it will shut down the connection
asynchronously.
After receiving GOAWAY, client is not supposed to create new streams. However,
until client reads this frame, we allow it to create new streams, which are
gracefully rejected. To prevent client from abusing this algorithm, a new
limit is introduced. Upon reaching keepalive_requests * 2, server now closes
the entire QUIC connection claiming excessive load.
The "hq" mode is HTTP/0.9-1.1 over QUIC. The following limits are introduced:
- uni streams are not allowed
- keepalive_requests is enforced
- keepalive_time is enforced
In case of error, QUIC connection is finalized with 0x101 code. This code
corresponds to HTTP/3 General Protocol Error.
Previously, in-flight byte counter and congestion window were properly
maintained, but the limit was not properly implemented.
Now a new datagram is sent only if in-flight byte counter is less than window.
The limit is datagram-based, which means that a single datagram may lead to
exceeding the limit, but the next one will not be sent.
Previously, the error was ignored leading to unnecessary retransmits.
Now, unsent frames are returned into output queue, state is reset, and
timer is started for the next send attempt.
As per quic-http-34:
Endpoints SHOULD create the HTTP control stream as well as the
unidirectional streams required by mandatory extensions (such as the
QPACK encoder and decoder streams) first, and then create additional
streams as allowed by their peer.
Previously, client could create and destroy additional uni streams unlimited
number of times before creating mandatory streams.
OpenSSL is known to provide read keys for an encryption level before the
level is active in TLS, following the old BoringSSL API. In BoringSSL,
it was then fixed to defer releasing read keys until QUIC may use them.
The directive enables usage of UDP segmentation offloading by quic.
By default, gso is disabled since it is not always operational when
detected (depends on interface configuration).
To improve output performance, UDP segmentation offloading is used
if available. If there is a significant amount of data in an output
queue and path is verified, QUIC packets are not sent one-by-one,
but instead are collected in a buffer, which is then passed to kernel
in a single sendmsg call, using UDP GSO. Such method greatly decreases
number of system calls and thus system load.
Additionally, the ngx_init_srcaddr_cmsg() function is introduced which
initializes control message with connection local address.
The NGX_HAVE_ADDRINFO_CMSG macro is defined when at least one of methods
to deal with corresponding control message is available.
The ngx_wsasend_chain() and ngx_wsarecv_chain() functions were
modified to use only preallocated memory, and the number of
preallocated wsabufs was increased to 64.
Sometimes, QUIC packets need to be of certain (or minimal) size. This is
achieved by adding PADDING frames. It is possible, that adding padding will
affect header size, thus forcing us to recalculate padding size once more.
In d1bde5c3c5d2, the number of preallocated iovec's for ngx_readv_chain()
was increased. Still, in some setups, the function might allocate memory
for iovec's from a connection pool, which is only freed when closing the
connection.
The ngx_readv_chain() function was modified to use only preallocated
memory, similarly to the ngx_writev_chain() change in 8e903522c17a.
Renamed header -> field per quic-qpack naming convention, in particular:
- Header Field -> Field Line
- Header Block -> (Encoded) Field Section
- Without Name Reference -> With Literal Name
- Header Acknowledgement -> Section Acknowledgment
Control characters (0x00-0x1f, 0x7f) and space are not expected to appear
in the Host header. Requests with such characters in the Host header are
now unconditionally rejected.
In 71edd9192f24 logging of invalid headers which were rejected with the
NGX_HTTP_PARSE_INVALID_HEADER error was restricted to just the "client
sent invalid header line" message, without any attempts to log the header
itself.
This patch returns logging of the header up to the invalid character and
the character itself. The r->header_end pointer is now properly set
in all cases to make logging possible.
The same logging is also introduced when parsing headers from upstream
servers.
Control characters (0x00-0x1f, 0x7f), space, and colon were never allowed in
header names. The only somewhat valid use is header continuation which nginx
never supported and which is explicitly obsolete by RFC 7230.
Previously, such headers were considered invalid and were ignored by default
(as per ignore_invalid_headers directive). With this change, such headers
are unconditionally rejected.
It is expected to make nginx more resilient to various attacks, in particular,
with ignore_invalid_headers switched off (which is inherently unsecure, though
nevertheless sometimes used in the wild).
Control characters (0x00-0x1f, 0x7f) were never allowed in URIs, and must
be percent-encoded by clients. Further, these are not believed to appear
in practice. On the other hand, passing such characters might make various
attacks possible or easier, despite the fact that currently allowed control
characters are not significant for HTTP request parsing.
From now on, requests with spaces in URIs are immediately rejected rather
than allowed. Spaces were allowed in 31e9677b15a1 (0.8.41) to handle bad
clients. It is believed that now this behaviour causes more harm than
good.
Per RFC 3986 only the following characters are allowed in URIs unescaped:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
And "%" can appear as a part of escaping itself. The following
characters are not allowed and need to be escaped: %00-%1F, %7F-%FF,
" ", """, "<", ">", "\", "^", "`", "{", "|", "}".
Not escaping ">" is known to cause problems at least with MS Exchange (see
http://nginx.org/pipermail/nginx-ru/2010-January/031261.html) and in
Tomcat (ticket #2191).
The patch adds escaping of the following chars in all URI parts: """, "<",
">", "\", "^", "`", "{", "|", "}". Note that comments are mostly preserved
to outline important characters being escaped.
HTTP clients are not allowed to generate such requests since Transfer-Encoding
introduction in RFC 2068, and they are not expected to appear in practice
except in attempts to perform a request smuggling attack. While handling of
such requests is strictly defined, the most secure approach seems to reject
them.
No valid CONNECT requests are expected to appear within nginx, since it
is not a forward proxy. Further, request line parsing will reject
proper CONNECT requests anyway, since we don't allow authority-form of
request-target. On the other hand, RFC 7230 specifies separate message
length rules for CONNECT which we don't support, so make sure to always
reject CONNECTs to avoid potential abuse.
Previously, TRACE requests were rejected before parsing Transfer-Encoding.
This is not important since keepalive is not enabled at this point anyway,
though rejecting such requests after properly parsing other headers is
less likely to cause issues in case of further code changes.
After 2096b21fcd10, a single RST_STREAM(NO_ERROR) may not result in an error.
This change removes several unnecessary ctx->type checks for such a case.
As per quic-http-34, these are the cases when this error should be generated:
If an endpoint receives a second SETTINGS frame
on the control stream, the endpoint MUST respond with a connection
error of type H3_FRAME_UNEXPECTED
SETTINGS frames MUST NOT be sent on any stream other than the control
stream. If an endpoint receives a SETTINGS frame on a different
stream, the endpoint MUST respond with a connection error of type
H3_FRAME_UNEXPECTED.
A client MUST NOT send a PUSH_PROMISE frame. A server MUST treat the
receipt of a PUSH_PROMISE frame as a connection error of type
H3_FRAME_UNEXPECTED; see Section 8.
The MAX_PUSH_ID frame is always sent on the control stream. Receipt
of a MAX_PUSH_ID frame on any other stream MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED.
Receipt of an invalid sequence of frames MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED; see Section 8. In
particular, a DATA frame before any HEADERS frame, or a HEADERS or
DATA frame after the trailing HEADERS frame, is considered invalid.
A CANCEL_PUSH frame is sent on the control stream. Receiving a
CANCEL_PUSH frame on a stream other than the control stream MUST be
treated as a connection error of type H3_FRAME_UNEXPECTED.
The GOAWAY frame is always sent on the control stream.
The quic-http-34 is ambiguous as to what error should be generated for the
first frame in control stream:
Each side MUST initiate a single control stream at the beginning of
the connection and send its SETTINGS frame as the first frame on this
stream. If the first frame of the control stream is any other frame
type, this MUST be treated as a connection error of type
H3_MISSING_SETTINGS.
If a DATA frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED.
If a HEADERS frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED.
Previously, H3_FRAME_UNEXPECTED had priority, but now H3_MISSING_SETTINGS has.
The arguments in the spec sound more compelling for H3_MISSING_SETTINGS.
- Function ngx_quic_control_flow() is introduced. This functions does
both MAX_DATA and MAX_STREAM_DATA flow controls. The function is called
from STREAM and RESET_STREAM frame handlers. Previously, flow control
was only accounted for STREAM. Also, MAX_DATA flow control was not accounted
at all.
- Function ngx_quic_update_flow() is introduced. This function advances flow
control windows and sends MAX_DATA/MAX_STREAM_DATA. The function is called
from RESET_STREAM frame handler, stream cleanup handler and stream recv()
handler.
Recent fixes to SSL shutdown with lingering close (554c6ae25ffc, 1.19.5)
broke logging of SSL variables. To make sure logging of SSL variables
works properly, avoid freeing c->ssl when doing an SSL shutdown before
lingering close.
Reported by Reinis Rozitis
(http://mailman.nginx.org/pipermail/nginx/2021-May/060670.html).
Instead of calling SSL_free() with each return point, introduced a single
place where cleanup happens. As a positive side effect, this fixes two
potential memory leaks on ngx_handle_read_event() and ngx_handle_write_event()
errors where there were no SSL_free() calls (though unlikely practical,
as errors there are only expected to happen due to bugs or kernel issues).
When starting processing a new encoder instruction, the header state is not
memzero'ed because generally it's burdensome. If the header value is empty,
this resulted in inserting a stale value left from the previous instruction.
Based on a patch by Zhiyong Sun.
On Linux, SO_REUSEADDR allows completely duplicate UDP sockets, so using
SO_REUSEADDR when testing configuration results in packets being dropped
if there is an existing traffic on the sockets being tested (ticket #2187).
While dropped packets are expected with UDP, it is better to avoid this
when possible.
With this change, SO_REUSEADDR is no longer set on datagram sockets when
testing configuration.
Since we anyway do not set SO_REUSEPORT when testing configuration
(see ecb5cd305b06), trying to open additional sockets does not make much
sense, as all these additional sockets are expected to result in EADDRINUSE
errors from bind(). On the other hand, there are reports that trying
to open these sockets takes significant time under load: total configuration
testing time greater than 15s was observed in ticket #2188, compared to less
than 1s without load.
With this change, no additional sockets are opened during testing
configuration.
As per quic-transport 34, FINAL_SIZE_ERROR is generated if an endpoint received
a STREAM frame or a RESET_STREAM frame containing a final size that was lower
than the size of stream data that was already received.
Since nginx always uses exactly one entry in the question section of
a DNS query, and never uses compression pointers in this entry, parsing
of a DNS response in ngx_resolver_process_response() does not expect
compression pointers to appear in the question section of the DNS
response. Indeed, compression pointers in the first name of a DNS response
hardly make sense, do not seem to be allowed by RFC 1035 (which says
"a pointer to a prior occurance of the same name", note "prior"), and
were never observed in practice.
Added an explicit check to ngx_resolver_process_response()'s parsing
of the question section to properly report an error if compression pointers
nevertheless appear in the question section.
Instead of checking on each label if we need to place a dot or not,
now it always adds a dot after a label, and reduces the resulting
length afterwards.
Previously, anything with any of the two high bits set were interpreted
as compression pointers. This is incorrect, as RFC 1035 clearly states
that "The 10 and 01 combinations are reserved for future use". Further,
the 01 combination is actually allocated for EDNS extended label type
(see RFC 2671 and RFC 6891), not really used though.
Fix is to reject unrecognized label types rather than misinterpreting
them as compression pointers.
It is believed to be harmless, and in the worst case it uses some
uninitialized memory as a part of the compression pointer length,
eventually leading to the "name is out of DNS response" error.
Generic function ngx_quic_order_bufs() is introduced. This function creates
and maintains a chain of buffers with holes. Holes are marked with b->sync
flag. Several buffers and holes in this chain may share the same underlying
memory buffer.
When processing STREAM frames with this function, frame data is copied only
once to the right place in the stream input chain. Previously data could
be copied twice. First when buffering an out-of-order frame data, and then
when filling stream buffer from ordered frame queue. Now there's only one
data chain for both tasks.
When variables are used in ssl_certificate or ssl_certificate_key, a request
is created in the certificate callback to evaluate the variables, and then
freed. Freeing it, however, updates c->log->action to "closing request",
resulting in confusing error messages like "client timed out ... while
closing request" when a client times out during the SSL handshake.
Fix is to restore c->log->action after calling ngx_http_free_request().
According to profiling, those two are among most frequently called,
so inlining is generally useful, and unrolling should help with it.
Further, this fixes undefined behaviour seen with invalid values.
Inspired by Yu Liu.
Similarly to smtpd_hard_error_limit in Postfix and smtp_max_unknown_commands
in Exim, specifies the number of errors after which the connection is closed.
The change is mostly the same as the SMTP one (04e43d03e153 and 3f5d0af4e40a),
and ensures that nginx is able to properly handle or reject multiple IMAP
commands. The s->cmd field is not really used and set for consistency.
Non-synchronizing literals handling in invalid/unknown commands is limited,
so when a non-synchronizing literal is detected at the end of a discarded
line, the connection is closed.
Only "A-Za-z0-9-._" characters now allowed (which is stricter than what
RFC 3501 requires, but expected to be enough for all known clients),
and tags shouldn't be longer than 32 characters.
Previously, s->backslash was set if any of the arguments was a quoted
string with a backslash character. After successful command parsing
this resulted in all arguments being filtered to remove backslashes.
This is, however, incorrect, as backslashes should not be removed from
IMAP literals. For example:
S: * OK IMAP4 ready
C: a01 login {9}
S: + OK
C: user\name "pass\"word"
S: * BAD internal server error
resulted in "Auth-User: username" instead of "Auth-User: user\name"
as it should.
Fix is to apply backslash filtering on per-argument basis during parsing.
As discussed in the previous change, s->arg_start handling in the "done"
labels of ngx_mail_pop3_parse_command(), ngx_mail_imap_parse_command(),
and ngx_mail_smtp_parse_command() is wrong: s->arg_start cannot be
set there, as it is handled and cleared on all code paths where the
"done" labels are reached. The relevant code is dead and now removed.
Previously, s->arg_start was left intact after invalid IMAP commands,
and this might result in an argument incorrectly added to the following
command. Similarly, s->backslash was left intact as well, leading
to unneeded backslash removal.
For example (LFs from the client are explicitly shown as "<LF>"):
S: * OK IMAP4 ready
C: a01 login "\<LF>
S: a01 BAD invalid command
C: a0000000000\2 authenticate <LF>
S: a00000000002 aBAD invalid command
The backslash followed by LF generates invalid command with s->arg_start
and s->backslash set, the following command incorrectly treats anything
from the old s->arg_start to the space after the command as an argument,
and removes the backslash from the tag. If there is no space, s->arg_end
will be NULL.
Both things seem to be harmless though. In particular:
- This can be used to provide an incorrect argument to a command without
arguments. The only command which seems to look at the single argument
is AUTHENTICATE, and it checks the argument length before trying to
access it.
- Backslash removal uses the "end" pointer, and stops due to "src < end"
condition instead of scanning all the process memory if s->arg_end is
NULL (and arg[0].len is huge).
- There should be no backslashes in unquoted strings.
An obvious fix is to clear s->arg_start and s->backslash on invalid commands,
similarly to how it is done in POP3 parsing (added in 810:e3aa8f305d21) and
SMTP parsing.
This, however, makes it clear that s->arg_start handling in the "done"
label is wrong: s->arg_start cannot be legitimately set there, as it
is expected to be cleared in all possible cases when the "done" label is
reached. The relevant code is dead and will be removed by the following
change.
The change is mostly the same as the SMTP one (04e43d03e153 and 3f5d0af4e40a),
and ensures that nginx is able to properly handle or reject multiple POP3
commands, as required by the PIPELINING capability (RFC 2449). The s->cmd
field is not really used and set for consistency.
There is no need to scan buffer from s->buffer->pos, as we already scanned
the buffer till "p" and wasn't able to find an LF.
There is no real need for this change in SMTP, since it is at most a
microoptimization of a non-common code path. Similar code in IMAP, however,
will have to start scanning from "p" to be correct, since there can be
newlines in IMAP literals.
Previously, if an invalid SMTP command was split between reads, nginx failed
to wait for LF before returning an error, and interpreted the rest of the
command received later as a separate command.
The sw_invalid state in ngx_mail_smtp_parse_command(), introduced in
04e43d03e153, did not work, since ngx_mail_smtp_auth_state() clears
s->state when returning an error due to NGX_MAIL_PARSE_INVALID_COMMAND.
And not clearing s->state will introduce another problem: the rest
of the command would trigger duplicate error when rest of the command is
received.
Fix is to return NGX_AGAIN from ngx_mail_smtp_parse_command() until full
command is received.
Previously, if there were some pipelined SMTP data in the buffer when
a proxied connection with the backend was established, nginx called
ngx_mail_proxy_handler() to send these data, and not tried to send the
response to the last command. In most cases, this response was later sent
along with the response to the pipelined command, but if for some reason
client decides to wait for the response before finishing the next command
this might result in a connection hang.
Fix is to always call ngx_mail_proxy_handler() to send the response, and
additionally post an event to send the pipelined data if needed.
When using server push, a segfault occured because
ngx_http_v3_create_push_request() accessed ngx_http_v3_session_t object the old
way. Prior to 9ec3e71f8a61, HTTP/3 session was stored directly in c->data.
Now it's referenced by the v3_session field of ngx_http_connection_t.
With this change, it is now possible to use ngx_conf_merge_ptr_value()
to merge complex values. This change follows much earlier changes in
ngx_conf_merge_ptr_value() and ngx_conf_set_str_array_slot()
in 1452:cd586e963db0 (0.6.10) and 1701:40d004d95d88 (0.6.22), and the
change in ngx_conf_set_keyval_slot() (7728:485dba3e2a01, 1.19.4).
To preserve compatibility with existing 3rd party modules, both NULL
and NGX_CONF_UNSET_PTR are accepted for now.
Client IDs cannot be reused on different paths. This change allows to reuse
client id previosly seen on the same path (but with different dcid) in case
when no unused client IDs are available.
According to quic-transport, 9.1:
PATH_CHALLENGE, PATH_RESPONSE, NEW_CONNECTION_ID, and PADDING frames
are "probing frames", and all other frames are "non-probing frames".
Previously it was parsed in ngx_http_v3_streams.c, while the streams were
parsed in ngx_http_v3_parse.c. Now all parsing is done in one file. This
simplifies parsing API and cleans up ngx_http_v3_streams.c.
Previously, an ngx_http_v3_connection_t object was created for HTTP/3 and
then assinged to c->data instead of the generic ngx_http_connection_t object.
Now a direct reference is added to ngx_http_connection_t, which is less
confusing and does not require a flag for http3.
It's used instead of accessing c->quic->parent->data directly. Apart from being
simpler, it allows to change the way session is stored in the future by changing
the macro.
Now ngx_http_v3_ack_header() and ngx_http_v3_inc_insert_count() always generate
decoder error. Our implementation does not use dynamic tables and does not
expect client to send Section Acknowledgement or Insert Count Increment.
Stream Cancellation, on the other hand, is allowed to be sent anyway. This is
why ngx_http_v3_cancel_stream() does not return an error.
The patch adds proper transitions between multiple networking addresses that
can be used by a single quic connection. New networking paths are validated
using PATH_CHALLENGE/PATH_RESPONSE frames.
Due to structure's alignment, some uninitialized memory contents may have
been passed between processes.
Zeroing was removed in 0215ec9aaa8a.
Reported by Johnny Wang.
7.2.1:
If a DATA frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED;
7.2.2:
If a HEADERS frame is received on a control stream, the recipient MUST
respond with a connection error (Section 8) of type H3_FRAME_UNEXPECTED.
With SMTP pipelining, ngx_mail_read_command() can be called with s->buffer
without any space available, to parse additional commands received to the
buffer on previous calls. Previously, this resulted in recv() being called
with zero length, resulting in zero being returned, which was interpreted
as a connection close by the client, so nginx silently closed connection.
Fix is to avoid calling c->recv() if there is no free space in the buffer,
but continue parsing of the already received commands.
Currently both names are used which is confusing. Historically these were
different objects, but now it's the same one. The name qs (quic stream) makes
more sense than sn (stream node).
PATH_RESPONSE was explicitly forbidden in 0-RTT since at least draft-22, but
the Frame Types table was not updated until recently while in IESG evaluation.
Stop including QUIC headers with no user-serviceable parts inside.
This allows to provide a much cleaner QUIC interface. To cope with that,
ngx_quic_derive_key() is now explicitly exported for v3 and quic modules.
Additionally, this completely hides the ngx_quic_keys_t internal type.
The "ngx_event_quic.h" header file now contains only public definitions,
used by modules. All internal definitions are moved into
the "ngx_event_quic_connection.h" header file.
The function correctly cleans up resources in case of failure to create
initial server id: it removes previously created udp node for odcid from
listening rbtree.
It turns out no browsers implement HTTP/2 GOAWAY handling properly, and
large enough number of resources on a page results in failures to load
some resources. In particular, Chrome seems to experience errors if
loading of all resources requires more than 1 connection (while it
is usually able to retry requests at least once, even with 2 connections
there are occasional failures for some reason), Safari if loading requires
more than 3 connections, and Firefox if loading requires more than 10
connections (can be configured with network.http.request.max-attempts,
defaults to 10).
It does not seem to be possible to resolve this on nginx side, even strict
limiting of maximum concurrency does not help, and loading issues seems to
be triggered by merely queueing of a request for a particular connection.
The only available mitigation seems to use higher keepalive_requests value.
The new default is 1000 and matches previously used default for
http2_max_requests. It is expected to be enough for 99.98% of the pages
(https://httparchive.org/reports/state-of-the-web?start=latest#reqTotal)
even in Chrome.
Similar to lingering_time, it limits total connection lifetime before
keepalive is switched off. The default is 1 hour, which is close to
the total maximum connection lifetime possible with default
keepalive_requests and keepalive_timeout.
Firefox uses several idle streams for PRIORITY frames[1], and
"http2_max_concurrent_streams 1;" results in "client sent too many
PRIORITY frames" errors when a connection is established by Firefox.
Fix is to relax the PRIORITY frames limit to use at least 100 as
the initial value (which is the recommended by the HTTP/2 protocol
minimum limit on the number of concurrent streams, so it is not
unreasonable for clients to assume that similar number of idle streams
can be used for prioritization).
[1] https://hg.mozilla.org/mozilla-central/file/32a9e6e145d6e3071c3993a20bb603a2f388722b/netwerk/protocol/http/Http2Stream.cpp#l1270
In current versions (all versions based on zlib 1.2.11, at least
since 2018) it no longer uses 64K hash and does not force window
bits to 13 if it is less than 13. That is, it needs just 16 bytes
more memory than normal zlib, so these bytes are simply added to
the normal size calculation.
In limit_req, auth_delay, and upstream code to check for broken
connections, tests for possible connection close by the client
did not work if the connection was already closed when relevant
event handler was set. This happened because there were no additional
events in case of edge-triggered event methods, and read events
were disabled in case of level-triggered ones.
Fix is to explicitly post a read event if the c->read->ready flag
is set.
For new data to be reported with eventport on Solaris,
ngx_handle_read_event() needs to be called after reading response
headers. To do so, ngx_http_upstream_process_non_buffered_upstream()
now called unconditionally if there are no prepread data. This
won't cause any read() syscalls as long as upstream connection
is not ready for reading (c->read->ready is not set), but will result
in proper handling of all events.
If we need to be notified about further events, ngx_handle_read_event()
needs to be called after a read event is processed. Without this,
an event can be removed from the kernel and won't be reported again,
notably when using oneshot event methods, such as eventport on Solaris.
While here, error handling is also added, similar to one present in
ngx_resolver_tcp_read(). This is not expected to make a difference
and mostly added for consistency.
If an attempt is made to delete an event which was already reported,
port_dissociate() returns an error. Fix is avoid doing anything if
ev->active is not set.
Possible alternative approach would be to avoid calling ngx_del_event()
at all if ev->active is not set. This approach, however, will require
something else to re-add the other event of the connection, since both
read and write events are dissociated if an event is reported on a file
descriptor. Currently ngx_eventport_del_event() re-associates write
event if called to delete read event, and vice versa.
If, at the start of an event loop iteration, there are any timers
in the past (including timers expiring now), the ngx_process_events()
function is called with zero timeout, and returns immediately even
if there are no events. But the following code only calls
ngx_event_expire_timers() if time actually changed, so this results
in nginx spinning in the event loop till current time changes.
While such timers are not expected to appear under normal conditions,
as all such timers should be removed on previous event loop iterations,
they still can appear due to bugs, zero timeouts set in the configuration
(if this is not explicitly handled by the code), or due to external
time changes on systems without clock_gettime(CLOCK_MONOTONIC).
Fix is to call ngx_event_expire_timers() unconditionally. Calling
it on each event loop iteration is not expected to be significant from
performance point of view, especially compared to a syscall in
ngx_process_events().
Without explicit handling, a zero timer was actually added, leading to
multiple unneeded syscalls. Further, sending GOAWAY frame early might
be beneficial for clients.
Reported by Sergey Kandaurov.
Unlike in 75e908236701, which added the logic to ngx_http_finalize_request(),
this change moves it to a more generic routine ngx_http_finalize_connection()
to cover cases when a request is finalized with NGX_DONE.
In particular, this fixes unwanted connection transition into the keepalive
state after receiving EOF while discarding request body. With edge-triggered
event methods that means the connection will last for extra seconds as set in
the keepalive_timeout directive.
The response size check introduced in 39501ce97e29 did not take into
account possible padding on DATA frames, resulting in incorrect
"upstream sent response body larger than indicated content length" errors
if upstream server used padding in responses with known length.
Fix is to check the actual size of response buffers produced by the code,
similarly to how it is done in other protocols, instead of checking
the size of DATA frames.
Reported at:
http://mailman.nginx.org/pipermail/nginx-devel/2021-March/013907.html
Currently listener contains rbtree with multiple nodes for single QUIC
connection: each corresponding to specific server id. Each udp node points
to same ngx_connection_t, which points to QUIC connection via c->udp field.
Thus when an event handler is called, it only gets ngx_connection_t with
c->udp pointing to QUIC connection. This makes it hard to obtain actual
node which was used to dispatch packet (it requires to repeat DCID lookup).
Additionally, ngx_quic_connection_t->udp field is only needed to keep a
pointer in c->udp. The node is not added into the tree and does not carry
useful information.
Sometimes it is required to process datagram properties at higher level (i.e.
QUIC is interested in source address which may change and IP options). The
patch adds ngx_udp_dgram_t structure used to pass packet-related information
in c->udp.
Previously, when a new datagram arrived, data were copied from the UDP layer
to the QUIC layer via c->recv() interface. Now UDP buffer is accessed
directly.
OpenSSL 3.0 started to require HKDF-Extract output PRK length pointer
used to represent the amount of data written to contain the length of
the key buffer before the call. EVP_PKEY_derive() documents this.
See HKDF_Extract() internal implementation update in this change:
https://github.com/openssl/openssl/commit/5a285ad
The function ngx_quic_shutdown_connection() waits until all non-cancelable
streams are closed, and then closes the connection. In HTTP/3 cancelable
streams are all unidirectional streams except push streams.
The function is called from HTTP/3 when client reaches keepalive_requests.
In case of long header packets, dcid length was not read correctly.
While there, macros to parse uint64 was fixed as well as format specifiers
to print it in debug mode.
Thanks to Gao Yan <gaoyan09@baidu.com>.
Activated with the "proxy_protocol" directive. Can be combined with
"listen ... proxy_protocol;" and "set_real_ip_from ...;" to pass
client address provided to nginx in the PROXY protocol header.
Activated with the "proxy_protocol" parameter of the "listen" directive.
Obtained information is passed to the auth_http script in Proxy-Protocol-Addr,
Proxy-Protocol-Port, Proxy-Protocol-Server-Addr, and Proxy-Protocol-Server-Port
headers.
Similarly to 40e8ce405859 in the stream module, this reduces the time
accept mutex is held. This also simplifies following changes to
introduce PROXY protocol support.
If we need to be notified about further events, ngx_handle_read_event()
needs to be called after a read event is processed. Without this,
an event can be removed from the kernel and won't be reported again,
notably when using oneshot event methods, such as eventport on Solaris.
For consistency, existing ngx_handle_read_event() call removed from
ngx_mail_read_command(), as this call only covers one of the code paths
where ngx_mail_read_command() returns NGX_AGAIN. Instead, appropriate
processing added to the callers, covering all code paths where NGX_AGAIN
is returned.
As long as a read event is blocked (ignored), ngx_handle_read_event()
needs to be called to make sure no further notifications will be
triggered when using level-triggered event methods, such as select() or
poll().
The "!rev->ready" test seems to be a typo, introduced in the original
commit (719:f30b1a75fd3b). The ngx_handle_write_event() code properly
tests for "rev->ready" instead.
Due to this typo, read events might be unexpectedly removed during
proxying after an event on the other part of the proxied connection.
Catched by mail proxying tests.
The strerrordesc_np() function, introduced in glibc 2.32, provides an
async-signal-safe way to obtain error messages. This makes it possible
to avoid copying error messages.
Previously, systems without sys_nerr (or _sys_nerr) were handled with an
assumption that errors start at 0 and continuous. This is, however, not
something POSIX requires, and not true on some platforms.
Notably, on Linux, where sys_nerr is no longer available for newly linked
binaries starting with glibc 2.32, there are gaps in error list, which
used to stop us from properly detecting maximum errno. Further, on
GNU/Hurd errors start at 0x40000001.
With this change, maximum errno detection is moved to the runtime code,
now able to ignore gaps, and also detects the first error if needed.
This fixes observed "Unknown error" messages as seen on Linux with
glibc 2.32 and on GNU/Hurd.
With this change, behaviour of HTTP/2 becomes even closer to HTTP/1.x,
and client_header_timeout instead of keepalive_timeout is used before
the first request is received.
This fixes HTTP/2 connections being closed even before the first request
if "keepalive_timeout 0;" was used in the configuration; the problem
appeared in f790816a0e87 (1.19.7).
As per quic-transport-34:
An endpoint also restarts its idle timer when sending an ack-eliciting
packet if no other ack-eliciting packets have been sent since last receiving
and processing a packet.
Previously, the timer was set for any packet.
The structure is used to parse an HTTP/3 request. An object of this type is
added to ngx_http_request_t instead of h3_parse generic pointer.
Also, the new field is located outside of the request ephemeral zone to keep it
safe after request headers are parsed.
Previously, PINGs and other frames extended possible keepalive time,
making it possible to keep an open HTTP/2 connection for a long time.
Now the connection is always closed as long as keepalive_timeout expires,
similarly to how it happens in HTTP/1.x.
Note that as a part of this change, incomplete frames are no longer
trigger a separate timeout, so http2_recv_timeout (replaced by
client_header_timeout in previous patches) is essentially cancelled.
The client_header_timeout is, however, used for SSL handshake and
while reading HEADERS frames.
Instead, keepalive_timeout and keepalive_requests are now used. This
is expected to simplify HTTP/2 code and usage. This also matches
directives used by upstream module for all protocols.
In case of default settings, this effectively changes maximum number
of requests per connection from 1000 to 100. This looks acceptable,
especially given that HTTP/2 code now properly supports lingering close.
Further, this changes default keepalive timeout in HTTP/2 from 300 seconds
to 75 seconds. This also looks acceptable, and larger than PING interval
used by Firefox (network.http.spdy.ping-threshold defaults to 58s),
the only browser to use PINGs.
Instead, the client_header_timeout is now used for HTTP/2 reading.
Further, the timeout is changed to be set once till no further data
left to read, similarly to how client_header_timeout is used in other
places.
New connections are marked reusable by ngx_http_init_connection() if there
are no data available for reading. As a result, if SSL is not used,
ngx_http_v2_init() might be called when the connection is marked reusable.
If a HEADERS frame is immediately available for reading, this resulted
in connection being preserved in reusable state with an active request,
and possibly closed later as if during worker shutdown (that is, after
all active requests were finalized).
Fix is to explicitly mark connections non-reusable in ngx_http_v2_init()
instead of (incorrectly) assuming they are already non-reusable.
Found by Sergey Kandaurov.
If ngx_drain_connections() fails to immediately reuse any connections
and there are no free connections, it now additionally tries to reuse
a connection again. This helps to provide at least one free connection
in case of HTTP/2 with lingering close, where merely trying to reuse
a connection once does not free it, but makes it reusable again,
waiting for lingering close.
This is particularly important in HTTP/2, where keepalive connections
are closed with lingering. Before the patch, reusing a keepalive HTTP/2
connection resulted in the connection waiting for lingering close to
remain in the reusable connections queue, preventing ngx_drain_connections()
from closing additional connections.
The patch fixes it by marking the connection reusable again, and so
moving it in the reusable connections queue. Further, it makes actually
possible to reuse such connections if needed.
The "max_ack_delay", "ack_delay_exponent", and "max_udp_payload_size"
transport parameters were not communicated to client.
The "disable_active_migration" and "active_connection_id_limit"
parameters were not saved into zero-rtt context.
18.1. Reserved Transport Parameters
Transport parameters with an identifier of the form "31 * N + 27" for
integer values of N are reserved to exercise the requirement that
unknown transport parameters be ignored. These transport parameters
have no semantics, and can carry arbitrary values.
Setting the timer is brought into compliance with quic-recovery-34. Now it's
set from a single function ngx_quic_set_lost_timer() that takes into account
both loss detection and PTO. The following issues are fixed with this change:
- when in loss detection mode, discarding a context could turn off the
timer forever after switching to the PTO mode
- when in loss detection mode, sending a packet resulted in rescheduling the
timer as if it's always in the PTO mode
As per quic-transport-33:
An endpoint MUST acknowledge all ack-eliciting Initial and Handshake
packets immediately
If a packet carrying Initial or Handshake ACK was lost, a non-immediate ACK
should not be sent later. Instead, client is expected to send a new packet
to acknowledge.
Sending non-immediate ACKs for Initial packets can cause the client to
generate an inflated RTT sample.
The token generation in QUIC is reworked. Single host key is used to generate
all required keys of needed sizes using HKDF.
The "quic_stateless_reset_token_key" directive is removed. Instead, the
"quic_host_key" directive is used, which reads key from file, or sets it
to random bytes if not specified.
The flag was introduced to create type-aware CONNECTION_CLOSE frames,
and now is replaced with frame type information, directly accessible.
Notably, this fixes type logging for received frames in b3d9e57d0f62.
The flag is used in ngx_http_finalize_connection() to switch client connection
to the keepalive mode. Since eaea7dac3292 this code is not executed for HTTP/3
which allows us to revert the change and get back to the default branch code.
This part somehow slipped away from c5840ca2063d.
While it is not expected to be needed in case of lingering close,
it is good to keep it for correctness (see 2b5528023f6b).
The change reduces diff to the default branch for
src/http/ngx_http_request_body.c.
Also, client Content-Length, if present, is now checked against the real body
size sent by client.
- split ngx_quic_process_packet() in two functions with the second one called
ngx_quic_process_payload() in charge of decrypring and handling the payload
- renamed ngx_quic_payload_handler() to ngx_quic_handle_frames()
- moved error cleanup from ngx_quic_input() to ngx_quic_process_payload()
- moved handling closed connection from ngx_quic_handle_frames() to
ngx_quic_process_payload()
- minor fixes
Previously, quic connection object was created when Retry packet was sent.
This is neither necessary nor convenient, and contradicts the idea of retry:
protecting from bad clients and saving server resources.
Now, the connection is not created, token is verified cryptographically
instead of holding it in connection.
This function should be called at the end of an event handler to prepare the
event for the next handler call. Particularly, the "active" flag is set or
cleared depending on data availability.
With this call missing in one code path, read handler was not called again
after handling the initial part of the client request, if the request was too
big to fit into a single STREAM frame.
Now ngx_handle_read_event() is called in this code path. Also, read timer is
restarted.
Keeping post_accept_timeout in ngx_listening_t is no longer needed since
we've switched to 1 second timeout for deferred accept in 5541:fdb67cfc957d.
Further, using it in HTTP code can result in client_header_timeout being
used from an incorrect server block, notably if address-specific virtual
servers are used along with a wildcard listening socket, or if we've switched
to a different server block based on SNI in SSL handshake.
The stub status module and ngx_http_send_response() (used by the empty gif
module and the "return" directive) incorrectly assumed that responding
to HEAD requests always results in r->header_only being set. This is not
true, and results in incorrect behaviour, for example, in the following
configuration:
location / {
image_filter size;
return 200 test;
}
Fix is to remove this incorrect micro-optimization from both stub status
module and ngx_http_send_response().
Reported by Chris Newton.
After 7675:9afa45068b8f and 7678:bffcc5af1d72 (1.19.1), during non-buffered
simple proxying, responses with extra data might result in zero size buffers
being generated and "zero size buf" alerts in writer. This bug is similar
to the one with FastCGI proxying fixed in 7689:da8d758aabeb.
In non-buffered mode, normally the filter function is not called if
u->length is already 0, since u->length is checked after each call of
the filter function. There is a case when this can happen though: if
the response length is 0, and there are pre-read response body data left
after reading response headers. As such, a check for u->length is needed
at the start of non-buffered filter functions, similar to the one
for p->length present in buffered filter functions.
Appropriate checks added to the existing non-buffered copy filters
in the upstream (used by scgi and uwsgi proxying) and proxy modules.
A header with the name containing null, CR, LF, colon or uppercase characters,
is now considered an error. A header with the value containing null, CR or LF,
is also considered an error.
Also, header is considered invalid unless its name only contains lowercase
characters, digits, minus and optionally underscore. Such header can be
optionally ignored.
- :method, :path and :scheme are expected exactly once and not empty
- :method and :scheme character validation is added
- :authority cannot appear more than once
Notably, the version negotiation table is updated to reject draft-33/QUICv1
(which requires a new TLS codepoint) unless explicitly asked to built with.
The quic kernel bpf helper inspects packet payload for DCID, extracts key
and routes the packet into socket matching the key.
Due to reuseport feature, each worker owns a personal socket, which is
identified by the same key, used to create DCID.
BPF objects are locked in RAM and are subject to RLIMIT_MEMLOCK.
The "ulimit -l" command may be used to setup proper limits, if maps
cannot be created with EPERM or updated with ETOOLONG.
With introduction of open_file_cache in 1454:f497ed7682a7, opening a file
with ngx_open_cached_file() automatically adds a cleanup handler to close
the file. As such, calling ngx_close_file() directly for non-regular files
is no longer needed and will result in duplicate close() call.
In 1454:f497ed7682a7 ngx_close_file() call for non-regular files was removed
in the static module, but wasn't in the flv module. And the resulting
incorrect code was later copied to the mp4 module. Fix is to remove the
ngx_close_file() call from both modules.
Reported by Chris Newton.
The ngx_http_parse_complex_uri() function cannot make URI longer and does
not null-terminate URI, so there is no need to allocate an extra byte. This
allocation appears to be a leftover from changes in 461:a88a3e4e158f (0.1.5),
where null-termination of r->uri and many other strings was removed.
When the request line contains request-target in the absolute-URI form,
it can contain path-empty instead of a single slash (see RFC 7230, RFC 3986).
Previously, the ngx_http_parse_request_line() function only accepted empty
path when there was no query string.
With this change, non-empty query is also correctly handled. That is,
request line "GET http://example.com?foo HTTP/1.1" is accepted and results
in $uri "/" and $args "foo".
Note that $request_uri remains "?foo", similarly to how spaces in URIs
are handled. Providing "/?foo", similarly to how "/" is provided for
"GET http://example.com HTTP/1.1", requires allocation.
Previously, when processing client ACK, rtt could be calculated for a packet
different than the largest if it was missing in the sent chain. Even though
this is an unlikely situation, rtt based on a different packet could be larger
than needed leading to bigger pto timeout and performance degradation.
Previously, this only worked for Application level because before
quic-transport-30, there were the following constraints:
Because the receiver doesn't use the ACK Delay for Initial and Handshake
packets, a sender SHOULD send a value of 0.
When adjusting an RTT sample using peer-reported acknowledgement delays, an
endpoint ... MUST ignore the ACK Delay field of the ACK frame for packets
sent in the Initial and Handshake packet number space.
Now initial output packet is not padded anymore if followed by a handshake
packet. If the datagram is still not big enough to satisfy minimum size
requirements, handshake packet is padded.
The patch replaces c->send() occurences with c->send_chain(), because the
latter accounts for the local address, which may be different if the wildcard
listener is used.
Previously, server sent response to client using address different from
one client connected to.
Notably, this fixes an issue with Chrome that can emit a "certificate_unknown"
alert during the SSL handshake where c->ssl->no_wait_shutdown is not yet set.
The filter is responsible for creating HTTP/3 response header and body.
The change removes differences to the default branch for
ngx_http_chunked_filter_module and ngx_http_header_filter_module.
Instead, appropriate format specifier for hexadecimal is used
in ngx_log_debug().
The STREAM frame "data" debug is moved into ngx_quic_log_frame(), similar
to all other frame fields debug.
Previously, the number of next_upstream tries included servers marked
as "down", resulting in "no live upstreams" with the code 502 instead
of the code derived from an attempt to connect to the last tried "up"
server (ticket #2096).
Similarly to the problem fixed in 2096b21fcd10 (ticket #1792),
when a "trailer only" gRPC response (that is, a response with the
END_STREAM flag in the HEADERS frame) was immediately followed by
RST_STREAM(NO_ERROR) in the data preread along with the response
header, RST_STREAM wasn't properly skipped and caused "upstream
rejected request with error 0" errors.
Observed with "unknown service" gRPC errors returned by grpc-go.
Fix is to set ctx->done if we are going to parse additional data,
so the RST_STREAM(NO_ERROR) is properly skipped. Additionally, now
ngx_http_grpc_filter() will complain about frames sent for closed
stream if there are any.
When installing or running from a non-root user it is sometimes required to
override default, compiled in error log path. There was no way to do this
without rebuilding the binary (ticket #147).
This patch introduced "-e" command line option which allows one to override
compiled in error log path.
Header value returned from the HTTP parser is expected to be null-terminated or
have a spare byte after the value bytes. When an empty header value was passed
by client in a literal header representation, neither was true. This could
result in segfault. The fix is to assign a literal empty null-terminated
string in this case.
Thanks to Andrey Kolyshkin.
The largely duplicate type-specific functions ngx_quic_parse_initial_header(),
ngx_quic_parse_handshake_header(), and a missing one for 0-RTT, were merged.
The new order of functions listed in ngx_event_quic_transport.c reflects this.
|_ ngx_quic_parse_long_header - version-invariant long header fields
\_ ngx_quic_supported_version - a helper to decide we can go further
\_ ngx_quic_parse_long_header_v1 - QUICv1-specific long header fields
0-RTT packets previously appeared as Handshake are now logged as appropriate:
*1 quic packet rx long flags:db version:ff00001d
*1 quic packet rx early len:870
Logging SCID/DCID is no longer duplicated as were seen with Initial packets.
If trailers were missing and a chain carrying the last_buf flag had no data
in it, then last HTTP/1 chunk was broken. The problem was introduced while
implementing HTTP/3 response body generation.
The change fixes the issue and reduces diff to the mainline nginx.
Previously, if quic_stateless_reset_token_key was empty or unspecified,
initial stateless reset token was not generated. However subsequent tokens
were generated with empty key, which resulted in error with certain SSL
libraries, for example OpenSSL.
Now a random 32-byte stateless reset token key is generated if none is
specified in the configuration. As a result, stateless reset tokens are now
generated for all server ids.
Previously new dcid was generated in the same memory that was allocated for
qc->dcid when creating the QUIC connection. However this memory was also
referenced by initial_source_connection_id and retry_source_connection_id
transport parameters. As a result these parameters changed their values after
retry which broke the protocol.
Before introduction of request body filter in 42d9beeb22db, the only
possible return code from the ngx_http_request_body_filter() call
without actual buffers was NGX_HTTP_INTERNAL_SERVER_ERROR, and
the code in ngx_http_read_client_request_body() hardcoded the only
possible error to simplify the code of initial call to set rb->rest.
This is no longer true after introduction of request body filters though,
as a request body filter might need to return other errors, such as 403.
Fix is to preserve the error code actually returned by the call
instead of assuming 500.
Added logging before returning NGX_HTTP_INTERNAL_SERVER_ERROR if there
are busy buffers after a request body flush. This should never happen
with current code, though bugs can be introduced by 3rd party modules.
Make sure debugging will be easy enough.
A negotiated version is decoupled from NGX_QUIC_VERSION and, if supported,
now stored in c->quic->version after packets processing. It is then used
to create long header packets. Otherwise, the list of supported versions
(which may be many now) is sent in the Version Negotiation packet.
All packets in the connection are expected to have the same version.
Incoming packets with mismatched version are now rejected.
When doing lingering close, the socket was first shut down for writing,
so SSL shutdown initiated after lingering close was not able to send
the close_notify alerts (ticket #2056).
The fix is to call ngx_ssl_shutdown() before shutting down the socket.
A number of unsigned variables has a special value, usually -1 or some maximum,
which produces huge numeric value in logs and makes them hard to read.
In order to distinguish such values in log, they are casted to the signed type
and printed as literal '-1'.
The client address validation didn't complete with a valid token,
which was broken after packet processing refactoring in d0d3fc0697a0.
An invalid or expired token was treated as a connection error.
Now we proceed as outlined in draft-ietf-quic-transport-32,
section 8.1.3 "Address Validation for Future Connections" below,
which is unlike validating the client address using Retry packets.
When a server receives an Initial packet with an address validation
token, it MUST attempt to validate the token, unless it has already
completed address validation. If the token is invalid then the
server SHOULD proceed as if the client did not have a validated
address, including potentially sending a Retry.
The connection is now closed in this case on internal errors only.
All key handling functionality is moved into ngx_quic_protection.c.
Public structures from ngx_quic_protection.h are now private and new
methods are available to manipulate keys.
A negotiated cipher is cached in QUIC connection from the set secret callback
to avoid calling SSL_get_current_cipher() on each encrypt/decrypt operation.
This also reduces the number of unwanted c->ssl->connection occurrences.
Now "s", "V", and "v" format specifiers may be prefixed with "x" (lowercase)
or "X" (uppercase) to output corresponding data in hexadecimal format.
In collaboration with Maxim Dounin.
Previously, tx ACK frames held ranges in an array of ngx_quic_ack_range_t,
while rx ACK frames held ranges in the serialized format. Now serialized format
is used for both types of frames.
The patch resets ctx->frames queue, which may contain frames. It was possible
that congestion or amplification limits prevented all frames to be sent.
Retransmitted frames could be accounted twice as inflight: first time in
ngx_quic_congestion_lost() called from ngx_quic_resend_frames(), and later
from ngx_quic_discard_ctx().
This accounts for the following change:
* Require expansion of datagrams to ensure that a path supports at
least 1200 bytes:
- During the handshake ack-eliciting Initial packets from the
server need to be expanded
All values are prefixed with name and separated from it using colon.
Multiple values are listed without commas in between.
Rationale: this greatly simplifies log parsing for analysis.
For application level packets, only every second packet is now acknowledged,
respecting max ack delay.
13.2.1 Sending ACK Frames
In order to assist loss detection at the sender, an endpoint SHOULD
generate and send an ACK frame without delay when it receives an ack-
eliciting packet either:
* when the received packet has a packet number less than another
ack-eliciting packet that has been received, or
* when the packet has a packet number larger than the highest-
numbered ack-eliciting packet that has been received and there are
missing packets between that packet and this packet.
13.2.2. Acknowledgement Frequency
A receiver SHOULD send an ACK frame after receiving at least two
ack-eliciting packets.
In some cases it might be needed to reject SSL handshake based on SNI
server name provided, for example, to make sure an invalid certificate
is not returned to clients trying to contact a name-based virtual server
without SSL configured. Previously, a "ssl_ciphers aNULL;" was used for
this. This workaround, however, is not compatible with TLSv1.3, in
particular, when using BoringSSL, where it is not possible to configure
TLSv1.3 ciphers at all.
With this change, the ssl_reject_handshake directive is introduced,
which instructs nginx to reject SSL handshakes with an "unrecognized_name"
alert in a particular server block.
For example, to reject handshake with names other than example.com,
one can use the following configuration:
server {
listen 443 ssl;
ssl_reject_handshake on;
}
server {
listen 443 ssl;
server_name example.com;
ssl_certificate example.com.crt;
ssl_certificate_key example.com.key;
}
The following configuration can be used to reject all SSL handshakes
without SNI server name provided:
server {
listen 443 ssl;
ssl_reject_handshake on;
}
server {
listen 443 ssl;
server_name ~^;
ssl_certificate example.crt;
ssl_certificate_key example.key;
}
Additionally, the ssl_reject_handshake directive makes configuring
certificates for the default server block optional. If no certificates
are configured in the default server for a given listening socket,
certificates must be defined in all non-default server blocks with
the listening socket in question.
Similarly to ssl_conf_command, proxy_ssl_conf_command can be used to
set arbitrary OpenSSL configuration parameters as long as nginx is
compiled with OpenSSL 1.0.2 or later, when connecting to upstream
servers with SSL. Full list of available configuration commands
can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
Similarly to ssl_conf_command, proxy_ssl_conf_command (grpc_ssl_conf_command,
uwsgi_ssl_conf_command) can be used to set arbitrary OpenSSL configuration
parameters as long as nginx is compiled with OpenSSL 1.0.2 or later,
when connecting to upstream servers with SSL. Full list of available
configuration commands can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
With the ssl_conf_command directive it is now possible to set
arbitrary OpenSSL configuration parameters as long as nginx is compiled
with OpenSSL 1.0.2 or later. Full list of available configuration
commands can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
In particular, this allows configuring PrioritizeChaCha option
(ticket #1445):
ssl_conf_command Options PrioritizeChaCha;
It can be also used to configure TLSv1.3 ciphers in OpenSSL,
which fails to configure them via the SSL_CTX_set_cipher_list()
interface (ticket #1529):
ssl_conf_command Ciphersuites TLS_CHACHA20_POLY1305_SHA256;
Configuration commands are applied after nginx own configuration
for SSL, so they can be used to override anything set by nginx.
Note though that configuring OpenSSL directly with ssl_conf_command
might result in a behaviour nginx does not expect, and should be
done with care.
With this change, it is now possible to use ngx_conf_merge_ptr_value()
to merge keyval arrays. This change actually follows much earlier
changes in ngx_conf_merge_ptr_value() and ngx_conf_set_str_array_slot()
in 1452:cd586e963db0 (0.6.10) and 1701:40d004d95d88 (0.6.22).
To preserve compatibility with existing 3rd party modules, both NULL
and NGX_CONF_UNSET_PTR are accepted for now.
13.2.4. Limiting Ranges by Tracking ACK Frames
When a packet containing an ACK frame is sent, the largest
acknowledged in that frame may be saved. When a packet containing an
ACK frame is acknowledged, the receiver can stop acknowledging
packets less than or equal to the largest acknowledged in the sent
ACK frame.
The history of acknowledged packet is kept in send context as ranges.
Up to NGX_QUIC_MAX_RANGES ranges is stored.
As a result, instead of separate ack frames, single frame with ranges
is sent.
Per draft-ietf-quic-transport-32 on the topic:
: Similarly, a server MUST expand the payload of all UDP datagrams carrying
: ack-eliciting Initial packets to at least the smallest allowed maximum
: datagram size of 1200 bytes.
The history of acknowledged packet is kept in send context as ranges.
Up to NGX_QUIC_MAX_RANGES ranges is stored.
As a result, instead of separate ack frames, single frame with ranges
is sent.
Previously, if there were multiple limits configured, errors in
ngx_http_complex_value() during processing of a non-first limit
resulted in reference count leak in shared memory nodes of already
processed limits. Fix is to explicity unlock relevant nodes, much
like we do when rejecting requests.
The proxy_smtp_auth directive instructs nginx to authenticate users
on backend via the AUTH command (using the PLAIN SASL mechanism),
similar to what is normally done for IMAP and POP3.
If xclient is enabled along with proxy_smtp_auth, the XCLIENT command
won't try to send the LOGIN parameter.
In 7717:e3e8b8234f05, the 1st bit was incorrectly used. It shouldn't
be used for bitmask values, as it is used by NGX_CONF_BITMASK_SET.
Additionally, special value "off" added to make it possible to clear
inherited userid_flags value.
The "false" parameter of the proxy_redirect directive is deprecated.
Warning has been emitted since c2230102df6f (0.7.54).
The "off" parameter of the proxy_redirect, proxy_cookie_domain, and
proxy_cookie_path directives tells nginx not to inherit the
configuration from the previous configuration level.
Previously, after specifying the directive with the "off" parameter,
any other directives were ignored, and syntax checking was disabled.
The syntax was enforced to allow either one directive with the "off"
parameter, or several directives with other parameters.
Also, specifying "proxy_redirect default foo" no longer works like
"proxy_redirect default".
Previously, this field was not set while creating a QUIC stream connection.
As a result, calling ngx_connection_local_sockaddr() led to getsockname()
bad descriptor error.
If client acknowledged an Initial packet with CRYPTO frame and then
sent another Initial packet containing duplicate CRYPTO again, this
could result in resending frames off the empty send queue.
The new "quic_stateless_reset_token_key" directive is added. It sets the
endpoint key used to generate stateless reset tokens and enables feature.
If the endpoint receives short-header packet that can't be matched to
existing connection, a stateless reset packet is generated with
a proper token.
If a valid stateless reset token is found in the incoming packet,
the connection is closed.
Example configuration:
http {
quic_stateless_reset_token_key "foo";
...
}
The flag is tied to the initial secret creation. The presence of c->quic
pointer is sufficient to enable execution of ngx_quic_close_quic().
The ngx_quic_new_connection() function now returns the allocated quic
connection object and the c->quic pointer is set by the caller.
If an early error occurs before secrets initialization (i.e. in cases
of invalid retry token or nginx exiting), it is still possible to
generate an error response by trying to initialize secrets directly
in the ngx_quic_send_cc() function.
Before the change such early errors failed to send proper connection close
message and logged an error.
An auxilliary ngx_quic_init_secrets() function is introduced to avoid
verbose call to ngx_quic_set_initial_secret() requiring local variable.
All packet header parsing is now performed by ngx_quic_parse_packet()
function, located in the ngx_quic_transport.c file.
The packet processing is centralized in the ngx_quic_process_packet()
function which decides if the packet should be accepted, ignored or
connection should be closed, depending on the connection state.
As a result of refactoring, behavior has changed in some places:
- minimal size of Initial packet is now always tested
- connection IDs are always tested in existing connections
- old keys are discarded on encryption level switch
Now flags are processed in ngx_quic_input(), and raw->pos points to the first
byte after the flags. Redundant checks from ngx_quic_parse_short_header() and
ngx_quic_parse_long_header() are removed.
Previously, when a packet was declared lost, another packet was sent with the
same frames. Now lost frames are moved to the output frame queue and push
event is posted. This has the advantage of forming packets with more frames
than before.
Also, the start argument is removed from the ngx_quic_resend_frames()
function as excess information.
Previously the default server configuration context was used until the
:authority or host header was parsed. This led to using the configuration
parameters like client_header_buffer_size or request_pool_size from the default
server rather than from the server selected by SNI.
Also, the switch to the right server log is implemented. This issue manifested
itself as QUIC stream being logged to the default server log until :authority
or host is parsed.
Initially, client certificate verification didn't work due to the missing
hc->ssl on a QUIC stream, which is started to be set in 7738:7f0981be07c4.
Then it was lost in 7999:0d2b2664b41c introducing "quic" listen parameter.
This change re-adds hc->ssl back for all QUIC connections, similar to SSL.
As per HTTP/3 draft 30, section 7.2.8:
Frame types that were used in HTTP/2 where there is no corresponding
HTTP/3 frame have also been reserved (Section 11.2.1). These frame
types MUST NOT be sent, and their receipt MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED.
In rare cases, such as memory allocation failure, SSL_set_SSL_CTX() returns
NULL, which could mean that a different SSL configuration has not been set.
Note that this new behaviour seemingly originated in OpenSSL-1.1.0 release.
HTTP/2 code failed to run posted requests after calling the request body
handler, and this resulted in connection hang if a subrequest was created
in the body handler and no other actions were made.
If 400 errors were redirected to an upstream server using the error_page
directive, DATA frames from the client might cause segmentation fault
due to null pointer dereference. The bug had appeared in 6989:2c4dbcd6f2e4
(1.13.0).
Fix is to skip such frames in ngx_http_v2_state_read_data() (similarly
to 7561:9f1f9d6e056a). With the fix, behaviour of 400 errors in HTTP/2
is now similar to one in HTTP/1.x, that is, nginx doesn't try to read the
request body.
Note that proxying 400 errors, as well as other early stage errors, to
upstream servers might not be a good idea anyway. These errors imply
that reading and processing of the request (and the request headers)
wasn't complete, and proxying of such incomplete request might lead to
various errors.
Reported by Chenglong Zhang.
This fixes "SSL_shutdown() failed (SSL: ... bad write retry)" errors
as observed on the second SSL_shutdown() call after SSL shutdown fixes in
09fb2135a589 (1.19.2), notably when HTTP/2 connections are closed due
to read timeouts while there are incomplete writes.
This fixes "SSL_shutdown() failed (SSL: ... bad write retry)" errors
as observed on the second SSL_shutdown() call after SSL shutdown fixes in
09fb2135a589 (1.19.2), notably when sending fails in ngx_http_test_expect(),
similarly to ticket #1194.
Note that there are some places where c->error is misused to prevent
further output, such as ngx_http_v2_finalize_connection() if there
are pending streams, or in filter finalization. These places seem
to be extreme enough to don't care about missing shutdown though.
For example, filter finalization currently prevents keepalive from
being used.
The c->read->ready and c->write->ready flags need to be cleared to ensure
that appropriate read or write events will be reported by kernel. Without
this, SSL shutdown might wait till the timeout after blocking on writing
or reading even if there is a socket activity.
OpenSSL 1.1.1 fails to return SSL_ERROR_SYSCALL if an error happens
during SSL_write() after close_notify alert from the peer, and returns
SSL_ERROR_ZERO_RETURN instead. Broken by this commit, which removes
the "i == 0" check around the SSL_RECEIVED_SHUTDOWN one:
https://git.openssl.org/?p=openssl.git;a=commitdiff;h=8051ab2
In particular, if a client closed the connection without reading
the response but with properly sent close_notify alert, this resulted in
unexpected "SSL_write() failed while ..." critical log message instead
of correct "SSL_write() failed (32: Broken pipe)" at the info level.
Since SSL_ERROR_ZERO_RETURN cannot be legitimately returned after
SSL_write(), the fix is to convert all SSL_ERROR_ZERO_RETURN errors
after SSL_write() to SSL_ERROR_SYSCALL.
If the variant hash doesn't match one we used as a secondary cache key,
we switch back to the original key. In this case, c->body_start was kept
updated from an existing cache node overwriting the new response value.
After file cache update, it led to discrepancy between a cache node and
cache file seen as critical errors "file cache .. has too long header".
As per HTTP/3 draft 29, section 4.1:
Frames of unknown types (Section 9), including reserved frames
(Section 7.2.8) MAY be sent on a request or push stream before,
after, or interleaved with other frames described in this section.
Also, trailers frame is now used as an indication of the request body end.
While for HTTP/1 unexpected eof always means an error, for HTTP/3 an eof right
after a DATA frame end means the end of the request body. For this reason,
since adding HTTP/3 support, eof no longer produced an error right after recv()
but was passed to filters which would make a decision. This decision was made
in ngx_http_parse_chunked() and ngx_http_v3_parse_request_body() based on the
b->last_buf flag.
Now that since 0f7f1a509113 (1.19.2) rb->chunked->length is a lower threshold
for the expected number of bytes, it can be set to zero to indicate that more
bytes may or may not follow. Now it's possible to move the check for eof from
parser functions to ngx_http_request_body_chunked_filter() and clean up the
parsing code.
Also, in the default branch, in case of eof, the following three things
happened, which were replaced with returning NGX_ERROR while implementing
HTTP/3:
- "client prematurely closed connection" message was logged
- c->error flag was set
- NGX_HTTP_BAD_REQUEST was returned
The change brings back this behavior for HTTP/1 as well as HTTP/3.
If a packet sent in response to an initial client packet was lost, then
successive client initial packets were dropped by nginx with the unexpected
dcid message logged. This was because the new DCID generated by the server was
not available to the client.
The check tested the total size of a packet header and unprotected packet
payload, which doesn't include the packet number length and expansion of
the packet protection AEAD. If the packet was corrupted, it could cause
false triggering of the condition due to unsigned type underflow leading
to a connection error.
Existing checks for the QUIC header and protected packet payload lengths
should be enough.
From quic-tls draft, section 5.4.2:
An endpoint MUST discard packets that are not long enough to contain
a complete sample.
The check includes the Packet Number field assumed to be 4 bytes long.
During long packet header parsing, pkt->len is updated with the Length
field value that is used to find next coalesced packets in a datagram.
For short packets it still contained the whole QUIC packet size.
This change uniforms packet length handling to always contain the total
length of the packet number and protected packet payload in pkt->len.
Previously STOP_SENDING was sent to client upon stream closure if rev->eof and
rev->error were not set. This was an indirect indication that no RESET_STREAM
or STREAM fin has arrived. But it is indeed possible that rev->eof is not set,
but STREAM fin has already been received, just not read out by the application.
In this case sending STOP_SENDING does not make sense and can be misleading for
some clients.
The peer may issue additional connection IDs up to the limit defined by
transport parameter "active_connection_id_limit", using NEW_CONNECTION_ID
frames, and retire such IDs using RETIRE_CONNECTION_ID frame.
It is required to distinguish internal errors from corrupted packets and
perform actions accordingly: drop the packet or close the connection.
While there, made processing of ngx_quic_decrypt() erorrs similar and
removed couple of protocol violation errors.
quic-transport
5.2:
Packets that are matched to an existing connection are discarded if
the packets are inconsistent with the state of that connection.
5.2.2:
Servers MUST drop incoming packets under all other circumstances.
The removal of QUIC packet protection depends on the largest packet number
received. When a garbage packet was received, the decoder still updated the
largest packet number from that packet. This could affect removing protection
from subsequent QUIC packets.
As per HTTP/3 draft 29, section 4.1:
When the server does not need to receive the remainder of the request,
it MAY abort reading the request stream, send a complete response, and
cleanly close the sending part of the stream.
On QUIC connections, SSL_shutdown() is used to call the send_alert callback
to send a CONNECTION_CLOSE frame. The reverse side is handled by other means.
At least BoringSSL doesn't differentiate whether this is a QUIC SSL method,
so waiting for the peer's close_notify alert should be explicitly disabled.
The logical quic connection state is tested by handler functions that
process corresponding types of packets (initial/handshake/application).
The packet is declined if state is incorrect.
No timeout is required for the input queue.
If a client attemtps to start a new connection with unsupported version,
a version negotiation packet is sent that contains a list of supported
versions (currently this is a single version, selected at compile time).
The function ngx_http_upstream_check_broken_connection() terminates the HTTP/1
request if client sends eof. For QUIC (including HTTP/3) the c->write->error
flag is now checked instead. This flag is set when the entire QUIC connection
is closed or STOP_SENDING was received from client.
Previously the request body DATA frame header was read by one byte because
filters were called only when the requested number of bytes were read. Now,
after 08ff2e10ae92 (1.19.2), filters are called after each read. More bytes
can be read at once, which simplifies and optimizes the code.
This also reduces diff with the default branch.
Previously, such packets weren't handled as the resulting zero remaining time
prevented setting the loss detection timer, which, instead, could be disarmed.
For implementation details, see quic-recovery draft 29, appendix A.10.
The PTO handler is split into separate PTO and loss detection handlers
that operate interchangeably depending on which timer should be set.
The present ngx_quic_lost_handler is now only used for packet loss detection.
It replaces ngx_quic_pto_handler if there are packets preceeding largest_ack.
Once there is no more such packets, ngx_quic_pto_handler is installed again.
Probes carry unacknowledged data previously sent in the oldest packet number,
one per each packet number space. That is, it could be up to two probes.
PTO backoff is now increased before scheduling next probes.
In particular, this prevents declaring packet number 0 as lost if
there aren't yet any acknowledgements in this packet number space.
For example, only Initial packets were acknowledged in handshake.
Previously a single STREAM frame was created for each buffer in stream output
chain which is wasteful with respect to memory. The following changes were
made in the stream send code:
- ngx_quic_stream_send_chain() no longer calls ngx_quic_stream_send() and got
a separate implementation that coalesces neighbouring buffers into a single
frame
- the new ngx_quic_stream_send_chain() respects the limit argument, which fixes
sendfile_max_chunk and limit_rate
- ngx_quic_stream_send() is reimplemented to call ngx_quic_stream_send_chain()
- stream frame size limit is moved out to a separate function
ngx_quic_max_stream_frame()
- flow control is moved out to a separate function ngx_quic_max_stream_flow()
- ngx_quic_stream_send_chain() is relocated next to ngx_quic_stream_send()
Reworked connections reuse, so closing connections is attempted in
advance, as long as number of free connections is less than 1/16 of
worker connections configured. This ensures that new connections can
be handled even if closing a reusable connection requires some time,
for example, for a lingering close (ticket #2017).
The 1/16 ratio is selected to be smaller than 1/8 used for disabling
accept when working with accept mutex, so nginx will try to balance
new connections to different workers first, and will start reusing
connections only if this won't help.
Previously, reusing connections happened silently and was only
visible in monitoring systems. This was shown to be not very user-friendly,
and administrators often didn't realize there were too few connections
available to withstand the load, and configured timeouts (keepalive_timeout
and http2_idle_timeout) were effectively reduced to keep things running.
To provide at least some information about this, a warning is now logged
(at most once per second, to avoid flooding the logs).
Sending shutdown when ngx_http_test_reading() detects the connection is
closed can result in "SSL_shutdown() failed (SSL: ... bad write retry)"
critical log messages if there are blocked writes.
Fix is to avoid sending shutdown via the c->ssl->no_send_shutdown flag,
similarly to how it is done in ngx_http_keepalive_handler() for kqueue
when pending EOF is detected.
Reported by Jan Prachař
(http://mailman.nginx.org/pipermail/nginx-devel/2018-December/011702.html).
Without the flag, SSL shutdown is attempted on such connections,
resulting in useless work and/or bogus "SSL_shutdown() failed
(SSL: ... bad write retry)" critical log messages if there are
blocked writes.
Previously, bidirectional shutdown never worked, due to two issues
in the code:
1. The code only tested SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE
when there was an error in the error queue, which cannot happen.
The bug was introduced in an attempt to fix unexpected error logging
as reported with OpenSSL 0.9.8g
(http://mailman.nginx.org/pipermail/nginx/2008-January/003084.html).
2. The code never called SSL_shutdown() for the second time to wait for
the peer's close_notify alert.
This change fixes both issues.
Note that after this change bidirectional shutdown is expected to work for
the first time, so c->ssl->no_wait_shutdown now makes a difference. This
is not a problem for HTTP code which always uses c->ssl->no_wait_shutdown,
but might be a problem for stream and mail code, as well as 3rd party
modules.
To minimize the effect of the change, the timeout, which was used to be 30
seconds and not configurable, though never actually used, is now set to
3 seconds. It is also expanded to apply to both SSL_ERROR_WANT_READ and
SSL_ERROR_WANT_WRITE, so timeout is properly set if writing to the socket
buffer is not possible.
If some additional data from a pipelined request happens to be
read into the body buffer, we copy it to r->header_in or allocate
an additional large client header buffer for it.
This ensures that copying won't write more than the buffer size
even if the buffer comes from hc->free and it is smaller than the large
client header buffer size in the virtual host configuration. This might
happen if size of large client header buffers is different in name-based
virtual hosts, similarly to the problem with number of buffers fixed
in 6926:e662cbf1b932.
Creating client-initiated streams is moved from ngx_quic_handle_stream_frame()
to a separate function ngx_quic_create_client_stream(). This function is
responsible for creating streams with lower ids as well.
Also, simplified and fixed initial data buffering in
ngx_quic_handle_stream_frame(). It is now done before calling the initial
handler as the handler can destroy the stream.
Previously this function generated an error trying to figure out if client shut
down the write end of the connection. The reason for this error was that a
QUIC stream has no socket descriptor. However checking for eof is not the
right thing to do for an HTTP/3 QUIC stream since HTTP/3 clients are expected
to shut down the write end of the stream after sending the request.
Now the function handles QUIC streams separately. It checks if c->read->error
is set. The error flags for c->read and c->write are now set for all streams
when closing the QUIC connection instead of setting the pending_eof flag.
According to quic-transport draft 29, section 19.3.1:
The value of the Gap field establishes the largest packet number
value for the subsequent ACK Range using the following formula:
largest = previous_smallest - gap - 2
Thus, given a largest packet number for the range, the smallest value
is determined by the formula:
smallest = largest - ack_range
While here, changed min/max to uint64_t for consistency.
A QUIC stream could be destroyed by handler while in ngx_quic_stream_input().
To detect this, ngx_quic_find_stream() is used to check that it still exists.
Previously, a stream id was passed to this routine off the frame structure.
In case of stream cleanup, it is freed along with other frames belonging to
the stream on cleanup. Then, a cleanup handler reuses last frames to update
MAX_STREAMS and serve other purpose. Thus, ngx_quic_find_stream() is passed
a reused frame with zeroed out part pointed by stream_id. If a stream with
id 0x0 still exists, this leads to use-after-free.
After 05e42236e95b (1.19.1) responses with extra data might result in
zero size buffers being generated and "zero size buf" alerts in writer
(if f->rest happened to be 0 when processing additional stdout data).
The limits on active bidi and uni client streams are maintained at their
initial values initial_max_streams_bidi and initial_max_streams_uni by sending
a MAX_STREAMS frame upon each client stream closure.
Also, the following is changed for data arriving to non-existing streams:
- if a stream was already closed, such data is ignored
- when creating a new stream, all streams of the same type with lower ids are
created too
Previously, the document generated by the xslt filter was always fully sent
to client even if a range was requested and response status was 206 with
appropriate Content-Range.
The xslt module is unable to serve a range because of suspending the header
filter chain. By the moment full response xml is buffered by the xslt filter,
range header filter is not called yet, but the range body filter has already
been called and did nothing.
The fix is to disable ranges by resetting the r->allow_ranges flag much like
the image filter that employs a similar technique.
The ngx_http_perl_module module doesn't have a notion of including additional
search paths through --with-cc-opt, which results in compile error incomplete
type 'enum ssl_encryption_level_t' when building nginx without QUIC support.
The enum is visible from quic event headers and eventually pollutes ngx_core.h.
The fix is to limit including headers to compile units that are real consumers.
According to quic-transport draft 29, section 21.12.1.1:
Prior to validation, endpoints are limited in what they are able to
send. During the handshake, a server cannot send more than three
times the data it receives; clients that initiate new connections or
migrate to a new network path are limited.
The ngx_quic_queue_frame() functions puts a frame into send queue and
schedules a push timer to actually send data.
The patch adds tracking for data amount in the queue and sends data
immediately if amount of data exceeds limit.
Instead of timer-based retransmissions with constant packet lifetime,
this patch implements ack-based loss detection and probe timeout
for the cases, when no ack is received, according to the quic-recovery
draft 29.
The c->quic->retransmit timer is now called "pto".
The ngx_quic_retransmit() function is renamed to "ngx_quic_detect_lost()".
This is a preparation for the following patches.
According to the quic-recovery 29, Section 5: Estimating the Round-Trip Time.
Currently, integer arithmetics is used, which loses sub-millisecond accuracy.
The slice filter allows ranges for the response by setting the r->allow_ranges
flag, which enables the range filter. If the range was not requested, the
range filter adds an Accept-Ranges header to the response to signal the
support for ranges.
Previously, if an Accept-Ranges header was already present in the first slice
response, client received two copies of this header. Now, the slice filter
removes the Accept-Ranges header from the response prior to setting the
r->allow_ranges flag.
As long as the "Content-Length" header is given, we now make sure
it exactly matches the size of the response. If it doesn't,
the response is considered malformed and must not be forwarded
(https://tools.ietf.org/html/rfc7540#section-8.1.2.6). While it
is not really possible to "not forward" the response which is already
being forwarded, we generate an error instead, which is the closest
equivalent.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Also this
directly contradicts HTTP/2 specification requirements.
Note that the new behaviour for the gRPC proxy is more strict than that
applied in other variants of proxying. This is intentional, as HTTP/2
specification requires us to do so, while in other types of proxying
malformed responses from backends are well known and historically
tolerated.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
Additionally, we now also issue a warning if the response is too
short, and make sure the fact it is truncated is propagated to the
client. The u->error flag is introduced to make it possible to
propagate the error to the client in case of unbuffered proxying.
For responses to HEAD requests there is an exception: we do allow
both responses without body and responses with body matching the
Content-Length header.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
This change covers generic buffered and unbuffered filters as used
in the scgi and uwsgi modules. Appropriate input filter init
handlers are provided by the scgi and uwsgi modules to set corresponding
lengths.
Note that for responses to HEAD requests there is an exception:
we do allow any response length. This is because responses to HEAD
requests might be actual full responses, and it is up to nginx
to remove the response body. If caching is enabled, only full
responses matching the Content-Length header will be cached
(see b779728b180c).
Previously, additional data after final chunk was either ignored
(in the same buffer, or during unbuffered proxying) or sent to the
client (in the next buffer already if it was already read from the
socket). Now additional data are properly detected and ignored
in all cases. Additionally, a warning is now logged and keepalive
is disabled in the connection.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
If a memcached response was followed by a correct trailer, and then
the NUL character followed by some extra data - this was accepted by
the trailer checking code. This in turn resulted in ctx->rest underflow
and caused negative size buffer on the next reading from the upstream,
followed by the "negative size buf in writer" alert.
Fix is to always check for too long responses, so a correct trailer cannot
be followed by extra data.
After sending the GOAWAY frame, a connection is now closed using
the lingering close mechanism.
This allows for the reliable delivery of the GOAWAY frames, while
also fixing connection resets observed when http2_max_requests is
reached (ticket #1250), or with graceful shutdown (ticket #1544),
when some additional data from the client is received on a fully
closed connection.
For HTTP/2, the settings lingering_close, lingering_timeout, and
lingering_time are taken from the "server" level.
Previously, the expression (ch & 0x7f) was promoted to a signed integer.
Depending on the platform, the size of this integer could be less than 8 bytes,
leading to overflow when handling the higher bits of the result. Also, sign
bit of this integer could be replicated when adding to the 64-bit st->value.
Previously errors led only to closing streams.
To simplify closing QUIC connection from a QUIC stream context, new macro
ngx_http_v3_finalize_connection() is introduced. It calls
ngx_quic_finalize_connection() for the parent connection.
The function finalizes QUIC connection with an application protocol error
code and sends a CONNECTION_CLOSE frame with type=0x1d.
Also, renamed NGX_QUIC_FT_CONNECTION_CLOSE2 to NGX_QUIC_FT_CONNECTION_CLOSE_APP.
Previously dynamic table was not functional because of zero limit on its size
set by default. Now the following changes enable it:
- new directives to set SETTINGS_QPACK_MAX_TABLE_CAPACITY and
SETTINGS_QPACK_BLOCKED_STREAMS
- send settings with SETTINGS_QPACK_MAX_TABLE_CAPACITY and
SETTINGS_QPACK_BLOCKED_STREAMS to the client
- send Insert Count Increment to the client
- send Header Acknowledgement to the client
- evict old dynamic table entries on overflow
- decode Required Insert Count from client
- block stream if Required Insert Count is not reached
Using SSL_CTX_set_verify(SSL_VERIFY_PEER) implies that OpenSSL will
send a certificate request during an SSL handshake, leading to unexpected
certificate requests from browsers as long as there are any client
certificates installed. Given that ngx_ssl_trusted_certificate()
is called unconditionally by the ngx_http_ssl_module, this affected
all HTTPS servers. Broken by 699f6e55bbb4 (not released yet).
Fix is to set verify callback in the ngx_ssl_trusted_certificate() function
without changing the verify mode.
Client streams may send literal strings which are now limited in size by the
new directive. The default value is 4096.
The directive is similar to HTTP/2 directive http2_max_field_size.
So that connections are protected from failing from on-path attacks.
Decryption failure of long packets used during handshake still leads
to connection close since it barely makes sense to handle them there.
A previously used undefined error code is now replaced with the generic one.
Note that quic-transport prescribes keeping connection intact, discarding such
QUIC packets individually, in the sense that coalesced packets could be there.
This is selectively handled in the next change.
The patch removes remnants of the old state tracking mechanism, which did
not take into account assimetry of read/write states and was not very
useful.
The encryption state now is entirely tracked using SSL_quic_read/write_level().
quic-transport draft 29:
section 7:
* authenticated negotiation of an application protocol (TLS uses
ALPN [RFC7301] for this purpose)
...
Endpoints MUST explicitly negotiate an application protocol. This
avoids situations where there is a disagreement about the protocol
that is in use.
section 8.1:
When using ALPN, endpoints MUST immediately close a connection (see
Section 10.3 of [QUIC-TRANSPORT]) with a no_application_protocol TLS
alert (QUIC error code 0x178; see Section 4.10) if an application
protocol is not negotiated.
Changes in ngx_quic_close_quic() function are required to avoid attempts
to generated and send packets without proper keys, what happens in case
of failed ALPN check.
quic-transport draft 29, section 14:
QUIC depends upon a minimum IP packet size of at least 1280 bytes.
This is the IPv6 minimum size [RFC8200] and is also supported by most
modern IPv4 networks. Assuming the minimum IP header size, this
results in a QUIC maximum packet size of 1232 bytes for IPv6 and 1252
bytes for IPv4.
Since the packet size can change during connection lifetime, the
ngx_quic_max_udp_payload() function is introduced that currently
returns minimal allowed size, depending on address family.
quic-tls, 8.2:
The quic_transport_parameters extension is carried in the ClientHello
and the EncryptedExtensions messages during the handshake. Endpoints
MUST send the quic_transport_parameters extension; endpoints that
receive ClientHello or EncryptedExtensions messages without the
quic_transport_parameters extension MUST close the connection with an
error of type 0x16d (equivalent to a fatal TLS missing_extension
alert, see Section 4.10).