Previously it was parsed in ngx_http_v3_streams.c, while the streams were
parsed in ngx_http_v3_parse.c. Now all parsing is done in one file. This
simplifies parsing API and cleans up ngx_http_v3_streams.c.
Previously, an ngx_http_v3_connection_t object was created for HTTP/3 and
then assinged to c->data instead of the generic ngx_http_connection_t object.
Now a direct reference is added to ngx_http_connection_t, which is less
confusing and does not require a flag for http3.
It's used instead of accessing c->quic->parent->data directly. Apart from being
simpler, it allows to change the way session is stored in the future by changing
the macro.
Now ngx_http_v3_ack_header() and ngx_http_v3_inc_insert_count() always generate
decoder error. Our implementation does not use dynamic tables and does not
expect client to send Section Acknowledgement or Insert Count Increment.
Stream Cancellation, on the other hand, is allowed to be sent anyway. This is
why ngx_http_v3_cancel_stream() does not return an error.
The patch adds proper transitions between multiple networking addresses that
can be used by a single quic connection. New networking paths are validated
using PATH_CHALLENGE/PATH_RESPONSE frames.
Due to structure's alignment, some uninitialized memory contents may have
been passed between processes.
Zeroing was removed in 0215ec9aaa8a.
Reported by Johnny Wang.
7.2.1:
If a DATA frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED;
7.2.2:
If a HEADERS frame is received on a control stream, the recipient MUST
respond with a connection error (Section 8) of type H3_FRAME_UNEXPECTED.
With SMTP pipelining, ngx_mail_read_command() can be called with s->buffer
without any space available, to parse additional commands received to the
buffer on previous calls. Previously, this resulted in recv() being called
with zero length, resulting in zero being returned, which was interpreted
as a connection close by the client, so nginx silently closed connection.
Fix is to avoid calling c->recv() if there is no free space in the buffer,
but continue parsing of the already received commands.
Currently both names are used which is confusing. Historically these were
different objects, but now it's the same one. The name qs (quic stream) makes
more sense than sn (stream node).
PATH_RESPONSE was explicitly forbidden in 0-RTT since at least draft-22, but
the Frame Types table was not updated until recently while in IESG evaluation.
Stop including QUIC headers with no user-serviceable parts inside.
This allows to provide a much cleaner QUIC interface. To cope with that,
ngx_quic_derive_key() is now explicitly exported for v3 and quic modules.
Additionally, this completely hides the ngx_quic_keys_t internal type.
The "ngx_event_quic.h" header file now contains only public definitions,
used by modules. All internal definitions are moved into
the "ngx_event_quic_connection.h" header file.
The function correctly cleans up resources in case of failure to create
initial server id: it removes previously created udp node for odcid from
listening rbtree.
It turns out no browsers implement HTTP/2 GOAWAY handling properly, and
large enough number of resources on a page results in failures to load
some resources. In particular, Chrome seems to experience errors if
loading of all resources requires more than 1 connection (while it
is usually able to retry requests at least once, even with 2 connections
there are occasional failures for some reason), Safari if loading requires
more than 3 connections, and Firefox if loading requires more than 10
connections (can be configured with network.http.request.max-attempts,
defaults to 10).
It does not seem to be possible to resolve this on nginx side, even strict
limiting of maximum concurrency does not help, and loading issues seems to
be triggered by merely queueing of a request for a particular connection.
The only available mitigation seems to use higher keepalive_requests value.
The new default is 1000 and matches previously used default for
http2_max_requests. It is expected to be enough for 99.98% of the pages
(https://httparchive.org/reports/state-of-the-web?start=latest#reqTotal)
even in Chrome.
Similar to lingering_time, it limits total connection lifetime before
keepalive is switched off. The default is 1 hour, which is close to
the total maximum connection lifetime possible with default
keepalive_requests and keepalive_timeout.
Firefox uses several idle streams for PRIORITY frames[1], and
"http2_max_concurrent_streams 1;" results in "client sent too many
PRIORITY frames" errors when a connection is established by Firefox.
Fix is to relax the PRIORITY frames limit to use at least 100 as
the initial value (which is the recommended by the HTTP/2 protocol
minimum limit on the number of concurrent streams, so it is not
unreasonable for clients to assume that similar number of idle streams
can be used for prioritization).
[1] https://hg.mozilla.org/mozilla-central/file/32a9e6e145d6e3071c3993a20bb603a2f388722b/netwerk/protocol/http/Http2Stream.cpp#l1270
In current versions (all versions based on zlib 1.2.11, at least
since 2018) it no longer uses 64K hash and does not force window
bits to 13 if it is less than 13. That is, it needs just 16 bytes
more memory than normal zlib, so these bytes are simply added to
the normal size calculation.
In limit_req, auth_delay, and upstream code to check for broken
connections, tests for possible connection close by the client
did not work if the connection was already closed when relevant
event handler was set. This happened because there were no additional
events in case of edge-triggered event methods, and read events
were disabled in case of level-triggered ones.
Fix is to explicitly post a read event if the c->read->ready flag
is set.
For new data to be reported with eventport on Solaris,
ngx_handle_read_event() needs to be called after reading response
headers. To do so, ngx_http_upstream_process_non_buffered_upstream()
now called unconditionally if there are no prepread data. This
won't cause any read() syscalls as long as upstream connection
is not ready for reading (c->read->ready is not set), but will result
in proper handling of all events.
If we need to be notified about further events, ngx_handle_read_event()
needs to be called after a read event is processed. Without this,
an event can be removed from the kernel and won't be reported again,
notably when using oneshot event methods, such as eventport on Solaris.
While here, error handling is also added, similar to one present in
ngx_resolver_tcp_read(). This is not expected to make a difference
and mostly added for consistency.
If an attempt is made to delete an event which was already reported,
port_dissociate() returns an error. Fix is avoid doing anything if
ev->active is not set.
Possible alternative approach would be to avoid calling ngx_del_event()
at all if ev->active is not set. This approach, however, will require
something else to re-add the other event of the connection, since both
read and write events are dissociated if an event is reported on a file
descriptor. Currently ngx_eventport_del_event() re-associates write
event if called to delete read event, and vice versa.
If, at the start of an event loop iteration, there are any timers
in the past (including timers expiring now), the ngx_process_events()
function is called with zero timeout, and returns immediately even
if there are no events. But the following code only calls
ngx_event_expire_timers() if time actually changed, so this results
in nginx spinning in the event loop till current time changes.
While such timers are not expected to appear under normal conditions,
as all such timers should be removed on previous event loop iterations,
they still can appear due to bugs, zero timeouts set in the configuration
(if this is not explicitly handled by the code), or due to external
time changes on systems without clock_gettime(CLOCK_MONOTONIC).
Fix is to call ngx_event_expire_timers() unconditionally. Calling
it on each event loop iteration is not expected to be significant from
performance point of view, especially compared to a syscall in
ngx_process_events().
Without explicit handling, a zero timer was actually added, leading to
multiple unneeded syscalls. Further, sending GOAWAY frame early might
be beneficial for clients.
Reported by Sergey Kandaurov.
Unlike in 75e908236701, which added the logic to ngx_http_finalize_request(),
this change moves it to a more generic routine ngx_http_finalize_connection()
to cover cases when a request is finalized with NGX_DONE.
In particular, this fixes unwanted connection transition into the keepalive
state after receiving EOF while discarding request body. With edge-triggered
event methods that means the connection will last for extra seconds as set in
the keepalive_timeout directive.
The response size check introduced in 39501ce97e29 did not take into
account possible padding on DATA frames, resulting in incorrect
"upstream sent response body larger than indicated content length" errors
if upstream server used padding in responses with known length.
Fix is to check the actual size of response buffers produced by the code,
similarly to how it is done in other protocols, instead of checking
the size of DATA frames.
Reported at:
http://mailman.nginx.org/pipermail/nginx-devel/2021-March/013907.html
Currently listener contains rbtree with multiple nodes for single QUIC
connection: each corresponding to specific server id. Each udp node points
to same ngx_connection_t, which points to QUIC connection via c->udp field.
Thus when an event handler is called, it only gets ngx_connection_t with
c->udp pointing to QUIC connection. This makes it hard to obtain actual
node which was used to dispatch packet (it requires to repeat DCID lookup).
Additionally, ngx_quic_connection_t->udp field is only needed to keep a
pointer in c->udp. The node is not added into the tree and does not carry
useful information.
Sometimes it is required to process datagram properties at higher level (i.e.
QUIC is interested in source address which may change and IP options). The
patch adds ngx_udp_dgram_t structure used to pass packet-related information
in c->udp.
Previously, when a new datagram arrived, data were copied from the UDP layer
to the QUIC layer via c->recv() interface. Now UDP buffer is accessed
directly.
OpenSSL 3.0 started to require HKDF-Extract output PRK length pointer
used to represent the amount of data written to contain the length of
the key buffer before the call. EVP_PKEY_derive() documents this.
See HKDF_Extract() internal implementation update in this change:
https://github.com/openssl/openssl/commit/5a285ad
The function ngx_quic_shutdown_connection() waits until all non-cancelable
streams are closed, and then closes the connection. In HTTP/3 cancelable
streams are all unidirectional streams except push streams.
The function is called from HTTP/3 when client reaches keepalive_requests.
In case of long header packets, dcid length was not read correctly.
While there, macros to parse uint64 was fixed as well as format specifiers
to print it in debug mode.
Thanks to Gao Yan <gaoyan09@baidu.com>.
Activated with the "proxy_protocol" directive. Can be combined with
"listen ... proxy_protocol;" and "set_real_ip_from ...;" to pass
client address provided to nginx in the PROXY protocol header.
Activated with the "proxy_protocol" parameter of the "listen" directive.
Obtained information is passed to the auth_http script in Proxy-Protocol-Addr,
Proxy-Protocol-Port, Proxy-Protocol-Server-Addr, and Proxy-Protocol-Server-Port
headers.
Similarly to 40e8ce405859 in the stream module, this reduces the time
accept mutex is held. This also simplifies following changes to
introduce PROXY protocol support.
If we need to be notified about further events, ngx_handle_read_event()
needs to be called after a read event is processed. Without this,
an event can be removed from the kernel and won't be reported again,
notably when using oneshot event methods, such as eventport on Solaris.
For consistency, existing ngx_handle_read_event() call removed from
ngx_mail_read_command(), as this call only covers one of the code paths
where ngx_mail_read_command() returns NGX_AGAIN. Instead, appropriate
processing added to the callers, covering all code paths where NGX_AGAIN
is returned.
As long as a read event is blocked (ignored), ngx_handle_read_event()
needs to be called to make sure no further notifications will be
triggered when using level-triggered event methods, such as select() or
poll().
The "!rev->ready" test seems to be a typo, introduced in the original
commit (719:f30b1a75fd3b). The ngx_handle_write_event() code properly
tests for "rev->ready" instead.
Due to this typo, read events might be unexpectedly removed during
proxying after an event on the other part of the proxied connection.
Catched by mail proxying tests.
The strerrordesc_np() function, introduced in glibc 2.32, provides an
async-signal-safe way to obtain error messages. This makes it possible
to avoid copying error messages.
Previously, systems without sys_nerr (or _sys_nerr) were handled with an
assumption that errors start at 0 and continuous. This is, however, not
something POSIX requires, and not true on some platforms.
Notably, on Linux, where sys_nerr is no longer available for newly linked
binaries starting with glibc 2.32, there are gaps in error list, which
used to stop us from properly detecting maximum errno. Further, on
GNU/Hurd errors start at 0x40000001.
With this change, maximum errno detection is moved to the runtime code,
now able to ignore gaps, and also detects the first error if needed.
This fixes observed "Unknown error" messages as seen on Linux with
glibc 2.32 and on GNU/Hurd.
With this change, behaviour of HTTP/2 becomes even closer to HTTP/1.x,
and client_header_timeout instead of keepalive_timeout is used before
the first request is received.
This fixes HTTP/2 connections being closed even before the first request
if "keepalive_timeout 0;" was used in the configuration; the problem
appeared in f790816a0e87 (1.19.7).
As per quic-transport-34:
An endpoint also restarts its idle timer when sending an ack-eliciting
packet if no other ack-eliciting packets have been sent since last receiving
and processing a packet.
Previously, the timer was set for any packet.
The structure is used to parse an HTTP/3 request. An object of this type is
added to ngx_http_request_t instead of h3_parse generic pointer.
Also, the new field is located outside of the request ephemeral zone to keep it
safe after request headers are parsed.
Previously, PINGs and other frames extended possible keepalive time,
making it possible to keep an open HTTP/2 connection for a long time.
Now the connection is always closed as long as keepalive_timeout expires,
similarly to how it happens in HTTP/1.x.
Note that as a part of this change, incomplete frames are no longer
trigger a separate timeout, so http2_recv_timeout (replaced by
client_header_timeout in previous patches) is essentially cancelled.
The client_header_timeout is, however, used for SSL handshake and
while reading HEADERS frames.
Instead, keepalive_timeout and keepalive_requests are now used. This
is expected to simplify HTTP/2 code and usage. This also matches
directives used by upstream module for all protocols.
In case of default settings, this effectively changes maximum number
of requests per connection from 1000 to 100. This looks acceptable,
especially given that HTTP/2 code now properly supports lingering close.
Further, this changes default keepalive timeout in HTTP/2 from 300 seconds
to 75 seconds. This also looks acceptable, and larger than PING interval
used by Firefox (network.http.spdy.ping-threshold defaults to 58s),
the only browser to use PINGs.
Instead, the client_header_timeout is now used for HTTP/2 reading.
Further, the timeout is changed to be set once till no further data
left to read, similarly to how client_header_timeout is used in other
places.
New connections are marked reusable by ngx_http_init_connection() if there
are no data available for reading. As a result, if SSL is not used,
ngx_http_v2_init() might be called when the connection is marked reusable.
If a HEADERS frame is immediately available for reading, this resulted
in connection being preserved in reusable state with an active request,
and possibly closed later as if during worker shutdown (that is, after
all active requests were finalized).
Fix is to explicitly mark connections non-reusable in ngx_http_v2_init()
instead of (incorrectly) assuming they are already non-reusable.
Found by Sergey Kandaurov.
If ngx_drain_connections() fails to immediately reuse any connections
and there are no free connections, it now additionally tries to reuse
a connection again. This helps to provide at least one free connection
in case of HTTP/2 with lingering close, where merely trying to reuse
a connection once does not free it, but makes it reusable again,
waiting for lingering close.
This is particularly important in HTTP/2, where keepalive connections
are closed with lingering. Before the patch, reusing a keepalive HTTP/2
connection resulted in the connection waiting for lingering close to
remain in the reusable connections queue, preventing ngx_drain_connections()
from closing additional connections.
The patch fixes it by marking the connection reusable again, and so
moving it in the reusable connections queue. Further, it makes actually
possible to reuse such connections if needed.
The "max_ack_delay", "ack_delay_exponent", and "max_udp_payload_size"
transport parameters were not communicated to client.
The "disable_active_migration" and "active_connection_id_limit"
parameters were not saved into zero-rtt context.
18.1. Reserved Transport Parameters
Transport parameters with an identifier of the form "31 * N + 27" for
integer values of N are reserved to exercise the requirement that
unknown transport parameters be ignored. These transport parameters
have no semantics, and can carry arbitrary values.
Setting the timer is brought into compliance with quic-recovery-34. Now it's
set from a single function ngx_quic_set_lost_timer() that takes into account
both loss detection and PTO. The following issues are fixed with this change:
- when in loss detection mode, discarding a context could turn off the
timer forever after switching to the PTO mode
- when in loss detection mode, sending a packet resulted in rescheduling the
timer as if it's always in the PTO mode
As per quic-transport-33:
An endpoint MUST acknowledge all ack-eliciting Initial and Handshake
packets immediately
If a packet carrying Initial or Handshake ACK was lost, a non-immediate ACK
should not be sent later. Instead, client is expected to send a new packet
to acknowledge.
Sending non-immediate ACKs for Initial packets can cause the client to
generate an inflated RTT sample.
The token generation in QUIC is reworked. Single host key is used to generate
all required keys of needed sizes using HKDF.
The "quic_stateless_reset_token_key" directive is removed. Instead, the
"quic_host_key" directive is used, which reads key from file, or sets it
to random bytes if not specified.
The flag was introduced to create type-aware CONNECTION_CLOSE frames,
and now is replaced with frame type information, directly accessible.
Notably, this fixes type logging for received frames in b3d9e57d0f62.
The flag is used in ngx_http_finalize_connection() to switch client connection
to the keepalive mode. Since eaea7dac3292 this code is not executed for HTTP/3
which allows us to revert the change and get back to the default branch code.
This part somehow slipped away from c5840ca2063d.
While it is not expected to be needed in case of lingering close,
it is good to keep it for correctness (see 2b5528023f6b).
The change reduces diff to the default branch for
src/http/ngx_http_request_body.c.
Also, client Content-Length, if present, is now checked against the real body
size sent by client.
- split ngx_quic_process_packet() in two functions with the second one called
ngx_quic_process_payload() in charge of decrypring and handling the payload
- renamed ngx_quic_payload_handler() to ngx_quic_handle_frames()
- moved error cleanup from ngx_quic_input() to ngx_quic_process_payload()
- moved handling closed connection from ngx_quic_handle_frames() to
ngx_quic_process_payload()
- minor fixes
Previously, quic connection object was created when Retry packet was sent.
This is neither necessary nor convenient, and contradicts the idea of retry:
protecting from bad clients and saving server resources.
Now, the connection is not created, token is verified cryptographically
instead of holding it in connection.
This function should be called at the end of an event handler to prepare the
event for the next handler call. Particularly, the "active" flag is set or
cleared depending on data availability.
With this call missing in one code path, read handler was not called again
after handling the initial part of the client request, if the request was too
big to fit into a single STREAM frame.
Now ngx_handle_read_event() is called in this code path. Also, read timer is
restarted.
Keeping post_accept_timeout in ngx_listening_t is no longer needed since
we've switched to 1 second timeout for deferred accept in 5541:fdb67cfc957d.
Further, using it in HTTP code can result in client_header_timeout being
used from an incorrect server block, notably if address-specific virtual
servers are used along with a wildcard listening socket, or if we've switched
to a different server block based on SNI in SSL handshake.
The stub status module and ngx_http_send_response() (used by the empty gif
module and the "return" directive) incorrectly assumed that responding
to HEAD requests always results in r->header_only being set. This is not
true, and results in incorrect behaviour, for example, in the following
configuration:
location / {
image_filter size;
return 200 test;
}
Fix is to remove this incorrect micro-optimization from both stub status
module and ngx_http_send_response().
Reported by Chris Newton.
After 7675:9afa45068b8f and 7678:bffcc5af1d72 (1.19.1), during non-buffered
simple proxying, responses with extra data might result in zero size buffers
being generated and "zero size buf" alerts in writer. This bug is similar
to the one with FastCGI proxying fixed in 7689:da8d758aabeb.
In non-buffered mode, normally the filter function is not called if
u->length is already 0, since u->length is checked after each call of
the filter function. There is a case when this can happen though: if
the response length is 0, and there are pre-read response body data left
after reading response headers. As such, a check for u->length is needed
at the start of non-buffered filter functions, similar to the one
for p->length present in buffered filter functions.
Appropriate checks added to the existing non-buffered copy filters
in the upstream (used by scgi and uwsgi proxying) and proxy modules.
A header with the name containing null, CR, LF, colon or uppercase characters,
is now considered an error. A header with the value containing null, CR or LF,
is also considered an error.
Also, header is considered invalid unless its name only contains lowercase
characters, digits, minus and optionally underscore. Such header can be
optionally ignored.
- :method, :path and :scheme are expected exactly once and not empty
- :method and :scheme character validation is added
- :authority cannot appear more than once
Notably, the version negotiation table is updated to reject draft-33/QUICv1
(which requires a new TLS codepoint) unless explicitly asked to built with.
The quic kernel bpf helper inspects packet payload for DCID, extracts key
and routes the packet into socket matching the key.
Due to reuseport feature, each worker owns a personal socket, which is
identified by the same key, used to create DCID.
BPF objects are locked in RAM and are subject to RLIMIT_MEMLOCK.
The "ulimit -l" command may be used to setup proper limits, if maps
cannot be created with EPERM or updated with ETOOLONG.
With introduction of open_file_cache in 1454:f497ed7682a7, opening a file
with ngx_open_cached_file() automatically adds a cleanup handler to close
the file. As such, calling ngx_close_file() directly for non-regular files
is no longer needed and will result in duplicate close() call.
In 1454:f497ed7682a7 ngx_close_file() call for non-regular files was removed
in the static module, but wasn't in the flv module. And the resulting
incorrect code was later copied to the mp4 module. Fix is to remove the
ngx_close_file() call from both modules.
Reported by Chris Newton.
The ngx_http_parse_complex_uri() function cannot make URI longer and does
not null-terminate URI, so there is no need to allocate an extra byte. This
allocation appears to be a leftover from changes in 461:a88a3e4e158f (0.1.5),
where null-termination of r->uri and many other strings was removed.
When the request line contains request-target in the absolute-URI form,
it can contain path-empty instead of a single slash (see RFC 7230, RFC 3986).
Previously, the ngx_http_parse_request_line() function only accepted empty
path when there was no query string.
With this change, non-empty query is also correctly handled. That is,
request line "GET http://example.com?foo HTTP/1.1" is accepted and results
in $uri "/" and $args "foo".
Note that $request_uri remains "?foo", similarly to how spaces in URIs
are handled. Providing "/?foo", similarly to how "/" is provided for
"GET http://example.com HTTP/1.1", requires allocation.
Previously, when processing client ACK, rtt could be calculated for a packet
different than the largest if it was missing in the sent chain. Even though
this is an unlikely situation, rtt based on a different packet could be larger
than needed leading to bigger pto timeout and performance degradation.
Previously, this only worked for Application level because before
quic-transport-30, there were the following constraints:
Because the receiver doesn't use the ACK Delay for Initial and Handshake
packets, a sender SHOULD send a value of 0.
When adjusting an RTT sample using peer-reported acknowledgement delays, an
endpoint ... MUST ignore the ACK Delay field of the ACK frame for packets
sent in the Initial and Handshake packet number space.
Now initial output packet is not padded anymore if followed by a handshake
packet. If the datagram is still not big enough to satisfy minimum size
requirements, handshake packet is padded.
The patch replaces c->send() occurences with c->send_chain(), because the
latter accounts for the local address, which may be different if the wildcard
listener is used.
Previously, server sent response to client using address different from
one client connected to.
Notably, this fixes an issue with Chrome that can emit a "certificate_unknown"
alert during the SSL handshake where c->ssl->no_wait_shutdown is not yet set.
The filter is responsible for creating HTTP/3 response header and body.
The change removes differences to the default branch for
ngx_http_chunked_filter_module and ngx_http_header_filter_module.
Instead, appropriate format specifier for hexadecimal is used
in ngx_log_debug().
The STREAM frame "data" debug is moved into ngx_quic_log_frame(), similar
to all other frame fields debug.
Previously, the number of next_upstream tries included servers marked
as "down", resulting in "no live upstreams" with the code 502 instead
of the code derived from an attempt to connect to the last tried "up"
server (ticket #2096).
Similarly to the problem fixed in 2096b21fcd10 (ticket #1792),
when a "trailer only" gRPC response (that is, a response with the
END_STREAM flag in the HEADERS frame) was immediately followed by
RST_STREAM(NO_ERROR) in the data preread along with the response
header, RST_STREAM wasn't properly skipped and caused "upstream
rejected request with error 0" errors.
Observed with "unknown service" gRPC errors returned by grpc-go.
Fix is to set ctx->done if we are going to parse additional data,
so the RST_STREAM(NO_ERROR) is properly skipped. Additionally, now
ngx_http_grpc_filter() will complain about frames sent for closed
stream if there are any.
When installing or running from a non-root user it is sometimes required to
override default, compiled in error log path. There was no way to do this
without rebuilding the binary (ticket #147).
This patch introduced "-e" command line option which allows one to override
compiled in error log path.
Header value returned from the HTTP parser is expected to be null-terminated or
have a spare byte after the value bytes. When an empty header value was passed
by client in a literal header representation, neither was true. This could
result in segfault. The fix is to assign a literal empty null-terminated
string in this case.
Thanks to Andrey Kolyshkin.
The largely duplicate type-specific functions ngx_quic_parse_initial_header(),
ngx_quic_parse_handshake_header(), and a missing one for 0-RTT, were merged.
The new order of functions listed in ngx_event_quic_transport.c reflects this.
|_ ngx_quic_parse_long_header - version-invariant long header fields
\_ ngx_quic_supported_version - a helper to decide we can go further
\_ ngx_quic_parse_long_header_v1 - QUICv1-specific long header fields
0-RTT packets previously appeared as Handshake are now logged as appropriate:
*1 quic packet rx long flags:db version:ff00001d
*1 quic packet rx early len:870
Logging SCID/DCID is no longer duplicated as were seen with Initial packets.
If trailers were missing and a chain carrying the last_buf flag had no data
in it, then last HTTP/1 chunk was broken. The problem was introduced while
implementing HTTP/3 response body generation.
The change fixes the issue and reduces diff to the mainline nginx.
Previously, if quic_stateless_reset_token_key was empty or unspecified,
initial stateless reset token was not generated. However subsequent tokens
were generated with empty key, which resulted in error with certain SSL
libraries, for example OpenSSL.
Now a random 32-byte stateless reset token key is generated if none is
specified in the configuration. As a result, stateless reset tokens are now
generated for all server ids.
Previously new dcid was generated in the same memory that was allocated for
qc->dcid when creating the QUIC connection. However this memory was also
referenced by initial_source_connection_id and retry_source_connection_id
transport parameters. As a result these parameters changed their values after
retry which broke the protocol.
Before introduction of request body filter in 42d9beeb22db, the only
possible return code from the ngx_http_request_body_filter() call
without actual buffers was NGX_HTTP_INTERNAL_SERVER_ERROR, and
the code in ngx_http_read_client_request_body() hardcoded the only
possible error to simplify the code of initial call to set rb->rest.
This is no longer true after introduction of request body filters though,
as a request body filter might need to return other errors, such as 403.
Fix is to preserve the error code actually returned by the call
instead of assuming 500.
Added logging before returning NGX_HTTP_INTERNAL_SERVER_ERROR if there
are busy buffers after a request body flush. This should never happen
with current code, though bugs can be introduced by 3rd party modules.
Make sure debugging will be easy enough.
A negotiated version is decoupled from NGX_QUIC_VERSION and, if supported,
now stored in c->quic->version after packets processing. It is then used
to create long header packets. Otherwise, the list of supported versions
(which may be many now) is sent in the Version Negotiation packet.
All packets in the connection are expected to have the same version.
Incoming packets with mismatched version are now rejected.
When doing lingering close, the socket was first shut down for writing,
so SSL shutdown initiated after lingering close was not able to send
the close_notify alerts (ticket #2056).
The fix is to call ngx_ssl_shutdown() before shutting down the socket.
A number of unsigned variables has a special value, usually -1 or some maximum,
which produces huge numeric value in logs and makes them hard to read.
In order to distinguish such values in log, they are casted to the signed type
and printed as literal '-1'.
The client address validation didn't complete with a valid token,
which was broken after packet processing refactoring in d0d3fc0697a0.
An invalid or expired token was treated as a connection error.
Now we proceed as outlined in draft-ietf-quic-transport-32,
section 8.1.3 "Address Validation for Future Connections" below,
which is unlike validating the client address using Retry packets.
When a server receives an Initial packet with an address validation
token, it MUST attempt to validate the token, unless it has already
completed address validation. If the token is invalid then the
server SHOULD proceed as if the client did not have a validated
address, including potentially sending a Retry.
The connection is now closed in this case on internal errors only.
All key handling functionality is moved into ngx_quic_protection.c.
Public structures from ngx_quic_protection.h are now private and new
methods are available to manipulate keys.
A negotiated cipher is cached in QUIC connection from the set secret callback
to avoid calling SSL_get_current_cipher() on each encrypt/decrypt operation.
This also reduces the number of unwanted c->ssl->connection occurrences.
Now "s", "V", and "v" format specifiers may be prefixed with "x" (lowercase)
or "X" (uppercase) to output corresponding data in hexadecimal format.
In collaboration with Maxim Dounin.
Previously, tx ACK frames held ranges in an array of ngx_quic_ack_range_t,
while rx ACK frames held ranges in the serialized format. Now serialized format
is used for both types of frames.
The patch resets ctx->frames queue, which may contain frames. It was possible
that congestion or amplification limits prevented all frames to be sent.
Retransmitted frames could be accounted twice as inflight: first time in
ngx_quic_congestion_lost() called from ngx_quic_resend_frames(), and later
from ngx_quic_discard_ctx().
This accounts for the following change:
* Require expansion of datagrams to ensure that a path supports at
least 1200 bytes:
- During the handshake ack-eliciting Initial packets from the
server need to be expanded
All values are prefixed with name and separated from it using colon.
Multiple values are listed without commas in between.
Rationale: this greatly simplifies log parsing for analysis.
For application level packets, only every second packet is now acknowledged,
respecting max ack delay.
13.2.1 Sending ACK Frames
In order to assist loss detection at the sender, an endpoint SHOULD
generate and send an ACK frame without delay when it receives an ack-
eliciting packet either:
* when the received packet has a packet number less than another
ack-eliciting packet that has been received, or
* when the packet has a packet number larger than the highest-
numbered ack-eliciting packet that has been received and there are
missing packets between that packet and this packet.
13.2.2. Acknowledgement Frequency
A receiver SHOULD send an ACK frame after receiving at least two
ack-eliciting packets.
In some cases it might be needed to reject SSL handshake based on SNI
server name provided, for example, to make sure an invalid certificate
is not returned to clients trying to contact a name-based virtual server
without SSL configured. Previously, a "ssl_ciphers aNULL;" was used for
this. This workaround, however, is not compatible with TLSv1.3, in
particular, when using BoringSSL, where it is not possible to configure
TLSv1.3 ciphers at all.
With this change, the ssl_reject_handshake directive is introduced,
which instructs nginx to reject SSL handshakes with an "unrecognized_name"
alert in a particular server block.
For example, to reject handshake with names other than example.com,
one can use the following configuration:
server {
listen 443 ssl;
ssl_reject_handshake on;
}
server {
listen 443 ssl;
server_name example.com;
ssl_certificate example.com.crt;
ssl_certificate_key example.com.key;
}
The following configuration can be used to reject all SSL handshakes
without SNI server name provided:
server {
listen 443 ssl;
ssl_reject_handshake on;
}
server {
listen 443 ssl;
server_name ~^;
ssl_certificate example.crt;
ssl_certificate_key example.key;
}
Additionally, the ssl_reject_handshake directive makes configuring
certificates for the default server block optional. If no certificates
are configured in the default server for a given listening socket,
certificates must be defined in all non-default server blocks with
the listening socket in question.
Similarly to ssl_conf_command, proxy_ssl_conf_command can be used to
set arbitrary OpenSSL configuration parameters as long as nginx is
compiled with OpenSSL 1.0.2 or later, when connecting to upstream
servers with SSL. Full list of available configuration commands
can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
Similarly to ssl_conf_command, proxy_ssl_conf_command (grpc_ssl_conf_command,
uwsgi_ssl_conf_command) can be used to set arbitrary OpenSSL configuration
parameters as long as nginx is compiled with OpenSSL 1.0.2 or later,
when connecting to upstream servers with SSL. Full list of available
configuration commands can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
With the ssl_conf_command directive it is now possible to set
arbitrary OpenSSL configuration parameters as long as nginx is compiled
with OpenSSL 1.0.2 or later. Full list of available configuration
commands can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
In particular, this allows configuring PrioritizeChaCha option
(ticket #1445):
ssl_conf_command Options PrioritizeChaCha;
It can be also used to configure TLSv1.3 ciphers in OpenSSL,
which fails to configure them via the SSL_CTX_set_cipher_list()
interface (ticket #1529):
ssl_conf_command Ciphersuites TLS_CHACHA20_POLY1305_SHA256;
Configuration commands are applied after nginx own configuration
for SSL, so they can be used to override anything set by nginx.
Note though that configuring OpenSSL directly with ssl_conf_command
might result in a behaviour nginx does not expect, and should be
done with care.
With this change, it is now possible to use ngx_conf_merge_ptr_value()
to merge keyval arrays. This change actually follows much earlier
changes in ngx_conf_merge_ptr_value() and ngx_conf_set_str_array_slot()
in 1452:cd586e963db0 (0.6.10) and 1701:40d004d95d88 (0.6.22).
To preserve compatibility with existing 3rd party modules, both NULL
and NGX_CONF_UNSET_PTR are accepted for now.
13.2.4. Limiting Ranges by Tracking ACK Frames
When a packet containing an ACK frame is sent, the largest
acknowledged in that frame may be saved. When a packet containing an
ACK frame is acknowledged, the receiver can stop acknowledging
packets less than or equal to the largest acknowledged in the sent
ACK frame.
The history of acknowledged packet is kept in send context as ranges.
Up to NGX_QUIC_MAX_RANGES ranges is stored.
As a result, instead of separate ack frames, single frame with ranges
is sent.
Per draft-ietf-quic-transport-32 on the topic:
: Similarly, a server MUST expand the payload of all UDP datagrams carrying
: ack-eliciting Initial packets to at least the smallest allowed maximum
: datagram size of 1200 bytes.
The history of acknowledged packet is kept in send context as ranges.
Up to NGX_QUIC_MAX_RANGES ranges is stored.
As a result, instead of separate ack frames, single frame with ranges
is sent.
Previously, if there were multiple limits configured, errors in
ngx_http_complex_value() during processing of a non-first limit
resulted in reference count leak in shared memory nodes of already
processed limits. Fix is to explicity unlock relevant nodes, much
like we do when rejecting requests.
The proxy_smtp_auth directive instructs nginx to authenticate users
on backend via the AUTH command (using the PLAIN SASL mechanism),
similar to what is normally done for IMAP and POP3.
If xclient is enabled along with proxy_smtp_auth, the XCLIENT command
won't try to send the LOGIN parameter.
In 7717:e3e8b8234f05, the 1st bit was incorrectly used. It shouldn't
be used for bitmask values, as it is used by NGX_CONF_BITMASK_SET.
Additionally, special value "off" added to make it possible to clear
inherited userid_flags value.
The "false" parameter of the proxy_redirect directive is deprecated.
Warning has been emitted since c2230102df6f (0.7.54).
The "off" parameter of the proxy_redirect, proxy_cookie_domain, and
proxy_cookie_path directives tells nginx not to inherit the
configuration from the previous configuration level.
Previously, after specifying the directive with the "off" parameter,
any other directives were ignored, and syntax checking was disabled.
The syntax was enforced to allow either one directive with the "off"
parameter, or several directives with other parameters.
Also, specifying "proxy_redirect default foo" no longer works like
"proxy_redirect default".
Previously, this field was not set while creating a QUIC stream connection.
As a result, calling ngx_connection_local_sockaddr() led to getsockname()
bad descriptor error.
If client acknowledged an Initial packet with CRYPTO frame and then
sent another Initial packet containing duplicate CRYPTO again, this
could result in resending frames off the empty send queue.
The new "quic_stateless_reset_token_key" directive is added. It sets the
endpoint key used to generate stateless reset tokens and enables feature.
If the endpoint receives short-header packet that can't be matched to
existing connection, a stateless reset packet is generated with
a proper token.
If a valid stateless reset token is found in the incoming packet,
the connection is closed.
Example configuration:
http {
quic_stateless_reset_token_key "foo";
...
}
The flag is tied to the initial secret creation. The presence of c->quic
pointer is sufficient to enable execution of ngx_quic_close_quic().
The ngx_quic_new_connection() function now returns the allocated quic
connection object and the c->quic pointer is set by the caller.
If an early error occurs before secrets initialization (i.e. in cases
of invalid retry token or nginx exiting), it is still possible to
generate an error response by trying to initialize secrets directly
in the ngx_quic_send_cc() function.
Before the change such early errors failed to send proper connection close
message and logged an error.
An auxilliary ngx_quic_init_secrets() function is introduced to avoid
verbose call to ngx_quic_set_initial_secret() requiring local variable.
All packet header parsing is now performed by ngx_quic_parse_packet()
function, located in the ngx_quic_transport.c file.
The packet processing is centralized in the ngx_quic_process_packet()
function which decides if the packet should be accepted, ignored or
connection should be closed, depending on the connection state.
As a result of refactoring, behavior has changed in some places:
- minimal size of Initial packet is now always tested
- connection IDs are always tested in existing connections
- old keys are discarded on encryption level switch
Now flags are processed in ngx_quic_input(), and raw->pos points to the first
byte after the flags. Redundant checks from ngx_quic_parse_short_header() and
ngx_quic_parse_long_header() are removed.
Previously, when a packet was declared lost, another packet was sent with the
same frames. Now lost frames are moved to the output frame queue and push
event is posted. This has the advantage of forming packets with more frames
than before.
Also, the start argument is removed from the ngx_quic_resend_frames()
function as excess information.
Previously the default server configuration context was used until the
:authority or host header was parsed. This led to using the configuration
parameters like client_header_buffer_size or request_pool_size from the default
server rather than from the server selected by SNI.
Also, the switch to the right server log is implemented. This issue manifested
itself as QUIC stream being logged to the default server log until :authority
or host is parsed.
Initially, client certificate verification didn't work due to the missing
hc->ssl on a QUIC stream, which is started to be set in 7738:7f0981be07c4.
Then it was lost in 7999:0d2b2664b41c introducing "quic" listen parameter.
This change re-adds hc->ssl back for all QUIC connections, similar to SSL.
As per HTTP/3 draft 30, section 7.2.8:
Frame types that were used in HTTP/2 where there is no corresponding
HTTP/3 frame have also been reserved (Section 11.2.1). These frame
types MUST NOT be sent, and their receipt MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED.
In rare cases, such as memory allocation failure, SSL_set_SSL_CTX() returns
NULL, which could mean that a different SSL configuration has not been set.
Note that this new behaviour seemingly originated in OpenSSL-1.1.0 release.
HTTP/2 code failed to run posted requests after calling the request body
handler, and this resulted in connection hang if a subrequest was created
in the body handler and no other actions were made.
If 400 errors were redirected to an upstream server using the error_page
directive, DATA frames from the client might cause segmentation fault
due to null pointer dereference. The bug had appeared in 6989:2c4dbcd6f2e4
(1.13.0).
Fix is to skip such frames in ngx_http_v2_state_read_data() (similarly
to 7561:9f1f9d6e056a). With the fix, behaviour of 400 errors in HTTP/2
is now similar to one in HTTP/1.x, that is, nginx doesn't try to read the
request body.
Note that proxying 400 errors, as well as other early stage errors, to
upstream servers might not be a good idea anyway. These errors imply
that reading and processing of the request (and the request headers)
wasn't complete, and proxying of such incomplete request might lead to
various errors.
Reported by Chenglong Zhang.
This fixes "SSL_shutdown() failed (SSL: ... bad write retry)" errors
as observed on the second SSL_shutdown() call after SSL shutdown fixes in
09fb2135a589 (1.19.2), notably when HTTP/2 connections are closed due
to read timeouts while there are incomplete writes.
This fixes "SSL_shutdown() failed (SSL: ... bad write retry)" errors
as observed on the second SSL_shutdown() call after SSL shutdown fixes in
09fb2135a589 (1.19.2), notably when sending fails in ngx_http_test_expect(),
similarly to ticket #1194.
Note that there are some places where c->error is misused to prevent
further output, such as ngx_http_v2_finalize_connection() if there
are pending streams, or in filter finalization. These places seem
to be extreme enough to don't care about missing shutdown though.
For example, filter finalization currently prevents keepalive from
being used.
The c->read->ready and c->write->ready flags need to be cleared to ensure
that appropriate read or write events will be reported by kernel. Without
this, SSL shutdown might wait till the timeout after blocking on writing
or reading even if there is a socket activity.
OpenSSL 1.1.1 fails to return SSL_ERROR_SYSCALL if an error happens
during SSL_write() after close_notify alert from the peer, and returns
SSL_ERROR_ZERO_RETURN instead. Broken by this commit, which removes
the "i == 0" check around the SSL_RECEIVED_SHUTDOWN one:
https://git.openssl.org/?p=openssl.git;a=commitdiff;h=8051ab2
In particular, if a client closed the connection without reading
the response but with properly sent close_notify alert, this resulted in
unexpected "SSL_write() failed while ..." critical log message instead
of correct "SSL_write() failed (32: Broken pipe)" at the info level.
Since SSL_ERROR_ZERO_RETURN cannot be legitimately returned after
SSL_write(), the fix is to convert all SSL_ERROR_ZERO_RETURN errors
after SSL_write() to SSL_ERROR_SYSCALL.
If the variant hash doesn't match one we used as a secondary cache key,
we switch back to the original key. In this case, c->body_start was kept
updated from an existing cache node overwriting the new response value.
After file cache update, it led to discrepancy between a cache node and
cache file seen as critical errors "file cache .. has too long header".
As per HTTP/3 draft 29, section 4.1:
Frames of unknown types (Section 9), including reserved frames
(Section 7.2.8) MAY be sent on a request or push stream before,
after, or interleaved with other frames described in this section.
Also, trailers frame is now used as an indication of the request body end.
While for HTTP/1 unexpected eof always means an error, for HTTP/3 an eof right
after a DATA frame end means the end of the request body. For this reason,
since adding HTTP/3 support, eof no longer produced an error right after recv()
but was passed to filters which would make a decision. This decision was made
in ngx_http_parse_chunked() and ngx_http_v3_parse_request_body() based on the
b->last_buf flag.
Now that since 0f7f1a509113 (1.19.2) rb->chunked->length is a lower threshold
for the expected number of bytes, it can be set to zero to indicate that more
bytes may or may not follow. Now it's possible to move the check for eof from
parser functions to ngx_http_request_body_chunked_filter() and clean up the
parsing code.
Also, in the default branch, in case of eof, the following three things
happened, which were replaced with returning NGX_ERROR while implementing
HTTP/3:
- "client prematurely closed connection" message was logged
- c->error flag was set
- NGX_HTTP_BAD_REQUEST was returned
The change brings back this behavior for HTTP/1 as well as HTTP/3.
If a packet sent in response to an initial client packet was lost, then
successive client initial packets were dropped by nginx with the unexpected
dcid message logged. This was because the new DCID generated by the server was
not available to the client.
The check tested the total size of a packet header and unprotected packet
payload, which doesn't include the packet number length and expansion of
the packet protection AEAD. If the packet was corrupted, it could cause
false triggering of the condition due to unsigned type underflow leading
to a connection error.
Existing checks for the QUIC header and protected packet payload lengths
should be enough.
From quic-tls draft, section 5.4.2:
An endpoint MUST discard packets that are not long enough to contain
a complete sample.
The check includes the Packet Number field assumed to be 4 bytes long.
During long packet header parsing, pkt->len is updated with the Length
field value that is used to find next coalesced packets in a datagram.
For short packets it still contained the whole QUIC packet size.
This change uniforms packet length handling to always contain the total
length of the packet number and protected packet payload in pkt->len.
Previously STOP_SENDING was sent to client upon stream closure if rev->eof and
rev->error were not set. This was an indirect indication that no RESET_STREAM
or STREAM fin has arrived. But it is indeed possible that rev->eof is not set,
but STREAM fin has already been received, just not read out by the application.
In this case sending STOP_SENDING does not make sense and can be misleading for
some clients.
The peer may issue additional connection IDs up to the limit defined by
transport parameter "active_connection_id_limit", using NEW_CONNECTION_ID
frames, and retire such IDs using RETIRE_CONNECTION_ID frame.
It is required to distinguish internal errors from corrupted packets and
perform actions accordingly: drop the packet or close the connection.
While there, made processing of ngx_quic_decrypt() erorrs similar and
removed couple of protocol violation errors.
quic-transport
5.2:
Packets that are matched to an existing connection are discarded if
the packets are inconsistent with the state of that connection.
5.2.2:
Servers MUST drop incoming packets under all other circumstances.
The removal of QUIC packet protection depends on the largest packet number
received. When a garbage packet was received, the decoder still updated the
largest packet number from that packet. This could affect removing protection
from subsequent QUIC packets.
As per HTTP/3 draft 29, section 4.1:
When the server does not need to receive the remainder of the request,
it MAY abort reading the request stream, send a complete response, and
cleanly close the sending part of the stream.
On QUIC connections, SSL_shutdown() is used to call the send_alert callback
to send a CONNECTION_CLOSE frame. The reverse side is handled by other means.
At least BoringSSL doesn't differentiate whether this is a QUIC SSL method,
so waiting for the peer's close_notify alert should be explicitly disabled.
The logical quic connection state is tested by handler functions that
process corresponding types of packets (initial/handshake/application).
The packet is declined if state is incorrect.
No timeout is required for the input queue.
If a client attemtps to start a new connection with unsupported version,
a version negotiation packet is sent that contains a list of supported
versions (currently this is a single version, selected at compile time).
The function ngx_http_upstream_check_broken_connection() terminates the HTTP/1
request if client sends eof. For QUIC (including HTTP/3) the c->write->error
flag is now checked instead. This flag is set when the entire QUIC connection
is closed or STOP_SENDING was received from client.
Previously the request body DATA frame header was read by one byte because
filters were called only when the requested number of bytes were read. Now,
after 08ff2e10ae92 (1.19.2), filters are called after each read. More bytes
can be read at once, which simplifies and optimizes the code.
This also reduces diff with the default branch.
Previously, such packets weren't handled as the resulting zero remaining time
prevented setting the loss detection timer, which, instead, could be disarmed.
For implementation details, see quic-recovery draft 29, appendix A.10.
The PTO handler is split into separate PTO and loss detection handlers
that operate interchangeably depending on which timer should be set.
The present ngx_quic_lost_handler is now only used for packet loss detection.
It replaces ngx_quic_pto_handler if there are packets preceeding largest_ack.
Once there is no more such packets, ngx_quic_pto_handler is installed again.
Probes carry unacknowledged data previously sent in the oldest packet number,
one per each packet number space. That is, it could be up to two probes.
PTO backoff is now increased before scheduling next probes.
In particular, this prevents declaring packet number 0 as lost if
there aren't yet any acknowledgements in this packet number space.
For example, only Initial packets were acknowledged in handshake.
Previously a single STREAM frame was created for each buffer in stream output
chain which is wasteful with respect to memory. The following changes were
made in the stream send code:
- ngx_quic_stream_send_chain() no longer calls ngx_quic_stream_send() and got
a separate implementation that coalesces neighbouring buffers into a single
frame
- the new ngx_quic_stream_send_chain() respects the limit argument, which fixes
sendfile_max_chunk and limit_rate
- ngx_quic_stream_send() is reimplemented to call ngx_quic_stream_send_chain()
- stream frame size limit is moved out to a separate function
ngx_quic_max_stream_frame()
- flow control is moved out to a separate function ngx_quic_max_stream_flow()
- ngx_quic_stream_send_chain() is relocated next to ngx_quic_stream_send()
Reworked connections reuse, so closing connections is attempted in
advance, as long as number of free connections is less than 1/16 of
worker connections configured. This ensures that new connections can
be handled even if closing a reusable connection requires some time,
for example, for a lingering close (ticket #2017).
The 1/16 ratio is selected to be smaller than 1/8 used for disabling
accept when working with accept mutex, so nginx will try to balance
new connections to different workers first, and will start reusing
connections only if this won't help.
Previously, reusing connections happened silently and was only
visible in monitoring systems. This was shown to be not very user-friendly,
and administrators often didn't realize there were too few connections
available to withstand the load, and configured timeouts (keepalive_timeout
and http2_idle_timeout) were effectively reduced to keep things running.
To provide at least some information about this, a warning is now logged
(at most once per second, to avoid flooding the logs).
Sending shutdown when ngx_http_test_reading() detects the connection is
closed can result in "SSL_shutdown() failed (SSL: ... bad write retry)"
critical log messages if there are blocked writes.
Fix is to avoid sending shutdown via the c->ssl->no_send_shutdown flag,
similarly to how it is done in ngx_http_keepalive_handler() for kqueue
when pending EOF is detected.
Reported by Jan Prachař
(http://mailman.nginx.org/pipermail/nginx-devel/2018-December/011702.html).
Without the flag, SSL shutdown is attempted on such connections,
resulting in useless work and/or bogus "SSL_shutdown() failed
(SSL: ... bad write retry)" critical log messages if there are
blocked writes.
Previously, bidirectional shutdown never worked, due to two issues
in the code:
1. The code only tested SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE
when there was an error in the error queue, which cannot happen.
The bug was introduced in an attempt to fix unexpected error logging
as reported with OpenSSL 0.9.8g
(http://mailman.nginx.org/pipermail/nginx/2008-January/003084.html).
2. The code never called SSL_shutdown() for the second time to wait for
the peer's close_notify alert.
This change fixes both issues.
Note that after this change bidirectional shutdown is expected to work for
the first time, so c->ssl->no_wait_shutdown now makes a difference. This
is not a problem for HTTP code which always uses c->ssl->no_wait_shutdown,
but might be a problem for stream and mail code, as well as 3rd party
modules.
To minimize the effect of the change, the timeout, which was used to be 30
seconds and not configurable, though never actually used, is now set to
3 seconds. It is also expanded to apply to both SSL_ERROR_WANT_READ and
SSL_ERROR_WANT_WRITE, so timeout is properly set if writing to the socket
buffer is not possible.
If some additional data from a pipelined request happens to be
read into the body buffer, we copy it to r->header_in or allocate
an additional large client header buffer for it.
This ensures that copying won't write more than the buffer size
even if the buffer comes from hc->free and it is smaller than the large
client header buffer size in the virtual host configuration. This might
happen if size of large client header buffers is different in name-based
virtual hosts, similarly to the problem with number of buffers fixed
in 6926:e662cbf1b932.
Creating client-initiated streams is moved from ngx_quic_handle_stream_frame()
to a separate function ngx_quic_create_client_stream(). This function is
responsible for creating streams with lower ids as well.
Also, simplified and fixed initial data buffering in
ngx_quic_handle_stream_frame(). It is now done before calling the initial
handler as the handler can destroy the stream.
Previously this function generated an error trying to figure out if client shut
down the write end of the connection. The reason for this error was that a
QUIC stream has no socket descriptor. However checking for eof is not the
right thing to do for an HTTP/3 QUIC stream since HTTP/3 clients are expected
to shut down the write end of the stream after sending the request.
Now the function handles QUIC streams separately. It checks if c->read->error
is set. The error flags for c->read and c->write are now set for all streams
when closing the QUIC connection instead of setting the pending_eof flag.
According to quic-transport draft 29, section 19.3.1:
The value of the Gap field establishes the largest packet number
value for the subsequent ACK Range using the following formula:
largest = previous_smallest - gap - 2
Thus, given a largest packet number for the range, the smallest value
is determined by the formula:
smallest = largest - ack_range
While here, changed min/max to uint64_t for consistency.
A QUIC stream could be destroyed by handler while in ngx_quic_stream_input().
To detect this, ngx_quic_find_stream() is used to check that it still exists.
Previously, a stream id was passed to this routine off the frame structure.
In case of stream cleanup, it is freed along with other frames belonging to
the stream on cleanup. Then, a cleanup handler reuses last frames to update
MAX_STREAMS and serve other purpose. Thus, ngx_quic_find_stream() is passed
a reused frame with zeroed out part pointed by stream_id. If a stream with
id 0x0 still exists, this leads to use-after-free.
After 05e42236e95b (1.19.1) responses with extra data might result in
zero size buffers being generated and "zero size buf" alerts in writer
(if f->rest happened to be 0 when processing additional stdout data).
The limits on active bidi and uni client streams are maintained at their
initial values initial_max_streams_bidi and initial_max_streams_uni by sending
a MAX_STREAMS frame upon each client stream closure.
Also, the following is changed for data arriving to non-existing streams:
- if a stream was already closed, such data is ignored
- when creating a new stream, all streams of the same type with lower ids are
created too
Previously, the document generated by the xslt filter was always fully sent
to client even if a range was requested and response status was 206 with
appropriate Content-Range.
The xslt module is unable to serve a range because of suspending the header
filter chain. By the moment full response xml is buffered by the xslt filter,
range header filter is not called yet, but the range body filter has already
been called and did nothing.
The fix is to disable ranges by resetting the r->allow_ranges flag much like
the image filter that employs a similar technique.
The ngx_http_perl_module module doesn't have a notion of including additional
search paths through --with-cc-opt, which results in compile error incomplete
type 'enum ssl_encryption_level_t' when building nginx without QUIC support.
The enum is visible from quic event headers and eventually pollutes ngx_core.h.
The fix is to limit including headers to compile units that are real consumers.
According to quic-transport draft 29, section 21.12.1.1:
Prior to validation, endpoints are limited in what they are able to
send. During the handshake, a server cannot send more than three
times the data it receives; clients that initiate new connections or
migrate to a new network path are limited.
The ngx_quic_queue_frame() functions puts a frame into send queue and
schedules a push timer to actually send data.
The patch adds tracking for data amount in the queue and sends data
immediately if amount of data exceeds limit.
Instead of timer-based retransmissions with constant packet lifetime,
this patch implements ack-based loss detection and probe timeout
for the cases, when no ack is received, according to the quic-recovery
draft 29.
The c->quic->retransmit timer is now called "pto".
The ngx_quic_retransmit() function is renamed to "ngx_quic_detect_lost()".
This is a preparation for the following patches.
According to the quic-recovery 29, Section 5: Estimating the Round-Trip Time.
Currently, integer arithmetics is used, which loses sub-millisecond accuracy.
The slice filter allows ranges for the response by setting the r->allow_ranges
flag, which enables the range filter. If the range was not requested, the
range filter adds an Accept-Ranges header to the response to signal the
support for ranges.
Previously, if an Accept-Ranges header was already present in the first slice
response, client received two copies of this header. Now, the slice filter
removes the Accept-Ranges header from the response prior to setting the
r->allow_ranges flag.
As long as the "Content-Length" header is given, we now make sure
it exactly matches the size of the response. If it doesn't,
the response is considered malformed and must not be forwarded
(https://tools.ietf.org/html/rfc7540#section-8.1.2.6). While it
is not really possible to "not forward" the response which is already
being forwarded, we generate an error instead, which is the closest
equivalent.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Also this
directly contradicts HTTP/2 specification requirements.
Note that the new behaviour for the gRPC proxy is more strict than that
applied in other variants of proxying. This is intentional, as HTTP/2
specification requires us to do so, while in other types of proxying
malformed responses from backends are well known and historically
tolerated.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
Additionally, we now also issue a warning if the response is too
short, and make sure the fact it is truncated is propagated to the
client. The u->error flag is introduced to make it possible to
propagate the error to the client in case of unbuffered proxying.
For responses to HEAD requests there is an exception: we do allow
both responses without body and responses with body matching the
Content-Length header.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
This change covers generic buffered and unbuffered filters as used
in the scgi and uwsgi modules. Appropriate input filter init
handlers are provided by the scgi and uwsgi modules to set corresponding
lengths.
Note that for responses to HEAD requests there is an exception:
we do allow any response length. This is because responses to HEAD
requests might be actual full responses, and it is up to nginx
to remove the response body. If caching is enabled, only full
responses matching the Content-Length header will be cached
(see b779728b180c).
Previously, additional data after final chunk was either ignored
(in the same buffer, or during unbuffered proxying) or sent to the
client (in the next buffer already if it was already read from the
socket). Now additional data are properly detected and ignored
in all cases. Additionally, a warning is now logged and keepalive
is disabled in the connection.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
If a memcached response was followed by a correct trailer, and then
the NUL character followed by some extra data - this was accepted by
the trailer checking code. This in turn resulted in ctx->rest underflow
and caused negative size buffer on the next reading from the upstream,
followed by the "negative size buf in writer" alert.
Fix is to always check for too long responses, so a correct trailer cannot
be followed by extra data.
After sending the GOAWAY frame, a connection is now closed using
the lingering close mechanism.
This allows for the reliable delivery of the GOAWAY frames, while
also fixing connection resets observed when http2_max_requests is
reached (ticket #1250), or with graceful shutdown (ticket #1544),
when some additional data from the client is received on a fully
closed connection.
For HTTP/2, the settings lingering_close, lingering_timeout, and
lingering_time are taken from the "server" level.
Previously, the expression (ch & 0x7f) was promoted to a signed integer.
Depending on the platform, the size of this integer could be less than 8 bytes,
leading to overflow when handling the higher bits of the result. Also, sign
bit of this integer could be replicated when adding to the 64-bit st->value.
Previously errors led only to closing streams.
To simplify closing QUIC connection from a QUIC stream context, new macro
ngx_http_v3_finalize_connection() is introduced. It calls
ngx_quic_finalize_connection() for the parent connection.
The function finalizes QUIC connection with an application protocol error
code and sends a CONNECTION_CLOSE frame with type=0x1d.
Also, renamed NGX_QUIC_FT_CONNECTION_CLOSE2 to NGX_QUIC_FT_CONNECTION_CLOSE_APP.
Previously dynamic table was not functional because of zero limit on its size
set by default. Now the following changes enable it:
- new directives to set SETTINGS_QPACK_MAX_TABLE_CAPACITY and
SETTINGS_QPACK_BLOCKED_STREAMS
- send settings with SETTINGS_QPACK_MAX_TABLE_CAPACITY and
SETTINGS_QPACK_BLOCKED_STREAMS to the client
- send Insert Count Increment to the client
- send Header Acknowledgement to the client
- evict old dynamic table entries on overflow
- decode Required Insert Count from client
- block stream if Required Insert Count is not reached
Using SSL_CTX_set_verify(SSL_VERIFY_PEER) implies that OpenSSL will
send a certificate request during an SSL handshake, leading to unexpected
certificate requests from browsers as long as there are any client
certificates installed. Given that ngx_ssl_trusted_certificate()
is called unconditionally by the ngx_http_ssl_module, this affected
all HTTPS servers. Broken by 699f6e55bbb4 (not released yet).
Fix is to set verify callback in the ngx_ssl_trusted_certificate() function
without changing the verify mode.
Client streams may send literal strings which are now limited in size by the
new directive. The default value is 4096.
The directive is similar to HTTP/2 directive http2_max_field_size.
So that connections are protected from failing from on-path attacks.
Decryption failure of long packets used during handshake still leads
to connection close since it barely makes sense to handle them there.
A previously used undefined error code is now replaced with the generic one.
Note that quic-transport prescribes keeping connection intact, discarding such
QUIC packets individually, in the sense that coalesced packets could be there.
This is selectively handled in the next change.
The patch removes remnants of the old state tracking mechanism, which did
not take into account assimetry of read/write states and was not very
useful.
The encryption state now is entirely tracked using SSL_quic_read/write_level().
quic-transport draft 29:
section 7:
* authenticated negotiation of an application protocol (TLS uses
ALPN [RFC7301] for this purpose)
...
Endpoints MUST explicitly negotiate an application protocol. This
avoids situations where there is a disagreement about the protocol
that is in use.
section 8.1:
When using ALPN, endpoints MUST immediately close a connection (see
Section 10.3 of [QUIC-TRANSPORT]) with a no_application_protocol TLS
alert (QUIC error code 0x178; see Section 4.10) if an application
protocol is not negotiated.
Changes in ngx_quic_close_quic() function are required to avoid attempts
to generated and send packets without proper keys, what happens in case
of failed ALPN check.
quic-transport draft 29, section 14:
QUIC depends upon a minimum IP packet size of at least 1280 bytes.
This is the IPv6 minimum size [RFC8200] and is also supported by most
modern IPv4 networks. Assuming the minimum IP header size, this
results in a QUIC maximum packet size of 1232 bytes for IPv6 and 1252
bytes for IPv4.
Since the packet size can change during connection lifetime, the
ngx_quic_max_udp_payload() function is introduced that currently
returns minimal allowed size, depending on address family.
quic-tls, 8.2:
The quic_transport_parameters extension is carried in the ClientHello
and the EncryptedExtensions messages during the handshake. Endpoints
MUST send the quic_transport_parameters extension; endpoints that
receive ClientHello or EncryptedExtensions messages without the
quic_transport_parameters extension MUST close the connection with an
error of type 0x16d (equivalent to a fatal TLS missing_extension
alert, see Section 4.10).
Clearing cache based on free space left on a file system is
expected to allow better disk utilization in some cases, notably
when disk space might be also used for something other than nginx
cache (including nginx own temporary files) and while loading
cache (when cache size might be inaccurate for a while, effectively
disabling max_size cache clearing).
Based on a patch by Adam Bambuch.
With XFS, using "allocsize=64m" mount option results in large preallocation
being reported in the st_blocks as returned by fstat() till the file is
closed. This in turn results in incorrect cache size calculations and
wrong clearing based on max_size.
To avoid too aggressive cache clearing on such volumes, st_blocks values
which result in sizes larger than st_size and eight blocks (an arbitrary
limit) are no longer trusted, and we use st_size instead.
The ngx_de_fs_size() counterpart is intentionally not modified, as
it is used on closed files and hence not affected by this problem.
NFS on Linux is known to report wsize as a block size (in both f_bsize
and f_frsize, both in statfs() and statvfs()). On the other hand,
typical file system block sizes on Linux (ext2/ext3/ext4, XFS) are limited
to pagesize. (With FAT, block sizes can be at least up to 512k in
extreme cases, but this doesn't really matter, see below.)
To avoid too aggressive cache clearing on NFS volumes on Linux, block
sizes larger than pagesize are now ignored.
Note that it is safe to ignore large block sizes. Since 3899:e7cd13b7f759
(1.0.1) cache size is calculated based on fstat() st_blocks, and rounding
to file system block size is preserved mostly for Windows.
Note well that on other OSes valid block sizes seen are at least up
to 65536. In particular, UFS on FreeBSD is known to work well with block
and fragment sizes set to 65536.
When validating second and further certificates, ssl callback could be called
twice to report the error. After the first call client connection is
terminated and its memory is released. Prior to the second call and in it
released connection memory is accessed.
Errors triggering this behavior:
- failure to create the request
- failure to start resolving OCSP responder name
- failure to start connecting to the OCSP responder
The fix is to rearrange the code to eliminate the second call.
The flush flag was not set when forwarding the request body to the uwsgi
server. When using uwsgi_pass suwsgi://..., this causes the uwsgi server
to wait indefinitely for the request body and eventually time out due to
SSL buffering.
This is essentially the same change as 4009:3183165283cc, which was made
to ngx_http_proxy_module.c.
This will fix the uwsgi bug https://github.com/unbit/uwsgi/issues/1490.
This is a temporary workaround, proper retransmission mechanism based on
quic-recovery rfc draft is yet to be implemented.
Currently hardcoded value is too small for real networks. The patch
sets static PTO, considering rtt of ~333ms, what gives about 1s.
This ensures that certificate verification is properly logged to debug
log during upstream server certificate verification. This should help
with debugging various certificate issues.
Listening UNIX sockets were not removed on graceful shutdown, preventing
the next runs. The fix is to replace the custom socket closing code in
ngx_master_process_cycle() by the ngx_close_listening_sockets() call.
When changing binary, sending a SIGTERM to the new binary's master process
should not remove inherited UNIX sockets unless the old binary's master
process has exited.
Also, if both are present, require that they have the same value. These
requirements are specified in HTTP/3 draft 28.
Current implementation of HTTP/2 treats ":authority" and "Host"
interchangeably. New checks only make sure at least one of these values is
present in the request. A similar check existed earlier and was limited only
to HTTP/1.1 in 38c0898b6df7.
The flags was originally added by 8f038068f4bc, and is propagated correctly
in the stream module. With QUIC introduction, http module now uses datagram
sockets as well, thus the fix.
Previously, invalid connection preface errors were only logged at debug
level, providing no visible feedback, in particular, when a plain text
HTTP/2 listening socket is erroneously used for HTTP/1.x connections.
Now these are explicitly logged at the info level, much like other
client-related errors.
When enabled, certificate status is stored in cache and is used to validate
the certificate in future requests.
New directive ssl_ocsp_cache is added to configure the cache.
OCSP validation for client certificates is enabled by the "ssl_ocsp" directive.
OCSP responder can be optionally specified by "ssl_ocsp_responder".
When session is reused, peer chain is not available for validation.
If the verified chain contains certificates from the peer chain not available
at the server, validation will fail.
Previously only the first responder address was used per each stapling update.
Now, in case of a network or parsing error, next address is used.
This also fixes the issue with unsupported responder address families
(ticket #1330).
Preserving pointers within the client buffer is not needed for HTTP/3 because
all data is either allocated from pool or static. Unlike with HTTP/1, data
typically cannot be referenced directly within the client buffer. Trying to
preserve NULLs or external pointers lead to broken pointers.
Also, reverted changes in ngx_http_alloc_large_header_buffer() not relevant
for HTTP/3 to minimize diff to mainstream.
New field r->parse_start is introduced to substitute r->request_start and
r->header_name_start for request length accounting. These fields only work for
this purpose in HTTP/1 because HTTP/1 request line and header line start with
these values.
Also, error logging is now fixed to output the right part of the request.
As per HTTP/3 draft 27, a request or response containing uppercase header
field names MUST be treated as malformed. Also, existing rules applied
when parsing HTTP/1 header names are also applied to HTTP/3 header names:
- null character is not allowed
- underscore character may or may not be treated as invalid depending on the
value of "underscores_in_headers"
- all non-alphanumeric characters with the exception of '-' are treated as
invalid
Also, the r->locase_header field is now filled while parsing an HTTP/3
header.
Error logging for invalid headers is fixed as well.
The first one parses pseudo-headers and is analagous to the request line
parser in HTTP/1. The second one parses regular headers and is analogous to
the header parser in HTTP/1.
Additionally, error handling of client passing malformed uri is now fixed.
The function ngx_http_parse_chunked() is also called from the proxy module to
parse the upstream response. It should always parse HTTP/1 body in this case.
According to quic-transport draft 28 section 10.3.1:
When sending CONNECTION_CLOSE, the goal is to ensure that the peer
will process the frame. Generally, this means sending the frame in a
packet with the highest level of packet protection to avoid the
packet being discarded. After the handshake is confirmed (see
Section 4.1.2 of [QUIC-TLS]), an endpoint MUST send any
CONNECTION_CLOSE frames in a 1-RTT packet. However, prior to
confirming the handshake, it is possible that more advanced packet
protection keys are not available to the peer, so another
CONNECTION_CLOSE frame MAY be sent in a packet that uses a lower
packet protection level.
There is no need in a separate type for the QUIC connection state.
The only state not found in the SSL library is NGX_QUIC_ST_UNAVAILABLE,
which is actually a flag used by the ngx_quic_close_quic() function
to prevent cleanup of uninitialized connection.
Sections 4.10.1 and 4.10.2 of quic transport describe discarding of initial
and handshake keys. Since the keys are discarded, we no longer need
to retransmit packets and corresponding queues should be emptied.
This patch removes previously added workaround that did not require
acknowledgement for initial packets, resulting in avoiding retransmission,
which is wrong because a packet could be lost and we have to retransmit it.
It was possible that retransmit timer was not set after the first
retransmission attempt, due to ngx_quic_retransmit() did not set
wait time properly, and the condition in retransmit handler was incorrect.
Section 17.2 and 17.3 of QUIC transport:
Fixed bit: Packets containing a zero value for this bit are not
valid packets in this version and MUST be discarded.
Reserved bit: An endpoint MUST treat receipt of a packet that has
a non-zero value for these bits, after removing both packet and
header protection, as a connection error of type PROTOCOL_VIOLATION.
When an error occurs, then c->quic->error field may be populated
with an appropriate error code, and the CONNECTION CLOSE frame will be
sent to the peer before the connection is closed. Otherwise, the error
treated as internal and INTERNAL_ERROR code is sent.
The pkt->error field is populated by functions processing packets to
indicate an error when it does not fit into pass/fail return status.
As per QUIC transport, the first flight of 0-RTT packets obviously uses same
Destination and Source Connection ID values as the client's first Initial.
The fix is to match 0-RTT against original DCID after it has been switched.
The ordered frame handler is always called for the existing stream, as it is
allocated from this stream. Instead of searching stream by id, pointer to the
stream node is passed.
The idea is to skip any zeroes that follow valid QUIC packet. Currently such
behavior can be only observed with Firefox which sends zero-padded initial
packets.
Now there's no need to annotate every frame in ACK-eliciting packet.
Sending ACK was moved to the first place, so that queueing ACK frame
no longer postponed up to the next packet after pushing STREAM frames.
+ added "quic" prefix to all error messages
+ rephrased some messages
+ removed excessive error logging from frame parser
+ added ngx_quic_check_peer() function to check proper source/destination
match and do it one place
- the ngx_quic_hexdump0() macro is renamed to ngx_quic_hexdump();
the original ngx_quic_hexdump() macro with variable argument is
removed, extra information is logged normally, with ngx_log_debug()
- all labels in hex dumps are prefixed with "quic"
- the hexdump format is simplified, length is moved forward to avoid
situations when the dump is truncated, and length is not shown
- ngx_quic_flush_flight() function contents is debug-only, placed under
NGX_DEBUG macro to avoid "unused variable" warnings from compiler
- frame names in labels are capitalized, similar to other places
+ all dumps are moved under one of the following macros (undefined by default):
NGX_QUIC_DEBUG_PACKETS
NGX_QUIC_DEBUG_FRAMES
NGX_QUIC_DEBUG_FRAMES_ALLOC
NGX_QUIC_DEBUG_CRYPTO
+ all QUIC debug messages got "quic " prefix
+ all input frames are reported as "quic frame in FOO_FRAME bar:1 baz:2"
+ all outgoing frames re reported as "quic frame out foo bar baz"
+ all stream operations are prefixed with id, like: "quic stream id 0x33 recv"
+ all transport parameters are prefixed with "quic tp"
(hex dump is moved to caller, to avoid using ngx_cycle->log)
+ packet flags and some other debug messages are updated to
include packet type
As per https://tools.ietf.org/html/rfc7540#section-8.1,
: A server can send a complete response prior to the client
: sending an entire request if the response does not depend on
: any portion of the request that has not been sent and
: received. When this is true, a server MAY request that the
: client abort transmission of a request without error by
: sending a RST_STREAM with an error code of NO_ERROR after
: sending a complete response (i.e., a frame with the
: END_STREAM flag). Clients MUST NOT discard responses as a
: result of receiving such a RST_STREAM, though clients can
: always discard responses at their discretion for other
: reasons.
Previously, RST_STREAM(NO_ERROR) received from upstream after
a frame with the END_STREAM flag was incorrectly treated as an
error. Now, a single RST_STREAM(NO_ERROR) is properly handled.
This fixes problems observed with modern grpc-c [1], as well
as with the Go gRPC module.
[1] https://github.com/grpc/grpc/pull/1661
We always generate stream frames that have length. The 'len' member is used
during parsing incoming frames and can be safely ignored when generating
output.
There are following flags in quic connection:
closing - true, when a connection close is initiated, for whatever reason
draining - true, when a CC frame is received from peer
The following state machine is used for closing:
+------------------+
| I/HS/AD |
+------------------+
| | |
| | V
| | immediate close initiated:
| | reasons: close by top-level protocol, fatal error
| | + sends CC (probably with app-level message)
| | + starts close_timer: 3 * PTO (current probe timeout)
| | |
| | V
| | +---------+ - Reply to input with CC (rate-limited)
| | | CLOSING | - Close/Reset all streams
| | +---------+
| | | |
| V V |
| receives CC |
| | |
idle | |
timer | |
| V |
| +----------+ | - MUST NOT send anything (MAY send a single CC)
| | DRAINING | | - if not already started, starts close_timer: 3 * PTO
| +----------+ | - if not already done, close all streams
| | |
| | |
| close_timer fires
| |
V V
+------------------------+
| CLOSED | - clean up all the resources, drop connection
+------------------------+ state completely
The ngx_quic_close_connection() function gets an "rc" argument, that signals
reason of connection closing:
NGX_OK - initiated by application (i.e. http/3), follow state machine
NGX_DONE - timedout (while idle or draining)
NGX_ERROR - fatal error, destroy connection immediately
The PTO calculations are not yet implemented, hardcoded value of 5s is used.
The function is split into three:
ngx_quic_close_connection() itself cleans up all core nginx things
ngx_quic_close_quic() deals with everything inside c->quic
ngx_quic_close_streams() deals with streams cleanup
The quic and streams cleanup functions may return NGX_AGAIN, thus signalling
that cleanup is not ready yet, and the close cannot continue to next step.
The header size macros for long and short packets were fixed to provide
correct values in bytes.
Currently the sending code limits frames so they don't exceed max_packet_size.
But it does not account the case when a single frame can exceed the limit.
As a result of this patch, big payload (CRYPTO and STREAM) will be split
into a number of smaller frames that fit into advertised max_packet_size
(which specifies final packet size, after encryption).
chrome-unstable 83.0.4103.7 starts with Initial packet number 1.
I couldn't find a proper explanation besides this text in quic-transport:
An endpoint MAY skip packet numbers when sending
packets to detect this (Optimistic ACK Attack) behavior.
+ MAX_STREAM_DATA frame is sent when recv() is performed on stream
The new value is a sum of total bytes received by stream + free
space in a buffer;
The sending of MAX_STREM_DATA frame in response to STREAM_DATA_BLOCKED
frame is adjusted to follow the same logic as above.
+ MAX_DATA frame is sent when total amount of received data is 2x
of current limit. The limit is doubled.
+ Default values of transport parameters are adjusted to more meaningful
values:
initial stream limits are set to quic buffer size instead of
unrealistically small 255.
initial max data is decreased to 16 buffer sizes, in an assumption that
this is enough for a relatively short connection, instead of randomly
chosen big number.
All this allows to initiate a stable flow of streams that does not block
on stream/connection limits (tested with FF 77.0a1 and 100K requests)
Before the patch, full STREAM frame handling was delayed until the frame with
zero offset is received. Only node in the streams tree was created.
This lead to problems when such stream was deleted, in particular, it had no
handlers set for read events.
This patch creates new stream immediately, but delays data delivery until
the proper offset will arrive. This is somewhat similar to how accept()
operation works.
The ngx_quic_add_stream() function is no longer needed and merged into stream
handler. The ngx_quic_stream_input() now only handles frames for existing
streams and does not deal with stream creation.
Frames can still float in the following queues:
- crypto frames reordering queues (one per encryption level)
- moved crypto frames cleanup to the moment where all streams are closed
- stream frames reordering queues (one per packet number namespace)
- frames retransmit queues (one per packet number namespace)
Each stream node now includes incoming frames queue and sent/received counters
for tracking offset. The sent counter is not used, c->sent is used, not like
in crypto buffers, which have no connections.
If offset in CRYPTO frame doesn't match expected, following actions are taken:
a) Duplicate frames or frames within [0...current offset] are ignored
b) New data from intersecting ranges (starts before current_offset, ends
after) is consumed
c) "Future" frames are stored in a sorted queue (min offset .. max offset)
Once a frame is consumed, current offset is updated and the queue is inspected:
we iterate the queue until the gap is found and act as described
above for each frame.
The amount of data in buffered frames is limited by corresponding macro.
The CRYPTO and STREAM frame structures are now compatible: they share
the same set of initial fields. This allows to have code that deals with
both of this frames.
The ordering layer now processes the frame with offset and invokes the
handler when it can organise an ordered stream of data.
Quote: Conceptually, a packet number space is the context in which a packet
can be processed and acknowledged.
ngx_quic_namespace_t => ngx_quic_send_ctx_t
qc->ns => qc->send_ctx
ns->largest => send_ctx->largest_ack
The ngx_quic_ns(level) macro now returns pointer, not just index:
ngx_quic_get_send_ctx(c->quic, level)
ngx_quic_retransmit_ns() => ngx_quic_retransmit()
ngx_quic_output_ns() => ngx_quic_output_frames()
The request processing is delayed by a timer. Since nginx updates
internal time once at the start of each event loop iteration, this
normally ensures constant time delay, adding a mitigation from
time-based attacks.
A notable exception to this is the case when there are no additional
events before the timer expires. To ensure constant-time processing
in this case as well, we trigger an additional event loop iteration
by posting a dummy event for the next event loop iteration.
The offset in client CRYPTO frames is tracked in c->quic->crypto_offset_in.
This means that CRYPTO frames with non-zero offset are now accepted making
possible to finish handshake with client certificates that exceed max packet
size (if no reordering happens).
The c->quic->crypto_offset field is renamed to crypto_offset_out to avoid
confusion with tracking of incoming CRYPTO stream.
+ since number of ranges in unknown, provide a function to parse them once
again in handler to avoid memory allocation
+ ack handler now processes all ranges, not only the first
+ ECN counters are parsed and saved into frame if present
Such frames are grouped together in a switch and just ignored, instead of
closing the connection This may improve test coverage. All such frames
require acknowledgment.
The qc->closing flag is set when a connection close is initiated for the first
time.
No timers will be set if the flag is active.
TODO: this is a temporary solution to avoid running timer handlers after
connection (and it's pool) was destroyed. It looks like currently we have
no clear policy of connection closing in regard to timers.
Found with a previously received Initial packet with ACK only, which
instantiates a new connection but do not produce the handshake keys.
This can be triggered by a fairly well behaving client, if the server
stands behind a load balancer that stripped Initial packets exchange.
Found by F5 test suite.
This makes sending large number of bidirectional stream work within ngtcp2,
which doesn't bother sending optional STREAMS_BLOCKED when exhausted.
This also introduces tracking currently opened and maximum allowed streams.
Currently, the output is called periodically, each 200 ms to invoke
ngx_quic_output() that will push all pending frames into packets.
TODO: implement flags a-là Nagle & co (NO_DELAY/NO_PUSH...)
All frames collected to packet are moved into a per-namespace send queue.
QUIC connection has a timer which fires on the closest max_ack_delay time.
The frame is deleted from the queue when a corresponding packet is acknowledged.
The NGX_QUIC_MAX_RETRANSMISSION is a timeout that defines maximum length
of retransmission of a frame.
The quic->keys[4] array now contains secrets related to the corresponding
encryption level. All protection-level functions get proper keys and do
not need to switch manually between levels.
If early data is accepted, SSL_do_handshake() completes as soon as ClientHello
is processed. SSL_in_init() will report the handshake is still in progress.
Static buffers are used instead in functions where decryption takes place.
The pkt->plaintext points to the beginning of a static buffer.
The pkt->payload.data points to decrypted data actual start.
+ ngx_quic_encrypt():
- no longer accepts pool as argument
- pkt is 1st arg
- payload is passed as pkt->payload
- performs encryption to the specified static buffer
+ ngx_quic_create_long/short_packet() functions:
- single buffer for everything, allocated by caller
- buffer layout is: [ ad | payload | TAG ]
the result is in the beginning of buffer with proper length
- nonce is calculated on stack
- log is passed explicitly, pkt is 1st arg
- no more allocations inside
+ ngx_quic_create_long_header():
- args changed: no need to pass str_t
+ added ngx_quic_create_short_header()
+ Client-related errors (i.e. parsing) are done at INFO level
+ c->log->action is updated through the process of receiving, parsing.
handling packet/payload and generating frames/output.
For ngx_http_process_request() part to work, this required to set both
r->http_connection->ssl and c->ssl on a QUIC stream. To avoid damaging
global SSL object, ngx_ssl_shutdown() is managed to ignore QUIC streams.
+ ngx_quic_init_ssl_methods() is no longer there, we setup methods on SSL
connection directly.
+ the handshake_handler is actually a generic quic input handler
+ updated c->log->action and debug to reflect changes and be more informative
+ c->quic is always set in ngx_quic_input()
+ the quic connection state is set by the results of SSL_do_handshake();
note:
+ parameters are available in SSL connection since they are obtained by ssl
stack
quote:
During connection establishment, both endpoints make authenticated
declarations of their transport parameters. These declarations are
made unilaterally by each endpoint.
and really, we send our parameters before we read client's.
no handling of incoming parameters is made by this patch.
- integer parameters can be configured using the following directives:
quic_max_idle_timeout
quic_max_ack_delay
quic_max_packet_size
quic_initial_max_data
quic_initial_max_stream_data_bidi_local
quic_initial_max_stream_data_bidi_remote
quic_initial_max_stream_data_uni
quic_initial_max_streams_bidi
quic_initial_max_streams_uni
quic_ack_delay_exponent
quic_active_migration
quic_active_connection_id_limit
- only following parameters are actually sent:
active_connection_id_limit
initial_max_streams_uni
initial_max_streams_bidi
initial_max_stream_data_bidi_local
initial_max_stream_data_bidi_remote
initial_max_stream_data_uni
(other parameters are to be added into ngx_quic_create_transport_params()
function as needed, should be easy now)
- draft 24 and draft 27 are now supported
(at compile-time using quic_version macro)
The ngx_quic_parse_frame() functions now has new 'pkt' argument: the packet
header of a currently processed frame. This allows to log errors/debug
closer to reasons and perform additional checks regarding possible frame
types. The handler only performs processing of good frames.
A number of functions like read_uint32(), parse_int[_multi] probably should
be implemented as a macro, but currently it is better to have them as
functions for simpler debugging.
Cleanup in ngx_event_quic.c:
+ reorderded functions, structures
+ added missing prototypes
+ added separate handlers for each frame type
+ numerous indentation/comments/TODO fixes
+ removed non-implemented qc->state and corresponding enum;
this requires deep thinking, stub was unused.
+ streams inside quic connection are now in own structure
All code dealing with serializing/deserializing
is moved int srv/event/ngx_event_quic_transport.c/h file.
All macros for dealing with data are internal to source file.
The header file exposes frame types and error codes.
The exported functions are currently packet header parsers and writers
and frames parser/writer.
The ngx_quic_header_t structure is updated with 'log' member. This avoids
passing extra argument to parsing functions that need to report errors.
+ support for more than one initial packet
+ workaround for trailing zeroes in packet
+ ignore application data packet if no keys yet (issue in draft 27/ff nightly)
+ fixed PING frame parser
+ STREAM frames need to be acknowledged
The following HTTP configuration is used for firefox (v74):
http {
ssl_certificate_key localhost.key;
ssl_certificate localhost.crt;
ssl_protocols TLSv1.2 TLSv1.3;
server {
listen 127.0.0.1:10368 reuseport http3;
ssl_quic on;
server_name localhost;
location / {
return 200 "This-is-QUICK\n";
}
}
server {
listen 127.0.0.1:5555 ssl; # point the browser here
server_name localhost;
location / {
add_header Alt-Svc 'h3-24=":10368";ma=100';
return 200 "ALT-SVC";
}
}
}
New files:
src/event/ngx_event_quic_protection.h
src/event/ngx_event_quic_protection.c
The protection.h header provides interface to the crypto part of the QUIC:
2 functions to initialize corresponding secrets:
ngx_quic_set_initial_secret()
ngx_quic_set_encryption_secret()
and 2 functions to deal with packet processing:
ngx_quic_encrypt()
ngx_quic_decrypt()
Also, structures representing secrets are defined there.
All functions require SSL connection and a pool, only crypto operations
inside, no access to nginx connections or events.
Currently pool->log is used for the logging (instead of original c->log).
- events handling moved into src/event/ngx_event_quic.c
- http invokes once ngx_quic_run() and passes stream callback
(diff to original http_request.c is now minimal)
- streams are stored in rbtree using ID as a key
- when a new stream is registered, appropriate callback is called
- ngx_quic_stream_t type represents STREAM and stored in c->qs
- now NEW_CONNECTION_ID frames can be received and parsed
The packet structure is created in ngx_quic_input() and passed
to all handlers (initial, handshake and application data).
The UDP datagram buffer is saved as pkt->raw;
The QUIC packet is stored as pkt->data and pkt->len (instead of pkt->buf)
(pkt->len is adjusted after parsing headers to actual length)
The pkt->pos is removed, pkt->raw->pos is used instead.
- added basic parsing of ACK, PING and PADDING frames on input
- added preliminary parsing of SHORT headers
The ngx_quic_output() is now called after processing of each input packet.
Frames are added into output queue according to their level: inital packets
go ahead of handshake and application data, so they can be merged properly.
The payload handler is called from both new, handshake and applicataion data
handlers (latter is a stub).
As was objserved with ngtcp2 client, Finished CRYPTO frame within Handshake
packet may not be sent for some reason if there's nothing to append on 1-RTT.
This results in unnecessary retransmit. To avoid this edge case, a non-zero
active_connection_id_limit transport parameter is now used to append datagram
with NEW_CONNECTION_ID 1-RTT frames.
Now handshake generates frames, and they are queued in c->quic->frames.
The ngx_quic_output() is called from ngx_quic_flush_flight() or manually,
processes the queue and encrypts all frames according to required encryption
level.
ngx_quic_hexdump0(log, format, buffer, buffer_size);
- logs hexdump of buffer to specified error log
ngx_quic_hexdump0(c->log, "this is foo:", foo.data, foo.len);
ngx_quic_hexdump(log, format, buffer, buffer_size, ...)
- same as hexdump0, but more format/args possible:
ngx_quic_hexdump(c->log, "a=%d b=%d, foo is:", foo.data, foo.len, a, b);
When "aio" or "aio threads" is used while processing the response body of an
in-memory background subrequest, the subrequest could be finalized with an aio
operation still in progress. Upon aio completion either parent request is
woken or the old r->write_event_handler is called again. The latter may result
in request errors. In either case post_subrequest handler is never called with
the full response body, which is typically expected when using in-memory
subrequests.
Currently in nginx background subrequests are created by the upstream module
and the mirror module. The issue does not manifest itself with these
subrequests because they are header-only. But it can manifest itself with
third-party modules which create in-memory background subrequests.
We used to have default error_page overwrite for 495, 496, and 497, so
a configuration like
error_page 495 /error;
will result in error 400, much like without any error_page configured.
The 494 status code was introduced later (in 3848:de59ad6bf557, nginx 0.9.4),
and relevant changes to ngx_http_core_error_page() were missed, resulting
in inconsistent behaviour of "error_page 494" - with error_page configured
it results in 494 being returned instead of 400.
Reported by Frank Liu,
http://mailman.nginx.org/pipermail/nginx/2020-February/058957.html.
Introduced ngx_quic_input() and ngx_quic_output() as interface between
nginx and protocol. They are the only functions that are exported.
While there, added copyrights.
In "co64" atom chunk start offset is a 64-bit unsigned integer. When trimming
the "mdat" atom, chunk offsets are casted to off_t values which are typically
64-bit signed integers. A specially crafted mp4 file with huge chunk offsets
may lead to off_t overflow and result in negative trim boundaries.
The consequences of the overflow are:
- Incorrect Content-Length header value in the response.
- Negative left boundary of the response file buffer holding the trimmed "mdat".
This leads to pread()/sendfile() errors followed by closing the client
connection.
On rare systems where off_t is a 32-bit integer, this scenario is also feasible
with the "stco" atom.
The fix is to add checks which make sure data chunks referenced by each track
are within the mp4 file boundaries. Additionally a few more checks are added to
ensure mp4 file consistency and log errors.
Duplicate "Host" headers were allowed in nginx 0.7.0 (revision b9de93d804ea)
as a workaround for some broken Motorola phones which used to generate
requests with two "Host" headers[1]. It is believed that this workaround
is no longer relevant.
[1] http://mailman.nginx.org/pipermail/nginx-ru/2008-May/017845.html
The "identity" transfer coding has been removed in RFC 7230. It is
believed that it is not used in real life, and at the same time it
provides a potential attack vector.
We anyway do not support more than one transfer encoding, so accepting
requests with multiple Transfer-Encoding headers doesn't make sense.
Further, we do not handle multiple headers, and ignore anything but
the first header.
Reported by Filippo Valsorda.
A connection could get stuck without timers if a client has partially sent
the HEADERS frame such that it was split on the individual header boundary.
In this case, it cannot be processed without the rest of the HEADERS frame.
The fix is to call ngx_http_v2_state_headers_save() in this case. Normally,
it would be called from the ngx_http_v2_state_header_block() handler on the
next iteration, when there is not enough data to continue processing. This
isn't the case if recv_buffer became empty and there's no more data to read.
With the recent change to prevent frames flood in d4448892a294,
nginx will finalize the connection with NGX_HTTP_V2_INTERNAL_ERROR
whenever flood is detected, causing nginx aborting or stopping if
the debug_points directive is used in nginx config.
Previous change 1ce3f01a4355 incorrectly introduced processing of the
ngx_posted_next_events queue at the end of operation, effectively making
posted next events a nop, since at the end of an event loop iteration
the queue is always empty. Correct approach is to move events to the
ngx_posted_events queue at an iteration start, as it was done previously.
Further, in some cases the c->read event might be already in the
ngx_posted_events queue, and calling ngx_post_event() with the
ngx_posted_next_events queue won't do anything. To make sure the event
will be correctly placed into the ngx_posted_next_events queue
we now check if it is already posted.
Introduced in 9d2ad2fb4423 available bytes handling in SSL relied
on connection read handler being overwritten to set the ready flag
and the amount of available bytes. This approach is, however, does
not work properly when connection read handler is changed, for example,
when switching to a next pipelined request, and can result in unexpected
connection timeouts, see here:
http://mailman.nginx.org/pipermail/nginx-devel/2019-December/012825.html
Fix is to introduce ngx_event_process_posted_next() instead, which
will set ready and available regardless of how event handler is set.
When ngx_http_v2_close_stream_handler() is used to retry stream close
after queued frames are sent, client timeouts on the stream can be
logged multiple times and/or in addition to already happened errors.
To resolve this, separate ngx_http_v2_retry_close_stream_handler()
was introduced, which does not try to log timeouts.
If a stream is closed with queued frames, it is possible that no further
write events will occur on the stream, leading to the socket leak.
To fix this, the stream's fake connection read handler is set to
ngx_http_v2_close_stream_handler(), to make sure that finalizing the
connection with ngx_http_v2_finalize_connection() will be able to
close the stream regardless of the current number of queued frames.
Additionally, the stream's fake connection fc->error flag is explicitly
set, so ngx_http_v2_handle_stream() will post a write event when queued
frames are finally sent even if stream flow control window is exhausted.
These checks were missed when chunked support was introduced. And also
added an explicit error message to ngx_http_dav_copy_move_handler()
(it was missed for some reason, in contrast to DELETE and MKCOL handlers).
While empty replacements were caught at run-time, parsing code
of the "rewrite" directive expects that a minimum length of the
"replacement" argument is 1.
If a rewritten URI has the null character, only a part of URI was
copied to a memory buffer allocated for path. In some setups this
could be exploited to expose uninitialized memory via the Location
header.
The "alias" directive cannot be used in the same location where URI
was rewritten. This has been detected in the "rewrite ... break"
case, but not when the standalone "break" directive was used.
This change also fixes proxy_pass with URI component in a similar
case:
location /aaa/ {
rewrite ^ /xxx/yyy;
break;
proxy_pass http://localhost:8080/bbb/;
}
Previously, the "/bbb/yyy" would be sent to a backend instead of
"/xxx/yyy". And if location's prefix was longer than the rewritten
URI, a segmentation fault might occur.
Previously, connections returned from keepalive cache had c->data
pointing to the keepalive cache item. While this shouldn't be a problem
for correct code, as c->data is not expected to be used before it is set,
explicitly clearing it might help to avoid confusion.
Previously only an rbtree was associated with a limit_conn. To make it
possible to associate more data with a limit_conn, shared context is introduced
similar to limit_req. Also, shared pool pointer is kept in a way similar to
limit_req.
Now a new structure ngx_proxy_protocol_t holds these fields. This allows
to add more PROXY protocol fields in the future without modifying the
connection structure.
With MinGW-w64, building 64-bit nginx binary with GCC 8 and above
results in warning due to cast of GetProcAddress() result to ngx_wsapoll_pt,
which GCC thinks is incorrect. Added intermediate cast to "void *" to
silence the warning.
FormatMessage() seems to return many errors which essentially indicate that
the language in question is not available. At least the following were
observed in the wild and during testing: ERROR_MUI_FILE_NOT_FOUND (15100)
(ticket #1868), ERROR_RESOURCE_TYPE_NOT_FOUND (1813). While documentation
says it should be ERROR_RESOURCE_LANG_NOT_FOUND (1815), this doesn't seem
to be the case.
As such, checking error code was removed, and as long as FormatMessage()
returns an error, we now always try the default language.
Added code to track number of bytes available in the socket.
This makes it possible to avoid looping for a long time while
working with fast enough peer when data are added to the socket buffer
faster than we are able to read and process data.
When kernel does not provide number of bytes available, it is
retrieved using ioctl(FIONREAD) as long as a buffer is filled by
SSL_read().
It is assumed that number of bytes returned by SSL_read() is close
to the number of bytes read from the socket, as we do not use
SSL compression. But even if it is not true for some reason, this
is not important, as we post an additional reading event anyway.
Note that data can be buffered at SSL layer, and it is not possible
to simply stop reading at some point and wait till the event will
be reported by the kernel again. This can be only done when there
are no data in SSL buffers, and there is no good way to find out if
it's the case.
Instead of trying to figure out if SSL buffers are empty, this patch
introduces events posted for the next event loop iteration - such
events will be processed only on the next event loop iteration,
after going into the kernel and retrieving additional events. This
seems to be simple and reliable approach.
This makes it possible to avoid looping for a long time while working
with a fast enough peer when data are added to the socket buffer faster
than we are able to read and process them (ticket #1431). This is
basically what we already do on FreeBSD with kqueue, where information
about the number of bytes in the socket buffer is returned by
the kevent() call.
With other event methods rev->available is now set to -1 when the socket
is ready for reading. Later in ngx_recv() and ngx_recv_chain(), if
full buffer is received, real number of bytes in the socket buffer is
retrieved using ioctl(FIONREAD). Reading more than this number of bytes
ensures that even with edge-triggered event methods the event will be
triggered again, so it is safe to stop processing of the socket and
switch to other connections.
Using ioctl(FIONREAD) only after reading a full buffer is an optimization.
With this approach we only call ioctl(FIONREAD) when there are at least
two recv()/readv() calls.
As long as there are data to read in the socket, yet the amount of data
is less than total size of the buffers in the chain, this saves one
unneeded read() syscall. Before this change, reading only stopped if
ngx_ssl_recv() returned no data, that is, two read() syscalls in a row
returned EAGAIN.
In SSL connections, data can be buffered by the SSL layer, and it is
wrong to avoid doing c->recv_chain() if c->read->available is 0 and
c->read->pending_eof is set. And tests show that the optimization in
question indeed can result in incorrect detection of premature connection
close if upstream closes the connection without sending a close notify
alert at the same time. Fix is to disable c->read->available optimization
for SSL connections.
This could happen when graceful shutdown configured by worker_shutdown_timeout
times out and is then followed by another timeout such as proxy_read_timeout.
In this case, the HEADERS frame is added to the output queue, but attempt to
send it fails (due to c->error forcibly set during graceful shutdown timeout).
This triggers request finalization which attempts to close the stream. But the
stream cannot be closed because there is a frame in the output queue, and the
connection cannot be finalized. This leaves the connection open without any
timer events leading to alert.
The fix is to post write event when sending output queue fails on c->error.
That will finalize the connection.
With this patch, all traffic over an HTTP/2 connection is counted in
the h2c->total_bytes field, and payload traffic is counted in
the h2c->payload_bytes field. As long as total traffic is many times
larger than payload traffic, we consider this to be a flood.
In 8df664ebe037, we've switched to maximizing stream window instead
of sending RST_STREAM. Since then handling of RST_STREAM with NO_ERROR
was fixed at least in Chrome, hence we switch back to using RST_STREAM.
This allows more effective rejecting of large bodies, and also minimizes
non-payload traffic to be accounted in the next patch.
Previously, if a response to the PTR request was cached, and ngx_resolver_dup()
failed to allocate memory for the resulting name, then the original node was
freed but left in expire_queue. A subsequent address resolving would end up
in a use-after-free memory access of the node either in ngx_resolver_expire()
or ngx_resolver_process_ptr(), when accessing it through expire_queue.
The fix is to leave the resolver node intact.
Don't waste server resources by sending RST_STREAM frames. Instead,
reject WINDOW_UPDATE frames with invalid zero increment by closing
connection with PROTOCOL_ERROR.
Don't waste server resources by sending RST_STREAM frames. Instead,
reject HEADERS and PRIORITY frames with self-dependency by closing
connection with PROTOCOL_ERROR.
When ngx_http_discard_request_body() call was added to ngx_http_send_response(),
there were no return codes other than NGX_OK and NGX_HTTP_INTERNAL_SERVER_ERROR.
Now it can also return NGX_HTTP_BAD_REQUEST, but ngx_http_send_response() still
incorrectly transforms it to NGX_HTTP_INTERNAL_SERVER_ERROR.
The fix is to propagate ngx_http_discard_request_body() errors.
As defined in HTTP/1.1, body chunks have the following ABNF:
chunk = chunk-size [ chunk-ext ] CRLF chunk-data CRLF
where chunk-data is a sequence of chunk-size octets.
With this change, chunk-data that doesn't end up with CRLF at chunk-size
offset will be treated as invalid, such as in the example provided below:
4
SEE-THIS-AND-
4
THAT
0
Previously, if unbuffered request body reading wasn't finished before
the request was redirected to a different location using error_page
or X-Accel-Redirect, and the request body is read again, this could
lead to disastrous effects, such as a duplicate post_handler call or
"http request count is zero" alert followed by a segmentation fault.
This happened in the following configuration (ticket #1819):
location / {
proxy_request_buffering off;
proxy_pass http://bad;
proxy_intercept_errors on;
error_page 502 = /error;
}
location /error {
proxy_pass http://backend;
}
Fixed excessive memory growth and CPU usage if stream windows are
manipulated in a way that results in generating many small DATA frames.
Fix is to limit the number of simultaneously allocated DATA frames.
Fixed uncontrolled memory growth if peer sends a stream of
headers with a 0-length header name and 0-length header value.
Fix is to reject headers with zero name length.
When using SMTP with SSL and resolver, read events might be enabled
during address resolving, leading to duplicate ngx_mail_ssl_handshake_handler()
calls if something arrives from the client, and duplicate session
initialization - including starting another resolving. This can lead
to a segmentation fault if the session is closed after first resolving
finished. Fix is to block read events while resolving.
Reported by Robert Norris,
http://mailman.nginx.org/pipermail/nginx/2019-July/058204.html.
After ac5a741d39cf it is now possible that after zstream.avail_out
reaches 0 and we allocate additional buffer, there will be no more data
to put into this buffer, triggering "zero size buf" alert. Fix is to
reset b->temporary flag in this case.
Additionally, an optimization added to avoid allocating additional buffer
in this case, by checking if last deflate() call returned Z_STREAM_END.
Note that checking for Z_STREAM_END by itself is not enough to fix alerts,
as deflate() can return Z_STREAM_END without producing any output if the
buffer is smaller than gzip trailer.
Reported by Witold Filipczyk,
http://mailman.nginx.org/pipermail/nginx-devel/2019-July/012469.html.
Due to shortcomings of the ccv->zero flag implementation in complex value
interface, length of the resulting string from ngx_http_complex_value()
might either not include terminating null character or include it,
so the only safe way to work with the result is to use it as a
null-terminated string.
Reported by Patrick Wollgast.
When "-" follows a parameter of maximum length, a single byte buffer
overflow happens, since the error branch does not check parameter length.
Fix is to avoid saving "-" to the parameter key, and instead use an error
message with "-" explicitly written. The message is mostly identical to
one used in similar cases in the preequal state.
Reported by Patrick Wollgast.
With level-triggered event methods it is important to specify
the NGX_CLOSE_EVENT flag to ngx_handle_read_event(), otherwise
the event won't be removed, resulting in CPU hog.
Reported by Patrick Wollgast.
To save memory hash code uses u_short to store resulting bucket sizes,
so maximum bucket size is limited to 65536 minus ngx_cacheline_size (larger
values will be aligned to 65536 which will overflow u_short). However,
there were no checks to enforce this, and using larger bucket sizes
resulted in overflows and segmentation faults.
Appropriate safety checks to enforce this added to ngx_hash_init().
When nginx is used with zlib patched with [1], which provides
integration with the future IBM Z hardware deflate acceleration, it ends
up computing CRC32 twice: one time in hardware, which always does this,
and one time in software by explicitly calling crc32().
crc32() calls were added in changesets 133:b27548f540ad ("nginx-0.0.1-
2003-09-24-23:51:12 import") and 134:d57c6835225c ("nginx-0.0.1-
2003-09-26-09:45:21 import") as part of gzip wrapping feature - back
then zlib did not support it.
However, since then gzip wrapping was implemented in zlib v1.2.0.4,
and it's already being used by nginx for log compression.
This patch replaces hand-written gzip wrapping with the one provided by
zlib. It simplifies the code, and makes it avoid computing CRC32 twice
when using hardware acceleration.
[1] https://github.com/madler/zlib/pull/410
Similarly to the change in 5491:74bfa803a5aa (1.5.9), we should accept
properly escaped URIs and unescape them as needed, else it is not possible
to handle URIs with question marks.
As we now have ctx->header_sent flag, it is further used to prevent
duplicate $r->send_http_header() calls, prevent output before sending
header, and $r->internal_redirect() after sending header.
Further, $r->send_http_header() protected from calls after
$r->internal_redirect().
Returning NGX_HTTP_INTERNAL_SERVER_ERROR if a perl code died after
sending header will lead to a "header already sent" alert. To avoid
it, we now check if header was already sent, and return NGX_ERROR
instead if it was.
Previously, redirects scheduled with $r->internal_redirect() were followed
even if the code then died. Now these are ignored and nginx will return
an error instead.
Variable handlers are not expected to send anything to the client, cannot
sleep or read body, and are not expected to modify the request. Added
appropriate protection to prevent accidental foot shooting.
Duplicate $r->sleep() and/or $r->has_request_body() calls result
in undefined behaviour (in practice, connection leaks were observed).
To prevent this, croak() added in appropriate places.
Previously, allocation errors in nginx.xs were more or less ignored,
potentially resulting in incorrect code execution in specific low-memory
conditions. This is changed to use ctx->error bit and croak(), similarly
to how output errors are now handled.
Note that this is mostly a cosmetic change, as Perl itself exits on memory
allocation errors, and hence nginx with Perl is hardly usable in low-memory
conditions.
When an error happens, the ctx->error bit is now set, and croak()
is called to terminate further processing. The ctx->error bit is
checked in ngx_http_perl_call_handler() to cancel further processing,
and is also checked in various output functions - to make sure these won't
be called if croak() was handled by an eval{} in perl code.
In particular, this ensures that output chain won't be called after
errors, as filters might not expect this to happen. This fixes some
segmentation faults under low memory conditions. Also this stops
request processing after filter finalization or request body reading
errors.
For cases where an HTTP error status can be additionally returned (for
example, 416 (Requested Range Not Satisfiable) from the range filter),
the ctx->status field is also added.
This ensures that correct ctx is always available, including after
filter finalization. In particular, this fixes a segmentation fault
with the following configuration:
location / {
image_filter test;
perl 'sub {
my $r = shift;
$r->send_http_header();
$r->print("foo\n");
$r->print("bar\n");
}';
}
This also seems to be the only way to correctly handle filter finalization
in various complex cases, for example, when embedded perl is used both
in the original handler and in an error page called after filter
finalization.
The NGX_DONE test in ngx_http_perl_handle_request() was introduced
in 1702:86bb52e28ce0, which also modified ngx_http_perl_call_handler()
to return NGX_DONE with c->destroyed. The latter part was then
removed in 3050:f54b02dbb12b, so NGX_DONE test is no longer needed.
Embedded perl does not set any request fields needed for conditional
requests processing. Further, filter finalization in the not_modified
filter can cause segmentation faults due to cleared ctx as in
ticket #1786.
Before 5fb1e57c758a (1.7.3) the not_modified filter was implicitly disabled
for perl responses, as r->headers_out.last_modified_time was -1. This
change restores this behaviour by using the explicit r->disable_not_modified
flag.
Note that this patch doesn't try to address perl module robustness against
filter finalization and other errors returned from filter chains. It should
be eventually reworked to handle errors instead of ignoring them.
A new directive limit_req_dry_run allows enabling the dry run mode. In this
mode requests are neither rejected nor delayed, but reject/delay status is
logged as usual.
In case of filter finalization, essential request fields like r->uri,
r->args etc could be changed, which affected the cache update subrequest.
Also, after filter finalization r->cache could be set to NULL, leading to
null pointer dereference in ngx_http_upstream_cache_background_update().
The fix is to create background cache update subrequest before sending the
cached response.
Since initial introduction in 1aeaae6e9446 (1.11.10) background cache update
subrequest was created after sending the cached response because otherwise it
blocked the parent request output. In 9552758a786e (1.13.1) background
subrequests were introduced to eliminate the delay before sending the final
part of the cached response. This also made it possible to create the
background cache update subrequest before sending the response.
Note that creating the subrequest earlier does not change the fact that in case
of filter finalization the background cache update subrequest will likely not
have enough time to successfully update the cache entry. Filter finalization
leads to the main request termination as soon the current iteration of request
processing is complete.
Previously, a variant not present in shared memory and stored on disk using a
secondary key was read using c->body_start from a variant stored with a main
key. This could result in critical errors "cache file .. has too long header".
Previously the stale-if-error extension of the Cache-Control upstream header
triggered the return of a stale response for all error conditions that can be
specified in the proxy_cache_use_stale directive. The list of these errors
includes both network/timeout/format errors, as well as some HTTP codes like
503, 504, 403, 429 etc. The latter prevented a cache entry from being updated
by a response with any of these HTTP codes during the stale-if-error period.
Now stale-if-error only works for network/timeout/format errors and ignores
the upstream HTTP code. The return of a stale response for certain HTTP codes
is still possible using the proxy_cache_use_stale directive.
This change also applies to the stale-while-revalidate extension of the
Cache-Control header, which triggers stale-if-error if it is missing.
Reported at
http://mailman.nginx.org/pipermail/nginx/2020-July/059723.html.
In ngx_http_range_singlepart_body() special buffers where passed
unmodified, including ones after the end of the range. As such,
if the last buffer of a response was sent separately as a special
buffer, two buffers with b->last_buf set were present in the response.
In particular, this might result in a duplicate final chunk when using
chunked transfer encoding (normally range filter and chunked transfer
encoding are not used together, but this may happen if there are trailers
in the response). This also likely to cause problems in HTTP/2.
Fix is to skip all special buffers after we've sent the last part of
the range requested. These special buffers are not meaningful anyway,
since we set b->last_buf in the buffer with the last part of the range,
and everything is expected to be flushed due to it.
Additionally, ngx_http_next_body_filter() is now called even
if no buffers are to be passed to it. This ensures that various
write events are properly propagated through the filter chain. In
particular, this fixes test failures observed with the above change
and aio enabled.
Filters are not allowed to change incoming chain links, and should allocate
their own links if any modifications are needed. Nevertheless
ngx_http_range_singlepart_body() modified incoming chain links in some
cases, notably at the end of the requested range.
No problems caused by this are currently known, mostly because of
limited number of possible modifications and the position of the range
body filter in the filter chain. Though this behaviour is clearly incorrect
and tests demonstrate that it can at least cause some proxy buffers being
lost when using proxy_force_ranges, leading to less effective handling
of responses.
Fix is to always allocate new chain links in ngx_http_range_singlepart_body().
Links are explicitly freed to ensure constant memory usage with long-lived
requests.
If a complex value is expected to be of type size_t, and the compiled
value is constant, the constant size_t value is remembered at compile
time.
The value is accessed through ngx_http_complex_value_size() which
either returns the remembered constant or evaluates the expression
and parses it as size_t.
Previously, ngx_utf8_decode() was called from ngx_utf8_length() with
incorrect length, potentially resulting in out-of-bounds read when
handling invalid UTF-8 strings.
In practice out-of-bounds reads are not possible though, as autoindex, the
only user of ngx_utf8_length(), provides null-terminated strings, and
ngx_utf8_decode() anyway returns an errors when it sees a null in the
middle of an UTF-8 sequence.
Reported by Yunbin Liu.
If OCSP stapling was enabled with dynamic certificate loading, with some
OpenSSL versions (1.0.2o and older, 1.1.0h and older; fixed in 1.0.2p,
1.1.0i, 1.1.1) a segmentation fault might happen.
The reason is that during an abbreviated handshake the certificate
callback is not called, but the certificate status callback was called
(https://github.com/openssl/openssl/issues/1662), leading to NULL being
returned from SSL_get_certificate().
Fix is to explicitly check SSL_get_certificate() result.
If X509_get_issuer_name() or X509_get_subject_name() returned NULL,
this could lead to a certificate reference leak. It cannot happen
in practice though, since each function returns an internal pointer
to a mandatory subfield of the certificate successfully decoded by
d2i_X509() during certificate message processing (closes#1751).
Previously the ngx_inet_resolve_host() function sorted addresses in a way that
IPv4 addresses came before IPv6 addresses. This was implemented in eaf95350d75c
(1.3.10) along with the introduction of getaddrinfo() which could resolve host
names to IPv6 addresses. Since the "listen" directive only used the first
address, sorting allowed to preserve "listen" compatibility with the previous
behavior and with the behavior of nginx built without IPv6 support. Now
"listen" uses all resolved addresses which makes sorting pointless.
Previously only one address was used by the listen directive handler even if
host name resolved to multiple addresses. Now a separate listening socket is
created for each address.
This makes it possible to provide certificates directly via variables
in ssl_certificate / ssl_certificate_key directives, without using
intermediate files.
It was accidentally introduced in 77436d9951a1 (1.15.9). In MSVC 2015
and more recent MSVC versions it triggers warning C4456 (declaration of
'pkey' hides previous local declaration). Previously, all such warnings
were resolved in 2a621245f4cf.
Reported by Steve Stevenson.
Server name callback is always called by OpenSSL, even
if server_name extension is not present in ClientHello. As such,
checking c->ssl->handshaked before the SSL_get_servername() result
should help to more effectively prevent renegotiation in
OpenSSL 1.1.0 - 1.1.0g, where neither SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS
nor SSL_OP_NO_RENEGOTIATION is available.