Without explicit handling, a zero timer was actually added, leading to
multiple unneeded syscalls. Further, sending GOAWAY frame early might
be beneficial for clients.
Reported by Sergey Kandaurov.
Unlike in 75e908236701, which added the logic to ngx_http_finalize_request(),
this change moves it to a more generic routine ngx_http_finalize_connection()
to cover cases when a request is finalized with NGX_DONE.
In particular, this fixes unwanted connection transition into the keepalive
state after receiving EOF while discarding request body. With edge-triggered
event methods that means the connection will last for extra seconds as set in
the keepalive_timeout directive.
The response size check introduced in 39501ce97e29 did not take into
account possible padding on DATA frames, resulting in incorrect
"upstream sent response body larger than indicated content length" errors
if upstream server used padding in responses with known length.
Fix is to check the actual size of response buffers produced by the code,
similarly to how it is done in other protocols, instead of checking
the size of DATA frames.
Reported at:
http://mailman.nginx.org/pipermail/nginx-devel/2021-March/013907.html
Currently listener contains rbtree with multiple nodes for single QUIC
connection: each corresponding to specific server id. Each udp node points
to same ngx_connection_t, which points to QUIC connection via c->udp field.
Thus when an event handler is called, it only gets ngx_connection_t with
c->udp pointing to QUIC connection. This makes it hard to obtain actual
node which was used to dispatch packet (it requires to repeat DCID lookup).
Additionally, ngx_quic_connection_t->udp field is only needed to keep a
pointer in c->udp. The node is not added into the tree and does not carry
useful information.
Sometimes it is required to process datagram properties at higher level (i.e.
QUIC is interested in source address which may change and IP options). The
patch adds ngx_udp_dgram_t structure used to pass packet-related information
in c->udp.
Previously, when a new datagram arrived, data were copied from the UDP layer
to the QUIC layer via c->recv() interface. Now UDP buffer is accessed
directly.
OpenSSL 3.0 started to require HKDF-Extract output PRK length pointer
used to represent the amount of data written to contain the length of
the key buffer before the call. EVP_PKEY_derive() documents this.
See HKDF_Extract() internal implementation update in this change:
https://github.com/openssl/openssl/commit/5a285ad
The function ngx_quic_shutdown_connection() waits until all non-cancelable
streams are closed, and then closes the connection. In HTTP/3 cancelable
streams are all unidirectional streams except push streams.
The function is called from HTTP/3 when client reaches keepalive_requests.
In case of long header packets, dcid length was not read correctly.
While there, macros to parse uint64 was fixed as well as format specifiers
to print it in debug mode.
Thanks to Gao Yan <gaoyan09@baidu.com>.
Activated with the "proxy_protocol" directive. Can be combined with
"listen ... proxy_protocol;" and "set_real_ip_from ...;" to pass
client address provided to nginx in the PROXY protocol header.
Activated with the "proxy_protocol" parameter of the "listen" directive.
Obtained information is passed to the auth_http script in Proxy-Protocol-Addr,
Proxy-Protocol-Port, Proxy-Protocol-Server-Addr, and Proxy-Protocol-Server-Port
headers.
Similarly to 40e8ce405859 in the stream module, this reduces the time
accept mutex is held. This also simplifies following changes to
introduce PROXY protocol support.
If we need to be notified about further events, ngx_handle_read_event()
needs to be called after a read event is processed. Without this,
an event can be removed from the kernel and won't be reported again,
notably when using oneshot event methods, such as eventport on Solaris.
For consistency, existing ngx_handle_read_event() call removed from
ngx_mail_read_command(), as this call only covers one of the code paths
where ngx_mail_read_command() returns NGX_AGAIN. Instead, appropriate
processing added to the callers, covering all code paths where NGX_AGAIN
is returned.
As long as a read event is blocked (ignored), ngx_handle_read_event()
needs to be called to make sure no further notifications will be
triggered when using level-triggered event methods, such as select() or
poll().
The "!rev->ready" test seems to be a typo, introduced in the original
commit (719:f30b1a75fd3b). The ngx_handle_write_event() code properly
tests for "rev->ready" instead.
Due to this typo, read events might be unexpectedly removed during
proxying after an event on the other part of the proxied connection.
Catched by mail proxying tests.
The strerrordesc_np() function, introduced in glibc 2.32, provides an
async-signal-safe way to obtain error messages. This makes it possible
to avoid copying error messages.
Previously, systems without sys_nerr (or _sys_nerr) were handled with an
assumption that errors start at 0 and continuous. This is, however, not
something POSIX requires, and not true on some platforms.
Notably, on Linux, where sys_nerr is no longer available for newly linked
binaries starting with glibc 2.32, there are gaps in error list, which
used to stop us from properly detecting maximum errno. Further, on
GNU/Hurd errors start at 0x40000001.
With this change, maximum errno detection is moved to the runtime code,
now able to ignore gaps, and also detects the first error if needed.
This fixes observed "Unknown error" messages as seen on Linux with
glibc 2.32 and on GNU/Hurd.
With this change, behaviour of HTTP/2 becomes even closer to HTTP/1.x,
and client_header_timeout instead of keepalive_timeout is used before
the first request is received.
This fixes HTTP/2 connections being closed even before the first request
if "keepalive_timeout 0;" was used in the configuration; the problem
appeared in f790816a0e87 (1.19.7).
As per quic-transport-34:
An endpoint also restarts its idle timer when sending an ack-eliciting
packet if no other ack-eliciting packets have been sent since last receiving
and processing a packet.
Previously, the timer was set for any packet.
The structure is used to parse an HTTP/3 request. An object of this type is
added to ngx_http_request_t instead of h3_parse generic pointer.
Also, the new field is located outside of the request ephemeral zone to keep it
safe after request headers are parsed.
Previously, PINGs and other frames extended possible keepalive time,
making it possible to keep an open HTTP/2 connection for a long time.
Now the connection is always closed as long as keepalive_timeout expires,
similarly to how it happens in HTTP/1.x.
Note that as a part of this change, incomplete frames are no longer
trigger a separate timeout, so http2_recv_timeout (replaced by
client_header_timeout in previous patches) is essentially cancelled.
The client_header_timeout is, however, used for SSL handshake and
while reading HEADERS frames.
Instead, keepalive_timeout and keepalive_requests are now used. This
is expected to simplify HTTP/2 code and usage. This also matches
directives used by upstream module for all protocols.
In case of default settings, this effectively changes maximum number
of requests per connection from 1000 to 100. This looks acceptable,
especially given that HTTP/2 code now properly supports lingering close.
Further, this changes default keepalive timeout in HTTP/2 from 300 seconds
to 75 seconds. This also looks acceptable, and larger than PING interval
used by Firefox (network.http.spdy.ping-threshold defaults to 58s),
the only browser to use PINGs.
Instead, the client_header_timeout is now used for HTTP/2 reading.
Further, the timeout is changed to be set once till no further data
left to read, similarly to how client_header_timeout is used in other
places.
New connections are marked reusable by ngx_http_init_connection() if there
are no data available for reading. As a result, if SSL is not used,
ngx_http_v2_init() might be called when the connection is marked reusable.
If a HEADERS frame is immediately available for reading, this resulted
in connection being preserved in reusable state with an active request,
and possibly closed later as if during worker shutdown (that is, after
all active requests were finalized).
Fix is to explicitly mark connections non-reusable in ngx_http_v2_init()
instead of (incorrectly) assuming they are already non-reusable.
Found by Sergey Kandaurov.
If ngx_drain_connections() fails to immediately reuse any connections
and there are no free connections, it now additionally tries to reuse
a connection again. This helps to provide at least one free connection
in case of HTTP/2 with lingering close, where merely trying to reuse
a connection once does not free it, but makes it reusable again,
waiting for lingering close.
This is particularly important in HTTP/2, where keepalive connections
are closed with lingering. Before the patch, reusing a keepalive HTTP/2
connection resulted in the connection waiting for lingering close to
remain in the reusable connections queue, preventing ngx_drain_connections()
from closing additional connections.
The patch fixes it by marking the connection reusable again, and so
moving it in the reusable connections queue. Further, it makes actually
possible to reuse such connections if needed.
The "max_ack_delay", "ack_delay_exponent", and "max_udp_payload_size"
transport parameters were not communicated to client.
The "disable_active_migration" and "active_connection_id_limit"
parameters were not saved into zero-rtt context.
18.1. Reserved Transport Parameters
Transport parameters with an identifier of the form "31 * N + 27" for
integer values of N are reserved to exercise the requirement that
unknown transport parameters be ignored. These transport parameters
have no semantics, and can carry arbitrary values.
Setting the timer is brought into compliance with quic-recovery-34. Now it's
set from a single function ngx_quic_set_lost_timer() that takes into account
both loss detection and PTO. The following issues are fixed with this change:
- when in loss detection mode, discarding a context could turn off the
timer forever after switching to the PTO mode
- when in loss detection mode, sending a packet resulted in rescheduling the
timer as if it's always in the PTO mode
As per quic-transport-33:
An endpoint MUST acknowledge all ack-eliciting Initial and Handshake
packets immediately
If a packet carrying Initial or Handshake ACK was lost, a non-immediate ACK
should not be sent later. Instead, client is expected to send a new packet
to acknowledge.
Sending non-immediate ACKs for Initial packets can cause the client to
generate an inflated RTT sample.
The token generation in QUIC is reworked. Single host key is used to generate
all required keys of needed sizes using HKDF.
The "quic_stateless_reset_token_key" directive is removed. Instead, the
"quic_host_key" directive is used, which reads key from file, or sets it
to random bytes if not specified.
The flag was introduced to create type-aware CONNECTION_CLOSE frames,
and now is replaced with frame type information, directly accessible.
Notably, this fixes type logging for received frames in b3d9e57d0f62.
The flag is used in ngx_http_finalize_connection() to switch client connection
to the keepalive mode. Since eaea7dac3292 this code is not executed for HTTP/3
which allows us to revert the change and get back to the default branch code.
This part somehow slipped away from c5840ca2063d.
While it is not expected to be needed in case of lingering close,
it is good to keep it for correctness (see 2b5528023f6b).
The change reduces diff to the default branch for
src/http/ngx_http_request_body.c.
Also, client Content-Length, if present, is now checked against the real body
size sent by client.
- split ngx_quic_process_packet() in two functions with the second one called
ngx_quic_process_payload() in charge of decrypring and handling the payload
- renamed ngx_quic_payload_handler() to ngx_quic_handle_frames()
- moved error cleanup from ngx_quic_input() to ngx_quic_process_payload()
- moved handling closed connection from ngx_quic_handle_frames() to
ngx_quic_process_payload()
- minor fixes
Previously, quic connection object was created when Retry packet was sent.
This is neither necessary nor convenient, and contradicts the idea of retry:
protecting from bad clients and saving server resources.
Now, the connection is not created, token is verified cryptographically
instead of holding it in connection.
This function should be called at the end of an event handler to prepare the
event for the next handler call. Particularly, the "active" flag is set or
cleared depending on data availability.
With this call missing in one code path, read handler was not called again
after handling the initial part of the client request, if the request was too
big to fit into a single STREAM frame.
Now ngx_handle_read_event() is called in this code path. Also, read timer is
restarted.
Keeping post_accept_timeout in ngx_listening_t is no longer needed since
we've switched to 1 second timeout for deferred accept in 5541:fdb67cfc957d.
Further, using it in HTTP code can result in client_header_timeout being
used from an incorrect server block, notably if address-specific virtual
servers are used along with a wildcard listening socket, or if we've switched
to a different server block based on SNI in SSL handshake.
The stub status module and ngx_http_send_response() (used by the empty gif
module and the "return" directive) incorrectly assumed that responding
to HEAD requests always results in r->header_only being set. This is not
true, and results in incorrect behaviour, for example, in the following
configuration:
location / {
image_filter size;
return 200 test;
}
Fix is to remove this incorrect micro-optimization from both stub status
module and ngx_http_send_response().
Reported by Chris Newton.
After 7675:9afa45068b8f and 7678:bffcc5af1d72 (1.19.1), during non-buffered
simple proxying, responses with extra data might result in zero size buffers
being generated and "zero size buf" alerts in writer. This bug is similar
to the one with FastCGI proxying fixed in 7689:da8d758aabeb.
In non-buffered mode, normally the filter function is not called if
u->length is already 0, since u->length is checked after each call of
the filter function. There is a case when this can happen though: if
the response length is 0, and there are pre-read response body data left
after reading response headers. As such, a check for u->length is needed
at the start of non-buffered filter functions, similar to the one
for p->length present in buffered filter functions.
Appropriate checks added to the existing non-buffered copy filters
in the upstream (used by scgi and uwsgi proxying) and proxy modules.
A header with the name containing null, CR, LF, colon or uppercase characters,
is now considered an error. A header with the value containing null, CR or LF,
is also considered an error.
Also, header is considered invalid unless its name only contains lowercase
characters, digits, minus and optionally underscore. Such header can be
optionally ignored.
- :method, :path and :scheme are expected exactly once and not empty
- :method and :scheme character validation is added
- :authority cannot appear more than once
Notably, the version negotiation table is updated to reject draft-33/QUICv1
(which requires a new TLS codepoint) unless explicitly asked to built with.
The quic kernel bpf helper inspects packet payload for DCID, extracts key
and routes the packet into socket matching the key.
Due to reuseport feature, each worker owns a personal socket, which is
identified by the same key, used to create DCID.
BPF objects are locked in RAM and are subject to RLIMIT_MEMLOCK.
The "ulimit -l" command may be used to setup proper limits, if maps
cannot be created with EPERM or updated with ETOOLONG.
With introduction of open_file_cache in 1454:f497ed7682a7, opening a file
with ngx_open_cached_file() automatically adds a cleanup handler to close
the file. As such, calling ngx_close_file() directly for non-regular files
is no longer needed and will result in duplicate close() call.
In 1454:f497ed7682a7 ngx_close_file() call for non-regular files was removed
in the static module, but wasn't in the flv module. And the resulting
incorrect code was later copied to the mp4 module. Fix is to remove the
ngx_close_file() call from both modules.
Reported by Chris Newton.
The ngx_http_parse_complex_uri() function cannot make URI longer and does
not null-terminate URI, so there is no need to allocate an extra byte. This
allocation appears to be a leftover from changes in 461:a88a3e4e158f (0.1.5),
where null-termination of r->uri and many other strings was removed.
When the request line contains request-target in the absolute-URI form,
it can contain path-empty instead of a single slash (see RFC 7230, RFC 3986).
Previously, the ngx_http_parse_request_line() function only accepted empty
path when there was no query string.
With this change, non-empty query is also correctly handled. That is,
request line "GET http://example.com?foo HTTP/1.1" is accepted and results
in $uri "/" and $args "foo".
Note that $request_uri remains "?foo", similarly to how spaces in URIs
are handled. Providing "/?foo", similarly to how "/" is provided for
"GET http://example.com HTTP/1.1", requires allocation.
Previously, when processing client ACK, rtt could be calculated for a packet
different than the largest if it was missing in the sent chain. Even though
this is an unlikely situation, rtt based on a different packet could be larger
than needed leading to bigger pto timeout and performance degradation.
Previously, this only worked for Application level because before
quic-transport-30, there were the following constraints:
Because the receiver doesn't use the ACK Delay for Initial and Handshake
packets, a sender SHOULD send a value of 0.
When adjusting an RTT sample using peer-reported acknowledgement delays, an
endpoint ... MUST ignore the ACK Delay field of the ACK frame for packets
sent in the Initial and Handshake packet number space.
Now initial output packet is not padded anymore if followed by a handshake
packet. If the datagram is still not big enough to satisfy minimum size
requirements, handshake packet is padded.
The patch replaces c->send() occurences with c->send_chain(), because the
latter accounts for the local address, which may be different if the wildcard
listener is used.
Previously, server sent response to client using address different from
one client connected to.
Notably, this fixes an issue with Chrome that can emit a "certificate_unknown"
alert during the SSL handshake where c->ssl->no_wait_shutdown is not yet set.
The filter is responsible for creating HTTP/3 response header and body.
The change removes differences to the default branch for
ngx_http_chunked_filter_module and ngx_http_header_filter_module.
Instead, appropriate format specifier for hexadecimal is used
in ngx_log_debug().
The STREAM frame "data" debug is moved into ngx_quic_log_frame(), similar
to all other frame fields debug.
Previously, the number of next_upstream tries included servers marked
as "down", resulting in "no live upstreams" with the code 502 instead
of the code derived from an attempt to connect to the last tried "up"
server (ticket #2096).
Similarly to the problem fixed in 2096b21fcd10 (ticket #1792),
when a "trailer only" gRPC response (that is, a response with the
END_STREAM flag in the HEADERS frame) was immediately followed by
RST_STREAM(NO_ERROR) in the data preread along with the response
header, RST_STREAM wasn't properly skipped and caused "upstream
rejected request with error 0" errors.
Observed with "unknown service" gRPC errors returned by grpc-go.
Fix is to set ctx->done if we are going to parse additional data,
so the RST_STREAM(NO_ERROR) is properly skipped. Additionally, now
ngx_http_grpc_filter() will complain about frames sent for closed
stream if there are any.
When installing or running from a non-root user it is sometimes required to
override default, compiled in error log path. There was no way to do this
without rebuilding the binary (ticket #147).
This patch introduced "-e" command line option which allows one to override
compiled in error log path.
Header value returned from the HTTP parser is expected to be null-terminated or
have a spare byte after the value bytes. When an empty header value was passed
by client in a literal header representation, neither was true. This could
result in segfault. The fix is to assign a literal empty null-terminated
string in this case.
Thanks to Andrey Kolyshkin.
The largely duplicate type-specific functions ngx_quic_parse_initial_header(),
ngx_quic_parse_handshake_header(), and a missing one for 0-RTT, were merged.
The new order of functions listed in ngx_event_quic_transport.c reflects this.
|_ ngx_quic_parse_long_header - version-invariant long header fields
\_ ngx_quic_supported_version - a helper to decide we can go further
\_ ngx_quic_parse_long_header_v1 - QUICv1-specific long header fields
0-RTT packets previously appeared as Handshake are now logged as appropriate:
*1 quic packet rx long flags:db version:ff00001d
*1 quic packet rx early len:870
Logging SCID/DCID is no longer duplicated as were seen with Initial packets.
If trailers were missing and a chain carrying the last_buf flag had no data
in it, then last HTTP/1 chunk was broken. The problem was introduced while
implementing HTTP/3 response body generation.
The change fixes the issue and reduces diff to the mainline nginx.
Previously, if quic_stateless_reset_token_key was empty or unspecified,
initial stateless reset token was not generated. However subsequent tokens
were generated with empty key, which resulted in error with certain SSL
libraries, for example OpenSSL.
Now a random 32-byte stateless reset token key is generated if none is
specified in the configuration. As a result, stateless reset tokens are now
generated for all server ids.
Previously new dcid was generated in the same memory that was allocated for
qc->dcid when creating the QUIC connection. However this memory was also
referenced by initial_source_connection_id and retry_source_connection_id
transport parameters. As a result these parameters changed their values after
retry which broke the protocol.
Before introduction of request body filter in 42d9beeb22db, the only
possible return code from the ngx_http_request_body_filter() call
without actual buffers was NGX_HTTP_INTERNAL_SERVER_ERROR, and
the code in ngx_http_read_client_request_body() hardcoded the only
possible error to simplify the code of initial call to set rb->rest.
This is no longer true after introduction of request body filters though,
as a request body filter might need to return other errors, such as 403.
Fix is to preserve the error code actually returned by the call
instead of assuming 500.
Added logging before returning NGX_HTTP_INTERNAL_SERVER_ERROR if there
are busy buffers after a request body flush. This should never happen
with current code, though bugs can be introduced by 3rd party modules.
Make sure debugging will be easy enough.
A negotiated version is decoupled from NGX_QUIC_VERSION and, if supported,
now stored in c->quic->version after packets processing. It is then used
to create long header packets. Otherwise, the list of supported versions
(which may be many now) is sent in the Version Negotiation packet.
All packets in the connection are expected to have the same version.
Incoming packets with mismatched version are now rejected.
When doing lingering close, the socket was first shut down for writing,
so SSL shutdown initiated after lingering close was not able to send
the close_notify alerts (ticket #2056).
The fix is to call ngx_ssl_shutdown() before shutting down the socket.
A number of unsigned variables has a special value, usually -1 or some maximum,
which produces huge numeric value in logs and makes them hard to read.
In order to distinguish such values in log, they are casted to the signed type
and printed as literal '-1'.
The client address validation didn't complete with a valid token,
which was broken after packet processing refactoring in d0d3fc0697a0.
An invalid or expired token was treated as a connection error.
Now we proceed as outlined in draft-ietf-quic-transport-32,
section 8.1.3 "Address Validation for Future Connections" below,
which is unlike validating the client address using Retry packets.
When a server receives an Initial packet with an address validation
token, it MUST attempt to validate the token, unless it has already
completed address validation. If the token is invalid then the
server SHOULD proceed as if the client did not have a validated
address, including potentially sending a Retry.
The connection is now closed in this case on internal errors only.
All key handling functionality is moved into ngx_quic_protection.c.
Public structures from ngx_quic_protection.h are now private and new
methods are available to manipulate keys.
A negotiated cipher is cached in QUIC connection from the set secret callback
to avoid calling SSL_get_current_cipher() on each encrypt/decrypt operation.
This also reduces the number of unwanted c->ssl->connection occurrences.
Now "s", "V", and "v" format specifiers may be prefixed with "x" (lowercase)
or "X" (uppercase) to output corresponding data in hexadecimal format.
In collaboration with Maxim Dounin.
Previously, tx ACK frames held ranges in an array of ngx_quic_ack_range_t,
while rx ACK frames held ranges in the serialized format. Now serialized format
is used for both types of frames.
The patch resets ctx->frames queue, which may contain frames. It was possible
that congestion or amplification limits prevented all frames to be sent.
Retransmitted frames could be accounted twice as inflight: first time in
ngx_quic_congestion_lost() called from ngx_quic_resend_frames(), and later
from ngx_quic_discard_ctx().
This accounts for the following change:
* Require expansion of datagrams to ensure that a path supports at
least 1200 bytes:
- During the handshake ack-eliciting Initial packets from the
server need to be expanded
All values are prefixed with name and separated from it using colon.
Multiple values are listed without commas in between.
Rationale: this greatly simplifies log parsing for analysis.
For application level packets, only every second packet is now acknowledged,
respecting max ack delay.
13.2.1 Sending ACK Frames
In order to assist loss detection at the sender, an endpoint SHOULD
generate and send an ACK frame without delay when it receives an ack-
eliciting packet either:
* when the received packet has a packet number less than another
ack-eliciting packet that has been received, or
* when the packet has a packet number larger than the highest-
numbered ack-eliciting packet that has been received and there are
missing packets between that packet and this packet.
13.2.2. Acknowledgement Frequency
A receiver SHOULD send an ACK frame after receiving at least two
ack-eliciting packets.
In some cases it might be needed to reject SSL handshake based on SNI
server name provided, for example, to make sure an invalid certificate
is not returned to clients trying to contact a name-based virtual server
without SSL configured. Previously, a "ssl_ciphers aNULL;" was used for
this. This workaround, however, is not compatible with TLSv1.3, in
particular, when using BoringSSL, where it is not possible to configure
TLSv1.3 ciphers at all.
With this change, the ssl_reject_handshake directive is introduced,
which instructs nginx to reject SSL handshakes with an "unrecognized_name"
alert in a particular server block.
For example, to reject handshake with names other than example.com,
one can use the following configuration:
server {
listen 443 ssl;
ssl_reject_handshake on;
}
server {
listen 443 ssl;
server_name example.com;
ssl_certificate example.com.crt;
ssl_certificate_key example.com.key;
}
The following configuration can be used to reject all SSL handshakes
without SNI server name provided:
server {
listen 443 ssl;
ssl_reject_handshake on;
}
server {
listen 443 ssl;
server_name ~^;
ssl_certificate example.crt;
ssl_certificate_key example.key;
}
Additionally, the ssl_reject_handshake directive makes configuring
certificates for the default server block optional. If no certificates
are configured in the default server for a given listening socket,
certificates must be defined in all non-default server blocks with
the listening socket in question.
Similarly to ssl_conf_command, proxy_ssl_conf_command can be used to
set arbitrary OpenSSL configuration parameters as long as nginx is
compiled with OpenSSL 1.0.2 or later, when connecting to upstream
servers with SSL. Full list of available configuration commands
can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
Similarly to ssl_conf_command, proxy_ssl_conf_command (grpc_ssl_conf_command,
uwsgi_ssl_conf_command) can be used to set arbitrary OpenSSL configuration
parameters as long as nginx is compiled with OpenSSL 1.0.2 or later,
when connecting to upstream servers with SSL. Full list of available
configuration commands can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
With the ssl_conf_command directive it is now possible to set
arbitrary OpenSSL configuration parameters as long as nginx is compiled
with OpenSSL 1.0.2 or later. Full list of available configuration
commands can be found in the SSL_CONF_cmd manual page
(https://www.openssl.org/docs/man1.1.1/man3/SSL_CONF_cmd.html).
In particular, this allows configuring PrioritizeChaCha option
(ticket #1445):
ssl_conf_command Options PrioritizeChaCha;
It can be also used to configure TLSv1.3 ciphers in OpenSSL,
which fails to configure them via the SSL_CTX_set_cipher_list()
interface (ticket #1529):
ssl_conf_command Ciphersuites TLS_CHACHA20_POLY1305_SHA256;
Configuration commands are applied after nginx own configuration
for SSL, so they can be used to override anything set by nginx.
Note though that configuring OpenSSL directly with ssl_conf_command
might result in a behaviour nginx does not expect, and should be
done with care.
With this change, it is now possible to use ngx_conf_merge_ptr_value()
to merge keyval arrays. This change actually follows much earlier
changes in ngx_conf_merge_ptr_value() and ngx_conf_set_str_array_slot()
in 1452:cd586e963db0 (0.6.10) and 1701:40d004d95d88 (0.6.22).
To preserve compatibility with existing 3rd party modules, both NULL
and NGX_CONF_UNSET_PTR are accepted for now.
13.2.4. Limiting Ranges by Tracking ACK Frames
When a packet containing an ACK frame is sent, the largest
acknowledged in that frame may be saved. When a packet containing an
ACK frame is acknowledged, the receiver can stop acknowledging
packets less than or equal to the largest acknowledged in the sent
ACK frame.
The history of acknowledged packet is kept in send context as ranges.
Up to NGX_QUIC_MAX_RANGES ranges is stored.
As a result, instead of separate ack frames, single frame with ranges
is sent.
Per draft-ietf-quic-transport-32 on the topic:
: Similarly, a server MUST expand the payload of all UDP datagrams carrying
: ack-eliciting Initial packets to at least the smallest allowed maximum
: datagram size of 1200 bytes.
The history of acknowledged packet is kept in send context as ranges.
Up to NGX_QUIC_MAX_RANGES ranges is stored.
As a result, instead of separate ack frames, single frame with ranges
is sent.
Previously, if there were multiple limits configured, errors in
ngx_http_complex_value() during processing of a non-first limit
resulted in reference count leak in shared memory nodes of already
processed limits. Fix is to explicity unlock relevant nodes, much
like we do when rejecting requests.
The proxy_smtp_auth directive instructs nginx to authenticate users
on backend via the AUTH command (using the PLAIN SASL mechanism),
similar to what is normally done for IMAP and POP3.
If xclient is enabled along with proxy_smtp_auth, the XCLIENT command
won't try to send the LOGIN parameter.
In 7717:e3e8b8234f05, the 1st bit was incorrectly used. It shouldn't
be used for bitmask values, as it is used by NGX_CONF_BITMASK_SET.
Additionally, special value "off" added to make it possible to clear
inherited userid_flags value.
The "false" parameter of the proxy_redirect directive is deprecated.
Warning has been emitted since c2230102df6f (0.7.54).
The "off" parameter of the proxy_redirect, proxy_cookie_domain, and
proxy_cookie_path directives tells nginx not to inherit the
configuration from the previous configuration level.
Previously, after specifying the directive with the "off" parameter,
any other directives were ignored, and syntax checking was disabled.
The syntax was enforced to allow either one directive with the "off"
parameter, or several directives with other parameters.
Also, specifying "proxy_redirect default foo" no longer works like
"proxy_redirect default".
Previously, this field was not set while creating a QUIC stream connection.
As a result, calling ngx_connection_local_sockaddr() led to getsockname()
bad descriptor error.
If client acknowledged an Initial packet with CRYPTO frame and then
sent another Initial packet containing duplicate CRYPTO again, this
could result in resending frames off the empty send queue.
The new "quic_stateless_reset_token_key" directive is added. It sets the
endpoint key used to generate stateless reset tokens and enables feature.
If the endpoint receives short-header packet that can't be matched to
existing connection, a stateless reset packet is generated with
a proper token.
If a valid stateless reset token is found in the incoming packet,
the connection is closed.
Example configuration:
http {
quic_stateless_reset_token_key "foo";
...
}
The flag is tied to the initial secret creation. The presence of c->quic
pointer is sufficient to enable execution of ngx_quic_close_quic().
The ngx_quic_new_connection() function now returns the allocated quic
connection object and the c->quic pointer is set by the caller.
If an early error occurs before secrets initialization (i.e. in cases
of invalid retry token or nginx exiting), it is still possible to
generate an error response by trying to initialize secrets directly
in the ngx_quic_send_cc() function.
Before the change such early errors failed to send proper connection close
message and logged an error.
An auxilliary ngx_quic_init_secrets() function is introduced to avoid
verbose call to ngx_quic_set_initial_secret() requiring local variable.
All packet header parsing is now performed by ngx_quic_parse_packet()
function, located in the ngx_quic_transport.c file.
The packet processing is centralized in the ngx_quic_process_packet()
function which decides if the packet should be accepted, ignored or
connection should be closed, depending on the connection state.
As a result of refactoring, behavior has changed in some places:
- minimal size of Initial packet is now always tested
- connection IDs are always tested in existing connections
- old keys are discarded on encryption level switch
Now flags are processed in ngx_quic_input(), and raw->pos points to the first
byte after the flags. Redundant checks from ngx_quic_parse_short_header() and
ngx_quic_parse_long_header() are removed.
Previously, when a packet was declared lost, another packet was sent with the
same frames. Now lost frames are moved to the output frame queue and push
event is posted. This has the advantage of forming packets with more frames
than before.
Also, the start argument is removed from the ngx_quic_resend_frames()
function as excess information.
Previously the default server configuration context was used until the
:authority or host header was parsed. This led to using the configuration
parameters like client_header_buffer_size or request_pool_size from the default
server rather than from the server selected by SNI.
Also, the switch to the right server log is implemented. This issue manifested
itself as QUIC stream being logged to the default server log until :authority
or host is parsed.
Initially, client certificate verification didn't work due to the missing
hc->ssl on a QUIC stream, which is started to be set in 7738:7f0981be07c4.
Then it was lost in 7999:0d2b2664b41c introducing "quic" listen parameter.
This change re-adds hc->ssl back for all QUIC connections, similar to SSL.
As per HTTP/3 draft 30, section 7.2.8:
Frame types that were used in HTTP/2 where there is no corresponding
HTTP/3 frame have also been reserved (Section 11.2.1). These frame
types MUST NOT be sent, and their receipt MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED.
In rare cases, such as memory allocation failure, SSL_set_SSL_CTX() returns
NULL, which could mean that a different SSL configuration has not been set.
Note that this new behaviour seemingly originated in OpenSSL-1.1.0 release.
HTTP/2 code failed to run posted requests after calling the request body
handler, and this resulted in connection hang if a subrequest was created
in the body handler and no other actions were made.
If 400 errors were redirected to an upstream server using the error_page
directive, DATA frames from the client might cause segmentation fault
due to null pointer dereference. The bug had appeared in 6989:2c4dbcd6f2e4
(1.13.0).
Fix is to skip such frames in ngx_http_v2_state_read_data() (similarly
to 7561:9f1f9d6e056a). With the fix, behaviour of 400 errors in HTTP/2
is now similar to one in HTTP/1.x, that is, nginx doesn't try to read the
request body.
Note that proxying 400 errors, as well as other early stage errors, to
upstream servers might not be a good idea anyway. These errors imply
that reading and processing of the request (and the request headers)
wasn't complete, and proxying of such incomplete request might lead to
various errors.
Reported by Chenglong Zhang.
This fixes "SSL_shutdown() failed (SSL: ... bad write retry)" errors
as observed on the second SSL_shutdown() call after SSL shutdown fixes in
09fb2135a589 (1.19.2), notably when HTTP/2 connections are closed due
to read timeouts while there are incomplete writes.
This fixes "SSL_shutdown() failed (SSL: ... bad write retry)" errors
as observed on the second SSL_shutdown() call after SSL shutdown fixes in
09fb2135a589 (1.19.2), notably when sending fails in ngx_http_test_expect(),
similarly to ticket #1194.
Note that there are some places where c->error is misused to prevent
further output, such as ngx_http_v2_finalize_connection() if there
are pending streams, or in filter finalization. These places seem
to be extreme enough to don't care about missing shutdown though.
For example, filter finalization currently prevents keepalive from
being used.
The c->read->ready and c->write->ready flags need to be cleared to ensure
that appropriate read or write events will be reported by kernel. Without
this, SSL shutdown might wait till the timeout after blocking on writing
or reading even if there is a socket activity.
OpenSSL 1.1.1 fails to return SSL_ERROR_SYSCALL if an error happens
during SSL_write() after close_notify alert from the peer, and returns
SSL_ERROR_ZERO_RETURN instead. Broken by this commit, which removes
the "i == 0" check around the SSL_RECEIVED_SHUTDOWN one:
https://git.openssl.org/?p=openssl.git;a=commitdiff;h=8051ab2
In particular, if a client closed the connection without reading
the response but with properly sent close_notify alert, this resulted in
unexpected "SSL_write() failed while ..." critical log message instead
of correct "SSL_write() failed (32: Broken pipe)" at the info level.
Since SSL_ERROR_ZERO_RETURN cannot be legitimately returned after
SSL_write(), the fix is to convert all SSL_ERROR_ZERO_RETURN errors
after SSL_write() to SSL_ERROR_SYSCALL.
If the variant hash doesn't match one we used as a secondary cache key,
we switch back to the original key. In this case, c->body_start was kept
updated from an existing cache node overwriting the new response value.
After file cache update, it led to discrepancy between a cache node and
cache file seen as critical errors "file cache .. has too long header".
As per HTTP/3 draft 29, section 4.1:
Frames of unknown types (Section 9), including reserved frames
(Section 7.2.8) MAY be sent on a request or push stream before,
after, or interleaved with other frames described in this section.
Also, trailers frame is now used as an indication of the request body end.
While for HTTP/1 unexpected eof always means an error, for HTTP/3 an eof right
after a DATA frame end means the end of the request body. For this reason,
since adding HTTP/3 support, eof no longer produced an error right after recv()
but was passed to filters which would make a decision. This decision was made
in ngx_http_parse_chunked() and ngx_http_v3_parse_request_body() based on the
b->last_buf flag.
Now that since 0f7f1a509113 (1.19.2) rb->chunked->length is a lower threshold
for the expected number of bytes, it can be set to zero to indicate that more
bytes may or may not follow. Now it's possible to move the check for eof from
parser functions to ngx_http_request_body_chunked_filter() and clean up the
parsing code.
Also, in the default branch, in case of eof, the following three things
happened, which were replaced with returning NGX_ERROR while implementing
HTTP/3:
- "client prematurely closed connection" message was logged
- c->error flag was set
- NGX_HTTP_BAD_REQUEST was returned
The change brings back this behavior for HTTP/1 as well as HTTP/3.
If a packet sent in response to an initial client packet was lost, then
successive client initial packets were dropped by nginx with the unexpected
dcid message logged. This was because the new DCID generated by the server was
not available to the client.
The check tested the total size of a packet header and unprotected packet
payload, which doesn't include the packet number length and expansion of
the packet protection AEAD. If the packet was corrupted, it could cause
false triggering of the condition due to unsigned type underflow leading
to a connection error.
Existing checks for the QUIC header and protected packet payload lengths
should be enough.
From quic-tls draft, section 5.4.2:
An endpoint MUST discard packets that are not long enough to contain
a complete sample.
The check includes the Packet Number field assumed to be 4 bytes long.
During long packet header parsing, pkt->len is updated with the Length
field value that is used to find next coalesced packets in a datagram.
For short packets it still contained the whole QUIC packet size.
This change uniforms packet length handling to always contain the total
length of the packet number and protected packet payload in pkt->len.
Previously STOP_SENDING was sent to client upon stream closure if rev->eof and
rev->error were not set. This was an indirect indication that no RESET_STREAM
or STREAM fin has arrived. But it is indeed possible that rev->eof is not set,
but STREAM fin has already been received, just not read out by the application.
In this case sending STOP_SENDING does not make sense and can be misleading for
some clients.
The peer may issue additional connection IDs up to the limit defined by
transport parameter "active_connection_id_limit", using NEW_CONNECTION_ID
frames, and retire such IDs using RETIRE_CONNECTION_ID frame.
It is required to distinguish internal errors from corrupted packets and
perform actions accordingly: drop the packet or close the connection.
While there, made processing of ngx_quic_decrypt() erorrs similar and
removed couple of protocol violation errors.
quic-transport
5.2:
Packets that are matched to an existing connection are discarded if
the packets are inconsistent with the state of that connection.
5.2.2:
Servers MUST drop incoming packets under all other circumstances.
The removal of QUIC packet protection depends on the largest packet number
received. When a garbage packet was received, the decoder still updated the
largest packet number from that packet. This could affect removing protection
from subsequent QUIC packets.
As per HTTP/3 draft 29, section 4.1:
When the server does not need to receive the remainder of the request,
it MAY abort reading the request stream, send a complete response, and
cleanly close the sending part of the stream.
On QUIC connections, SSL_shutdown() is used to call the send_alert callback
to send a CONNECTION_CLOSE frame. The reverse side is handled by other means.
At least BoringSSL doesn't differentiate whether this is a QUIC SSL method,
so waiting for the peer's close_notify alert should be explicitly disabled.
The logical quic connection state is tested by handler functions that
process corresponding types of packets (initial/handshake/application).
The packet is declined if state is incorrect.
No timeout is required for the input queue.
If a client attemtps to start a new connection with unsupported version,
a version negotiation packet is sent that contains a list of supported
versions (currently this is a single version, selected at compile time).
The function ngx_http_upstream_check_broken_connection() terminates the HTTP/1
request if client sends eof. For QUIC (including HTTP/3) the c->write->error
flag is now checked instead. This flag is set when the entire QUIC connection
is closed or STOP_SENDING was received from client.
Previously the request body DATA frame header was read by one byte because
filters were called only when the requested number of bytes were read. Now,
after 08ff2e10ae92 (1.19.2), filters are called after each read. More bytes
can be read at once, which simplifies and optimizes the code.
This also reduces diff with the default branch.
Previously, such packets weren't handled as the resulting zero remaining time
prevented setting the loss detection timer, which, instead, could be disarmed.
For implementation details, see quic-recovery draft 29, appendix A.10.
The PTO handler is split into separate PTO and loss detection handlers
that operate interchangeably depending on which timer should be set.
The present ngx_quic_lost_handler is now only used for packet loss detection.
It replaces ngx_quic_pto_handler if there are packets preceeding largest_ack.
Once there is no more such packets, ngx_quic_pto_handler is installed again.
Probes carry unacknowledged data previously sent in the oldest packet number,
one per each packet number space. That is, it could be up to two probes.
PTO backoff is now increased before scheduling next probes.
In particular, this prevents declaring packet number 0 as lost if
there aren't yet any acknowledgements in this packet number space.
For example, only Initial packets were acknowledged in handshake.
Previously a single STREAM frame was created for each buffer in stream output
chain which is wasteful with respect to memory. The following changes were
made in the stream send code:
- ngx_quic_stream_send_chain() no longer calls ngx_quic_stream_send() and got
a separate implementation that coalesces neighbouring buffers into a single
frame
- the new ngx_quic_stream_send_chain() respects the limit argument, which fixes
sendfile_max_chunk and limit_rate
- ngx_quic_stream_send() is reimplemented to call ngx_quic_stream_send_chain()
- stream frame size limit is moved out to a separate function
ngx_quic_max_stream_frame()
- flow control is moved out to a separate function ngx_quic_max_stream_flow()
- ngx_quic_stream_send_chain() is relocated next to ngx_quic_stream_send()
Reworked connections reuse, so closing connections is attempted in
advance, as long as number of free connections is less than 1/16 of
worker connections configured. This ensures that new connections can
be handled even if closing a reusable connection requires some time,
for example, for a lingering close (ticket #2017).
The 1/16 ratio is selected to be smaller than 1/8 used for disabling
accept when working with accept mutex, so nginx will try to balance
new connections to different workers first, and will start reusing
connections only if this won't help.
Previously, reusing connections happened silently and was only
visible in monitoring systems. This was shown to be not very user-friendly,
and administrators often didn't realize there were too few connections
available to withstand the load, and configured timeouts (keepalive_timeout
and http2_idle_timeout) were effectively reduced to keep things running.
To provide at least some information about this, a warning is now logged
(at most once per second, to avoid flooding the logs).
Sending shutdown when ngx_http_test_reading() detects the connection is
closed can result in "SSL_shutdown() failed (SSL: ... bad write retry)"
critical log messages if there are blocked writes.
Fix is to avoid sending shutdown via the c->ssl->no_send_shutdown flag,
similarly to how it is done in ngx_http_keepalive_handler() for kqueue
when pending EOF is detected.
Reported by Jan Prachař
(http://mailman.nginx.org/pipermail/nginx-devel/2018-December/011702.html).
Without the flag, SSL shutdown is attempted on such connections,
resulting in useless work and/or bogus "SSL_shutdown() failed
(SSL: ... bad write retry)" critical log messages if there are
blocked writes.
Previously, bidirectional shutdown never worked, due to two issues
in the code:
1. The code only tested SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE
when there was an error in the error queue, which cannot happen.
The bug was introduced in an attempt to fix unexpected error logging
as reported with OpenSSL 0.9.8g
(http://mailman.nginx.org/pipermail/nginx/2008-January/003084.html).
2. The code never called SSL_shutdown() for the second time to wait for
the peer's close_notify alert.
This change fixes both issues.
Note that after this change bidirectional shutdown is expected to work for
the first time, so c->ssl->no_wait_shutdown now makes a difference. This
is not a problem for HTTP code which always uses c->ssl->no_wait_shutdown,
but might be a problem for stream and mail code, as well as 3rd party
modules.
To minimize the effect of the change, the timeout, which was used to be 30
seconds and not configurable, though never actually used, is now set to
3 seconds. It is also expanded to apply to both SSL_ERROR_WANT_READ and
SSL_ERROR_WANT_WRITE, so timeout is properly set if writing to the socket
buffer is not possible.
If some additional data from a pipelined request happens to be
read into the body buffer, we copy it to r->header_in or allocate
an additional large client header buffer for it.
This ensures that copying won't write more than the buffer size
even if the buffer comes from hc->free and it is smaller than the large
client header buffer size in the virtual host configuration. This might
happen if size of large client header buffers is different in name-based
virtual hosts, similarly to the problem with number of buffers fixed
in 6926:e662cbf1b932.
Creating client-initiated streams is moved from ngx_quic_handle_stream_frame()
to a separate function ngx_quic_create_client_stream(). This function is
responsible for creating streams with lower ids as well.
Also, simplified and fixed initial data buffering in
ngx_quic_handle_stream_frame(). It is now done before calling the initial
handler as the handler can destroy the stream.
Previously this function generated an error trying to figure out if client shut
down the write end of the connection. The reason for this error was that a
QUIC stream has no socket descriptor. However checking for eof is not the
right thing to do for an HTTP/3 QUIC stream since HTTP/3 clients are expected
to shut down the write end of the stream after sending the request.
Now the function handles QUIC streams separately. It checks if c->read->error
is set. The error flags for c->read and c->write are now set for all streams
when closing the QUIC connection instead of setting the pending_eof flag.
According to quic-transport draft 29, section 19.3.1:
The value of the Gap field establishes the largest packet number
value for the subsequent ACK Range using the following formula:
largest = previous_smallest - gap - 2
Thus, given a largest packet number for the range, the smallest value
is determined by the formula:
smallest = largest - ack_range
While here, changed min/max to uint64_t for consistency.
A QUIC stream could be destroyed by handler while in ngx_quic_stream_input().
To detect this, ngx_quic_find_stream() is used to check that it still exists.
Previously, a stream id was passed to this routine off the frame structure.
In case of stream cleanup, it is freed along with other frames belonging to
the stream on cleanup. Then, a cleanup handler reuses last frames to update
MAX_STREAMS and serve other purpose. Thus, ngx_quic_find_stream() is passed
a reused frame with zeroed out part pointed by stream_id. If a stream with
id 0x0 still exists, this leads to use-after-free.
After 05e42236e95b (1.19.1) responses with extra data might result in
zero size buffers being generated and "zero size buf" alerts in writer
(if f->rest happened to be 0 when processing additional stdout data).
The limits on active bidi and uni client streams are maintained at their
initial values initial_max_streams_bidi and initial_max_streams_uni by sending
a MAX_STREAMS frame upon each client stream closure.
Also, the following is changed for data arriving to non-existing streams:
- if a stream was already closed, such data is ignored
- when creating a new stream, all streams of the same type with lower ids are
created too
Previously, the document generated by the xslt filter was always fully sent
to client even if a range was requested and response status was 206 with
appropriate Content-Range.
The xslt module is unable to serve a range because of suspending the header
filter chain. By the moment full response xml is buffered by the xslt filter,
range header filter is not called yet, but the range body filter has already
been called and did nothing.
The fix is to disable ranges by resetting the r->allow_ranges flag much like
the image filter that employs a similar technique.
The ngx_http_perl_module module doesn't have a notion of including additional
search paths through --with-cc-opt, which results in compile error incomplete
type 'enum ssl_encryption_level_t' when building nginx without QUIC support.
The enum is visible from quic event headers and eventually pollutes ngx_core.h.
The fix is to limit including headers to compile units that are real consumers.
According to quic-transport draft 29, section 21.12.1.1:
Prior to validation, endpoints are limited in what they are able to
send. During the handshake, a server cannot send more than three
times the data it receives; clients that initiate new connections or
migrate to a new network path are limited.
The ngx_quic_queue_frame() functions puts a frame into send queue and
schedules a push timer to actually send data.
The patch adds tracking for data amount in the queue and sends data
immediately if amount of data exceeds limit.
Instead of timer-based retransmissions with constant packet lifetime,
this patch implements ack-based loss detection and probe timeout
for the cases, when no ack is received, according to the quic-recovery
draft 29.
The c->quic->retransmit timer is now called "pto".
The ngx_quic_retransmit() function is renamed to "ngx_quic_detect_lost()".
This is a preparation for the following patches.
According to the quic-recovery 29, Section 5: Estimating the Round-Trip Time.
Currently, integer arithmetics is used, which loses sub-millisecond accuracy.
The slice filter allows ranges for the response by setting the r->allow_ranges
flag, which enables the range filter. If the range was not requested, the
range filter adds an Accept-Ranges header to the response to signal the
support for ranges.
Previously, if an Accept-Ranges header was already present in the first slice
response, client received two copies of this header. Now, the slice filter
removes the Accept-Ranges header from the response prior to setting the
r->allow_ranges flag.
As long as the "Content-Length" header is given, we now make sure
it exactly matches the size of the response. If it doesn't,
the response is considered malformed and must not be forwarded
(https://tools.ietf.org/html/rfc7540#section-8.1.2.6). While it
is not really possible to "not forward" the response which is already
being forwarded, we generate an error instead, which is the closest
equivalent.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Also this
directly contradicts HTTP/2 specification requirements.
Note that the new behaviour for the gRPC proxy is more strict than that
applied in other variants of proxying. This is intentional, as HTTP/2
specification requires us to do so, while in other types of proxying
malformed responses from backends are well known and historically
tolerated.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
Additionally, we now also issue a warning if the response is too
short, and make sure the fact it is truncated is propagated to the
client. The u->error flag is introduced to make it possible to
propagate the error to the client in case of unbuffered proxying.
For responses to HEAD requests there is an exception: we do allow
both responses without body and responses with body matching the
Content-Length header.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
This change covers generic buffered and unbuffered filters as used
in the scgi and uwsgi modules. Appropriate input filter init
handlers are provided by the scgi and uwsgi modules to set corresponding
lengths.
Note that for responses to HEAD requests there is an exception:
we do allow any response length. This is because responses to HEAD
requests might be actual full responses, and it is up to nginx
to remove the response body. If caching is enabled, only full
responses matching the Content-Length header will be cached
(see b779728b180c).
Previously, additional data after final chunk was either ignored
(in the same buffer, or during unbuffered proxying) or sent to the
client (in the next buffer already if it was already read from the
socket). Now additional data are properly detected and ignored
in all cases. Additionally, a warning is now logged and keepalive
is disabled in the connection.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
If a memcached response was followed by a correct trailer, and then
the NUL character followed by some extra data - this was accepted by
the trailer checking code. This in turn resulted in ctx->rest underflow
and caused negative size buffer on the next reading from the upstream,
followed by the "negative size buf in writer" alert.
Fix is to always check for too long responses, so a correct trailer cannot
be followed by extra data.
After sending the GOAWAY frame, a connection is now closed using
the lingering close mechanism.
This allows for the reliable delivery of the GOAWAY frames, while
also fixing connection resets observed when http2_max_requests is
reached (ticket #1250), or with graceful shutdown (ticket #1544),
when some additional data from the client is received on a fully
closed connection.
For HTTP/2, the settings lingering_close, lingering_timeout, and
lingering_time are taken from the "server" level.
Previously, the expression (ch & 0x7f) was promoted to a signed integer.
Depending on the platform, the size of this integer could be less than 8 bytes,
leading to overflow when handling the higher bits of the result. Also, sign
bit of this integer could be replicated when adding to the 64-bit st->value.
Previously errors led only to closing streams.
To simplify closing QUIC connection from a QUIC stream context, new macro
ngx_http_v3_finalize_connection() is introduced. It calls
ngx_quic_finalize_connection() for the parent connection.
The function finalizes QUIC connection with an application protocol error
code and sends a CONNECTION_CLOSE frame with type=0x1d.
Also, renamed NGX_QUIC_FT_CONNECTION_CLOSE2 to NGX_QUIC_FT_CONNECTION_CLOSE_APP.
Previously dynamic table was not functional because of zero limit on its size
set by default. Now the following changes enable it:
- new directives to set SETTINGS_QPACK_MAX_TABLE_CAPACITY and
SETTINGS_QPACK_BLOCKED_STREAMS
- send settings with SETTINGS_QPACK_MAX_TABLE_CAPACITY and
SETTINGS_QPACK_BLOCKED_STREAMS to the client
- send Insert Count Increment to the client
- send Header Acknowledgement to the client
- evict old dynamic table entries on overflow
- decode Required Insert Count from client
- block stream if Required Insert Count is not reached
Using SSL_CTX_set_verify(SSL_VERIFY_PEER) implies that OpenSSL will
send a certificate request during an SSL handshake, leading to unexpected
certificate requests from browsers as long as there are any client
certificates installed. Given that ngx_ssl_trusted_certificate()
is called unconditionally by the ngx_http_ssl_module, this affected
all HTTPS servers. Broken by 699f6e55bbb4 (not released yet).
Fix is to set verify callback in the ngx_ssl_trusted_certificate() function
without changing the verify mode.
Client streams may send literal strings which are now limited in size by the
new directive. The default value is 4096.
The directive is similar to HTTP/2 directive http2_max_field_size.
So that connections are protected from failing from on-path attacks.
Decryption failure of long packets used during handshake still leads
to connection close since it barely makes sense to handle them there.
A previously used undefined error code is now replaced with the generic one.
Note that quic-transport prescribes keeping connection intact, discarding such
QUIC packets individually, in the sense that coalesced packets could be there.
This is selectively handled in the next change.
The patch removes remnants of the old state tracking mechanism, which did
not take into account assimetry of read/write states and was not very
useful.
The encryption state now is entirely tracked using SSL_quic_read/write_level().
quic-transport draft 29:
section 7:
* authenticated negotiation of an application protocol (TLS uses
ALPN [RFC7301] for this purpose)
...
Endpoints MUST explicitly negotiate an application protocol. This
avoids situations where there is a disagreement about the protocol
that is in use.
section 8.1:
When using ALPN, endpoints MUST immediately close a connection (see
Section 10.3 of [QUIC-TRANSPORT]) with a no_application_protocol TLS
alert (QUIC error code 0x178; see Section 4.10) if an application
protocol is not negotiated.
Changes in ngx_quic_close_quic() function are required to avoid attempts
to generated and send packets without proper keys, what happens in case
of failed ALPN check.
quic-transport draft 29, section 14:
QUIC depends upon a minimum IP packet size of at least 1280 bytes.
This is the IPv6 minimum size [RFC8200] and is also supported by most
modern IPv4 networks. Assuming the minimum IP header size, this
results in a QUIC maximum packet size of 1232 bytes for IPv6 and 1252
bytes for IPv4.
Since the packet size can change during connection lifetime, the
ngx_quic_max_udp_payload() function is introduced that currently
returns minimal allowed size, depending on address family.
quic-tls, 8.2:
The quic_transport_parameters extension is carried in the ClientHello
and the EncryptedExtensions messages during the handshake. Endpoints
MUST send the quic_transport_parameters extension; endpoints that
receive ClientHello or EncryptedExtensions messages without the
quic_transport_parameters extension MUST close the connection with an
error of type 0x16d (equivalent to a fatal TLS missing_extension
alert, see Section 4.10).
Clearing cache based on free space left on a file system is
expected to allow better disk utilization in some cases, notably
when disk space might be also used for something other than nginx
cache (including nginx own temporary files) and while loading
cache (when cache size might be inaccurate for a while, effectively
disabling max_size cache clearing).
Based on a patch by Adam Bambuch.
With XFS, using "allocsize=64m" mount option results in large preallocation
being reported in the st_blocks as returned by fstat() till the file is
closed. This in turn results in incorrect cache size calculations and
wrong clearing based on max_size.
To avoid too aggressive cache clearing on such volumes, st_blocks values
which result in sizes larger than st_size and eight blocks (an arbitrary
limit) are no longer trusted, and we use st_size instead.
The ngx_de_fs_size() counterpart is intentionally not modified, as
it is used on closed files and hence not affected by this problem.
NFS on Linux is known to report wsize as a block size (in both f_bsize
and f_frsize, both in statfs() and statvfs()). On the other hand,
typical file system block sizes on Linux (ext2/ext3/ext4, XFS) are limited
to pagesize. (With FAT, block sizes can be at least up to 512k in
extreme cases, but this doesn't really matter, see below.)
To avoid too aggressive cache clearing on NFS volumes on Linux, block
sizes larger than pagesize are now ignored.
Note that it is safe to ignore large block sizes. Since 3899:e7cd13b7f759
(1.0.1) cache size is calculated based on fstat() st_blocks, and rounding
to file system block size is preserved mostly for Windows.
Note well that on other OSes valid block sizes seen are at least up
to 65536. In particular, UFS on FreeBSD is known to work well with block
and fragment sizes set to 65536.
When validating second and further certificates, ssl callback could be called
twice to report the error. After the first call client connection is
terminated and its memory is released. Prior to the second call and in it
released connection memory is accessed.
Errors triggering this behavior:
- failure to create the request
- failure to start resolving OCSP responder name
- failure to start connecting to the OCSP responder
The fix is to rearrange the code to eliminate the second call.
The flush flag was not set when forwarding the request body to the uwsgi
server. When using uwsgi_pass suwsgi://..., this causes the uwsgi server
to wait indefinitely for the request body and eventually time out due to
SSL buffering.
This is essentially the same change as 4009:3183165283cc, which was made
to ngx_http_proxy_module.c.
This will fix the uwsgi bug https://github.com/unbit/uwsgi/issues/1490.
This is a temporary workaround, proper retransmission mechanism based on
quic-recovery rfc draft is yet to be implemented.
Currently hardcoded value is too small for real networks. The patch
sets static PTO, considering rtt of ~333ms, what gives about 1s.
This ensures that certificate verification is properly logged to debug
log during upstream server certificate verification. This should help
with debugging various certificate issues.
Listening UNIX sockets were not removed on graceful shutdown, preventing
the next runs. The fix is to replace the custom socket closing code in
ngx_master_process_cycle() by the ngx_close_listening_sockets() call.
When changing binary, sending a SIGTERM to the new binary's master process
should not remove inherited UNIX sockets unless the old binary's master
process has exited.
Also, if both are present, require that they have the same value. These
requirements are specified in HTTP/3 draft 28.
Current implementation of HTTP/2 treats ":authority" and "Host"
interchangeably. New checks only make sure at least one of these values is
present in the request. A similar check existed earlier and was limited only
to HTTP/1.1 in 38c0898b6df7.
The flags was originally added by 8f038068f4bc, and is propagated correctly
in the stream module. With QUIC introduction, http module now uses datagram
sockets as well, thus the fix.
Previously, invalid connection preface errors were only logged at debug
level, providing no visible feedback, in particular, when a plain text
HTTP/2 listening socket is erroneously used for HTTP/1.x connections.
Now these are explicitly logged at the info level, much like other
client-related errors.
When enabled, certificate status is stored in cache and is used to validate
the certificate in future requests.
New directive ssl_ocsp_cache is added to configure the cache.
OCSP validation for client certificates is enabled by the "ssl_ocsp" directive.
OCSP responder can be optionally specified by "ssl_ocsp_responder".
When session is reused, peer chain is not available for validation.
If the verified chain contains certificates from the peer chain not available
at the server, validation will fail.
Previously only the first responder address was used per each stapling update.
Now, in case of a network or parsing error, next address is used.
This also fixes the issue with unsupported responder address families
(ticket #1330).
Preserving pointers within the client buffer is not needed for HTTP/3 because
all data is either allocated from pool or static. Unlike with HTTP/1, data
typically cannot be referenced directly within the client buffer. Trying to
preserve NULLs or external pointers lead to broken pointers.
Also, reverted changes in ngx_http_alloc_large_header_buffer() not relevant
for HTTP/3 to minimize diff to mainstream.
New field r->parse_start is introduced to substitute r->request_start and
r->header_name_start for request length accounting. These fields only work for
this purpose in HTTP/1 because HTTP/1 request line and header line start with
these values.
Also, error logging is now fixed to output the right part of the request.
As per HTTP/3 draft 27, a request or response containing uppercase header
field names MUST be treated as malformed. Also, existing rules applied
when parsing HTTP/1 header names are also applied to HTTP/3 header names:
- null character is not allowed
- underscore character may or may not be treated as invalid depending on the
value of "underscores_in_headers"
- all non-alphanumeric characters with the exception of '-' are treated as
invalid
Also, the r->locase_header field is now filled while parsing an HTTP/3
header.
Error logging for invalid headers is fixed as well.
The first one parses pseudo-headers and is analagous to the request line
parser in HTTP/1. The second one parses regular headers and is analogous to
the header parser in HTTP/1.
Additionally, error handling of client passing malformed uri is now fixed.
The function ngx_http_parse_chunked() is also called from the proxy module to
parse the upstream response. It should always parse HTTP/1 body in this case.
According to quic-transport draft 28 section 10.3.1:
When sending CONNECTION_CLOSE, the goal is to ensure that the peer
will process the frame. Generally, this means sending the frame in a
packet with the highest level of packet protection to avoid the
packet being discarded. After the handshake is confirmed (see
Section 4.1.2 of [QUIC-TLS]), an endpoint MUST send any
CONNECTION_CLOSE frames in a 1-RTT packet. However, prior to
confirming the handshake, it is possible that more advanced packet
protection keys are not available to the peer, so another
CONNECTION_CLOSE frame MAY be sent in a packet that uses a lower
packet protection level.
There is no need in a separate type for the QUIC connection state.
The only state not found in the SSL library is NGX_QUIC_ST_UNAVAILABLE,
which is actually a flag used by the ngx_quic_close_quic() function
to prevent cleanup of uninitialized connection.
Sections 4.10.1 and 4.10.2 of quic transport describe discarding of initial
and handshake keys. Since the keys are discarded, we no longer need
to retransmit packets and corresponding queues should be emptied.
This patch removes previously added workaround that did not require
acknowledgement for initial packets, resulting in avoiding retransmission,
which is wrong because a packet could be lost and we have to retransmit it.
It was possible that retransmit timer was not set after the first
retransmission attempt, due to ngx_quic_retransmit() did not set
wait time properly, and the condition in retransmit handler was incorrect.
Section 17.2 and 17.3 of QUIC transport:
Fixed bit: Packets containing a zero value for this bit are not
valid packets in this version and MUST be discarded.
Reserved bit: An endpoint MUST treat receipt of a packet that has
a non-zero value for these bits, after removing both packet and
header protection, as a connection error of type PROTOCOL_VIOLATION.
When an error occurs, then c->quic->error field may be populated
with an appropriate error code, and the CONNECTION CLOSE frame will be
sent to the peer before the connection is closed. Otherwise, the error
treated as internal and INTERNAL_ERROR code is sent.
The pkt->error field is populated by functions processing packets to
indicate an error when it does not fit into pass/fail return status.
As per QUIC transport, the first flight of 0-RTT packets obviously uses same
Destination and Source Connection ID values as the client's first Initial.
The fix is to match 0-RTT against original DCID after it has been switched.
The ordered frame handler is always called for the existing stream, as it is
allocated from this stream. Instead of searching stream by id, pointer to the
stream node is passed.
The idea is to skip any zeroes that follow valid QUIC packet. Currently such
behavior can be only observed with Firefox which sends zero-padded initial
packets.
Now there's no need to annotate every frame in ACK-eliciting packet.
Sending ACK was moved to the first place, so that queueing ACK frame
no longer postponed up to the next packet after pushing STREAM frames.
+ added "quic" prefix to all error messages
+ rephrased some messages
+ removed excessive error logging from frame parser
+ added ngx_quic_check_peer() function to check proper source/destination
match and do it one place
- the ngx_quic_hexdump0() macro is renamed to ngx_quic_hexdump();
the original ngx_quic_hexdump() macro with variable argument is
removed, extra information is logged normally, with ngx_log_debug()
- all labels in hex dumps are prefixed with "quic"
- the hexdump format is simplified, length is moved forward to avoid
situations when the dump is truncated, and length is not shown
- ngx_quic_flush_flight() function contents is debug-only, placed under
NGX_DEBUG macro to avoid "unused variable" warnings from compiler
- frame names in labels are capitalized, similar to other places
+ all dumps are moved under one of the following macros (undefined by default):
NGX_QUIC_DEBUG_PACKETS
NGX_QUIC_DEBUG_FRAMES
NGX_QUIC_DEBUG_FRAMES_ALLOC
NGX_QUIC_DEBUG_CRYPTO
+ all QUIC debug messages got "quic " prefix
+ all input frames are reported as "quic frame in FOO_FRAME bar:1 baz:2"
+ all outgoing frames re reported as "quic frame out foo bar baz"
+ all stream operations are prefixed with id, like: "quic stream id 0x33 recv"
+ all transport parameters are prefixed with "quic tp"
(hex dump is moved to caller, to avoid using ngx_cycle->log)
+ packet flags and some other debug messages are updated to
include packet type
As per https://tools.ietf.org/html/rfc7540#section-8.1,
: A server can send a complete response prior to the client
: sending an entire request if the response does not depend on
: any portion of the request that has not been sent and
: received. When this is true, a server MAY request that the
: client abort transmission of a request without error by
: sending a RST_STREAM with an error code of NO_ERROR after
: sending a complete response (i.e., a frame with the
: END_STREAM flag). Clients MUST NOT discard responses as a
: result of receiving such a RST_STREAM, though clients can
: always discard responses at their discretion for other
: reasons.
Previously, RST_STREAM(NO_ERROR) received from upstream after
a frame with the END_STREAM flag was incorrectly treated as an
error. Now, a single RST_STREAM(NO_ERROR) is properly handled.
This fixes problems observed with modern grpc-c [1], as well
as with the Go gRPC module.
[1] https://github.com/grpc/grpc/pull/1661
We always generate stream frames that have length. The 'len' member is used
during parsing incoming frames and can be safely ignored when generating
output.
There are following flags in quic connection:
closing - true, when a connection close is initiated, for whatever reason
draining - true, when a CC frame is received from peer
The following state machine is used for closing:
+------------------+
| I/HS/AD |
+------------------+
| | |
| | V
| | immediate close initiated:
| | reasons: close by top-level protocol, fatal error
| | + sends CC (probably with app-level message)
| | + starts close_timer: 3 * PTO (current probe timeout)
| | |
| | V
| | +---------+ - Reply to input with CC (rate-limited)
| | | CLOSING | - Close/Reset all streams
| | +---------+
| | | |
| V V |
| receives CC |
| | |
idle | |
timer | |
| V |
| +----------+ | - MUST NOT send anything (MAY send a single CC)
| | DRAINING | | - if not already started, starts close_timer: 3 * PTO
| +----------+ | - if not already done, close all streams
| | |
| | |
| close_timer fires
| |
V V
+------------------------+
| CLOSED | - clean up all the resources, drop connection
+------------------------+ state completely
The ngx_quic_close_connection() function gets an "rc" argument, that signals
reason of connection closing:
NGX_OK - initiated by application (i.e. http/3), follow state machine
NGX_DONE - timedout (while idle or draining)
NGX_ERROR - fatal error, destroy connection immediately
The PTO calculations are not yet implemented, hardcoded value of 5s is used.
The function is split into three:
ngx_quic_close_connection() itself cleans up all core nginx things
ngx_quic_close_quic() deals with everything inside c->quic
ngx_quic_close_streams() deals with streams cleanup
The quic and streams cleanup functions may return NGX_AGAIN, thus signalling
that cleanup is not ready yet, and the close cannot continue to next step.
The header size macros for long and short packets were fixed to provide
correct values in bytes.
Currently the sending code limits frames so they don't exceed max_packet_size.
But it does not account the case when a single frame can exceed the limit.
As a result of this patch, big payload (CRYPTO and STREAM) will be split
into a number of smaller frames that fit into advertised max_packet_size
(which specifies final packet size, after encryption).
chrome-unstable 83.0.4103.7 starts with Initial packet number 1.
I couldn't find a proper explanation besides this text in quic-transport:
An endpoint MAY skip packet numbers when sending
packets to detect this (Optimistic ACK Attack) behavior.
+ MAX_STREAM_DATA frame is sent when recv() is performed on stream
The new value is a sum of total bytes received by stream + free
space in a buffer;
The sending of MAX_STREM_DATA frame in response to STREAM_DATA_BLOCKED
frame is adjusted to follow the same logic as above.
+ MAX_DATA frame is sent when total amount of received data is 2x
of current limit. The limit is doubled.
+ Default values of transport parameters are adjusted to more meaningful
values:
initial stream limits are set to quic buffer size instead of
unrealistically small 255.
initial max data is decreased to 16 buffer sizes, in an assumption that
this is enough for a relatively short connection, instead of randomly
chosen big number.
All this allows to initiate a stable flow of streams that does not block
on stream/connection limits (tested with FF 77.0a1 and 100K requests)
Before the patch, full STREAM frame handling was delayed until the frame with
zero offset is received. Only node in the streams tree was created.
This lead to problems when such stream was deleted, in particular, it had no
handlers set for read events.
This patch creates new stream immediately, but delays data delivery until
the proper offset will arrive. This is somewhat similar to how accept()
operation works.
The ngx_quic_add_stream() function is no longer needed and merged into stream
handler. The ngx_quic_stream_input() now only handles frames for existing
streams and does not deal with stream creation.
Frames can still float in the following queues:
- crypto frames reordering queues (one per encryption level)
- moved crypto frames cleanup to the moment where all streams are closed
- stream frames reordering queues (one per packet number namespace)
- frames retransmit queues (one per packet number namespace)
Each stream node now includes incoming frames queue and sent/received counters
for tracking offset. The sent counter is not used, c->sent is used, not like
in crypto buffers, which have no connections.
If offset in CRYPTO frame doesn't match expected, following actions are taken:
a) Duplicate frames or frames within [0...current offset] are ignored
b) New data from intersecting ranges (starts before current_offset, ends
after) is consumed
c) "Future" frames are stored in a sorted queue (min offset .. max offset)
Once a frame is consumed, current offset is updated and the queue is inspected:
we iterate the queue until the gap is found and act as described
above for each frame.
The amount of data in buffered frames is limited by corresponding macro.
The CRYPTO and STREAM frame structures are now compatible: they share
the same set of initial fields. This allows to have code that deals with
both of this frames.
The ordering layer now processes the frame with offset and invokes the
handler when it can organise an ordered stream of data.
Quote: Conceptually, a packet number space is the context in which a packet
can be processed and acknowledged.
ngx_quic_namespace_t => ngx_quic_send_ctx_t
qc->ns => qc->send_ctx
ns->largest => send_ctx->largest_ack
The ngx_quic_ns(level) macro now returns pointer, not just index:
ngx_quic_get_send_ctx(c->quic, level)
ngx_quic_retransmit_ns() => ngx_quic_retransmit()
ngx_quic_output_ns() => ngx_quic_output_frames()
The request processing is delayed by a timer. Since nginx updates
internal time once at the start of each event loop iteration, this
normally ensures constant time delay, adding a mitigation from
time-based attacks.
A notable exception to this is the case when there are no additional
events before the timer expires. To ensure constant-time processing
in this case as well, we trigger an additional event loop iteration
by posting a dummy event for the next event loop iteration.
The offset in client CRYPTO frames is tracked in c->quic->crypto_offset_in.
This means that CRYPTO frames with non-zero offset are now accepted making
possible to finish handshake with client certificates that exceed max packet
size (if no reordering happens).
The c->quic->crypto_offset field is renamed to crypto_offset_out to avoid
confusion with tracking of incoming CRYPTO stream.
+ since number of ranges in unknown, provide a function to parse them once
again in handler to avoid memory allocation
+ ack handler now processes all ranges, not only the first
+ ECN counters are parsed and saved into frame if present
Such frames are grouped together in a switch and just ignored, instead of
closing the connection This may improve test coverage. All such frames
require acknowledgment.
The qc->closing flag is set when a connection close is initiated for the first
time.
No timers will be set if the flag is active.
TODO: this is a temporary solution to avoid running timer handlers after
connection (and it's pool) was destroyed. It looks like currently we have
no clear policy of connection closing in regard to timers.
Found with a previously received Initial packet with ACK only, which
instantiates a new connection but do not produce the handshake keys.
This can be triggered by a fairly well behaving client, if the server
stands behind a load balancer that stripped Initial packets exchange.
Found by F5 test suite.
This makes sending large number of bidirectional stream work within ngtcp2,
which doesn't bother sending optional STREAMS_BLOCKED when exhausted.
This also introduces tracking currently opened and maximum allowed streams.
Currently, the output is called periodically, each 200 ms to invoke
ngx_quic_output() that will push all pending frames into packets.
TODO: implement flags a-là Nagle & co (NO_DELAY/NO_PUSH...)
All frames collected to packet are moved into a per-namespace send queue.
QUIC connection has a timer which fires on the closest max_ack_delay time.
The frame is deleted from the queue when a corresponding packet is acknowledged.
The NGX_QUIC_MAX_RETRANSMISSION is a timeout that defines maximum length
of retransmission of a frame.
The quic->keys[4] array now contains secrets related to the corresponding
encryption level. All protection-level functions get proper keys and do
not need to switch manually between levels.
If early data is accepted, SSL_do_handshake() completes as soon as ClientHello
is processed. SSL_in_init() will report the handshake is still in progress.
Static buffers are used instead in functions where decryption takes place.
The pkt->plaintext points to the beginning of a static buffer.
The pkt->payload.data points to decrypted data actual start.
+ ngx_quic_encrypt():
- no longer accepts pool as argument
- pkt is 1st arg
- payload is passed as pkt->payload
- performs encryption to the specified static buffer
+ ngx_quic_create_long/short_packet() functions:
- single buffer for everything, allocated by caller
- buffer layout is: [ ad | payload | TAG ]
the result is in the beginning of buffer with proper length
- nonce is calculated on stack
- log is passed explicitly, pkt is 1st arg
- no more allocations inside
+ ngx_quic_create_long_header():
- args changed: no need to pass str_t
+ added ngx_quic_create_short_header()
+ Client-related errors (i.e. parsing) are done at INFO level
+ c->log->action is updated through the process of receiving, parsing.
handling packet/payload and generating frames/output.
For ngx_http_process_request() part to work, this required to set both
r->http_connection->ssl and c->ssl on a QUIC stream. To avoid damaging
global SSL object, ngx_ssl_shutdown() is managed to ignore QUIC streams.
+ ngx_quic_init_ssl_methods() is no longer there, we setup methods on SSL
connection directly.
+ the handshake_handler is actually a generic quic input handler
+ updated c->log->action and debug to reflect changes and be more informative
+ c->quic is always set in ngx_quic_input()
+ the quic connection state is set by the results of SSL_do_handshake();
note:
+ parameters are available in SSL connection since they are obtained by ssl
stack
quote:
During connection establishment, both endpoints make authenticated
declarations of their transport parameters. These declarations are
made unilaterally by each endpoint.
and really, we send our parameters before we read client's.
no handling of incoming parameters is made by this patch.
- integer parameters can be configured using the following directives:
quic_max_idle_timeout
quic_max_ack_delay
quic_max_packet_size
quic_initial_max_data
quic_initial_max_stream_data_bidi_local
quic_initial_max_stream_data_bidi_remote
quic_initial_max_stream_data_uni
quic_initial_max_streams_bidi
quic_initial_max_streams_uni
quic_ack_delay_exponent
quic_active_migration
quic_active_connection_id_limit
- only following parameters are actually sent:
active_connection_id_limit
initial_max_streams_uni
initial_max_streams_bidi
initial_max_stream_data_bidi_local
initial_max_stream_data_bidi_remote
initial_max_stream_data_uni
(other parameters are to be added into ngx_quic_create_transport_params()
function as needed, should be easy now)
- draft 24 and draft 27 are now supported
(at compile-time using quic_version macro)
The ngx_quic_parse_frame() functions now has new 'pkt' argument: the packet
header of a currently processed frame. This allows to log errors/debug
closer to reasons and perform additional checks regarding possible frame
types. The handler only performs processing of good frames.
A number of functions like read_uint32(), parse_int[_multi] probably should
be implemented as a macro, but currently it is better to have them as
functions for simpler debugging.
Cleanup in ngx_event_quic.c:
+ reorderded functions, structures
+ added missing prototypes
+ added separate handlers for each frame type
+ numerous indentation/comments/TODO fixes
+ removed non-implemented qc->state and corresponding enum;
this requires deep thinking, stub was unused.
+ streams inside quic connection are now in own structure
All code dealing with serializing/deserializing
is moved int srv/event/ngx_event_quic_transport.c/h file.
All macros for dealing with data are internal to source file.
The header file exposes frame types and error codes.
The exported functions are currently packet header parsers and writers
and frames parser/writer.
The ngx_quic_header_t structure is updated with 'log' member. This avoids
passing extra argument to parsing functions that need to report errors.
+ support for more than one initial packet
+ workaround for trailing zeroes in packet
+ ignore application data packet if no keys yet (issue in draft 27/ff nightly)
+ fixed PING frame parser
+ STREAM frames need to be acknowledged
The following HTTP configuration is used for firefox (v74):
http {
ssl_certificate_key localhost.key;
ssl_certificate localhost.crt;
ssl_protocols TLSv1.2 TLSv1.3;
server {
listen 127.0.0.1:10368 reuseport http3;
ssl_quic on;
server_name localhost;
location / {
return 200 "This-is-QUICK\n";
}
}
server {
listen 127.0.0.1:5555 ssl; # point the browser here
server_name localhost;
location / {
add_header Alt-Svc 'h3-24=":10368";ma=100';
return 200 "ALT-SVC";
}
}
}
New files:
src/event/ngx_event_quic_protection.h
src/event/ngx_event_quic_protection.c
The protection.h header provides interface to the crypto part of the QUIC:
2 functions to initialize corresponding secrets:
ngx_quic_set_initial_secret()
ngx_quic_set_encryption_secret()
and 2 functions to deal with packet processing:
ngx_quic_encrypt()
ngx_quic_decrypt()
Also, structures representing secrets are defined there.
All functions require SSL connection and a pool, only crypto operations
inside, no access to nginx connections or events.
Currently pool->log is used for the logging (instead of original c->log).
- events handling moved into src/event/ngx_event_quic.c
- http invokes once ngx_quic_run() and passes stream callback
(diff to original http_request.c is now minimal)
- streams are stored in rbtree using ID as a key
- when a new stream is registered, appropriate callback is called
- ngx_quic_stream_t type represents STREAM and stored in c->qs
- now NEW_CONNECTION_ID frames can be received and parsed
The packet structure is created in ngx_quic_input() and passed
to all handlers (initial, handshake and application data).
The UDP datagram buffer is saved as pkt->raw;
The QUIC packet is stored as pkt->data and pkt->len (instead of pkt->buf)
(pkt->len is adjusted after parsing headers to actual length)
The pkt->pos is removed, pkt->raw->pos is used instead.
- added basic parsing of ACK, PING and PADDING frames on input
- added preliminary parsing of SHORT headers
The ngx_quic_output() is now called after processing of each input packet.
Frames are added into output queue according to their level: inital packets
go ahead of handshake and application data, so they can be merged properly.
The payload handler is called from both new, handshake and applicataion data
handlers (latter is a stub).
As was objserved with ngtcp2 client, Finished CRYPTO frame within Handshake
packet may not be sent for some reason if there's nothing to append on 1-RTT.
This results in unnecessary retransmit. To avoid this edge case, a non-zero
active_connection_id_limit transport parameter is now used to append datagram
with NEW_CONNECTION_ID 1-RTT frames.
Now handshake generates frames, and they are queued in c->quic->frames.
The ngx_quic_output() is called from ngx_quic_flush_flight() or manually,
processes the queue and encrypts all frames according to required encryption
level.
ngx_quic_hexdump0(log, format, buffer, buffer_size);
- logs hexdump of buffer to specified error log
ngx_quic_hexdump0(c->log, "this is foo:", foo.data, foo.len);
ngx_quic_hexdump(log, format, buffer, buffer_size, ...)
- same as hexdump0, but more format/args possible:
ngx_quic_hexdump(c->log, "a=%d b=%d, foo is:", foo.data, foo.len, a, b);
When "aio" or "aio threads" is used while processing the response body of an
in-memory background subrequest, the subrequest could be finalized with an aio
operation still in progress. Upon aio completion either parent request is
woken or the old r->write_event_handler is called again. The latter may result
in request errors. In either case post_subrequest handler is never called with
the full response body, which is typically expected when using in-memory
subrequests.
Currently in nginx background subrequests are created by the upstream module
and the mirror module. The issue does not manifest itself with these
subrequests because they are header-only. But it can manifest itself with
third-party modules which create in-memory background subrequests.
We used to have default error_page overwrite for 495, 496, and 497, so
a configuration like
error_page 495 /error;
will result in error 400, much like without any error_page configured.
The 494 status code was introduced later (in 3848:de59ad6bf557, nginx 0.9.4),
and relevant changes to ngx_http_core_error_page() were missed, resulting
in inconsistent behaviour of "error_page 494" - with error_page configured
it results in 494 being returned instead of 400.
Reported by Frank Liu,
http://mailman.nginx.org/pipermail/nginx/2020-February/058957.html.
Introduced ngx_quic_input() and ngx_quic_output() as interface between
nginx and protocol. They are the only functions that are exported.
While there, added copyrights.
In "co64" atom chunk start offset is a 64-bit unsigned integer. When trimming
the "mdat" atom, chunk offsets are casted to off_t values which are typically
64-bit signed integers. A specially crafted mp4 file with huge chunk offsets
may lead to off_t overflow and result in negative trim boundaries.
The consequences of the overflow are:
- Incorrect Content-Length header value in the response.
- Negative left boundary of the response file buffer holding the trimmed "mdat".
This leads to pread()/sendfile() errors followed by closing the client
connection.
On rare systems where off_t is a 32-bit integer, this scenario is also feasible
with the "stco" atom.
The fix is to add checks which make sure data chunks referenced by each track
are within the mp4 file boundaries. Additionally a few more checks are added to
ensure mp4 file consistency and log errors.
Duplicate "Host" headers were allowed in nginx 0.7.0 (revision b9de93d804ea)
as a workaround for some broken Motorola phones which used to generate
requests with two "Host" headers[1]. It is believed that this workaround
is no longer relevant.
[1] http://mailman.nginx.org/pipermail/nginx-ru/2008-May/017845.html
The "identity" transfer coding has been removed in RFC 7230. It is
believed that it is not used in real life, and at the same time it
provides a potential attack vector.
We anyway do not support more than one transfer encoding, so accepting
requests with multiple Transfer-Encoding headers doesn't make sense.
Further, we do not handle multiple headers, and ignore anything but
the first header.
Reported by Filippo Valsorda.
A connection could get stuck without timers if a client has partially sent
the HEADERS frame such that it was split on the individual header boundary.
In this case, it cannot be processed without the rest of the HEADERS frame.
The fix is to call ngx_http_v2_state_headers_save() in this case. Normally,
it would be called from the ngx_http_v2_state_header_block() handler on the
next iteration, when there is not enough data to continue processing. This
isn't the case if recv_buffer became empty and there's no more data to read.
With the recent change to prevent frames flood in d4448892a294,
nginx will finalize the connection with NGX_HTTP_V2_INTERNAL_ERROR
whenever flood is detected, causing nginx aborting or stopping if
the debug_points directive is used in nginx config.
Previous change 1ce3f01a4355 incorrectly introduced processing of the
ngx_posted_next_events queue at the end of operation, effectively making
posted next events a nop, since at the end of an event loop iteration
the queue is always empty. Correct approach is to move events to the
ngx_posted_events queue at an iteration start, as it was done previously.
Further, in some cases the c->read event might be already in the
ngx_posted_events queue, and calling ngx_post_event() with the
ngx_posted_next_events queue won't do anything. To make sure the event
will be correctly placed into the ngx_posted_next_events queue
we now check if it is already posted.
Introduced in 9d2ad2fb4423 available bytes handling in SSL relied
on connection read handler being overwritten to set the ready flag
and the amount of available bytes. This approach is, however, does
not work properly when connection read handler is changed, for example,
when switching to a next pipelined request, and can result in unexpected
connection timeouts, see here:
http://mailman.nginx.org/pipermail/nginx-devel/2019-December/012825.html
Fix is to introduce ngx_event_process_posted_next() instead, which
will set ready and available regardless of how event handler is set.
When ngx_http_v2_close_stream_handler() is used to retry stream close
after queued frames are sent, client timeouts on the stream can be
logged multiple times and/or in addition to already happened errors.
To resolve this, separate ngx_http_v2_retry_close_stream_handler()
was introduced, which does not try to log timeouts.
If a stream is closed with queued frames, it is possible that no further
write events will occur on the stream, leading to the socket leak.
To fix this, the stream's fake connection read handler is set to
ngx_http_v2_close_stream_handler(), to make sure that finalizing the
connection with ngx_http_v2_finalize_connection() will be able to
close the stream regardless of the current number of queued frames.
Additionally, the stream's fake connection fc->error flag is explicitly
set, so ngx_http_v2_handle_stream() will post a write event when queued
frames are finally sent even if stream flow control window is exhausted.
These checks were missed when chunked support was introduced. And also
added an explicit error message to ngx_http_dav_copy_move_handler()
(it was missed for some reason, in contrast to DELETE and MKCOL handlers).
While empty replacements were caught at run-time, parsing code
of the "rewrite" directive expects that a minimum length of the
"replacement" argument is 1.
If a rewritten URI has the null character, only a part of URI was
copied to a memory buffer allocated for path. In some setups this
could be exploited to expose uninitialized memory via the Location
header.
The "alias" directive cannot be used in the same location where URI
was rewritten. This has been detected in the "rewrite ... break"
case, but not when the standalone "break" directive was used.
This change also fixes proxy_pass with URI component in a similar
case:
location /aaa/ {
rewrite ^ /xxx/yyy;
break;
proxy_pass http://localhost:8080/bbb/;
}
Previously, the "/bbb/yyy" would be sent to a backend instead of
"/xxx/yyy". And if location's prefix was longer than the rewritten
URI, a segmentation fault might occur.
Previously, connections returned from keepalive cache had c->data
pointing to the keepalive cache item. While this shouldn't be a problem
for correct code, as c->data is not expected to be used before it is set,
explicitly clearing it might help to avoid confusion.
Previously only an rbtree was associated with a limit_conn. To make it
possible to associate more data with a limit_conn, shared context is introduced
similar to limit_req. Also, shared pool pointer is kept in a way similar to
limit_req.
Now a new structure ngx_proxy_protocol_t holds these fields. This allows
to add more PROXY protocol fields in the future without modifying the
connection structure.
With MinGW-w64, building 64-bit nginx binary with GCC 8 and above
results in warning due to cast of GetProcAddress() result to ngx_wsapoll_pt,
which GCC thinks is incorrect. Added intermediate cast to "void *" to
silence the warning.
FormatMessage() seems to return many errors which essentially indicate that
the language in question is not available. At least the following were
observed in the wild and during testing: ERROR_MUI_FILE_NOT_FOUND (15100)
(ticket #1868), ERROR_RESOURCE_TYPE_NOT_FOUND (1813). While documentation
says it should be ERROR_RESOURCE_LANG_NOT_FOUND (1815), this doesn't seem
to be the case.
As such, checking error code was removed, and as long as FormatMessage()
returns an error, we now always try the default language.
Added code to track number of bytes available in the socket.
This makes it possible to avoid looping for a long time while
working with fast enough peer when data are added to the socket buffer
faster than we are able to read and process data.
When kernel does not provide number of bytes available, it is
retrieved using ioctl(FIONREAD) as long as a buffer is filled by
SSL_read().
It is assumed that number of bytes returned by SSL_read() is close
to the number of bytes read from the socket, as we do not use
SSL compression. But even if it is not true for some reason, this
is not important, as we post an additional reading event anyway.
Note that data can be buffered at SSL layer, and it is not possible
to simply stop reading at some point and wait till the event will
be reported by the kernel again. This can be only done when there
are no data in SSL buffers, and there is no good way to find out if
it's the case.
Instead of trying to figure out if SSL buffers are empty, this patch
introduces events posted for the next event loop iteration - such
events will be processed only on the next event loop iteration,
after going into the kernel and retrieving additional events. This
seems to be simple and reliable approach.
This makes it possible to avoid looping for a long time while working
with a fast enough peer when data are added to the socket buffer faster
than we are able to read and process them (ticket #1431). This is
basically what we already do on FreeBSD with kqueue, where information
about the number of bytes in the socket buffer is returned by
the kevent() call.
With other event methods rev->available is now set to -1 when the socket
is ready for reading. Later in ngx_recv() and ngx_recv_chain(), if
full buffer is received, real number of bytes in the socket buffer is
retrieved using ioctl(FIONREAD). Reading more than this number of bytes
ensures that even with edge-triggered event methods the event will be
triggered again, so it is safe to stop processing of the socket and
switch to other connections.
Using ioctl(FIONREAD) only after reading a full buffer is an optimization.
With this approach we only call ioctl(FIONREAD) when there are at least
two recv()/readv() calls.
As long as there are data to read in the socket, yet the amount of data
is less than total size of the buffers in the chain, this saves one
unneeded read() syscall. Before this change, reading only stopped if
ngx_ssl_recv() returned no data, that is, two read() syscalls in a row
returned EAGAIN.
In SSL connections, data can be buffered by the SSL layer, and it is
wrong to avoid doing c->recv_chain() if c->read->available is 0 and
c->read->pending_eof is set. And tests show that the optimization in
question indeed can result in incorrect detection of premature connection
close if upstream closes the connection without sending a close notify
alert at the same time. Fix is to disable c->read->available optimization
for SSL connections.
This could happen when graceful shutdown configured by worker_shutdown_timeout
times out and is then followed by another timeout such as proxy_read_timeout.
In this case, the HEADERS frame is added to the output queue, but attempt to
send it fails (due to c->error forcibly set during graceful shutdown timeout).
This triggers request finalization which attempts to close the stream. But the
stream cannot be closed because there is a frame in the output queue, and the
connection cannot be finalized. This leaves the connection open without any
timer events leading to alert.
The fix is to post write event when sending output queue fails on c->error.
That will finalize the connection.
With this patch, all traffic over an HTTP/2 connection is counted in
the h2c->total_bytes field, and payload traffic is counted in
the h2c->payload_bytes field. As long as total traffic is many times
larger than payload traffic, we consider this to be a flood.
In 8df664ebe037, we've switched to maximizing stream window instead
of sending RST_STREAM. Since then handling of RST_STREAM with NO_ERROR
was fixed at least in Chrome, hence we switch back to using RST_STREAM.
This allows more effective rejecting of large bodies, and also minimizes
non-payload traffic to be accounted in the next patch.
Previously, if a response to the PTR request was cached, and ngx_resolver_dup()
failed to allocate memory for the resulting name, then the original node was
freed but left in expire_queue. A subsequent address resolving would end up
in a use-after-free memory access of the node either in ngx_resolver_expire()
or ngx_resolver_process_ptr(), when accessing it through expire_queue.
The fix is to leave the resolver node intact.
Don't waste server resources by sending RST_STREAM frames. Instead,
reject WINDOW_UPDATE frames with invalid zero increment by closing
connection with PROTOCOL_ERROR.
Don't waste server resources by sending RST_STREAM frames. Instead,
reject HEADERS and PRIORITY frames with self-dependency by closing
connection with PROTOCOL_ERROR.
When ngx_http_discard_request_body() call was added to ngx_http_send_response(),
there were no return codes other than NGX_OK and NGX_HTTP_INTERNAL_SERVER_ERROR.
Now it can also return NGX_HTTP_BAD_REQUEST, but ngx_http_send_response() still
incorrectly transforms it to NGX_HTTP_INTERNAL_SERVER_ERROR.
The fix is to propagate ngx_http_discard_request_body() errors.
As defined in HTTP/1.1, body chunks have the following ABNF:
chunk = chunk-size [ chunk-ext ] CRLF chunk-data CRLF
where chunk-data is a sequence of chunk-size octets.
With this change, chunk-data that doesn't end up with CRLF at chunk-size
offset will be treated as invalid, such as in the example provided below:
4
SEE-THIS-AND-
4
THAT
0
Previously, if unbuffered request body reading wasn't finished before
the request was redirected to a different location using error_page
or X-Accel-Redirect, and the request body is read again, this could
lead to disastrous effects, such as a duplicate post_handler call or
"http request count is zero" alert followed by a segmentation fault.
This happened in the following configuration (ticket #1819):
location / {
proxy_request_buffering off;
proxy_pass http://bad;
proxy_intercept_errors on;
error_page 502 = /error;
}
location /error {
proxy_pass http://backend;
}
Fixed excessive memory growth and CPU usage if stream windows are
manipulated in a way that results in generating many small DATA frames.
Fix is to limit the number of simultaneously allocated DATA frames.
Fixed uncontrolled memory growth if peer sends a stream of
headers with a 0-length header name and 0-length header value.
Fix is to reject headers with zero name length.
When using SMTP with SSL and resolver, read events might be enabled
during address resolving, leading to duplicate ngx_mail_ssl_handshake_handler()
calls if something arrives from the client, and duplicate session
initialization - including starting another resolving. This can lead
to a segmentation fault if the session is closed after first resolving
finished. Fix is to block read events while resolving.
Reported by Robert Norris,
http://mailman.nginx.org/pipermail/nginx/2019-July/058204.html.
After ac5a741d39cf it is now possible that after zstream.avail_out
reaches 0 and we allocate additional buffer, there will be no more data
to put into this buffer, triggering "zero size buf" alert. Fix is to
reset b->temporary flag in this case.
Additionally, an optimization added to avoid allocating additional buffer
in this case, by checking if last deflate() call returned Z_STREAM_END.
Note that checking for Z_STREAM_END by itself is not enough to fix alerts,
as deflate() can return Z_STREAM_END without producing any output if the
buffer is smaller than gzip trailer.
Reported by Witold Filipczyk,
http://mailman.nginx.org/pipermail/nginx-devel/2019-July/012469.html.
Due to shortcomings of the ccv->zero flag implementation in complex value
interface, length of the resulting string from ngx_http_complex_value()
might either not include terminating null character or include it,
so the only safe way to work with the result is to use it as a
null-terminated string.
Reported by Patrick Wollgast.
When "-" follows a parameter of maximum length, a single byte buffer
overflow happens, since the error branch does not check parameter length.
Fix is to avoid saving "-" to the parameter key, and instead use an error
message with "-" explicitly written. The message is mostly identical to
one used in similar cases in the preequal state.
Reported by Patrick Wollgast.
With level-triggered event methods it is important to specify
the NGX_CLOSE_EVENT flag to ngx_handle_read_event(), otherwise
the event won't be removed, resulting in CPU hog.
Reported by Patrick Wollgast.
To save memory hash code uses u_short to store resulting bucket sizes,
so maximum bucket size is limited to 65536 minus ngx_cacheline_size (larger
values will be aligned to 65536 which will overflow u_short). However,
there were no checks to enforce this, and using larger bucket sizes
resulted in overflows and segmentation faults.
Appropriate safety checks to enforce this added to ngx_hash_init().
When nginx is used with zlib patched with [1], which provides
integration with the future IBM Z hardware deflate acceleration, it ends
up computing CRC32 twice: one time in hardware, which always does this,
and one time in software by explicitly calling crc32().
crc32() calls were added in changesets 133:b27548f540ad ("nginx-0.0.1-
2003-09-24-23:51:12 import") and 134:d57c6835225c ("nginx-0.0.1-
2003-09-26-09:45:21 import") as part of gzip wrapping feature - back
then zlib did not support it.
However, since then gzip wrapping was implemented in zlib v1.2.0.4,
and it's already being used by nginx for log compression.
This patch replaces hand-written gzip wrapping with the one provided by
zlib. It simplifies the code, and makes it avoid computing CRC32 twice
when using hardware acceleration.
[1] https://github.com/madler/zlib/pull/410
Similarly to the change in 5491:74bfa803a5aa (1.5.9), we should accept
properly escaped URIs and unescape them as needed, else it is not possible
to handle URIs with question marks.
As we now have ctx->header_sent flag, it is further used to prevent
duplicate $r->send_http_header() calls, prevent output before sending
header, and $r->internal_redirect() after sending header.
Further, $r->send_http_header() protected from calls after
$r->internal_redirect().
Returning NGX_HTTP_INTERNAL_SERVER_ERROR if a perl code died after
sending header will lead to a "header already sent" alert. To avoid
it, we now check if header was already sent, and return NGX_ERROR
instead if it was.
Previously, redirects scheduled with $r->internal_redirect() were followed
even if the code then died. Now these are ignored and nginx will return
an error instead.
Variable handlers are not expected to send anything to the client, cannot
sleep or read body, and are not expected to modify the request. Added
appropriate protection to prevent accidental foot shooting.
Duplicate $r->sleep() and/or $r->has_request_body() calls result
in undefined behaviour (in practice, connection leaks were observed).
To prevent this, croak() added in appropriate places.
Previously, allocation errors in nginx.xs were more or less ignored,
potentially resulting in incorrect code execution in specific low-memory
conditions. This is changed to use ctx->error bit and croak(), similarly
to how output errors are now handled.
Note that this is mostly a cosmetic change, as Perl itself exits on memory
allocation errors, and hence nginx with Perl is hardly usable in low-memory
conditions.
When an error happens, the ctx->error bit is now set, and croak()
is called to terminate further processing. The ctx->error bit is
checked in ngx_http_perl_call_handler() to cancel further processing,
and is also checked in various output functions - to make sure these won't
be called if croak() was handled by an eval{} in perl code.
In particular, this ensures that output chain won't be called after
errors, as filters might not expect this to happen. This fixes some
segmentation faults under low memory conditions. Also this stops
request processing after filter finalization or request body reading
errors.
For cases where an HTTP error status can be additionally returned (for
example, 416 (Requested Range Not Satisfiable) from the range filter),
the ctx->status field is also added.
This ensures that correct ctx is always available, including after
filter finalization. In particular, this fixes a segmentation fault
with the following configuration:
location / {
image_filter test;
perl 'sub {
my $r = shift;
$r->send_http_header();
$r->print("foo\n");
$r->print("bar\n");
}';
}
This also seems to be the only way to correctly handle filter finalization
in various complex cases, for example, when embedded perl is used both
in the original handler and in an error page called after filter
finalization.
The NGX_DONE test in ngx_http_perl_handle_request() was introduced
in 1702:86bb52e28ce0, which also modified ngx_http_perl_call_handler()
to return NGX_DONE with c->destroyed. The latter part was then
removed in 3050:f54b02dbb12b, so NGX_DONE test is no longer needed.
Embedded perl does not set any request fields needed for conditional
requests processing. Further, filter finalization in the not_modified
filter can cause segmentation faults due to cleared ctx as in
ticket #1786.
Before 5fb1e57c758a (1.7.3) the not_modified filter was implicitly disabled
for perl responses, as r->headers_out.last_modified_time was -1. This
change restores this behaviour by using the explicit r->disable_not_modified
flag.
Note that this patch doesn't try to address perl module robustness against
filter finalization and other errors returned from filter chains. It should
be eventually reworked to handle errors instead of ignoring them.
A new directive limit_req_dry_run allows enabling the dry run mode. In this
mode requests are neither rejected nor delayed, but reject/delay status is
logged as usual.
In case of filter finalization, essential request fields like r->uri,
r->args etc could be changed, which affected the cache update subrequest.
Also, after filter finalization r->cache could be set to NULL, leading to
null pointer dereference in ngx_http_upstream_cache_background_update().
The fix is to create background cache update subrequest before sending the
cached response.
Since initial introduction in 1aeaae6e9446 (1.11.10) background cache update
subrequest was created after sending the cached response because otherwise it
blocked the parent request output. In 9552758a786e (1.13.1) background
subrequests were introduced to eliminate the delay before sending the final
part of the cached response. This also made it possible to create the
background cache update subrequest before sending the response.
Note that creating the subrequest earlier does not change the fact that in case
of filter finalization the background cache update subrequest will likely not
have enough time to successfully update the cache entry. Filter finalization
leads to the main request termination as soon the current iteration of request
processing is complete.
Previously, a variant not present in shared memory and stored on disk using a
secondary key was read using c->body_start from a variant stored with a main
key. This could result in critical errors "cache file .. has too long header".
Previously the stale-if-error extension of the Cache-Control upstream header
triggered the return of a stale response for all error conditions that can be
specified in the proxy_cache_use_stale directive. The list of these errors
includes both network/timeout/format errors, as well as some HTTP codes like
503, 504, 403, 429 etc. The latter prevented a cache entry from being updated
by a response with any of these HTTP codes during the stale-if-error period.
Now stale-if-error only works for network/timeout/format errors and ignores
the upstream HTTP code. The return of a stale response for certain HTTP codes
is still possible using the proxy_cache_use_stale directive.
This change also applies to the stale-while-revalidate extension of the
Cache-Control header, which triggers stale-if-error if it is missing.
Reported at
http://mailman.nginx.org/pipermail/nginx/2020-July/059723.html.
In ngx_http_range_singlepart_body() special buffers where passed
unmodified, including ones after the end of the range. As such,
if the last buffer of a response was sent separately as a special
buffer, two buffers with b->last_buf set were present in the response.
In particular, this might result in a duplicate final chunk when using
chunked transfer encoding (normally range filter and chunked transfer
encoding are not used together, but this may happen if there are trailers
in the response). This also likely to cause problems in HTTP/2.
Fix is to skip all special buffers after we've sent the last part of
the range requested. These special buffers are not meaningful anyway,
since we set b->last_buf in the buffer with the last part of the range,
and everything is expected to be flushed due to it.
Additionally, ngx_http_next_body_filter() is now called even
if no buffers are to be passed to it. This ensures that various
write events are properly propagated through the filter chain. In
particular, this fixes test failures observed with the above change
and aio enabled.
Filters are not allowed to change incoming chain links, and should allocate
their own links if any modifications are needed. Nevertheless
ngx_http_range_singlepart_body() modified incoming chain links in some
cases, notably at the end of the requested range.
No problems caused by this are currently known, mostly because of
limited number of possible modifications and the position of the range
body filter in the filter chain. Though this behaviour is clearly incorrect
and tests demonstrate that it can at least cause some proxy buffers being
lost when using proxy_force_ranges, leading to less effective handling
of responses.
Fix is to always allocate new chain links in ngx_http_range_singlepart_body().
Links are explicitly freed to ensure constant memory usage with long-lived
requests.
If a complex value is expected to be of type size_t, and the compiled
value is constant, the constant size_t value is remembered at compile
time.
The value is accessed through ngx_http_complex_value_size() which
either returns the remembered constant or evaluates the expression
and parses it as size_t.
Previously, ngx_utf8_decode() was called from ngx_utf8_length() with
incorrect length, potentially resulting in out-of-bounds read when
handling invalid UTF-8 strings.
In practice out-of-bounds reads are not possible though, as autoindex, the
only user of ngx_utf8_length(), provides null-terminated strings, and
ngx_utf8_decode() anyway returns an errors when it sees a null in the
middle of an UTF-8 sequence.
Reported by Yunbin Liu.
If OCSP stapling was enabled with dynamic certificate loading, with some
OpenSSL versions (1.0.2o and older, 1.1.0h and older; fixed in 1.0.2p,
1.1.0i, 1.1.1) a segmentation fault might happen.
The reason is that during an abbreviated handshake the certificate
callback is not called, but the certificate status callback was called
(https://github.com/openssl/openssl/issues/1662), leading to NULL being
returned from SSL_get_certificate().
Fix is to explicitly check SSL_get_certificate() result.
If X509_get_issuer_name() or X509_get_subject_name() returned NULL,
this could lead to a certificate reference leak. It cannot happen
in practice though, since each function returns an internal pointer
to a mandatory subfield of the certificate successfully decoded by
d2i_X509() during certificate message processing (closes#1751).
Previously the ngx_inet_resolve_host() function sorted addresses in a way that
IPv4 addresses came before IPv6 addresses. This was implemented in eaf95350d75c
(1.3.10) along with the introduction of getaddrinfo() which could resolve host
names to IPv6 addresses. Since the "listen" directive only used the first
address, sorting allowed to preserve "listen" compatibility with the previous
behavior and with the behavior of nginx built without IPv6 support. Now
"listen" uses all resolved addresses which makes sorting pointless.
Previously only one address was used by the listen directive handler even if
host name resolved to multiple addresses. Now a separate listening socket is
created for each address.
This makes it possible to provide certificates directly via variables
in ssl_certificate / ssl_certificate_key directives, without using
intermediate files.
It was accidentally introduced in 77436d9951a1 (1.15.9). In MSVC 2015
and more recent MSVC versions it triggers warning C4456 (declaration of
'pkey' hides previous local declaration). Previously, all such warnings
were resolved in 2a621245f4cf.
Reported by Steve Stevenson.
Server name callback is always called by OpenSSL, even
if server_name extension is not present in ClientHello. As such,
checking c->ssl->handshaked before the SSL_get_servername() result
should help to more effectively prevent renegotiation in
OpenSSL 1.1.0 - 1.1.0g, where neither SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS
nor SSL_OP_NO_RENEGOTIATION is available.
The SSL_OP_NO_CLIENT_RENEGOTIATION option was introduced in LibreSSL 2.5.1.
Unlike OpenSSL's SSL_OP_NO_RENEGOTIATION, it only disables client-initiated
renegotiation, and hence can be safely used on all SSL contexts.
If ngx_pool_cleanup_add() fails, we have to clean just created SSL context
manually, thus appropriate call added.
Additionally, ngx_pool_cleanup_add() moved closer to ngx_ssl_create() in
the ngx_http_ssl_module, to make sure there are no leaks due to intermediate
code.
Notably this affects various allocation errors, and should generally
improve things if an allocation error actually happens during a callback.
Depending on the OpenSSL version, returning an error can result in
either SSL_R_CALLBACK_FAILED or SSL_R_CLIENTHELLO_TLSEXT error from
SSL_do_handshake(), so both errors were switched to the "info" level.
OpenSSL 1.1.1 does not save server name to the session if server name
callback returns anything but SSL_TLSEXT_ERR_OK, thus breaking
the $ssl_server_name variable in resumed sessions.
Since $ssl_server_name can be used even if we've selected the default
server and there are no other servers, it looks like the only viable
solution is to always return SSL_TLSEXT_ERR_OK regardless of the actual
result.
To fix things in the stream module as well, added a dummy server name
callback which always returns SSL_TLSEXT_ERR_OK.
A virtual server may have no SSL context if it does not have certificates
defined, so we have to use config of the ngx_http_ssl_module from the
SSL context in the certificate callback. To do so, it is now passed as
the argument of the callback.
The stream module doesn't really need any changes, but was modified as
well to match http code.
Dynamic certificates re-introduce problem with incorrect session
reuse (AKA "virtual host confusion", CVE-2014-3616), since there are
no server certificates to generate session id context from.
To prevent this, session id context is now generated from ssl_certificate
directives as specified in the configuration. This approach prevents
incorrect session reuse in most cases, while still allowing sharing
sessions across multiple machines with ssl_session_ticket_key set as
long as configurations are identical.
Passwords have to be copied to the configuration pool to be used
at runtime. Also, to prevent blocking on stdin (with "daemon off;")
an empty password list is provided.
To make things simpler, password handling was modified to allow
an empty array (with 0 elements and elts set to NULL) as an equivalent
of an array with 1 empty password.
To evaluate variables, a request is created in the certificate callback,
and then freed. To do this without side effects on the stub_status
counters and connection state, an additional function was introduced,
ngx_http_alloc_request().
Only works with OpenSSL 1.0.2+, since there is no SSL_CTX_set_cert_cb()
in older versions.
This makes it possible to reuse certificate loading at runtime,
as introduced in the following patches.
Additionally, this improves error logging, so nginx will now log
human-friendly messages "cannot load certificate" instead of only
referring to sometimes cryptic names of OpenSSL functions.
The "(SSL:)" snippet currently appears in logs when nginx code uses
ngx_ssl_error() to log an error, but OpenSSL's error queue is empty.
This can happen either because the error wasn't in fact from OpenSSL,
or because OpenSSL did not indicate the error in the error queue
for some reason.
In particular, currently "(SSL:)" can be seen in errors at least in
the following cases:
- When SSL_write() fails due to a syscall error,
"[info] ... SSL_write() failed (SSL:) (32: Broken pipe)...".
- When loading a certificate with no data in it,
"[emerg] PEM_read_bio_X509_AUX(...) failed (SSL:)".
This can easily happen due to an additional empty line before
the end line, so all lines of the certificate are interpreted
as header lines.
- When trying to configure an unknown curve,
"[emerg] SSL_CTX_set1_curves_list("foo") failed (SSL:)".
Likely there are other cases as well.
With this change, "(SSL:...)" will be only added to the error message
if there is something in the error queue. This is expected to make
logs more readable in the above cases. Additionally, with this change
it is now possible to use ngx_ssl_error() to log errors when some
of the possible errors are not from OpenSSL and not expected to have
anything in the error queue.
Checking multiple errors at once is a bad practice, as in general
it is not guaranteed that an object can be used after the error.
In this particular case, checking errors after multiple allocations
can result in excessive errors being logged when there is no memory
available.
On Windows, connect() errors are only reported via exceptfds descriptor set
from select(). Previously exceptfds was set to NULL, and connect() errors
were not detected at all, so connects to closed ports were waiting till
a timeout occurred.
Since ongoing connect() means that there will be a write event active,
except descriptor set is copied from the write one. While it is possible
to construct except descriptor set as a concatenation of both read and write
descriptor sets, this looks unneeded.
With this change, connect() errors are properly detected now when using
select(). Note well that it is not possible to detect connect() errors with
WSAPoll() (see https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/).
WSAPoll() is only available with Windows Vista and newer (and only
available during compilation if _WIN32_WINNT >= 0x0600). To make
sure the code works with Windows XP, we do not redefine _WIN32_WINNT,
but instead load WSAPoll() dynamically if it is not available during
compilation.
Also, sockets are not guaranteed to be small integers on Windows.
So an index array is used instead of NGX_USE_FD_EVENT to map
events to connections.
Previously, the code incorrectly assumed "ngx_event_t *" elements
instead of "struct pollfd".
This is mostly cosmetic change, as this code is never called now.
Previously, when using proxy_upload_rate and proxy_download_rate, the buffer
size for reading from a socket could be reduced as a result of rate limiting.
For connection-oriented protocols this behavior is normal since unread data will
normally be read at the next iteration. But for datagram-oriented protocols
this is not the case, and unread part of the datagram is lost.
Now buffer size is not limited for datagrams. Rate limiting still works in this
case by delaying the next reading event.
A shared connection does not own its file descriptor, which means that
ngx_handle_read_event/ngx_handle_write_event calls should do nothing for it.
Currently the c->shared flag is checked in several places in the stream proxy
module prior to calling these functions. However it was not done everywhere.
Missing checks could lead to calling
ngx_handle_read_event/ngx_handle_write_event on shared connections.
The problem manifested itself when using proxy_upload_rate and resulted in
either duplicate file descriptor error (e.g. with epoll) or incorrect further
udp packet processing (e.g. with kqueue).
The fix is to set and reset the event active flag in a way that prevents
ngx_handle_read_event/ngx_handle_write_event from scheduling socket events.
Previous interface of ngx_open_dir() assumed that passed directory name
has a room for NGX_DIR_MASK at the end (NGX_DIR_MASK_LEN bytes). While all
direct users of ngx_dir_open() followed this interface, this also implied
similar requirements for indirect uses - in particular, via ngx_walk_tree().
Currently none of ngx_walk_tree() uses provides appropriate space, and
fixing this does not look like a right way to go. Instead, ngx_dir_open()
interface was changed to not require any additional space and use
appropriate allocations instead.
If SSL_write_early_data() returned SSL_ERROR_WANT_WRITE, stop further reading
using a newly introduced c->ssl->write_blocked flag, as otherwise this would
result in SSL error "ssl3_write_bytes:bad length". Eventually, normal reading
will be restored by read event posted from successful SSL_write_early_data().
While here, place "SSL_write_early_data: want write" debug on the path.
Previously, if an SRV record was successfully resolved, but all of its A
records failed to resolve, NXDOMAIN was returned to the caller, which is
considered a successful resolve rather than an error. This could result in
losing the result of a previous successful resolve by the caller.
Now NXDOMAIN is only returned if at least one A resolve completed with this
code. Otherwise the error state of the first A resolve is returned.
Previously, unnamed regex captures matched in the parent request, were not
available in a cloned subrequest. Now 3 fields related to unnamed captures
are copied to a cloned subrequest: r->ncaptures, r->captures and
r->captures_data. Since r->captures cannot be changed by either request after
creating a clone, a new flag r->realloc_captures is introduced to force
reallocation of r->captures.
The issue was reported as a proxy_cache_background_update misbehavior in
http://mailman.nginx.org/pipermail/nginx/2018-December/057251.html.
In the past, there were several security issues which resulted in
worker process memory disclosure due to buffers with negative size.
It looks reasonable to check for such buffers in various places,
much like we already check for zero size buffers.
While here, removed "#if 1 / #endif" around zero size buffer checks.
It looks highly unlikely that we'll disable these checks anytime soon.
On 32-bit platforms mp4->buffer_pos might overflow when a large
enough (close to 4 gigabytes) atom is being skipped, resulting in
incorrect memory addesses being read further in the code. In most
cases this results in harmless errors being logged, though may also
result in a segmentation fault if hitting unmapped pages.
To address this, ngx_mp4_atom_next() now only increments mp4->buffer_pos
up to mp4->buffer_end. This ensures that overflow cannot happen.
Variables now do not depend on presence of the HTTP status code in response.
If the corresponding event occurred, variables contain time between request
creation and the event, and "-" otherwise.
Previously, intermediate value of the $upstream_response_time variable held
unix timestamp.
The directive allows to drop binding between a client and existing UDP stream
session after receiving a specified number of packets. First packet from the
same client address and port will start a new session. Old session continues
to exist and will terminate at moment defined by configuration: either after
receiving the expected number of responses, or after timeout, as specified by
the "proxy_responses" and/or "proxy_timeout" directives.
By default, proxy_requests is zero (disabled).
An attack that continuously switches HTTP/2 connection between
idle and active states can result in excessive CPU usage.
This is because when a connection switches to the idle state,
all of its memory pool caches are freed.
This change limits the maximum allowed number of idle state
switches to 10 * http2_max_requests (i.e., 10000 by default).
This limits possible CPU usage in one connection, and also
imposes a limit on the maximum lifetime of a connection.
Initially reported by Gal Goldshtein from F5 Networks.
Fixed uncontrolled memory growth in case peer is flooding us with
some frames (e.g., SETTINGS and PING) and doesn't read data. Fix
is to limit the number of allocated control frames.
Previously there was no validation for the size of a 64-bit atom
in an mp4 file. This could lead to a CPU hog when the size is 0,
or various other problems due to integer underflow when calculating
atom data size, including segmentation fault or worker process
memory disclosure.
Size of a shared memory zones must be at least two pages - one page
for slab allocator internal data, and another page for actual allocations.
Using 8192 instead is wrong, as there are systems with page sizes other
than 4096.
Note well that two pages is usually too low as well. In particular, cache
is likely to use two allocations of different sizes for global structures,
and at least four pages will be needed to properly allocate cache nodes.
Except in a few very special cases, with keys zone of just two pages nginx
won't be able to start. Other uses of shared memory impose a limit
of 8 pages, which provides some room for global allocations. This patch
doesn't try to address this though.
Inspired by ticket #1665.
With maximum version explicitly set, TLSv1.3 will not be unexpectedly
enabled if nginx compiled with OpenSSL 1.1.0 (without TLSv1.3 support)
will be run with OpenSSL 1.1.1 (with TLSv1.3 support).
In e3ba4026c02d (1.15.4) nginx own renegotiation checks were disabled
if SSL_OP_NO_RENEGOTIATION is available. But since SSL_OP_NO_RENEGOTIATION
is only set on a connection, not in an SSL context, SSL_clear_option()
removed it as long as a matching virtual server was found. This resulted
in a segmentation fault similar to the one fixed in a6902a941279 (1.9.8),
affecting nginx built with OpenSSL 1.1.0h or higher.
To fix this, SSL_OP_NO_RENEGOTIATION is now explicitly set in
ngx_http_ssl_servername() after adjusting options. Additionally, instead
of c->ssl->renegotiation we now check c->ssl->handshaked, which seems
to be a more correct flag to test, and will prevent the segmentation fault
from happening even if SSL_OP_NO_RENEGOTIATION is not working.
The "no suitable signature algorithm" errors are reported by OpenSSL 1.1.1
when using TLSv1.3 if there are no shared signature algorithms. In
particular, this can happen if the client limits available signature
algorithms to something we don't have a certificate for, or to an empty
list. For example, the following command:
openssl s_client -connect 127.0.0.1:8443 -sigalgs rsa_pkcs1_sha1
will always result in the "no suitable signature algorithm" error
as the "rsa_pkcs1_sha1" algorithm refers solely to signatures which
appear in certificates and not defined for use in TLS 1.3 handshake
messages.
The SSL_R_NO_COMMON_SIGNATURE_ALGORITHMS error is what BoringSSL returns
in the same situation.
The "no suitable key share" errors are reported by OpenSSL 1.1.1 when
using TLSv1.3 if there are no shared groups (that is, elliptic curves).
In particular, it is easy enough to trigger by using only a single
curve in ssl_ecdh_curve:
ssl_ecdh_curve secp384r1;
and using a different curve in the client:
openssl s_client -connect 127.0.0.1:443 -curves prime256v1
On the client side it is seen as "sslv3 alert handshake failure",
"SSL alert number 40":
0:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:ssl/record/rec_layer_s3.c:1528:SSL alert number 40
It can be also triggered with default ssl_ecdh_curve by using a curve
which is not in the default list (X25519, prime256v1, X448, secp521r1,
secp384r1):
openssl s_client -connect 127.0.0.1:8443 -curves brainpoolP512r1
Given that many clients hardcode prime256v1, these errors might become
a common problem with TLSv1.3 if ssl_ecdh_curve is redefined. Previously
this resulted in not using ECDH with such clients, but with TLSv1.3 it
is no longer possible and will result in a handshake failure.
The SSL_R_NO_SHARED_GROUP error is what BoringSSL returns in the same
situation.
Seen at:
https://serverfault.com/questions/932102/nginx-ssl-handshake-error-no-suitable-key-share
Previously, configurations with typo, for example
fastcgi_cache_valid 200301 302 5m;
successfully pass configuration test. Adding check for status
codes > 599, and such configurations are now properly rejected.
The bgcolor attribute overrides compatibility settings in browsers
and leads to undesirable behavior when the default font color is set
to white in the browser, since font-color is not also overridden.
Following 7319:dcab86115261, as long as SSL_OP_NO_RENEGOTIATION is
defined, it is OpenSSL library responsibility to prevent renegotiation,
so the checks are meaningless.
Additionally, with TLSv1.3 OpenSSL tends to report SSL_CB_HANDSHAKE_START
at various unexpected moments - notably, on KeyUpdate messages and
when sending tickets. This change prevents unexpected connection
close on KeyUpdate messages and when finishing handshake with upcoming
early data changes.
Trying to look into r->err_status in the "return" directive
makes it behave differently than real errors generated in other
parts of the code, and is an endless source of various problems.
This behaviour was introduced in 726:7b71936d5299 (0.4.4) with
the comment "fix: "return" always overrode "error_page" response code".
It is not clear if there were any real cases this was expected to fix,
but there are several cases which are broken due to this change, some
previously fixed (4147:7f64de1cc2c0).
In ticket #1634, the problem is that when r->err_status is set to
a non-special status code, it is not possible to return a response
by simply returning r->err_status. If this is the case, the only
option is to return script's e->status instead. An example
configuration:
location / {
error_page 404 =200 /err502;
return 404;
}
location = /err502 {
return 502;
}
After the change, such a configuration will properly return
standard 502 error, much like it happens when a 502 error is
generated by proxy_pass.
This also fixes the following configuration to properly close
connection as clearly requested by "return 444":
location / {
error_page 404 /close;
return 404;
}
location = /close {
return 444;
}
Previously, this required "error_page 404 = /close;" to work
as intended.
Socket leak was observed in the following configuration:
error_page 400 = /close;
location = /close {
return 444;
}
The problem is that "return 444" triggers termination of the request,
and due to error_page termination thinks that it needs to use a posted
request to clear stack. But at the early request processing where 400
errors are generated there are no ngx_http_run_posted_requests() calls,
so the request is only terminated after an external event.
Variants of the problem include "error_page 497" instead (ticket #695)
and various other errors generated during early request processing
(405, 414, 421, 494, 495, 496, 501, 505).
The same problem can be also triggered with "return 499" and "return 408"
as both codes trigger ngx_http_terminate_request(), much like "return 444".
To fix this, the patch adds ngx_http_run_posted_requests() calls to
ngx_http_process_request_line() and ngx_http_process_request_headers()
functions, and to ngx_http_v2_run_request() and ngx_http_v2_push_stream()
functions in HTTP/2.
Since the ngx_http_process_request() function is now only called via
other functions which call ngx_http_run_posted_requests(), the call
there is no longer needed and was removed.
It is possible that after SSL_read() will return SSL_ERROR_WANT_WRITE,
further calls will return SSL_ERROR_WANT_READ without reading any
application data. We have to call ngx_handle_write_event() and
switch back to normal write handling much like we do if there are some
application data, or the write there will be reported again and again.
Similarly, we have to switch back to normal read handling if there
is saved read handler and SSL_write() returns SSL_ERROR_WANT_WRITE.
While SSL_read() most likely to return SSL_ERROR_WANT_WRITE (and SSL_write()
accordingly SSL_ERROR_WANT_READ) during an SSL renegotiation, it is
not necessary mean that a renegotiation was started. In particular,
it can never happen during a renegotiation or can happen multiple times
during a renegotiation.
Because of the above, misleading "peer started SSL renegotiation" info
messages were replaced with "SSL_read: want write" and "SSL_write: want read"
debug ones.
Additionally, "SSL write handler" and "SSL read handler" are now logged
by the SSL write and read handlers, to make it easier to understand that
temporary SSL handlers are called instead of normal handlers.
The "do { c->recv() } while (c->read->ready)" form used in the
ngx_http_lingering_close_handler() is not really correct, as for
example with SSL c->read->ready may be still set when returning NGX_AGAIN
due to SSL_ERROR_WANT_WRITE. Therefore the above might be an infinite loop.
This doesn't really matter in lingering close, as we shutdown write side
of the socket anyway and also disable renegotiation (and even without shutdown
and with renegotiation it requires using very large certificate chain and
tuning socket buffers to trigger SSL_ERROR_WANT_WRITE). But for the sake of
correctness added an NGX_AGAIN check.
If sending request body was not completed (u->request_body_sent is not set),
the upstream keepalive module won't save such a connection. However, it
is theoretically possible (though highly unlikely) that sending of some
control frames can be blocked after the request body was sent. The
ctx->output_blocked flag introduced to disable keepalive in such cases.
The code is now able to parse additional control frames after
the response is received, and can send control frames as well.
This fixes keepalive problems as observed with grpc-c, which can
send window update and ping frames after the response, see
http://mailman.nginx.org/pipermail/nginx/2018-August/056620.html.
Previously the preread phase code ignored NGX_AGAIN value returned from
c->recv() and relied only on c->read->ready. But this flag is not reliable and
should only be checked for optimization purposes. For example, when using
SSL, c->read->ready may be set when no input is available. This can lead to
calling preread handler infinitely in a loop.
The problem does not manifest itself currently, because in case of
non-buffered reading, chain link created by u->create_request method
consists of a single element.
Found by PVS-Studio.
The directive configures maximum number of requests allowed on
a connection kept in the cache. Once a connection reaches the number
of requests configured, it is no longer saved to the cache.
The default is 100.
Much like keepalive_requests for client connections, this is mostly
a safeguard to make sure connections are closed periodically and the
memory allocated from the connection pool is freed.
The directive configures maximum time a connection can be kept in the
cache. By configuring a time which is smaller than the corresponding
timeout on the backend side one can avoid the race between closing
a connection by the backend and nginx trying to use the same connection
to send a request at the same time.
LibreSSL 2.8.0 "added const annotations to many existing APIs from OpenSSL,
making interoperability easier for downstream applications". This includes
the const change in the SSL_CTX_sess_set_get_cb() callback function (see
9dd43f4ef67e), which breaks compilation.
To fix this, added a condition on how we redefine OPENSSL_VERSION_NUMBER
when working with LibreSSL (see 382fc7069e3a). With LibreSSL 2.8.0,
we now set OPENSSL_VERSION_NUMBER to 0x1010000fL (OpenSSL 1.1.0), so the
appropriate conditions in the code will use "const" as it happens with
OpenSSL 1.1.0 and later versions.
There are clients which cannot handle HPACK's dynamic table size updates
as added in 12cadc4669a7 (1.13.6). Notably, old versions of OkHttp library
are known to fail on it (ticket #1397).
This change makes it possible to work with such clients by only sending
dynamic table size updates in response to SETTINGS_HEADER_TABLE_SIZE. As
a downside, clients which do not use SETTINGS_HEADER_TABLE_SIZE will
continue to maintain default 4k table.
Previously, a chunk of spaces larger than NGX_CONF_BUFFER (4096 bytes)
resulted in the "too long parameter" error during parsing such a
configuration. This was because the code only set start and start_line
on non-whitespace characters, and hence adjacent whitespace characters
were preserved when reading additional data from the configuration file.
Fix is to always move start and start_line if the last character was
a space.
Early data AKA 0-RTT mode is enabled as long as "ssl_early_data on" is
specified in the configuration (default is off).
The $ssl_early_data variable evaluates to "1" if the SSL handshake
isn't yet completed, and can be used to set the Early-Data header as
per draft-ietf-httpbis-replay-04.
BoringSSL currently requires SSL_CTX_set_max_proto_version(TLS1_3_VERSION)
to be able to enable TLS 1.3. This is because by default max protocol
version is set to TLS 1.2, and the SSL_OP_NO_* options are merely used
as a blacklist within the version range specified using the
SSL_CTX_set_min_proto_version() and SSL_CTX_set_max_proto_version()
functions.
With this change, we now call SSL_CTX_set_max_proto_version() with an
explicit maximum version set. This enables TLS 1.3 with BoringSSL.
As a side effect, this change also limits maximum protocol version to
the newest protocol we know about, TLS 1.3. This seems to be a good
change, as enabling unknown protocols might have unexpected results.
Additionally, we now explicitly call SSL_CTX_set_min_proto_version()
with 0. This is expected to help with Debian system-wide default
of MinProtocol set to TLSv1.2, see
http://mailman.nginx.org/pipermail/nginx-ru/2017-October/060411.html.
Note that there is no SSL_CTX_set_min_proto_version macro in BoringSSL,
so we call SSL_CTX_set_min_proto_version() and SSL_CTX_set_max_proto_version()
as long as the TLS1_3_VERSION macro is defined.
The behaviour is now in line with COPY of a directory with contents,
which preserves access masks on individual files, as well as the "cp"
command.
Requested by Roman Arutyunyan.
This fixes wrong permissions and file time after cross-device MOVE
in the DAV module (ticket #1577). Broken in 8101d9101ed8 (0.8.9) when
cross-device copying was introduced in ngx_ext_rename_file().
With this change, ngx_copy_file() always calls ngx_set_file_time(),
either with the time provided, or with the time from the original file.
This is considered acceptable given that copying the file is costly anyway,
and optimizing cases when we do not need to preserve time will require
interface changes.
Previously, ngx_open_file(NGX_FILE_CREATE_OR_OPEN) was used, resulting
in destination file being partially rewritten if exists. Notably,
this affected WebDAV COPY command (ticket #1576).
Previously, "%uA" was used, which corresponds to ngx_atomic_uint_t.
Size of ngx_atomic_uint_t can be easily different from uint64_t,
leading to undefined results.
In TLSv1.3, NewSessionTicket messages arrive after the handshake and
can come at any time. Therefore we use a callback to save the session
when we know about it. This approach works for < TLSv1.3 as well.
The callback function is set once per location on merge phase.
Since SSL_get_session() in BoringSSL returns an unresumable session for
TLSv1.3, peer save_session() methods have been updated as well to use a
session supplied within the callback. To preserve API, the session is
cached in c->ssl->session. It is preferably accessed in save_session()
methods by ngx_ssl_get_session() and ngx_ssl_get0_session() wrappers.