So that connections are protected from failing from on-path attacks.
Decryption failure of long packets used during handshake still leads
to connection close since it barely makes sense to handle them there.
A previously used undefined error code is now replaced with the generic one.
Note that quic-transport prescribes keeping connection intact, discarding such
QUIC packets individually, in the sense that coalesced packets could be there.
This is selectively handled in the next change.
The patch removes remnants of the old state tracking mechanism, which did
not take into account assimetry of read/write states and was not very
useful.
The encryption state now is entirely tracked using SSL_quic_read/write_level().
quic-transport draft 29:
section 7:
* authenticated negotiation of an application protocol (TLS uses
ALPN [RFC7301] for this purpose)
...
Endpoints MUST explicitly negotiate an application protocol. This
avoids situations where there is a disagreement about the protocol
that is in use.
section 8.1:
When using ALPN, endpoints MUST immediately close a connection (see
Section 10.3 of [QUIC-TRANSPORT]) with a no_application_protocol TLS
alert (QUIC error code 0x178; see Section 4.10) if an application
protocol is not negotiated.
Changes in ngx_quic_close_quic() function are required to avoid attempts
to generated and send packets without proper keys, what happens in case
of failed ALPN check.
quic-transport draft 29, section 14:
QUIC depends upon a minimum IP packet size of at least 1280 bytes.
This is the IPv6 minimum size [RFC8200] and is also supported by most
modern IPv4 networks. Assuming the minimum IP header size, this
results in a QUIC maximum packet size of 1232 bytes for IPv6 and 1252
bytes for IPv4.
Since the packet size can change during connection lifetime, the
ngx_quic_max_udp_payload() function is introduced that currently
returns minimal allowed size, depending on address family.
quic-tls, 8.2:
The quic_transport_parameters extension is carried in the ClientHello
and the EncryptedExtensions messages during the handshake. Endpoints
MUST send the quic_transport_parameters extension; endpoints that
receive ClientHello or EncryptedExtensions messages without the
quic_transport_parameters extension MUST close the connection with an
error of type 0x16d (equivalent to a fatal TLS missing_extension
alert, see Section 4.10).
Clearing cache based on free space left on a file system is
expected to allow better disk utilization in some cases, notably
when disk space might be also used for something other than nginx
cache (including nginx own temporary files) and while loading
cache (when cache size might be inaccurate for a while, effectively
disabling max_size cache clearing).
Based on a patch by Adam Bambuch.
With XFS, using "allocsize=64m" mount option results in large preallocation
being reported in the st_blocks as returned by fstat() till the file is
closed. This in turn results in incorrect cache size calculations and
wrong clearing based on max_size.
To avoid too aggressive cache clearing on such volumes, st_blocks values
which result in sizes larger than st_size and eight blocks (an arbitrary
limit) are no longer trusted, and we use st_size instead.
The ngx_de_fs_size() counterpart is intentionally not modified, as
it is used on closed files and hence not affected by this problem.
NFS on Linux is known to report wsize as a block size (in both f_bsize
and f_frsize, both in statfs() and statvfs()). On the other hand,
typical file system block sizes on Linux (ext2/ext3/ext4, XFS) are limited
to pagesize. (With FAT, block sizes can be at least up to 512k in
extreme cases, but this doesn't really matter, see below.)
To avoid too aggressive cache clearing on NFS volumes on Linux, block
sizes larger than pagesize are now ignored.
Note that it is safe to ignore large block sizes. Since 3899:e7cd13b7f759
(1.0.1) cache size is calculated based on fstat() st_blocks, and rounding
to file system block size is preserved mostly for Windows.
Note well that on other OSes valid block sizes seen are at least up
to 65536. In particular, UFS on FreeBSD is known to work well with block
and fragment sizes set to 65536.
When validating second and further certificates, ssl callback could be called
twice to report the error. After the first call client connection is
terminated and its memory is released. Prior to the second call and in it
released connection memory is accessed.
Errors triggering this behavior:
- failure to create the request
- failure to start resolving OCSP responder name
- failure to start connecting to the OCSP responder
The fix is to rearrange the code to eliminate the second call.
The flush flag was not set when forwarding the request body to the uwsgi
server. When using uwsgi_pass suwsgi://..., this causes the uwsgi server
to wait indefinitely for the request body and eventually time out due to
SSL buffering.
This is essentially the same change as 4009:3183165283cc, which was made
to ngx_http_proxy_module.c.
This will fix the uwsgi bug https://github.com/unbit/uwsgi/issues/1490.
This is a temporary workaround, proper retransmission mechanism based on
quic-recovery rfc draft is yet to be implemented.
Currently hardcoded value is too small for real networks. The patch
sets static PTO, considering rtt of ~333ms, what gives about 1s.
This ensures that certificate verification is properly logged to debug
log during upstream server certificate verification. This should help
with debugging various certificate issues.
Listening UNIX sockets were not removed on graceful shutdown, preventing
the next runs. The fix is to replace the custom socket closing code in
ngx_master_process_cycle() by the ngx_close_listening_sockets() call.
When changing binary, sending a SIGTERM to the new binary's master process
should not remove inherited UNIX sockets unless the old binary's master
process has exited.
Also, if both are present, require that they have the same value. These
requirements are specified in HTTP/3 draft 28.
Current implementation of HTTP/2 treats ":authority" and "Host"
interchangeably. New checks only make sure at least one of these values is
present in the request. A similar check existed earlier and was limited only
to HTTP/1.1 in 38c0898b6df7.
The flags was originally added by 8f038068f4bc, and is propagated correctly
in the stream module. With QUIC introduction, http module now uses datagram
sockets as well, thus the fix.
Previously, invalid connection preface errors were only logged at debug
level, providing no visible feedback, in particular, when a plain text
HTTP/2 listening socket is erroneously used for HTTP/1.x connections.
Now these are explicitly logged at the info level, much like other
client-related errors.
When enabled, certificate status is stored in cache and is used to validate
the certificate in future requests.
New directive ssl_ocsp_cache is added to configure the cache.
OCSP validation for client certificates is enabled by the "ssl_ocsp" directive.
OCSP responder can be optionally specified by "ssl_ocsp_responder".
When session is reused, peer chain is not available for validation.
If the verified chain contains certificates from the peer chain not available
at the server, validation will fail.
Previously only the first responder address was used per each stapling update.
Now, in case of a network or parsing error, next address is used.
This also fixes the issue with unsupported responder address families
(ticket #1330).
Preserving pointers within the client buffer is not needed for HTTP/3 because
all data is either allocated from pool or static. Unlike with HTTP/1, data
typically cannot be referenced directly within the client buffer. Trying to
preserve NULLs or external pointers lead to broken pointers.
Also, reverted changes in ngx_http_alloc_large_header_buffer() not relevant
for HTTP/3 to minimize diff to mainstream.
New field r->parse_start is introduced to substitute r->request_start and
r->header_name_start for request length accounting. These fields only work for
this purpose in HTTP/1 because HTTP/1 request line and header line start with
these values.
Also, error logging is now fixed to output the right part of the request.
As per HTTP/3 draft 27, a request or response containing uppercase header
field names MUST be treated as malformed. Also, existing rules applied
when parsing HTTP/1 header names are also applied to HTTP/3 header names:
- null character is not allowed
- underscore character may or may not be treated as invalid depending on the
value of "underscores_in_headers"
- all non-alphanumeric characters with the exception of '-' are treated as
invalid
Also, the r->locase_header field is now filled while parsing an HTTP/3
header.
Error logging for invalid headers is fixed as well.
The first one parses pseudo-headers and is analagous to the request line
parser in HTTP/1. The second one parses regular headers and is analogous to
the header parser in HTTP/1.
Additionally, error handling of client passing malformed uri is now fixed.
The function ngx_http_parse_chunked() is also called from the proxy module to
parse the upstream response. It should always parse HTTP/1 body in this case.
According to quic-transport draft 28 section 10.3.1:
When sending CONNECTION_CLOSE, the goal is to ensure that the peer
will process the frame. Generally, this means sending the frame in a
packet with the highest level of packet protection to avoid the
packet being discarded. After the handshake is confirmed (see
Section 4.1.2 of [QUIC-TLS]), an endpoint MUST send any
CONNECTION_CLOSE frames in a 1-RTT packet. However, prior to
confirming the handshake, it is possible that more advanced packet
protection keys are not available to the peer, so another
CONNECTION_CLOSE frame MAY be sent in a packet that uses a lower
packet protection level.
There is no need in a separate type for the QUIC connection state.
The only state not found in the SSL library is NGX_QUIC_ST_UNAVAILABLE,
which is actually a flag used by the ngx_quic_close_quic() function
to prevent cleanup of uninitialized connection.
Sections 4.10.1 and 4.10.2 of quic transport describe discarding of initial
and handshake keys. Since the keys are discarded, we no longer need
to retransmit packets and corresponding queues should be emptied.
This patch removes previously added workaround that did not require
acknowledgement for initial packets, resulting in avoiding retransmission,
which is wrong because a packet could be lost and we have to retransmit it.
It was possible that retransmit timer was not set after the first
retransmission attempt, due to ngx_quic_retransmit() did not set
wait time properly, and the condition in retransmit handler was incorrect.
Section 17.2 and 17.3 of QUIC transport:
Fixed bit: Packets containing a zero value for this bit are not
valid packets in this version and MUST be discarded.
Reserved bit: An endpoint MUST treat receipt of a packet that has
a non-zero value for these bits, after removing both packet and
header protection, as a connection error of type PROTOCOL_VIOLATION.
When an error occurs, then c->quic->error field may be populated
with an appropriate error code, and the CONNECTION CLOSE frame will be
sent to the peer before the connection is closed. Otherwise, the error
treated as internal and INTERNAL_ERROR code is sent.
The pkt->error field is populated by functions processing packets to
indicate an error when it does not fit into pass/fail return status.
As per QUIC transport, the first flight of 0-RTT packets obviously uses same
Destination and Source Connection ID values as the client's first Initial.
The fix is to match 0-RTT against original DCID after it has been switched.
The ordered frame handler is always called for the existing stream, as it is
allocated from this stream. Instead of searching stream by id, pointer to the
stream node is passed.
The idea is to skip any zeroes that follow valid QUIC packet. Currently such
behavior can be only observed with Firefox which sends zero-padded initial
packets.
Now there's no need to annotate every frame in ACK-eliciting packet.
Sending ACK was moved to the first place, so that queueing ACK frame
no longer postponed up to the next packet after pushing STREAM frames.
+ added "quic" prefix to all error messages
+ rephrased some messages
+ removed excessive error logging from frame parser
+ added ngx_quic_check_peer() function to check proper source/destination
match and do it one place
- the ngx_quic_hexdump0() macro is renamed to ngx_quic_hexdump();
the original ngx_quic_hexdump() macro with variable argument is
removed, extra information is logged normally, with ngx_log_debug()
- all labels in hex dumps are prefixed with "quic"
- the hexdump format is simplified, length is moved forward to avoid
situations when the dump is truncated, and length is not shown
- ngx_quic_flush_flight() function contents is debug-only, placed under
NGX_DEBUG macro to avoid "unused variable" warnings from compiler
- frame names in labels are capitalized, similar to other places
+ all dumps are moved under one of the following macros (undefined by default):
NGX_QUIC_DEBUG_PACKETS
NGX_QUIC_DEBUG_FRAMES
NGX_QUIC_DEBUG_FRAMES_ALLOC
NGX_QUIC_DEBUG_CRYPTO
+ all QUIC debug messages got "quic " prefix
+ all input frames are reported as "quic frame in FOO_FRAME bar:1 baz:2"
+ all outgoing frames re reported as "quic frame out foo bar baz"
+ all stream operations are prefixed with id, like: "quic stream id 0x33 recv"
+ all transport parameters are prefixed with "quic tp"
(hex dump is moved to caller, to avoid using ngx_cycle->log)
+ packet flags and some other debug messages are updated to
include packet type
As per https://tools.ietf.org/html/rfc7540#section-8.1,
: A server can send a complete response prior to the client
: sending an entire request if the response does not depend on
: any portion of the request that has not been sent and
: received. When this is true, a server MAY request that the
: client abort transmission of a request without error by
: sending a RST_STREAM with an error code of NO_ERROR after
: sending a complete response (i.e., a frame with the
: END_STREAM flag). Clients MUST NOT discard responses as a
: result of receiving such a RST_STREAM, though clients can
: always discard responses at their discretion for other
: reasons.
Previously, RST_STREAM(NO_ERROR) received from upstream after
a frame with the END_STREAM flag was incorrectly treated as an
error. Now, a single RST_STREAM(NO_ERROR) is properly handled.
This fixes problems observed with modern grpc-c [1], as well
as with the Go gRPC module.
[1] https://github.com/grpc/grpc/pull/1661
We always generate stream frames that have length. The 'len' member is used
during parsing incoming frames and can be safely ignored when generating
output.
There are following flags in quic connection:
closing - true, when a connection close is initiated, for whatever reason
draining - true, when a CC frame is received from peer
The following state machine is used for closing:
+------------------+
| I/HS/AD |
+------------------+
| | |
| | V
| | immediate close initiated:
| | reasons: close by top-level protocol, fatal error
| | + sends CC (probably with app-level message)
| | + starts close_timer: 3 * PTO (current probe timeout)
| | |
| | V
| | +---------+ - Reply to input with CC (rate-limited)
| | | CLOSING | - Close/Reset all streams
| | +---------+
| | | |
| V V |
| receives CC |
| | |
idle | |
timer | |
| V |
| +----------+ | - MUST NOT send anything (MAY send a single CC)
| | DRAINING | | - if not already started, starts close_timer: 3 * PTO
| +----------+ | - if not already done, close all streams
| | |
| | |
| close_timer fires
| |
V V
+------------------------+
| CLOSED | - clean up all the resources, drop connection
+------------------------+ state completely
The ngx_quic_close_connection() function gets an "rc" argument, that signals
reason of connection closing:
NGX_OK - initiated by application (i.e. http/3), follow state machine
NGX_DONE - timedout (while idle or draining)
NGX_ERROR - fatal error, destroy connection immediately
The PTO calculations are not yet implemented, hardcoded value of 5s is used.
The function is split into three:
ngx_quic_close_connection() itself cleans up all core nginx things
ngx_quic_close_quic() deals with everything inside c->quic
ngx_quic_close_streams() deals with streams cleanup
The quic and streams cleanup functions may return NGX_AGAIN, thus signalling
that cleanup is not ready yet, and the close cannot continue to next step.
The header size macros for long and short packets were fixed to provide
correct values in bytes.
Currently the sending code limits frames so they don't exceed max_packet_size.
But it does not account the case when a single frame can exceed the limit.
As a result of this patch, big payload (CRYPTO and STREAM) will be split
into a number of smaller frames that fit into advertised max_packet_size
(which specifies final packet size, after encryption).
chrome-unstable 83.0.4103.7 starts with Initial packet number 1.
I couldn't find a proper explanation besides this text in quic-transport:
An endpoint MAY skip packet numbers when sending
packets to detect this (Optimistic ACK Attack) behavior.
+ MAX_STREAM_DATA frame is sent when recv() is performed on stream
The new value is a sum of total bytes received by stream + free
space in a buffer;
The sending of MAX_STREM_DATA frame in response to STREAM_DATA_BLOCKED
frame is adjusted to follow the same logic as above.
+ MAX_DATA frame is sent when total amount of received data is 2x
of current limit. The limit is doubled.
+ Default values of transport parameters are adjusted to more meaningful
values:
initial stream limits are set to quic buffer size instead of
unrealistically small 255.
initial max data is decreased to 16 buffer sizes, in an assumption that
this is enough for a relatively short connection, instead of randomly
chosen big number.
All this allows to initiate a stable flow of streams that does not block
on stream/connection limits (tested with FF 77.0a1 and 100K requests)
Before the patch, full STREAM frame handling was delayed until the frame with
zero offset is received. Only node in the streams tree was created.
This lead to problems when such stream was deleted, in particular, it had no
handlers set for read events.
This patch creates new stream immediately, but delays data delivery until
the proper offset will arrive. This is somewhat similar to how accept()
operation works.
The ngx_quic_add_stream() function is no longer needed and merged into stream
handler. The ngx_quic_stream_input() now only handles frames for existing
streams and does not deal with stream creation.
Frames can still float in the following queues:
- crypto frames reordering queues (one per encryption level)
- moved crypto frames cleanup to the moment where all streams are closed
- stream frames reordering queues (one per packet number namespace)
- frames retransmit queues (one per packet number namespace)
Each stream node now includes incoming frames queue and sent/received counters
for tracking offset. The sent counter is not used, c->sent is used, not like
in crypto buffers, which have no connections.
If offset in CRYPTO frame doesn't match expected, following actions are taken:
a) Duplicate frames or frames within [0...current offset] are ignored
b) New data from intersecting ranges (starts before current_offset, ends
after) is consumed
c) "Future" frames are stored in a sorted queue (min offset .. max offset)
Once a frame is consumed, current offset is updated and the queue is inspected:
we iterate the queue until the gap is found and act as described
above for each frame.
The amount of data in buffered frames is limited by corresponding macro.
The CRYPTO and STREAM frame structures are now compatible: they share
the same set of initial fields. This allows to have code that deals with
both of this frames.
The ordering layer now processes the frame with offset and invokes the
handler when it can organise an ordered stream of data.
Quote: Conceptually, a packet number space is the context in which a packet
can be processed and acknowledged.
ngx_quic_namespace_t => ngx_quic_send_ctx_t
qc->ns => qc->send_ctx
ns->largest => send_ctx->largest_ack
The ngx_quic_ns(level) macro now returns pointer, not just index:
ngx_quic_get_send_ctx(c->quic, level)
ngx_quic_retransmit_ns() => ngx_quic_retransmit()
ngx_quic_output_ns() => ngx_quic_output_frames()
The request processing is delayed by a timer. Since nginx updates
internal time once at the start of each event loop iteration, this
normally ensures constant time delay, adding a mitigation from
time-based attacks.
A notable exception to this is the case when there are no additional
events before the timer expires. To ensure constant-time processing
in this case as well, we trigger an additional event loop iteration
by posting a dummy event for the next event loop iteration.
The offset in client CRYPTO frames is tracked in c->quic->crypto_offset_in.
This means that CRYPTO frames with non-zero offset are now accepted making
possible to finish handshake with client certificates that exceed max packet
size (if no reordering happens).
The c->quic->crypto_offset field is renamed to crypto_offset_out to avoid
confusion with tracking of incoming CRYPTO stream.
+ since number of ranges in unknown, provide a function to parse them once
again in handler to avoid memory allocation
+ ack handler now processes all ranges, not only the first
+ ECN counters are parsed and saved into frame if present
Such frames are grouped together in a switch and just ignored, instead of
closing the connection This may improve test coverage. All such frames
require acknowledgment.
The qc->closing flag is set when a connection close is initiated for the first
time.
No timers will be set if the flag is active.
TODO: this is a temporary solution to avoid running timer handlers after
connection (and it's pool) was destroyed. It looks like currently we have
no clear policy of connection closing in regard to timers.
Found with a previously received Initial packet with ACK only, which
instantiates a new connection but do not produce the handshake keys.
This can be triggered by a fairly well behaving client, if the server
stands behind a load balancer that stripped Initial packets exchange.
Found by F5 test suite.
This makes sending large number of bidirectional stream work within ngtcp2,
which doesn't bother sending optional STREAMS_BLOCKED when exhausted.
This also introduces tracking currently opened and maximum allowed streams.
Currently, the output is called periodically, each 200 ms to invoke
ngx_quic_output() that will push all pending frames into packets.
TODO: implement flags a-là Nagle & co (NO_DELAY/NO_PUSH...)
All frames collected to packet are moved into a per-namespace send queue.
QUIC connection has a timer which fires on the closest max_ack_delay time.
The frame is deleted from the queue when a corresponding packet is acknowledged.
The NGX_QUIC_MAX_RETRANSMISSION is a timeout that defines maximum length
of retransmission of a frame.
The quic->keys[4] array now contains secrets related to the corresponding
encryption level. All protection-level functions get proper keys and do
not need to switch manually between levels.
If early data is accepted, SSL_do_handshake() completes as soon as ClientHello
is processed. SSL_in_init() will report the handshake is still in progress.
Static buffers are used instead in functions where decryption takes place.
The pkt->plaintext points to the beginning of a static buffer.
The pkt->payload.data points to decrypted data actual start.
+ ngx_quic_encrypt():
- no longer accepts pool as argument
- pkt is 1st arg
- payload is passed as pkt->payload
- performs encryption to the specified static buffer
+ ngx_quic_create_long/short_packet() functions:
- single buffer for everything, allocated by caller
- buffer layout is: [ ad | payload | TAG ]
the result is in the beginning of buffer with proper length
- nonce is calculated on stack
- log is passed explicitly, pkt is 1st arg
- no more allocations inside
+ ngx_quic_create_long_header():
- args changed: no need to pass str_t
+ added ngx_quic_create_short_header()
+ Client-related errors (i.e. parsing) are done at INFO level
+ c->log->action is updated through the process of receiving, parsing.
handling packet/payload and generating frames/output.
For ngx_http_process_request() part to work, this required to set both
r->http_connection->ssl and c->ssl on a QUIC stream. To avoid damaging
global SSL object, ngx_ssl_shutdown() is managed to ignore QUIC streams.
+ ngx_quic_init_ssl_methods() is no longer there, we setup methods on SSL
connection directly.
+ the handshake_handler is actually a generic quic input handler
+ updated c->log->action and debug to reflect changes and be more informative
+ c->quic is always set in ngx_quic_input()
+ the quic connection state is set by the results of SSL_do_handshake();
note:
+ parameters are available in SSL connection since they are obtained by ssl
stack
quote:
During connection establishment, both endpoints make authenticated
declarations of their transport parameters. These declarations are
made unilaterally by each endpoint.
and really, we send our parameters before we read client's.
no handling of incoming parameters is made by this patch.
- integer parameters can be configured using the following directives:
quic_max_idle_timeout
quic_max_ack_delay
quic_max_packet_size
quic_initial_max_data
quic_initial_max_stream_data_bidi_local
quic_initial_max_stream_data_bidi_remote
quic_initial_max_stream_data_uni
quic_initial_max_streams_bidi
quic_initial_max_streams_uni
quic_ack_delay_exponent
quic_active_migration
quic_active_connection_id_limit
- only following parameters are actually sent:
active_connection_id_limit
initial_max_streams_uni
initial_max_streams_bidi
initial_max_stream_data_bidi_local
initial_max_stream_data_bidi_remote
initial_max_stream_data_uni
(other parameters are to be added into ngx_quic_create_transport_params()
function as needed, should be easy now)
- draft 24 and draft 27 are now supported
(at compile-time using quic_version macro)
The ngx_quic_parse_frame() functions now has new 'pkt' argument: the packet
header of a currently processed frame. This allows to log errors/debug
closer to reasons and perform additional checks regarding possible frame
types. The handler only performs processing of good frames.
A number of functions like read_uint32(), parse_int[_multi] probably should
be implemented as a macro, but currently it is better to have them as
functions for simpler debugging.
Cleanup in ngx_event_quic.c:
+ reorderded functions, structures
+ added missing prototypes
+ added separate handlers for each frame type
+ numerous indentation/comments/TODO fixes
+ removed non-implemented qc->state and corresponding enum;
this requires deep thinking, stub was unused.
+ streams inside quic connection are now in own structure
All code dealing with serializing/deserializing
is moved int srv/event/ngx_event_quic_transport.c/h file.
All macros for dealing with data are internal to source file.
The header file exposes frame types and error codes.
The exported functions are currently packet header parsers and writers
and frames parser/writer.
The ngx_quic_header_t structure is updated with 'log' member. This avoids
passing extra argument to parsing functions that need to report errors.
+ support for more than one initial packet
+ workaround for trailing zeroes in packet
+ ignore application data packet if no keys yet (issue in draft 27/ff nightly)
+ fixed PING frame parser
+ STREAM frames need to be acknowledged
The following HTTP configuration is used for firefox (v74):
http {
ssl_certificate_key localhost.key;
ssl_certificate localhost.crt;
ssl_protocols TLSv1.2 TLSv1.3;
server {
listen 127.0.0.1:10368 reuseport http3;
ssl_quic on;
server_name localhost;
location / {
return 200 "This-is-QUICK\n";
}
}
server {
listen 127.0.0.1:5555 ssl; # point the browser here
server_name localhost;
location / {
add_header Alt-Svc 'h3-24=":10368";ma=100';
return 200 "ALT-SVC";
}
}
}
New files:
src/event/ngx_event_quic_protection.h
src/event/ngx_event_quic_protection.c
The protection.h header provides interface to the crypto part of the QUIC:
2 functions to initialize corresponding secrets:
ngx_quic_set_initial_secret()
ngx_quic_set_encryption_secret()
and 2 functions to deal with packet processing:
ngx_quic_encrypt()
ngx_quic_decrypt()
Also, structures representing secrets are defined there.
All functions require SSL connection and a pool, only crypto operations
inside, no access to nginx connections or events.
Currently pool->log is used for the logging (instead of original c->log).
- events handling moved into src/event/ngx_event_quic.c
- http invokes once ngx_quic_run() and passes stream callback
(diff to original http_request.c is now minimal)
- streams are stored in rbtree using ID as a key
- when a new stream is registered, appropriate callback is called
- ngx_quic_stream_t type represents STREAM and stored in c->qs
- now NEW_CONNECTION_ID frames can be received and parsed
The packet structure is created in ngx_quic_input() and passed
to all handlers (initial, handshake and application data).
The UDP datagram buffer is saved as pkt->raw;
The QUIC packet is stored as pkt->data and pkt->len (instead of pkt->buf)
(pkt->len is adjusted after parsing headers to actual length)
The pkt->pos is removed, pkt->raw->pos is used instead.
- added basic parsing of ACK, PING and PADDING frames on input
- added preliminary parsing of SHORT headers
The ngx_quic_output() is now called after processing of each input packet.
Frames are added into output queue according to their level: inital packets
go ahead of handshake and application data, so they can be merged properly.
The payload handler is called from both new, handshake and applicataion data
handlers (latter is a stub).
As was objserved with ngtcp2 client, Finished CRYPTO frame within Handshake
packet may not be sent for some reason if there's nothing to append on 1-RTT.
This results in unnecessary retransmit. To avoid this edge case, a non-zero
active_connection_id_limit transport parameter is now used to append datagram
with NEW_CONNECTION_ID 1-RTT frames.
Now handshake generates frames, and they are queued in c->quic->frames.
The ngx_quic_output() is called from ngx_quic_flush_flight() or manually,
processes the queue and encrypts all frames according to required encryption
level.
ngx_quic_hexdump0(log, format, buffer, buffer_size);
- logs hexdump of buffer to specified error log
ngx_quic_hexdump0(c->log, "this is foo:", foo.data, foo.len);
ngx_quic_hexdump(log, format, buffer, buffer_size, ...)
- same as hexdump0, but more format/args possible:
ngx_quic_hexdump(c->log, "a=%d b=%d, foo is:", foo.data, foo.len, a, b);
When "aio" or "aio threads" is used while processing the response body of an
in-memory background subrequest, the subrequest could be finalized with an aio
operation still in progress. Upon aio completion either parent request is
woken or the old r->write_event_handler is called again. The latter may result
in request errors. In either case post_subrequest handler is never called with
the full response body, which is typically expected when using in-memory
subrequests.
Currently in nginx background subrequests are created by the upstream module
and the mirror module. The issue does not manifest itself with these
subrequests because they are header-only. But it can manifest itself with
third-party modules which create in-memory background subrequests.
We used to have default error_page overwrite for 495, 496, and 497, so
a configuration like
error_page 495 /error;
will result in error 400, much like without any error_page configured.
The 494 status code was introduced later (in 3848:de59ad6bf557, nginx 0.9.4),
and relevant changes to ngx_http_core_error_page() were missed, resulting
in inconsistent behaviour of "error_page 494" - with error_page configured
it results in 494 being returned instead of 400.
Reported by Frank Liu,
http://mailman.nginx.org/pipermail/nginx/2020-February/058957.html.
Introduced ngx_quic_input() and ngx_quic_output() as interface between
nginx and protocol. They are the only functions that are exported.
While there, added copyrights.
In "co64" atom chunk start offset is a 64-bit unsigned integer. When trimming
the "mdat" atom, chunk offsets are casted to off_t values which are typically
64-bit signed integers. A specially crafted mp4 file with huge chunk offsets
may lead to off_t overflow and result in negative trim boundaries.
The consequences of the overflow are:
- Incorrect Content-Length header value in the response.
- Negative left boundary of the response file buffer holding the trimmed "mdat".
This leads to pread()/sendfile() errors followed by closing the client
connection.
On rare systems where off_t is a 32-bit integer, this scenario is also feasible
with the "stco" atom.
The fix is to add checks which make sure data chunks referenced by each track
are within the mp4 file boundaries. Additionally a few more checks are added to
ensure mp4 file consistency and log errors.
Duplicate "Host" headers were allowed in nginx 0.7.0 (revision b9de93d804ea)
as a workaround for some broken Motorola phones which used to generate
requests with two "Host" headers[1]. It is believed that this workaround
is no longer relevant.
[1] http://mailman.nginx.org/pipermail/nginx-ru/2008-May/017845.html
The "identity" transfer coding has been removed in RFC 7230. It is
believed that it is not used in real life, and at the same time it
provides a potential attack vector.
We anyway do not support more than one transfer encoding, so accepting
requests with multiple Transfer-Encoding headers doesn't make sense.
Further, we do not handle multiple headers, and ignore anything but
the first header.
Reported by Filippo Valsorda.
A connection could get stuck without timers if a client has partially sent
the HEADERS frame such that it was split on the individual header boundary.
In this case, it cannot be processed without the rest of the HEADERS frame.
The fix is to call ngx_http_v2_state_headers_save() in this case. Normally,
it would be called from the ngx_http_v2_state_header_block() handler on the
next iteration, when there is not enough data to continue processing. This
isn't the case if recv_buffer became empty and there's no more data to read.
With the recent change to prevent frames flood in d4448892a294,
nginx will finalize the connection with NGX_HTTP_V2_INTERNAL_ERROR
whenever flood is detected, causing nginx aborting or stopping if
the debug_points directive is used in nginx config.
Previous change 1ce3f01a4355 incorrectly introduced processing of the
ngx_posted_next_events queue at the end of operation, effectively making
posted next events a nop, since at the end of an event loop iteration
the queue is always empty. Correct approach is to move events to the
ngx_posted_events queue at an iteration start, as it was done previously.
Further, in some cases the c->read event might be already in the
ngx_posted_events queue, and calling ngx_post_event() with the
ngx_posted_next_events queue won't do anything. To make sure the event
will be correctly placed into the ngx_posted_next_events queue
we now check if it is already posted.
Introduced in 9d2ad2fb4423 available bytes handling in SSL relied
on connection read handler being overwritten to set the ready flag
and the amount of available bytes. This approach is, however, does
not work properly when connection read handler is changed, for example,
when switching to a next pipelined request, and can result in unexpected
connection timeouts, see here:
http://mailman.nginx.org/pipermail/nginx-devel/2019-December/012825.html
Fix is to introduce ngx_event_process_posted_next() instead, which
will set ready and available regardless of how event handler is set.
When ngx_http_v2_close_stream_handler() is used to retry stream close
after queued frames are sent, client timeouts on the stream can be
logged multiple times and/or in addition to already happened errors.
To resolve this, separate ngx_http_v2_retry_close_stream_handler()
was introduced, which does not try to log timeouts.
If a stream is closed with queued frames, it is possible that no further
write events will occur on the stream, leading to the socket leak.
To fix this, the stream's fake connection read handler is set to
ngx_http_v2_close_stream_handler(), to make sure that finalizing the
connection with ngx_http_v2_finalize_connection() will be able to
close the stream regardless of the current number of queued frames.
Additionally, the stream's fake connection fc->error flag is explicitly
set, so ngx_http_v2_handle_stream() will post a write event when queued
frames are finally sent even if stream flow control window is exhausted.
These checks were missed when chunked support was introduced. And also
added an explicit error message to ngx_http_dav_copy_move_handler()
(it was missed for some reason, in contrast to DELETE and MKCOL handlers).
While empty replacements were caught at run-time, parsing code
of the "rewrite" directive expects that a minimum length of the
"replacement" argument is 1.
If a rewritten URI has the null character, only a part of URI was
copied to a memory buffer allocated for path. In some setups this
could be exploited to expose uninitialized memory via the Location
header.
The "alias" directive cannot be used in the same location where URI
was rewritten. This has been detected in the "rewrite ... break"
case, but not when the standalone "break" directive was used.
This change also fixes proxy_pass with URI component in a similar
case:
location /aaa/ {
rewrite ^ /xxx/yyy;
break;
proxy_pass http://localhost:8080/bbb/;
}
Previously, the "/bbb/yyy" would be sent to a backend instead of
"/xxx/yyy". And if location's prefix was longer than the rewritten
URI, a segmentation fault might occur.
Previously, connections returned from keepalive cache had c->data
pointing to the keepalive cache item. While this shouldn't be a problem
for correct code, as c->data is not expected to be used before it is set,
explicitly clearing it might help to avoid confusion.
Previously only an rbtree was associated with a limit_conn. To make it
possible to associate more data with a limit_conn, shared context is introduced
similar to limit_req. Also, shared pool pointer is kept in a way similar to
limit_req.
Now a new structure ngx_proxy_protocol_t holds these fields. This allows
to add more PROXY protocol fields in the future without modifying the
connection structure.
With MinGW-w64, building 64-bit nginx binary with GCC 8 and above
results in warning due to cast of GetProcAddress() result to ngx_wsapoll_pt,
which GCC thinks is incorrect. Added intermediate cast to "void *" to
silence the warning.
FormatMessage() seems to return many errors which essentially indicate that
the language in question is not available. At least the following were
observed in the wild and during testing: ERROR_MUI_FILE_NOT_FOUND (15100)
(ticket #1868), ERROR_RESOURCE_TYPE_NOT_FOUND (1813). While documentation
says it should be ERROR_RESOURCE_LANG_NOT_FOUND (1815), this doesn't seem
to be the case.
As such, checking error code was removed, and as long as FormatMessage()
returns an error, we now always try the default language.
Added code to track number of bytes available in the socket.
This makes it possible to avoid looping for a long time while
working with fast enough peer when data are added to the socket buffer
faster than we are able to read and process data.
When kernel does not provide number of bytes available, it is
retrieved using ioctl(FIONREAD) as long as a buffer is filled by
SSL_read().
It is assumed that number of bytes returned by SSL_read() is close
to the number of bytes read from the socket, as we do not use
SSL compression. But even if it is not true for some reason, this
is not important, as we post an additional reading event anyway.
Note that data can be buffered at SSL layer, and it is not possible
to simply stop reading at some point and wait till the event will
be reported by the kernel again. This can be only done when there
are no data in SSL buffers, and there is no good way to find out if
it's the case.
Instead of trying to figure out if SSL buffers are empty, this patch
introduces events posted for the next event loop iteration - such
events will be processed only on the next event loop iteration,
after going into the kernel and retrieving additional events. This
seems to be simple and reliable approach.
This makes it possible to avoid looping for a long time while working
with a fast enough peer when data are added to the socket buffer faster
than we are able to read and process them (ticket #1431). This is
basically what we already do on FreeBSD with kqueue, where information
about the number of bytes in the socket buffer is returned by
the kevent() call.
With other event methods rev->available is now set to -1 when the socket
is ready for reading. Later in ngx_recv() and ngx_recv_chain(), if
full buffer is received, real number of bytes in the socket buffer is
retrieved using ioctl(FIONREAD). Reading more than this number of bytes
ensures that even with edge-triggered event methods the event will be
triggered again, so it is safe to stop processing of the socket and
switch to other connections.
Using ioctl(FIONREAD) only after reading a full buffer is an optimization.
With this approach we only call ioctl(FIONREAD) when there are at least
two recv()/readv() calls.
As long as there are data to read in the socket, yet the amount of data
is less than total size of the buffers in the chain, this saves one
unneeded read() syscall. Before this change, reading only stopped if
ngx_ssl_recv() returned no data, that is, two read() syscalls in a row
returned EAGAIN.
In SSL connections, data can be buffered by the SSL layer, and it is
wrong to avoid doing c->recv_chain() if c->read->available is 0 and
c->read->pending_eof is set. And tests show that the optimization in
question indeed can result in incorrect detection of premature connection
close if upstream closes the connection without sending a close notify
alert at the same time. Fix is to disable c->read->available optimization
for SSL connections.