When it's known that the kernel supports EPOLLRDHUP, there is no need in
additional recv() call to get EOF or error when the flag is absent in the
event generated by the kernel. A special runtime test is done at startup
to detect if EPOLLRDHUP is actually supported by the kernel because
epoll_ctl() silently ignores unknown flags.
With this knowledge it's now possible to drop the "ready" flag for partial
read. Previously, the "ready" flag was kept until the recv() returned EOF
or error. In particular, this change allows the lingering close heuristics
(which relies on the "ready" flag state) to actually work on Linux, and not
wait for more data in most cases.
The "available" flag is now used in the read event with the semantics similar
to the corresponding counter in kqueue.
Previously, there were only three timeouts used globally for the whole HTTP/2
connection:
1. Idle timeout for inactivity when there are no streams in processing
(the "http2_idle_timeout" directive);
2. Receive timeout for incomplete frames when there are no streams in
processing (the "http2_recv_timeout" directive);
3. Send timeout when there are frames waiting in the output queue
(the "send_timeout" directive on a server level).
Reaching one of these timeouts leads to HTTP/2 connection close.
This left a number of scenarios when a connection can get stuck without any
processing and timeouts:
1. A client has sent the headers block partially so nginx starts processing
a new stream but cannot continue without the rest of HEADERS and/or
CONTINUATION frames;
2. When nginx waits for the request body;
3. All streams are stuck on exhausted connection or stream windows.
The first idea that was rejected was to detect when the whole connection
gets stuck because of these situations and set the global receive timeout.
The disadvantage of such approach would be inconsistent behaviour in some
typical use cases. For example, if a user never replies to the browser's
question about where to save the downloaded file, the stream will be
eventually closed by a timeout. On the other hand, this will not happen
if there's some activity in other concurrent streams.
Now almost all the request timeouts work like in HTTP/1.x connections, so
the "client_header_timeout", "client_body_timeout", and "send_timeout" are
respected. These timeouts close the request.
The global timeouts work as before.
Previously, the c->write->delayed flag was abused to avoid setting timeouts on
stream events. Now, the "active" and "ready" flags are manipulated instead to
control the processing of individual streams.
Since 667aaf61a778 (1.1.17) the ngx_http_parse_header_line() function can return
NGX_HTTP_PARSE_INVALID_HEADER when a header contains NUL character. In this
case the r->header_end pointer isn't properly initialized, but the log message
in ngx_http_process_request_headers() hasn't been adjusted. It used the pointer
in size calculation, which might result in up to 2k buffer over-read.
Found with afl-fuzz.
Skip SSL_CTX_set_tlsext_servername_callback in case of renegotiation.
Do nothing in SNI callback as in this case it will be supplied with
request in c->data which isn't expected and doesn't work this way.
This was broken by b40af2fd1c16 (1.9.6) with OpenSSL master branch and LibreSSL.
OpenSSL doesn't check if the negotiated protocol has been announced.
As a result, the client might force using HTTP/2 even if it wasn't
enabled in configuration.
The r->request_body_no_buffering flag was introduced. It instructs
client request body reading code to avoid reading the whole body, and
to call post_handler early instead. The caller should use the
ngx_http_read_unbuffered_request_body() function to read remaining
parts of the body.
Upstream module is now able to use this mode, if configured with
the proxy_request_buffering directive.
Previously, connection hung after calling ngx_http_ssl_handshake() with
rev->ready set and no bytes in socket to read. It's possible in at least the
following cases:
- when processing a connection with expired TCP_DEFER_ACCEPT on Linux
- after parsing PROXY protocol header if it arrived in a separate TCP packet
Thanks to James Hamlin.
To ensure proper logging make sure to set current_request in all event
handlers, including resolve, ssl handshake, cache lock wait timer and
aio read handlers. A macro ngx_http_set_log_request() introduced to
simplify this.
The SPDY module doesn't expect timers can be set on stream events for reasons
other than delaying output. But ngx_http_writer() could add timer on write
event if the delayed flag wasn't set and nginx is waiting for AIO completion.
That could cause delays in sending response over SPDY when file AIO was used.
This ensures that debug logging and the $uri variable (if used in
400 Bad Request processing) will not try to access uninitialized
memory.
Found by Sergey Bobrov.
Client address specified in the PROXY protocol header is now
saved in the $proxy_protocol_addr variable and can be used in
the realip module.
This is currently not implemented for mail.
Backed out 05a56ebb084a, as it turns out that kernel can return connections
without any delay if syncookies are used. This basically means we can't
assume anything about connections returned with deferred accept set.
To solve original problem the 05a56ebb084a tried to solve, i.e. to don't
wait longer than needed if a connection was accepted after deferred accept
timeout, this patch changes a timeout set with setsockopt(TCP_DEFER_ACCEPT)
to 1 second, unconditionally. This is believed to be enough for speed
improvements, and doesn't imply major changes to timeouts used.
Note that before 2.6.32 connections were dropped after a timeout. Though
it is believed that 1s is still appropriate for kernels before 2.6.32,
as previously tcp_synack_retries controlled the actual timeout and 1s results
in more than 1 minute actual timeout by default.
It is believed to be better than fallback to HTTP/0.9, because most of
the clients at present time support HTTP/1.0. It allows nginx to return
error response code for them in cases when it fail to parse request line,
and therefore fail to detect client protocol version.
Even if the client does not support HTTP/1.0, this assumption should not
cause any harm, since from the HTTP/0.9 point of view it still a valid
response.
Previous code called ngx_http_finalize_request() with rc = 0. This is
ok if a response status was already set, but resulted in "000" being
logged if it wasn't. In particular this happened with limit_req
if a connection was prematurely closed during limit_req delay.
There are two significant changes in this patch:
1) The <= 0 comparison is done with a signed type. This fixes the case
of ngx_time() being larger than r->lingering_time.
2) Calculation of r->lingering_time - ngx_time() is now always done
in the ngx_msec_t type. This ensures the calculation is correct
even if time_t is unsigned and differs in size from ngx_msec_t.
Thanks to Lanshun Zhou.
If nginx was compiled without --with-http_ssl_module, but with some
other module which uses OpenSSL (e.g. --with-mail_ssl_module), insufficient
preprocessor check resulted in build failure. The problem was introduced
by e0a3714a36f8 (1.3.14).
Reported by Roman Arutyunyan.
This should improve behavior under deficiency of connections.
Since SSL handshake usually takes significant amount of time,
we exclude connections from reusable queue during this period
to avoid premature flush of them.
If c->recv() returns 0 there is no sense in using ngx_socket_errno for
logging, its value meaningless. (The code in question was copied from
ngx_http_keepalive_handler(), but ngx_socket_errno makes sense there as it's
used as a part of ECONNRESET handling, and the c->recv() call is preceeded
by the ngx_set_socket_errno(0) call.)
The c->single_connection was intended to be used as lock mechanism
to serialize modifications of request object from several threads
working with client and upstream connections. The flag is redundant
since threads in nginx have never been used that way.
In Linux 2.6.32, TCP_DEFER_ACCEPT was changed to accept connections
after the deferring period is finished without any data available.
(Reading from the socket returns EAGAIN in this case.)
Since in nginx TCP_DEFER_ACCEPT is set to "post_accept_timeout", we
do not need to wait longer if deferred accept returns with no data.
Previously, only the first request in a connection used timeout
value from the "client_header_timeout" directive while reading
header. All subsequent requests used "keepalive_timeout" for
that.
It happened because timeout of the read event was set to the
value of "keepalive_timeout" in ngx_http_set_keepalive(), but
was not removed when the next request arrived.
Previously, we always created an object and logged 400 (Bad Request)
in access log if a client closed connection without sending any data.
Such a connection was counted as "reading".
Since it's common for modern browsers to behave like this, it's no
longer considered an error if a client closes connection without
sending any data, and such a connection will be counted as "waiting".
Now, we do not log 400 (Bad Request) and keep memory footprint as
small as possible.
Previously, it was allocated from a connection pool and
was selectively freed for an idle keepalive connection.
The goal is to put coupled things in one chunk of memory,
and to simplify handling of request objects.
According to RFC 6066, client is not supposed to request a different server
name at the application layer. Server implementations that rely upon these
names being equal must validate that a client did not send a different name
in HTTP request. Current versions of Apache HTTP server always return 400
"Bad Request" in such cases.
There exist implementations however (e.g., SPDY) that rely on being able to
request different host names in one connection. Given this, we only reject
requests with differing host names if verification of client certificates
is enabled in a corresponding server configuration.
An example of configuration that might not work as expected:
server {
listen 433 ssl default;
return 404;
}
server {
listen 433 ssl;
server_name example.org;
ssl_client_certificate org.cert;
ssl_verify_client on;
}
server {
listen 433 ssl;
server_name example.com;
ssl_client_certificate com.cert;
ssl_verify_client on;
}
Previously, a client was able to request example.com by presenting
a certificate for example.org, and vice versa.
Not only this is consistent with a case without SNI, but this also
prevents abusing configurations that assume that the $host variable
is limited to one of the configured names for a server.
An example of potentially unsafe configuration:
server {
listen 443 ssl default_server;
...
}
server {
listen 443;
server_name example.com;
location / {
proxy_pass http://$host;
}
}
Note: it is possible to negotiate "example.com" by SNI, and to request
arbitrary host name that does not exist in the configuration above.
Previously, this was done only after the whole request header
was parsed, and if an error occurred earlier then the request
was processed in the default server (or server chosen by SNI),
while r->headers_in.server might be set to the value from the
Host: header or host from request line.
r->headers_in.server is in turn used for $host variable and
in HTTP redirects if "server_name_in_redirect" is disabled.
Without the change, configurations that rely on this during
error handling are potentially unsafe if SNI is used.
This change also allows to use server specific settings of
"underscores_in_headers", "ignore_invalid_headers", and
"large_client_header_buffers" directives for HTTP requests
and HTTPS requests without SNI.
The request object will not be created until SSL handshake is complete.
This simplifies adding another connection handler that does not need
request object right after handshake (e.g., SPDY).
There are also a few more intentional effects:
- the "client_header_buffer_size" directive will be taken from the
server configuration that was negotiated by SNI;
- SSL handshake errors and timeouts are not logged into access log
as bad requests;
- ngx_ssl_create_connection() is not called until the first byte of
ClientHello message was received. This also decreases memory
consumption if plain HTTP request is sent to SSL socket.
Previously, only the first request in a connection was assigned the
configuration selected by SNI. All subsequent requests initially
used the default server's configuration, ignoring SNI, which was
wrong.
Now all subsequent requests in a connection will initially use the
configuration selected by SNI. This is done by storing a pointer
to configuration in http connection object. It points to default
server's configuration initially, but changed upon receipt of SNI.
(The request's configuration can be further refined when parsing
the request line and Host: header.)
This change was not made specific to SNI as it also allows slightly
faster access to configuration without the request object.
This change helps to decouple ngx_http_ssl_servername() from the request
object.
Note: now we close connection in case of error during server name lookup
for request. Previously, we did so only for HTTP/0.9 requests.