The IP_BIND_ADDRESS_NO_PORT option is set on upstream sockets
if proxy_bind does not specify a port. The SO_REUSEADDR option
is set on UDP upstream sockets if proxy_bind specifies a port.
Due to checking of the wrong port, IP_BIND_ADDRESS_NO_PORT was
never set, and SO_REUSEPORT was always set.
This flag appeared in Linux 4.5 and is useful for avoiding thundering herd
problem.
The current Linux kernel implementation walks the list of exclusive waiters,
and queues an event to each epfd, until it finds the first waiter that has
threads blocked on it via epoll_wait().
Now it is believed that the accept mutex brings more harm than benefits.
Especially in various benchmarks it often results in situation where only
one worker grabs all connections.
The option is only set if the socket is bound to a specific port to allow
several such sockets coexist at the same time. This is required, for example,
when nginx acts as a transparent proxy and receives two datagrams from the same
client in a short time.
The feature is only implemented for Linux.
This patch moves various OpenSSL-specific function calls into the
OpenSSL module and introduces ngx_ssl_ciphers() to make nginx more
crypto-library-agnostic.
Using the same DH parameters on multiple servers is believed to be subject
to precomputation attacks, see http://weakdh.org/. Additionally, 1024 bits
are not enough in the modern world as well. Let users provide their own
DH parameters with the ssl_dhparam directive if they want to use EDH ciphers.
Note that SSL_CTX_set_dh_auto() as provided by OpenSSL 1.1.0 uses fixed
DH parameters from RFC 5114 and RFC 3526, and therefore subject to the same
precomputation attacks. We avoid using it as well.
This change also fixes compilation with OpenSSL 1.1.0-pre5 (aka Beta 2),
as OpenSSL developers changed their policy after releasing Beta 1 and
broke API once again by making the DH struct opaque (see ticket #860).
OpenSSL 1.0.2+ allows configuring a curve list instead of a single curve
previously supported. This allows use of different curves depending on
what client supports (as available via the elliptic_curves extension),
and also allows use of different curves in an ECDHE key exchange and
in the ECDSA certificate.
The special value "auto" was introduced (now the default for ssl_ecdh_curve),
which means "use an internal list of curves as available in the OpenSSL
library used". For versions prior to OpenSSL 1.0.2 it maps to "prime256v1"
as previously used. The default in 1.0.2b+ prefers prime256v1 as well
(and X25519 in OpenSSL 1.1.0+).
As client vs. server preference of curves is controlled by the
same option as used for ciphers (SSL_OP_CIPHER_SERVER_PREFERENCE),
the ssl_prefer_server_ciphers directive now controls both.
The SSL_CTX_add0_chain_cert() function as introduced in OpenSSL 1.0.2 now
used instead of SSL_CTX_add_extra_chain_cert().
SSL_CTX_add_extra_chain_cert() adds extra certs for all certificates
in the context, while SSL_CTX_add0_chain_cert() only to a particular
certificate. There is no difference unless multiple certificates are used,
though it is important when using multiple certificates.
Additionally, SSL_CTX_select_current_cert() is now called before using
a chain to make sure correct chain will be returned.
A pointer to a previously configured certificate now stored in a certificate.
This makes it possible to iterate though all certificates configured in
the SSL context. This is now used to configure OCSP stapling for all
certificates, and in ngx_ssl_session_id_context().
As SSL_CTX_use_certificate() frees previously loaded certificate of the same
type, and we have no way to find out if it's the case, X509_free() calls
are now posponed till ngx_ssl_cleanup_ctx().
Note that in OpenSSL 1.0.2+ this can be done without storing things in exdata
using the SSL_CTX_set_current_cert() and SSL_CTX_get0_certificate() functions.
These are not yet available in all supported versions though, so it's easier
to continue to use exdata for now.
This makes it possible to properly return OCSP staple with multiple
certificates configured.
Note that it only works properly in OpenSSL 1.0.1d+, 1.0.0k, 0.9.8y+.
In older versions SSL_get_certificate() fails to return correct certificate
when the certificate status callback is called.
When it's known that the kernel supports EPOLLRDHUP, there is no need in
additional recv() call to get EOF or error when the flag is absent in the
event generated by the kernel. A special runtime test is done at startup
to detect if EPOLLRDHUP is actually supported by the kernel because
epoll_ctl() silently ignores unknown flags.
With this knowledge it's now possible to drop the "ready" flag for partial
read. Previously, the "ready" flag was kept until the recv() returned EOF
or error. In particular, this change allows the lingering close heuristics
(which relies on the "ready" flag state) to actually work on Linux, and not
wait for more data in most cases.
The "available" flag is now used in the read event with the semantics similar
to the corresponding counter in kqueue.
This parameter lets binding the proxy connection to a non-local address.
Upstream will see the connection as coming from that address.
When used with $remote_addr, upstream will accept the connection from real
client address.
Example:
proxy_bind $remote_addr transparent;
SSLeay_version() and SSLeay() are no longer available if OPENSSL_API_COMPAT
is set to 0x10100000L. Switched to using OpenSSL_version() instead.
Additionally, we now compare version strings instead of version numbers,
and this correctly works for LibreSSL as well.
OPENSSL_config() deprecated in OpenSSL 1.1.0. Additionally,
SSL_library_init(), SSL_load_error_strings() and OpenSSL_add_all_algorithms()
are no longer available if OPENSSL_API_COMPAT is set to 0x10100000L.
The OPENSSL_init_ssl() function is now used instead with appropriate
arguments to trigger the same behaviour. The configure test changed to
use SSL_CTX_set_options().
Deinitialization now happens automatically in OPENSSL_cleanup() called
via atexit(3), so we no longer call EVP_cleanup() and ENGINE_cleanup()
directly.
LibreSSL defines OPENSSL_VERSION_NUMBER to 0x20000000L, but uses an old
API derived from OpenSSL at the time LibreSSL forked. As a result, every
version check we use to test for new API elements in newer OpenSSL versions
requires an explicit check for LibreSSL.
To reduce clutter, redefine OPENSSL_VERSION_NUMBER to 0x1000107fL if
LibreSSL is used. The same is done by FreeBSD port of LibreSSL.
Fixes various aspects of --test-build-devpoll, --test-build-eventport, and
--test-build-epoll.
In particular, if --test-build-devpoll was used on Linux, then "devpoll"
event method would be preferred over "epoll". Also, wrong definitions of
event macros were chosen.
The "aio_write" directive is introduced, which enables use of aio
for writing. Currently it is meaningful only with "aio threads".
Note that aio operations can be done by both event pipe and output
chain, so proper mapping between r->aio and p->aio is provided when
calling ngx_event_pipe() and in output filter.
In collaboration with Valentin Bartenev.
If a write event happens after sendfile() but before we've got the
sendfile results in the main thread, this write event will be ignored.
And if no more events will happen, the connection will hang.
Removing the events works in the simple cases, but not always, as
in some cases events are added back by an unrelated code. E.g.,
the upstream module adds write event in the ngx_http_upstream_init()
to track client aborts.
Fix is to use wev->complete instead. It is now set to 0 before
a sendfile() task is posted, and it is set to 1 once a write event
happens. If on completion of the sendfile() task wev->complete is 1,
we know that an event happened while we were executing sendfile(), and
the socket is still ready for writing even if sendfile() did not sent
all the data or returned EAGAIN.
This fixes "called a function you should not call" and
"shutdown while in init" errors as observed with OpenSSL 1.0.2f
due to changes in how OpenSSL handles SSL_shutdown() during
SSL handshakes.
As setitimer() isn't available on Windows, time wasn't updated at all
if timer_resolution was used with the select event method. Fix is
to ignore timer_resolution in such cases.
This context is needed for shared sessions cache to work in configurations
with multiple virtual servers sharing the same port. Unfortunately, OpenSSL
does not provide an API to access the session context, thus storing it
separately.
In collaboration with Vladimir Homutov.
If no space left in buffer after adding formatting symbols, error message
could be left without terminating null. The fix is to output message using
actual length.
RAND_pseudo_bytes() is deprecated in the OpenSSL master branch, so the only
use was changed to RAND_bytes(). Access to internal structures is no longer
possible, so now we don't try to set SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS even
if it's defined.
OCSP responses may contain no nextUpdate. As per RFC 6960, this means
that nextUpdate checks should be bypassed. Handle this gracefully by
using NGX_MAX_TIME_T_VALUE as "valid" in such a case.
The problem was introduced by 6893a1007a7c (1.9.2).
Reported by Matthew Baldwin.
Broken by 6893a1007a7c (1.9.2) during introduction of strict OCSP response
validity checks. As stapling file is expected to be returned unconditionally,
fix is to set its validity to the maximum supported time.
Reported by Faidon Liambotis.
When configured, an individual listen socket on a given address is
created for each worker process. This allows to reduce in-kernel lock
contention on configurations with high accept rates, resulting in better
performance. As of now it works on Linux and DragonFly BSD.
Note that on Linux incoming connection requests are currently tied up
to a specific listen socket, and if some sockets are closed, connection
requests will be reset, see https://lwn.net/Articles/542629/. With
nginx, this may happen if the number of worker processes is reduced.
There is no such problem on DragonFly BSD.
Based on previous work by Sepherosa Ziehau and Yingqi Lu.
This may happen if eventfd() returns ENOSYS, notably seen on CentOS 5.4.
Such a failure will now just disable the notification mechanism and let
the callers cope with it, instead of failing to start worker processes.
If thread pools are not configured, this can safely be ignored.
Missing call to X509_STORE_CTX_free when X509_STORE_CTX_init fails.
Missing call to OCSP_CERTID_free when OCSP_request_add0_id fails.
Possible leaks in vary particular scenariis of memory shortage.
The SSL_MODE_NO_AUTO_CHAIN mode prevents OpenSSL from automatically
building a certificate chain on the fly if there is no certificate chain
explicitly provided. Before this change, certificates provided via the
ssl_client_certificate and ssl_trusted_certificate directives were
used by OpenSSL to automatically build certificate chains, resulting
in unexpected (and in some cases unneeded) chains being sent to clients.
LibreSSL 2.1.1+ started to set SSL_OP_NO_SSLv3 option by default on
new contexts. This makes sure to clear it to make it possible to use SSLv3
with LibreSSL if enabled in nginx config.
Prodded by Kuramoto Eiji.
This reduces layering violation and simplifies the logic of AIO preread, since
it's now triggered by the send chain function itself without falling back to
the copy filter. The context of AIO operation is now stored per file buffer,
which makes it possible to properly handle cases when multiple buffers come
from different locations, each with its own configuration.
Instead of collecting a number of the possible SSL_CTX_use_PrivateKey_file()
error codes that becomes more and more difficult with the rising variety of
OpenSSL versions and its derivatives, just continue with the next password.
Multiple passwords in a single ssl_password_file feature was broken after
recent OpenSSL changes (commit 4aac102f75b517bdb56b1bcfd0a856052d559f6e).
Affected OpenSSL releases: 0.9.8zc, 1.0.0o, 1.0.1j and 1.0.2-beta3.
Reported by Piotr Sikora.
GetQueuedCompletionStatus() document on MSDN says the
following signature:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364986.aspx
BOOL WINAPI GetQueuedCompletionStatus(
_In_ HANDLE CompletionPort,
_Out_ LPDWORD lpNumberOfBytes,
_Out_ PULONG_PTR lpCompletionKey,
_Out_ LPOVERLAPPED *lpOverlapped,
_In_ DWORD dwMilliseconds
);
In the latest specification, the type of the third argument
(lpCompletionKey) is PULONG_PTR not LPDWORD.
This prevents inappropriate session reuse in unrelated server{}
blocks, while preserving ability to restore sessions on other servers
when using TLS Session Tickets.
Additionally, session context is now set even if there is no session cache
configured. This is needed as it's also used for TLS Session Tickets.
Thanks to Antoine Delignat-Lavaud and Piotr Sikora.
The new directives {proxy,fastcgi,scgi,uwsgi,memcached}_next_upstream_tries
and {proxy,fastcgi,scgi,uwsgi,memcached}_next_upstream_timeout limit
the number of upstreams tried and the maximum time spent for these tries
when searching for a valid upstream.
Some of the OpenSSL forks (read: BoringSSL) started removing unused,
no longer necessary and/or not really working bug workarounds along
with the SSL options and defines for them.
Instead of fixing nginx build after each removal, be proactive
and guard use of all SSL options for bug workarounds.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
In theory, this can provide a bit better distribution of latencies.
Also it simplifies the code, since ngx_queue_t is now used instead
of custom implementation.
LibreSSL developers decided that LibreSSL is OpenSSL-2.0.0, so tests
for OpenSSL-1.0.2+ are now passing, even though the library doesn't
provide functions that are expected from that version of OpenSSL.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
This change adds support for using BoringSSL as a drop-in replacement
for OpenSSL without adding support for any of the BoringSSL-specific
features.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
This is really just a prerequisite for building against BoringSSL,
which doesn't provide either of those features.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
SSL_SESSION struct is internal part of the OpenSSL library and it's fields
should be accessed via API (when exposed), not directly.
The unfortunate side-effect of this change is that we're losing reference
count that used to be printed at the debug log level, but this seems to be
an acceptable trade-off.
Almost fixes build with -DOPENSSL_NO_SSL_INTERN.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
The RSA_generate_key() is marked as deprecated and causes build to
fail. On the other hand, replacement function, RSA_generate_key_ex(),
requires much more code. Since RSA_generate_key() is only needed
for barely usable EXP ciphers, the #ifdef was added instead.
Prodded by Piotr Sikora.
This change is mostly cosmetic, because in practice this callback
is used only for 512-bit RSA keys.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
Previously, <bn.h>, <dh.h>, <rand.h> and <rsa.h> were pulled in
by <engine.h> using OpenSSL's deprecated interface, which meant
that nginx couldn't have been built with -DOPENSSL_NO_DEPRECATED.
Both <x509.h> and <x509v3.h> are pulled in by <ocsp.h>, but we're
calling X509 functions directly, so let's include those as well.
<crypto.h> is pulled in by virtually everything, but we're calling
CRYPTO_add() directly, so let's include it as well.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
Previously, the NGX_LOG_INFO level was used unconditionally. This is
correct for client SSL connections, but too low for connections to
upstream servers. To resolve this, ngx_connection_error() now used
to log this error, it will select logging level appropriately.
With this change, if an upstream connection is closed during SSL
handshake, it is now properly logged at "error" level.
Previously, nginx closed client connection in cases when a response body
from upstream was needed to be cached or stored but shouldn't be sent to
the client. While this is normal for HTTP, it is unacceptable for SPDY.
Fix is to use instead the p->downstream_error flag to prevent nginx from
sending anything downstream. To make this work, the event pipe code was
modified to properly cache empty responses with the flag set.
This fixes --with-file-aio support on systems that lack eventfd()
syscall, notably aarch64 Linux.
The syscall(SYS_eventfd) may still be necessary on systems that
have eventfd() syscall in the kernel but lack it in glibc, e.g.
as seen in the current CentOS 5 release.
The flag allows to suppress "ngx_slab_alloc() failed: no memory" messages
from a slab allocator, e.g., if an LRU expiration is used by a consumer
and allocation failures aren't fatal.
The flag is now used in the SSL session cache code, and in the limit_req
module.
Even during execution of a request it is possible that there will be
no session available, notably in case of renegotiation. As a result
logging of $ssl_session_id in some cases caused NULL pointer dereference
after revision 97e3769637a7 (1.5.9). The check added returns an empty
string if there is no session available.
Previously, it used to contain full session serialized instead of just
a session id, making it almost impossible to use the variable in a safe
way.
Thanks to Ivan Ristić.
If c->read->ready was reset, but later some data were read from a socket
buffer due to a call to ngx_ssl_recv(), the c->read->ready flag should
be restored if not all data were read from OpenSSL buffers (as kernel
won't notify us about the data anymore).
More details are available here:
http://mailman.nginx.org/pipermail/nginx/2013-November/041178.html
In order to support key rollover, ssl_session_ticket_key can be defined
multiple times. The first key will be used to issue and resume Session
Tickets, while the rest will be used only to resume them.
ssl_session_ticket_key session_tickets/current.key;
ssl_session_ticket_key session_tickets/prev-1h.key;
ssl_session_ticket_key session_tickets/prev-2h.key;
Please note that nginx supports Session Tickets even without explicit
configuration of the keys and this feature should be only used in setups
where SSL traffic is distributed across multiple nginx servers.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
The timeout set is used by OpenSSL as a hint for clients in TLS Session
Tickets. Previous code resulted in a default timeout (5m) used for TLS
Sessions Tickets if there was no session cache configured.
Prodded by Piotr Sikora.