Iterating through all connections takes a lot of CPU time, especially
with large number of worker connections configured. As a result
nginx processes used to consume CPU time during graceful shutdown.
To mitigate this we now only do a full scan for idle connections when
shutdown signal is received.
Transitions of connections to idle ones are now expected to be
avoided if the ngx_exiting flag is set. The upstream keepalive module
was modified to follow this.
If nginx was used under OpenVZ and a container with nginx was suspended
and resumed, configuration tests started to fail because of EADDRINUSE
returned from listen() instead of bind():
# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] listen() to 0.0.0.0:80, backlog 511 failed (98: Address already in use)
nginx: configuration file /etc/nginx/nginx.conf test failed
With this change EADDRINUSE errors returned by listen() are handled
similarly to errors returned by bind(), and configuration tests work
fine in the same environment:
# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
More details about OpenVZ suspend/resume bug:
https://bugzilla.openvz.org/show_bug.cgi?id=2470
If the -T option is passed, additionally to configuration test, configuration
files are output to stdout.
In the debug mode, configuration files are kept in memory and can be accessed
using a debugger.
The function is now called ngx_parse_http_time(), and can be used by
any code to parse HTTP-style date and time. In particular, it will be
used for OCSP stapling.
For compatibility, a macro to map ngx_http_parse_time() to the new name
provided for a while.
When configured, an individual listen socket on a given address is
created for each worker process. This allows to reduce in-kernel lock
contention on configurations with high accept rates, resulting in better
performance. As of now it works on Linux and DragonFly BSD.
Note that on Linux incoming connection requests are currently tied up
to a specific listen socket, and if some sockets are closed, connection
requests will be reset, see https://lwn.net/Articles/542629/. With
nginx, this may happen if the number of worker processes is reduced.
There is no such problem on DragonFly BSD.
Based on previous work by Sepherosa Ziehau and Yingqi Lu.
Two mechanisms are implemented to make it possible to store pointers
in shared memory on Windows, in particular on Windows Vista and later
versions with ASLR:
- The ngx_shm_remap() function added to allow remapping of a shared memory
zone to the address originally used for it in the master process. While
important, it doesn't solve the problem by itself as in many cases it's
not possible to use the address because of conflicts with other
allocations.
- We now create mappings at the same address in all processes by starting
mappings at predefined addresses normally unused by newborn processes.
These two mechanisms combined allow to use shared memory on Windows
almost without problems, including reloads.
Based on the patch by Sergey Brester:
http://mailman.nginx.org/pipermail/nginx-devel/2015-April/006836.html
Similar to ngx_http_file_cache_set_slot(), the last component of file->name
with a fixed length of 10 bytes, as generated in ngx_create_temp_path(), is
used as a source for the names of intermediate subdirectories with each one
taking its own part. Ensure that the sum of specified levels with slashes
fits into the length (ticket #731).
Example of usage:
error_log memory:16m debug;
This allows to configure debug logging with minimum impact on performance.
It's especially useful when rare crashes are experienced under high load.
The log can be extracted from a coredump using the following gdb script:
set $log = ngx_cycle->log
while $log->writer != ngx_log_memory_writer
set $log = $log->next
end
set $buf = (ngx_log_memory_buf_t *) $log->wdata
dump binary memory debug_log.txt $buf->start $buf->end
Initial size as calculated from the number of elements may be bigger
than max_size. If this happens, make sure to set size to max_size.
Reported by Chris West.
Previously, this function checked for connection local address existence
and returned error if it was missing. Now a new address is assigned in this
case making it possible to call this function not only for accepted connections.
It appeared that the NGX_HAVE_AIO_SENDFILE macro was defined regardless of
the "--with-file-aio" configure option and the NGX_HAVE_FILE_AIO macro.
Now they are related.
Additionally, fixed one macro.
This reduces layering violation and simplifies the logic of AIO preread, since
it's now triggered by the send chain function itself without falling back to
the copy filter. The context of AIO operation is now stored per file buffer,
which makes it possible to properly handle cases when multiple buffers come
from different locations, each with its own configuration.
The mtx->wait counter was not decremented if we were able to obtain the lock
right after incrementing it. This resulted in unneeded sem_post() calls,
eventually leading to EOVERFLOW errors being logged, "sem_post() failed
while wake shmtx (75: Value too large for defined data type)".
To close the race, mtx->wait is now decremented if we obtain the lock right
after incrementing it in ngx_shmtx_lock(). The result can become -1 if a
concurrent ngx_shmtx_unlock() decrements mtx->wait before the added code does.
However, that only leads to one extra iteration in the next call of
ngx_shmtx_lock().
The use_temp_path http cache feature is now implemented using a separate temp
hierarchy in cache directory. Prefix-based temp files are no longer needed.
The original check for NGX_AGAIN was surplus, since the function returns
only NGX_OK or NGX_ERROR. Now it looks similar to other places.
No functional changes.
In 954867a2f0a6, we switched to using resolver node as the timer event data.
This broke debug event logging.
Replaced now unused ngx_resolver_ctx_t.ident with ngx_resolver_node_t.ident
so that ngx_event_ident() extracts something sensible when accessing
ngx_resolver_node_t as ngx_connection_t.
In 954867a2f0a6, we switched to using resolver node as the
timer event data, so make sure we do not free resolver node
memory until the corresponding timer is deleted.
If a syslog daemon is restarted and the unix socket is used, further logging
might stop to work. In case of send error, socket is closed, forcing
a reconnection at the next logging attempt.
The ngx_cycle->log is used when sending the message. This allows to log syslog
send errors in another log.
Logging to syslog after its cleanup handler has been executed was prohibited.
Previously, this was possible from ngx_destroy_pool(), which resulted in error
messages caused by attempts to write into the closed socket.
The "processing" flag is renamed to "busy" to better match its semantics.
In theory, this can provide a bit better distribution of latencies.
Also it simplifies the code, since ngx_queue_t is now used instead
of custom implementation.
RFC3986 says that, for consistency, URI producers and normalizers
should use uppercase hexadecimal digits for all percent-encodings.
This is also what modern web browsers and other tools use.
Using lowercase hexadecimal digits makes it harder to interact with
those tools in case when use of the percent-encoded URI is required,
for example when $request_uri is part of the cache key.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
The check became meaningless after refactoring in 2a92804f4109.
With the loop currently in place, "current" can't be NULL, hence
the check can be dropped.
Additionally, the local variable "current" was removed to
simplify code, and pool->current now used directly instead.
Found by Coverity (CID 714236).
This isn't really important as configuration testing shortly ends with
a process termination which will free all sockets, though Coverity
complains.
Prodded by Coverity (CID 400872).
Large allocations from a slab pool result in free page blocks being fragmented,
eventually leading to a situation when no further allocation larger than a page
size are possible from the pool. While this isn't a problem for nginx itself,
it is known to be bad for various 3rd party modules. Fix is to merge adjacent
blocks of free pages in the ngx_slab_free_pages() function.
Prodded by Wandenberg Peixoto and Yichun Zhang.
Previous code failed to properly restore cf->conf_file in case of
ngx_close_file() errors, potentially resulting in double free of
cf->conf_file->buffer->start.
Found by Coverity (CID 1087507).
The flag allows to suppress "ngx_slab_alloc() failed: no memory" messages
from a slab allocator, e.g., if an LRU expiration is used by a consumer
and allocation failures aren't fatal.
The flag is now used in the SSL session cache code, and in the limit_req
module.
Client address specified in the PROXY protocol header is now
saved in the $proxy_protocol_addr variable and can be used in
the realip module.
This is currently not implemented for mail.
Linux returns EOPNOTSUPP for non-TCP sockets and ENOPROTOOPT for TCP
sockets, because getsockopt(TCP_FASTOPEN) is not implemented so far.
While there, lower the log level from ALERT to NOTICE to match other
getsockopt() failures.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
Backed out 05a56ebb084a, as it turns out that kernel can return connections
without any delay if syncookies are used. This basically means we can't
assume anything about connections returned with deferred accept set.
To solve original problem the 05a56ebb084a tried to solve, i.e. to don't
wait longer than needed if a connection was accepted after deferred accept
timeout, this patch changes a timeout set with setsockopt(TCP_DEFER_ACCEPT)
to 1 second, unconditionally. This is believed to be enough for speed
improvements, and doesn't imply major changes to timeouts used.
Note that before 2.6.32 connections were dropped after a timeout. Though
it is believed that 1s is still appropriate for kernels before 2.6.32,
as previously tcp_synack_retries controlled the actual timeout and 1s results
in more than 1 minute actual timeout by default.
Previously pool->current wasn't moved back to pool, resulting in blocks
not used for further allocations if pool->current was already moved at the
time of ngx_reset_pool(). Additionally, to preserve logic of moving
pool->current, the p->d.failed counters are now properly cleared. While
here, pool->chain is also cleared.
This change is essentially a nop with current code, but generally improves
things.
Fallback to synchronous sendfile() now only done on 3rd EBUSY without
any progress in a row. Not falling back is believed to be better
in case of occasional EBUSY, though protection is still needed to
make sure there will be no infinite loop.
Stricten response header checks: ensure that reserved bits are zeroes,
and that the opcode is "standard query".
Fixed the "zero-length domain name in DNS response" condition.
Renamed ngx_resolver_query_t to ngx_resolver_hdr_t as it describes
the header that is common to DNS queries and answers.
Replaced the magic number 12 by the size of the header structure.
The other changes are self-explanatory.
Recent Linux versions started to return EOPNOTSUPP to getsockopt() calls
on unix sockets, resulting in log pollution on binary upgrade. Such errors
are silently ignored now.
The accept_filter and deferred options were not applied to sockets
that were added to configuration during binary upgrade cycle.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
This patch fixes incorrect handling of auto redirect in configurations
like:
location /0 { }
location /a- { }
location /a/ { proxy_pass ... }
With previously used sorting, this resulted in the following locations
tree (as "-" is less than "/"):
"/a-"
"/0" "/a/"
and a request to "/a" didn't match "/a/" with auto_redirect, as it
didn't traverse relevant tree node during lookup (it tested "/a-",
then "/0", and then falled back to null location).
To preserve locale use for non-ASCII characters on case-insensetive
systems, libc's tolower() used.
Found by using auth_basic.t from mdounin nginx-tests under valgrind.
==10470== Invalid write of size 1
==10470== at 0x43603D: ngx_crypt_to64 (ngx_crypt.c:168)
==10470== by 0x43648E: ngx_crypt (ngx_crypt.c:153)
==10470== by 0x489D8B: ngx_http_auth_basic_crypt_handler (ngx_http_auth_basic_module.c:297)
==10470== by 0x48A24A: ngx_http_auth_basic_handler (ngx_http_auth_basic_module.c:240)
==10470== by 0x44EAB9: ngx_http_core_access_phase (ngx_http_core_module.c:1121)
==10470== by 0x44A822: ngx_http_core_run_phases (ngx_http_core_module.c:895)
==10470== by 0x44A932: ngx_http_handler (ngx_http_core_module.c:878)
==10470== by 0x455EEF: ngx_http_process_request (ngx_http_request.c:1852)
==10470== by 0x456527: ngx_http_process_request_headers (ngx_http_request.c:1283)
==10470== by 0x456A91: ngx_http_process_request_line (ngx_http_request.c:964)
==10470== by 0x457097: ngx_http_wait_request_handler (ngx_http_request.c:486)
==10470== by 0x4411EE: ngx_epoll_process_events (ngx_epoll_module.c:691)
==10470== Address 0x5866fab is 0 bytes after a block of size 27 alloc'd
==10470== at 0x4A074CD: malloc (vg_replace_malloc.c:236)
==10470== by 0x43B251: ngx_alloc (ngx_alloc.c:22)
==10470== by 0x421B0D: ngx_malloc (ngx_palloc.c:119)
==10470== by 0x421B65: ngx_pnalloc (ngx_palloc.c:147)
==10470== by 0x436368: ngx_crypt (ngx_crypt.c:140)
==10470== by 0x489D8B: ngx_http_auth_basic_crypt_handler (ngx_http_auth_basic_module.c:297)
==10470== by 0x48A24A: ngx_http_auth_basic_handler (ngx_http_auth_basic_module.c:240)
==10470== by 0x44EAB9: ngx_http_core_access_phase (ngx_http_core_module.c:1121)
==10470== by 0x44A822: ngx_http_core_run_phases (ngx_http_core_module.c:895)
==10470== by 0x44A932: ngx_http_handler (ngx_http_core_module.c:878)
==10470== by 0x455EEF: ngx_http_process_request (ngx_http_request.c:1852)
==10470== by 0x456527: ngx_http_process_request_headers (ngx_http_request.c:1283)
==10470==
The same path names with different "data" context should not be allowed.
In particular it rejects configurations like this:
proxy_cache_path /var/cache/ keys_zone=one:10m max_size=1g inactive=5m;
proxy_cache_path /var/cache/ keys_zone=two:20m max_size=4m inactive=30s;
Casts between pointers and integers produce warnings on size mismatch. To
silence them, cast to (u)intptr_t should be used. Prevoiusly, casts to
ngx_(u)int_t were used in some cases, and several ngx_int_t expressions had
no casts.
As of now it's mostly style as ngx_int_t is defined as intptr_t.
Several warnings silenced, notably (ngx_socket_t) -1 is now checked
on socket operations instead of -1, as ngx_socket_t is unsigned on win32
and gcc complains on comparison.
With this patch, it's now possible to compile nginx using mingw gcc,
with options we normally compile on win32.
Precompiled headers are disabled as they lead to internal compiler errors
with long configure lines. Couple of false positive warnings silenced.
Various win32 typedefs are adjusted to work with Open Watcom C 1.9 headers.
With this patch, it's now again possible to compile nginx using owc386,
with options we normally compile on win32 minus ipv6 and ssl.
It was introduced in Linux 2.6.39, glibc 2.14 and allows to obtain
file descriptors without actually opening files. Thus made it possible
to traverse path with openat() syscalls without the need to have read
permissions for path components. It is effectively emulates O_SEARCH
which is missing on Linux.
O_PATH is used in combination with O_RDONLY. The last one is ignored
if O_PATH is used, but it allows nginx to not fail when it was built on
modern system (i.e. glibc 2.14+) and run with a kernel older than 2.6.39.
Then O_PATH is unknown to the kernel and ignored, while O_RDONLY is used.
Sadly, fstat() is not working with O_PATH descriptors till Linux 3.6.
As a workaround we fallback to fstatat() with the AT_EMPTY_PATH flag
that was introduced at the same time as O_PATH.
While ngx_get_full_name() might have a bit more descriptive arguments,
the ngx_conf_full_name() is generally easier to use when parsing
configuration and limits exposure of cycle->prefix / cycle->conf_prefix
details.
If a relative path is set by variables, then the ngx_conf_full_name()
function was called while processing requests, which causes allocations
from the cycle pool.
A new function that takes pool as an argument was introduced.
This is done by passing AI_ADDRCONFIG to getaddrinfo().
On Linux, setting net.ipv6.conf.all.disable_ipv6 to 1 will now be
respected.
On FreeBSD, AI_ADDRCONFIG filtering is currently implemented by
attempting to create a datagram socket for the corresponding family,
which succeeds even if the system doesn't in fact have any addresses
of that family configured. That is, if the system with IPv6 support
in the kernel doesn't have IPv6 addresses configured, AI_ADDRCONFIG
will filter out IPv6 only inside a jail without IPv6 addresses or
with IPv6 disabled.
The call to ngx_sock_ntop() in ngx_connection_local_sockaddr() might be
performed with the uninitialized "len" variable. The fix is to initialize
variable to the size of corresponding socket address type.
The issue was introduced in commit 05ba5bce31e0.
On Linux, sockaddr length is required to process unix socket addresses properly
due to unnamed sockets (which don't have sun_path set at all) and abstract
namespace sockets.
When several "error_log" directives are specified in the same configuration
block, logs are written to all files with a matching log level.
All logs are stored in the singly-linked list that is sorted by log level in
the descending order.
Specific debug levels (NGX_LOG_DEBUG_HTTP,EVENT, etc.) are not supported
if several "error_log" directives are specified. In this case all logs
will use debug level that has largest absolute value.
The cycle->new_log->log_level should only be initialized by ngx_init_cycle()
if no error logs were found in the configuration. This move allows to get rid
of extra initialization in ngx_error_log().
If "stderr" was specified in one of the "error_log" directives,
stderr is not redirected to the first error_log on startup,
configuration reload, and reopening log files.
On win32 stderr was not redirected into a file specified by "error_log"
while reopening files. Fix is to use platform-independent functions to
work with stderr, as already used by ngx_init_cycle() and main() since
rev. d8316f307b6a.
It is now a syntax error if tokens passed to a custom configuration
handler are terminated by "{".
The following incorrect configuration is now properly rejected:
map $v $v2 {
a b {
c d {
e f {
}
On Win32 platforms 0 is used to indicate errors in file operations, so
comparing against -1 is not portable.
This was not much of an issue in patched code, since only ngx_fd_info() test
is actually reachable on Win32 and in worst case it might result in bogus
error log entry.
Patch by Piotr Sikora.
And corresponding variable $connections_waiting was added.
Previously, waiting connections were counted as the difference between
active connections and the sum of reading and writing connections.
That made it impossible to count more than one request in one connection
as reading or writing (as is the case for SPDY).
Also, we no longer count connections in handshake state as waiting.
The c->single_connection was intended to be used as lock mechanism
to serialize modifications of request object from several threads
working with client and upstream connections. The flag is redundant
since threads in nginx have never been used that way.
Note: use of {SHA} passwords is discouraged as {SHA} password scheme is
vulnerable to attacks using rainbow tables. Use of {SSHA}, $apr1$ or
crypt() algorithms as supported by OS is recommended instead.
The {SHA} password scheme support is added to avoid the need of changing
the scheme recorded in password files from {SHA} to {SSHA} because such
a change hides security problem with {SHA} passwords.
Patch by Louis Opter, with minor changes.
Upstreams created by "proxy_pass" with IP address and no port were
broken in 1.3.10, by not initializing port in u->sockaddr.
API change: ngx_parse_url() was modified to always initialize port
(in u->sockaddr and in u->port), even for the u->no_resolve case;
ngx_http_upstream() and ngx_http_upstream_add() were adopted.
Uninitialized pointer may result in arbitrary segfaults if access_log is used
without buffer and without variables in file path.
Patch by Tatsuhiko Kubo (ticket #268).
The code refactored in a way to call custom handler that can do appropriate
cleanup work (if any), like flushing buffers, finishing compress streams,
finalizing connections to log daemon, etc..
This includes "debug_connection", upstreams, "proxy_pass", etc.
(ticket #92)
To preserve compatibility, "listen" specified with a domain name
selects the first IPv4 address, if available. If not available,
the first IPv6 address will be used (ticket #186).
The URL parsing code is not expected to initialize port from default port
when in "no_resolve" mode. This got broken in r4671 for the case of IPv6
literals.
The ngx_write_fd() and ngx_read_fd() functions return -1 in case of error,
so the incorrect comparison with NGX_FILE_ERROR (which is 0 on windows
platforms) might result in inaccurate error message in the error log.
Also the ngx_errno global variable is being set only if the returned value
is -1.
nginx doesn't allow the same shared memory zone to be used for different
purposes, but failed to check this on reconfiguration. If a shared memory
zone was used for another purpose in the new configuration, nginx attempted
to reuse it and crashed.
This includes the ssl_stapling_responder directive (defaults to OCSP
responder set in certificate's AIA extension).
OCSP response for a given certificate is requested once we get at least
one connection with certificate_status extension in ClientHello, and
certificate status won't be sent in the connection in question. This due
to limitations in the OpenSSL API (certificate status callback is blocking).
Note: SSL_CTX_use_certificate_chain_file() was reimplemented as it doesn't
allow to access the certificate loaded via SSL_CTX.
The "include" directive should be able to include multiple files if
given a filename mask. Fixed this to work for "include" directives
inside the "map" or "types" blocks. The "include" directive inside
the "geo" block is still not fixed.