The code failed to ensure that "s" is within the buffer passed for
parsing when checking for "ms", and this resulted in unexpected errors when
parsing non-null-terminated strings with trailing "m". The bug manifested
itself when the expires directive was used with variables.
Found by Roman Arutyunyan.
The code for displaying version info and configuration info seemed to be
cluttering up the main function. I was finding it hard to read main. This
extracts out all of the logic for displaying version and configuration info
into its own function, thus making main easier to read.
A configuration like
server { server_name .foo^@; }
server { server_name .foo; }
resulted in a segmentation fault during construction of server names hash.
Reported by Markus Linnala.
Found with afl-fuzz.
Iterating through all connections takes a lot of CPU time, especially
with large number of worker connections configured. As a result
nginx processes used to consume CPU time during graceful shutdown.
To mitigate this we now only do a full scan for idle connections when
shutdown signal is received.
Transitions of connections to idle ones are now expected to be
avoided if the ngx_exiting flag is set. The upstream keepalive module
was modified to follow this.
If nginx was used under OpenVZ and a container with nginx was suspended
and resumed, configuration tests started to fail because of EADDRINUSE
returned from listen() instead of bind():
# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] listen() to 0.0.0.0:80, backlog 511 failed (98: Address already in use)
nginx: configuration file /etc/nginx/nginx.conf test failed
With this change EADDRINUSE errors returned by listen() are handled
similarly to errors returned by bind(), and configuration tests work
fine in the same environment:
# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
More details about OpenVZ suspend/resume bug:
https://bugzilla.openvz.org/show_bug.cgi?id=2470
If the -T option is passed, additionally to configuration test, configuration
files are output to stdout.
In the debug mode, configuration files are kept in memory and can be accessed
using a debugger.
The function is now called ngx_parse_http_time(), and can be used by
any code to parse HTTP-style date and time. In particular, it will be
used for OCSP stapling.
For compatibility, a macro to map ngx_http_parse_time() to the new name
provided for a while.
When configured, an individual listen socket on a given address is
created for each worker process. This allows to reduce in-kernel lock
contention on configurations with high accept rates, resulting in better
performance. As of now it works on Linux and DragonFly BSD.
Note that on Linux incoming connection requests are currently tied up
to a specific listen socket, and if some sockets are closed, connection
requests will be reset, see https://lwn.net/Articles/542629/. With
nginx, this may happen if the number of worker processes is reduced.
There is no such problem on DragonFly BSD.
Based on previous work by Sepherosa Ziehau and Yingqi Lu.
Two mechanisms are implemented to make it possible to store pointers
in shared memory on Windows, in particular on Windows Vista and later
versions with ASLR:
- The ngx_shm_remap() function added to allow remapping of a shared memory
zone to the address originally used for it in the master process. While
important, it doesn't solve the problem by itself as in many cases it's
not possible to use the address because of conflicts with other
allocations.
- We now create mappings at the same address in all processes by starting
mappings at predefined addresses normally unused by newborn processes.
These two mechanisms combined allow to use shared memory on Windows
almost without problems, including reloads.
Based on the patch by Sergey Brester:
http://mailman.nginx.org/pipermail/nginx-devel/2015-April/006836.html
Similar to ngx_http_file_cache_set_slot(), the last component of file->name
with a fixed length of 10 bytes, as generated in ngx_create_temp_path(), is
used as a source for the names of intermediate subdirectories with each one
taking its own part. Ensure that the sum of specified levels with slashes
fits into the length (ticket #731).
Example of usage:
error_log memory:16m debug;
This allows to configure debug logging with minimum impact on performance.
It's especially useful when rare crashes are experienced under high load.
The log can be extracted from a coredump using the following gdb script:
set $log = ngx_cycle->log
while $log->writer != ngx_log_memory_writer
set $log = $log->next
end
set $buf = (ngx_log_memory_buf_t *) $log->wdata
dump binary memory debug_log.txt $buf->start $buf->end
Initial size as calculated from the number of elements may be bigger
than max_size. If this happens, make sure to set size to max_size.
Reported by Chris West.
Previously, this function checked for connection local address existence
and returned error if it was missing. Now a new address is assigned in this
case making it possible to call this function not only for accepted connections.
It appeared that the NGX_HAVE_AIO_SENDFILE macro was defined regardless of
the "--with-file-aio" configure option and the NGX_HAVE_FILE_AIO macro.
Now they are related.
Additionally, fixed one macro.
This reduces layering violation and simplifies the logic of AIO preread, since
it's now triggered by the send chain function itself without falling back to
the copy filter. The context of AIO operation is now stored per file buffer,
which makes it possible to properly handle cases when multiple buffers come
from different locations, each with its own configuration.
The mtx->wait counter was not decremented if we were able to obtain the lock
right after incrementing it. This resulted in unneeded sem_post() calls,
eventually leading to EOVERFLOW errors being logged, "sem_post() failed
while wake shmtx (75: Value too large for defined data type)".
To close the race, mtx->wait is now decremented if we obtain the lock right
after incrementing it in ngx_shmtx_lock(). The result can become -1 if a
concurrent ngx_shmtx_unlock() decrements mtx->wait before the added code does.
However, that only leads to one extra iteration in the next call of
ngx_shmtx_lock().
The use_temp_path http cache feature is now implemented using a separate temp
hierarchy in cache directory. Prefix-based temp files are no longer needed.
The original check for NGX_AGAIN was surplus, since the function returns
only NGX_OK or NGX_ERROR. Now it looks similar to other places.
No functional changes.
In 954867a2f0a6, we switched to using resolver node as the timer event data.
This broke debug event logging.
Replaced now unused ngx_resolver_ctx_t.ident with ngx_resolver_node_t.ident
so that ngx_event_ident() extracts something sensible when accessing
ngx_resolver_node_t as ngx_connection_t.
In 954867a2f0a6, we switched to using resolver node as the
timer event data, so make sure we do not free resolver node
memory until the corresponding timer is deleted.
If a syslog daemon is restarted and the unix socket is used, further logging
might stop to work. In case of send error, socket is closed, forcing
a reconnection at the next logging attempt.
The ngx_cycle->log is used when sending the message. This allows to log syslog
send errors in another log.
Logging to syslog after its cleanup handler has been executed was prohibited.
Previously, this was possible from ngx_destroy_pool(), which resulted in error
messages caused by attempts to write into the closed socket.
The "processing" flag is renamed to "busy" to better match its semantics.
In theory, this can provide a bit better distribution of latencies.
Also it simplifies the code, since ngx_queue_t is now used instead
of custom implementation.
RFC3986 says that, for consistency, URI producers and normalizers
should use uppercase hexadecimal digits for all percent-encodings.
This is also what modern web browsers and other tools use.
Using lowercase hexadecimal digits makes it harder to interact with
those tools in case when use of the percent-encoded URI is required,
for example when $request_uri is part of the cache key.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
The check became meaningless after refactoring in 2a92804f4109.
With the loop currently in place, "current" can't be NULL, hence
the check can be dropped.
Additionally, the local variable "current" was removed to
simplify code, and pool->current now used directly instead.
Found by Coverity (CID 714236).
This isn't really important as configuration testing shortly ends with
a process termination which will free all sockets, though Coverity
complains.
Prodded by Coverity (CID 400872).
Large allocations from a slab pool result in free page blocks being fragmented,
eventually leading to a situation when no further allocation larger than a page
size are possible from the pool. While this isn't a problem for nginx itself,
it is known to be bad for various 3rd party modules. Fix is to merge adjacent
blocks of free pages in the ngx_slab_free_pages() function.
Prodded by Wandenberg Peixoto and Yichun Zhang.
Previous code failed to properly restore cf->conf_file in case of
ngx_close_file() errors, potentially resulting in double free of
cf->conf_file->buffer->start.
Found by Coverity (CID 1087507).
The flag allows to suppress "ngx_slab_alloc() failed: no memory" messages
from a slab allocator, e.g., if an LRU expiration is used by a consumer
and allocation failures aren't fatal.
The flag is now used in the SSL session cache code, and in the limit_req
module.
Client address specified in the PROXY protocol header is now
saved in the $proxy_protocol_addr variable and can be used in
the realip module.
This is currently not implemented for mail.
Linux returns EOPNOTSUPP for non-TCP sockets and ENOPROTOOPT for TCP
sockets, because getsockopt(TCP_FASTOPEN) is not implemented so far.
While there, lower the log level from ALERT to NOTICE to match other
getsockopt() failures.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
Backed out 05a56ebb084a, as it turns out that kernel can return connections
without any delay if syncookies are used. This basically means we can't
assume anything about connections returned with deferred accept set.
To solve original problem the 05a56ebb084a tried to solve, i.e. to don't
wait longer than needed if a connection was accepted after deferred accept
timeout, this patch changes a timeout set with setsockopt(TCP_DEFER_ACCEPT)
to 1 second, unconditionally. This is believed to be enough for speed
improvements, and doesn't imply major changes to timeouts used.
Note that before 2.6.32 connections were dropped after a timeout. Though
it is believed that 1s is still appropriate for kernels before 2.6.32,
as previously tcp_synack_retries controlled the actual timeout and 1s results
in more than 1 minute actual timeout by default.
Previously pool->current wasn't moved back to pool, resulting in blocks
not used for further allocations if pool->current was already moved at the
time of ngx_reset_pool(). Additionally, to preserve logic of moving
pool->current, the p->d.failed counters are now properly cleared. While
here, pool->chain is also cleared.
This change is essentially a nop with current code, but generally improves
things.
Fallback to synchronous sendfile() now only done on 3rd EBUSY without
any progress in a row. Not falling back is believed to be better
in case of occasional EBUSY, though protection is still needed to
make sure there will be no infinite loop.
Stricten response header checks: ensure that reserved bits are zeroes,
and that the opcode is "standard query".
Fixed the "zero-length domain name in DNS response" condition.
Renamed ngx_resolver_query_t to ngx_resolver_hdr_t as it describes
the header that is common to DNS queries and answers.
Replaced the magic number 12 by the size of the header structure.
The other changes are self-explanatory.
Recent Linux versions started to return EOPNOTSUPP to getsockopt() calls
on unix sockets, resulting in log pollution on binary upgrade. Such errors
are silently ignored now.
The accept_filter and deferred options were not applied to sockets
that were added to configuration during binary upgrade cycle.
Signed-off-by: Piotr Sikora <piotr@cloudflare.com>
This patch fixes incorrect handling of auto redirect in configurations
like:
location /0 { }
location /a- { }
location /a/ { proxy_pass ... }
With previously used sorting, this resulted in the following locations
tree (as "-" is less than "/"):
"/a-"
"/0" "/a/"
and a request to "/a" didn't match "/a/" with auto_redirect, as it
didn't traverse relevant tree node during lookup (it tested "/a-",
then "/0", and then falled back to null location).
To preserve locale use for non-ASCII characters on case-insensetive
systems, libc's tolower() used.
Found by using auth_basic.t from mdounin nginx-tests under valgrind.
==10470== Invalid write of size 1
==10470== at 0x43603D: ngx_crypt_to64 (ngx_crypt.c:168)
==10470== by 0x43648E: ngx_crypt (ngx_crypt.c:153)
==10470== by 0x489D8B: ngx_http_auth_basic_crypt_handler (ngx_http_auth_basic_module.c:297)
==10470== by 0x48A24A: ngx_http_auth_basic_handler (ngx_http_auth_basic_module.c:240)
==10470== by 0x44EAB9: ngx_http_core_access_phase (ngx_http_core_module.c:1121)
==10470== by 0x44A822: ngx_http_core_run_phases (ngx_http_core_module.c:895)
==10470== by 0x44A932: ngx_http_handler (ngx_http_core_module.c:878)
==10470== by 0x455EEF: ngx_http_process_request (ngx_http_request.c:1852)
==10470== by 0x456527: ngx_http_process_request_headers (ngx_http_request.c:1283)
==10470== by 0x456A91: ngx_http_process_request_line (ngx_http_request.c:964)
==10470== by 0x457097: ngx_http_wait_request_handler (ngx_http_request.c:486)
==10470== by 0x4411EE: ngx_epoll_process_events (ngx_epoll_module.c:691)
==10470== Address 0x5866fab is 0 bytes after a block of size 27 alloc'd
==10470== at 0x4A074CD: malloc (vg_replace_malloc.c:236)
==10470== by 0x43B251: ngx_alloc (ngx_alloc.c:22)
==10470== by 0x421B0D: ngx_malloc (ngx_palloc.c:119)
==10470== by 0x421B65: ngx_pnalloc (ngx_palloc.c:147)
==10470== by 0x436368: ngx_crypt (ngx_crypt.c:140)
==10470== by 0x489D8B: ngx_http_auth_basic_crypt_handler (ngx_http_auth_basic_module.c:297)
==10470== by 0x48A24A: ngx_http_auth_basic_handler (ngx_http_auth_basic_module.c:240)
==10470== by 0x44EAB9: ngx_http_core_access_phase (ngx_http_core_module.c:1121)
==10470== by 0x44A822: ngx_http_core_run_phases (ngx_http_core_module.c:895)
==10470== by 0x44A932: ngx_http_handler (ngx_http_core_module.c:878)
==10470== by 0x455EEF: ngx_http_process_request (ngx_http_request.c:1852)
==10470== by 0x456527: ngx_http_process_request_headers (ngx_http_request.c:1283)
==10470==
The same path names with different "data" context should not be allowed.
In particular it rejects configurations like this:
proxy_cache_path /var/cache/ keys_zone=one:10m max_size=1g inactive=5m;
proxy_cache_path /var/cache/ keys_zone=two:20m max_size=4m inactive=30s;
Casts between pointers and integers produce warnings on size mismatch. To
silence them, cast to (u)intptr_t should be used. Prevoiusly, casts to
ngx_(u)int_t were used in some cases, and several ngx_int_t expressions had
no casts.
As of now it's mostly style as ngx_int_t is defined as intptr_t.
Several warnings silenced, notably (ngx_socket_t) -1 is now checked
on socket operations instead of -1, as ngx_socket_t is unsigned on win32
and gcc complains on comparison.
With this patch, it's now possible to compile nginx using mingw gcc,
with options we normally compile on win32.
Precompiled headers are disabled as they lead to internal compiler errors
with long configure lines. Couple of false positive warnings silenced.
Various win32 typedefs are adjusted to work with Open Watcom C 1.9 headers.
With this patch, it's now again possible to compile nginx using owc386,
with options we normally compile on win32 minus ipv6 and ssl.
It was introduced in Linux 2.6.39, glibc 2.14 and allows to obtain
file descriptors without actually opening files. Thus made it possible
to traverse path with openat() syscalls without the need to have read
permissions for path components. It is effectively emulates O_SEARCH
which is missing on Linux.
O_PATH is used in combination with O_RDONLY. The last one is ignored
if O_PATH is used, but it allows nginx to not fail when it was built on
modern system (i.e. glibc 2.14+) and run with a kernel older than 2.6.39.
Then O_PATH is unknown to the kernel and ignored, while O_RDONLY is used.
Sadly, fstat() is not working with O_PATH descriptors till Linux 3.6.
As a workaround we fallback to fstatat() with the AT_EMPTY_PATH flag
that was introduced at the same time as O_PATH.
While ngx_get_full_name() might have a bit more descriptive arguments,
the ngx_conf_full_name() is generally easier to use when parsing
configuration and limits exposure of cycle->prefix / cycle->conf_prefix
details.
If a relative path is set by variables, then the ngx_conf_full_name()
function was called while processing requests, which causes allocations
from the cycle pool.
A new function that takes pool as an argument was introduced.
This is done by passing AI_ADDRCONFIG to getaddrinfo().
On Linux, setting net.ipv6.conf.all.disable_ipv6 to 1 will now be
respected.
On FreeBSD, AI_ADDRCONFIG filtering is currently implemented by
attempting to create a datagram socket for the corresponding family,
which succeeds even if the system doesn't in fact have any addresses
of that family configured. That is, if the system with IPv6 support
in the kernel doesn't have IPv6 addresses configured, AI_ADDRCONFIG
will filter out IPv6 only inside a jail without IPv6 addresses or
with IPv6 disabled.