Discussion:
Dealing with WP cron's raciness
David Anderson
2013-09-18 11:06:32 UTC
Permalink
Hi,

The WP cron system suffers from race conditions. Is this known by core
developers, and how do they feel about it? i.e. Is it here to stay, or
are there plans to improve it?

Easy way to reproduce (every time):

1) Load up the machine with your MySQL server on so that it's fully
loaded (so that there's some latency - this is the crucial bit):
(e.g. run a few of these: while true; do wget -O -
http://localhost/mydevwebsite; done )

2) Create a simple mu-plugin:

add_action('crontest', 'testing_crontest_da');
function testing_crontest_da() { error_log('cron test!'); }

3) Create a few processes that are continually calling wp-cron.php (two
or more):
while true; do wget -O - http://localhost/mydevwebsite/wp-cron.php; done

4) Schedule a single cron event:

<?php require('wp-load.php'); wp_schedule_single_event(time()+5,
'crontest'); ?>

The result will be that the 'listener' in 2) logs as many lines as you
had tasks calling wp-cron.php in 3). i.e. the action gets called that
many times.

This isn't a problem for scheduled tasks that are performing cleanup
jobs (e.g. cache purges). But for something like a scheduled backup,
it's annoying - you get two backups. For a task that sends out report
emails, you get multiple reports. etc. Of course, each plugin that
performs such tasks could write its own locking system... but these
would be likely to each have their own ad hoc quirks. It'd be much
better if WP's cron system was less racy.

Thoughts?

David
--
WordShell - WordPress fast from the CLI - www.wordshell.net
Alex King
2013-09-18 18:24:37 UTC
Permalink
WP-CRON uses a "run at least once" model - you need to account for this in your code. In our Social plugin we implemented a semaphore to handle the queue of things we needed to do *only once*, but that could take longer than the PHP timeout (multiple API calls).

Feel free to learn from/borrow that code:

https://github.com/crowdfavorite/wp-social

Cheers,
--Alex

http://alexking.org | http://crowdfavorite.com
Hi,
The WP cron system suffers from race conditions. Is this known by core developers, and how do they feel about it? i.e. Is it here to stay, or are there plans to improve it?
(e.g. run a few of these: while true; do wget -O - http://localhost/mydevwebsite; done )
add_action('crontest', 'testing_crontest_da');
function testing_crontest_da() { error_log('cron test!'); }
while true; do wget -O - http://localhost/mydevwebsite/wp-cron.php; done
<?php require('wp-load.php'); wp_schedule_single_event(time()+5, 'crontest'); ?>
The result will be that the 'listener' in 2) logs as many lines as you had tasks calling wp-cron.php in 3). i.e. the action gets called that many times.
This isn't a problem for scheduled tasks that are performing cleanup jobs (e.g. cache purges). But for something like a scheduled backup, it's annoying - you get two backups. For a task that sends out report emails, you get multiple reports. etc. Of course, each plugin that performs such tasks could write its own locking system... but these would be likely to each have their own ad hoc quirks. It'd be much better if WP's cron system was less racy.
Thoughts?
David
--
WordShell - WordPress fast from the CLI - www.wordshell.net
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
David Anderson
2013-09-18 22:31:52 UTC
Permalink
Post by Alex King
WP-CRON uses a "run at least once" model - you need to account for
this in your code. In our Social plugin we implemented a semaphore to
handle the queue of things we needed to do *only once*, but that could
take longer than the PHP timeout (multiple API calls). Feel free to
learn from/borrow that code: https://github.com/crowdfavorite/wp-social
Cheers,
--Alex
Hi Alex,

Thank you... it's nice... a note for people finding this in Google: from
my reading and testing, I learned that your code is designed not so much
to only do the job only once, but to make sure that it can only do it
once *at the same time* (i.e. no concurrent runs). Successive runs were
possible, if the task was short-running enough and released the lock
early enough. (Which is not likely to happen with my example of a backup
job, but could happen with my other example of aggregating and
amalgamating statistics). I suppose that if that's likely to be a
problem, then the task could impose an arbitrary minimum time before
releasing the lock.

Again, for others - the relevant part of the code in the plugin that
Alex pointed to is in lib/social/semaphore.php. The simplest use case is
then this (though of course you'd want to rename the class to avoid
collisions):

$semaphore = Social_Semaphore::factory();
if ($semaphore->lock()) {
do_something();
sleep(5); # Perhaps - but it'd be more sophisticated to time
yourself and only sleep as much as needed
$semaphore->unlock();
}

David
Otto
2013-09-18 22:59:10 UTC
Permalink
3) Create a few processes that are continually calling wp-cron.php (two or
while true; do wget -O - http://localhost/mydevwebsite/wp-cron.php; done
Why are you hitting the wp-cron.php file directly? WordPress doesn't
do that normally.

WordPress normally sets the doing_wp_cron transient with the current
time(), then calls wp-cron.php?doing_wp_cron=time. This is a locking
mechanism to prevent the subsequent calls after that from running.

This is not *perfect*, but it is pretty good and prevents most of the
problem you're describing. If you're calling wp-cron.php directly,
you're bypassing this mechanism, and the delay you've added to the
database will cause the locking mechanism to take longer, causing the
race issues.

-Otto
David Anderson
2013-09-19 00:07:31 UTC
Permalink
Post by Otto
Why are you hitting the wp-cron.php file directly? WordPress doesn't
do that normally.
WordPress normally sets the doing_wp_cron transient with the current
time(), then calls wp-cron.php?doing_wp_cron=time. This is a locking
mechanism to prevent the subsequent calls after that from running.
This is not *perfect*, but it is pretty good and prevents most of the
problem you're describing. If you're calling wp-cron.php directly,
you're bypassing this mechanism, and the delay you've added to the
database will cause the locking mechanism to take longer, causing the
race issues.
Hi Otto,

Thanks... I was aware of that; the reason I was hitting wp-cron.php
directly in the tests I mentioned in my early mail was for two reasons:
1) On my reading of the code, wp-cron.php tries to apply the same lock
(the section beginning with the comment "// Use global $doing_wp_cron
lock otherwise use the GET lock. If no lock, trying grabbing a new
lock."). 2) In my tests, going indirectly (hitting the site's home page)
had the same effect - i.e. the action would still consistently run
multiple times due to raciness. It was just slightly less racy, but I
surmised that that was due to a side-effect of timing due to the extra
HTTP round-trip, rather than due to any extra locking taking place
(which I surmised because of 1) - I can't see any extra locking take
place - only a difference as to the timing of *when* the locking takes
place).

David
--
WordShell - WordPress fast from the CLI - www.wordshell.net
Brian Hogg
2013-09-19 00:19:17 UTC
Permalink
I'm guessing you're trying to press the limits of the lock with the example, but if you disable automatic wp cron (with a flag in wp-config.php, DISABLE_WP_CRON define I believe) you would be able to control how many times wp cron was fired under normal conditions, for example once per minute in a system cron job you create. Unless for some reason you need to fire it more often?
Post by David Anderson
Post by Otto
Why are you hitting the wp-cron.php file directly? WordPress doesn't
do that normally.
WordPress normally sets the doing_wp_cron transient with the current
time(), then calls wp-cron.php?doing_wp_cron=time. This is a locking
mechanism to prevent the subsequent calls after that from running.
This is not *perfect*, but it is pretty good and prevents most of the
problem you're describing. If you're calling wp-cron.php directly,
you're bypassing this mechanism, and the delay you've added to the
database will cause the locking mechanism to take longer, causing the
race issues.
Hi Otto,
Thanks... I was aware of that; the reason I was hitting wp-cron.php directly in the tests I mentioned in my early mail was for two reasons: 1) On my reading of the code, wp-cron.php tries to apply the same lock (the section beginning with the comment "// Use global $doing_wp_cron lock otherwise use the GET lock. If no lock, trying grabbing a new lock."). 2) In my tests, going indirectly (hitting the site's home page) had the same effect - i.e. the action would still consistently run multiple times due to raciness. It was just slightly less racy, but I surmised that that was due to a side-effect of timing due to the extra HTTP round-trip, rather than due to any extra locking taking place (which I surmised because of 1) - I can't see any extra locking take place - only a difference as to t
he timing of *when* the locking takes place).
Post by David Anderson
David
--
WordShell - WordPress fast from the CLI - www.wordshell.net
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
Otto
2013-09-19 00:27:21 UTC
Permalink
Post by David Anderson
Hi Otto,
Thanks... I was aware of that; the reason I was hitting wp-cron.php directly
in the tests I mentioned in my early mail was for two reasons: 1) On my
reading of the code, wp-cron.php tries to apply the same lock (the section
beginning with the comment "// Use global $doing_wp_cron lock otherwise use
the GET lock. If no lock, trying grabbing a new lock."). 2) In my tests,
going indirectly (hitting the site's home page) had the same effect - i.e.
the action would still consistently run multiple times due to raciness. It
was just slightly less racy, but I surmised that that was due to a
side-effect of timing due to the extra HTTP round-trip, rather than due to
any extra locking taking place (which I surmised because of 1) - I can't see
any extra locking take place - only a difference as to the timing of *when*
the locking takes place).
Well, like I said, it's not perfect, but it's still going to be
slightly better as to the timing.

Hitting it directly means that the lock logic is running inside the
hit to wp-cron.php. So if you have multiple instances running
concurrently, then they can both get the same "lock" because of timing
differences between the check and the setting of the lock.

Running it normally means that the lock is grabbed before wp-cron.php
is ever called, in the wp-includes/cron.php file. Now, if you have two
hits in the same second, or hits on multiple servers and the like,
then you can still get the race condition to occur and have the main
cron.php script call the wp-cron.php script twice with the same
values, but this is better for the most part... for a given definition
of "better". :)

If you're expecting more than 1 hit per second on a normal basis, and
it's a problem for your particular setup, then you'd probably be
better served setting DISABLE_WP_CRON and creating a real cron job to
call it every so often instead. It doesn't even have to be a wget to
the URL, you can run it with a direct "php wp-cron.php" command just
fine, for non-multisite instances.

-Otto
Dion Hulse (dd32)
2013-09-19 13:19:57 UTC
Permalink
3) Create a few processes that are continually calling wp-cron.php (two or
while true; do wget -O - http://localhost/mydevwebsite/**wp-cron.php<http://localhost/mydevwebsite/wp-cron.php>;
done
I just wanted to point out, that doing this bypasses the major component of
WP Cron which attempts to allow things to only run once.. doing so you'll
almost be guaranteed to have duplicate events fired.

When WordPress fires off the cron, it creates a locking transient (which
should be pretty reliable when using an external object cache, not awesome,
but still pretty good, when not) which is based on microtime(true) which
includes microseconds: 1379596515.2753798961639404296875
The request is then made via
wp-cron.php?doing_wp_cron=1379596515.2753798961639404296875. If the GET arg
doesn't match exactly what's in the transient, that request doesn't perform
any operations.

In the past (pre-3.6 I believe?), the transient was based on time(), so any
requests made within a 1second timeframe would trigger duplicate cron
tasks.

It's definitely not foolproof, but it's pretty much as locking-as-possible
that WordPress can achieve in a shared hosting environment at present..
anything that absolutely requires only one process to be running at a time
should definately add something of their own if they don't trust core's
ability.

It should also be noted, that if you have a long-running job (like some
backup plugins), and you have a server where someone triggers wp-cron.php
manually with a real cron every 1/2/5minutes, then you'll most likely get
duplicate events running.. simply because your event is only de-queued
AFTER it's run, so if it takes 5 minutes to run, and another request it
made 2 minutes into that window, you'll get two jobs

If you want to test WordPress's cron locks out, instead of using the above
code, do something like this in a stand alone script instead:
<?php include 'wp-load.php'; while(true) { spawn_cron(); } ?>
Dion Hulse (dd32)
2013-09-19 13:24:09 UTC
Permalink
Just to confirm, 3.6 doesn't seem to have changed anything (wow, how time
flies)

I was thinking of 3.3 and 3.4:
http://core.trac.wordpress.org/ticket/17462
http://core.trac.wordpress.org/ticket/19700
Post by Dion Hulse (dd32)
Post by David Anderson
3) Create a few processes that are continually calling wp-cron.php (two
while true; do wget -O - http://localhost/mydevwebsite/**wp-cron.php<http://localhost/mydevwebsite/wp-cron.php>;
done
I just wanted to point out, that doing this bypasses the major component
of WP Cron which attempts to allow things to only run once.. doing so
you'll almost be guaranteed to have duplicate events fired.
When WordPress fires off the cron, it creates a locking transient (which
should be pretty reliable when using an external object cache, not awesome,
but still pretty good, when not) which is based on microtime(true) which
includes microseconds: 1379596515.2753798961639404296875
The request is then made via
wp-cron.php?doing_wp_cron=1379596515.2753798961639404296875. If the GET arg
doesn't match exactly what's in the transient, that request doesn't perform
any operations.
In the past (pre-3.6 I believe?), the transient was based on time(), so
any requests made within a 1second timeframe would trigger duplicate cron
tasks.
It's definitely not foolproof, but it's pretty much as locking-as-possible
that WordPress can achieve in a shared hosting environment at present..
anything that absolutely requires only one process to be running at a time
should definately add something of their own if they don't trust core's
ability.
It should also be noted, that if you have a long-running job (like some
backup plugins), and you have a server where someone triggers wp-cron.php
manually with a real cron every 1/2/5minutes, then you'll most likely get
duplicate events running.. simply because your event is only de-queued
AFTER it's run, so if it takes 5 minutes to run, and another request it
made 2 minutes into that window, you'll get two jobs
If you want to test WordPress's cron locks out, instead of using the above
<?php include 'wp-load.php'; while(true) { spawn_cron(); } ?>
Paul Menard
2013-09-19 13:34:48 UTC
Permalink
Post by Dion Hulse (dd32)
It should also be noted, that if you have a long-running job (like some
backup plugins), and you have a server where someone triggers wp-cron.php
manually with a real cron every 1/2/5minutes, then you'll most likely get
duplicate events running.. simply because your event is only de-queued
AFTER it's run, so if it takes 5 minutes to run, and another request it
made 2 minutes into that window, you'll get two jobs
Just my thoughts. This is really no different than dealing directly with UNIX cron calling shell scripts. If you know there is potential that another process will start before the previous process finishes you implement a locking system of your own design. OR as been illustrated in this thread you have users who setup UNIX crontabs to call wp-cron.php every 2 minutes. Things you just cannot control.

I would not personally like to see this added to WordPress core. I actually have reasons in the past to rely on the multi-instance cron processing for some sampling plugins I wrote. Each instance needed to be independent. I don't want WP to be the traffic cop.
David Anderson
2013-09-20 12:46:56 UTC
Permalink
Post by Dion Hulse (dd32)
I just wanted to point out, that doing this bypasses the major component of
WP Cron which attempts to allow things to only run once.. doing so you'll
almost be guaranteed to have duplicate events fired.
When WordPress fires off the cron, it creates a locking transient (which
should be pretty reliable when using an external object cache, not awesome,
but still pretty good, when not) which is based on microtime(true) which
includes microseconds: 1379596515.2753798961639404296875
The request is then made via
wp-cron.php?doing_wp_cron=1379596515.2753798961639404296875. If the GET arg
doesn't match exactly what's in the transient, that request doesn't perform
any operations.
Hi Dion,

Thanks for the feedback... I mentioned before a couple of things:

1) My motivation is to torture-test the worst-case scenarios, because
they are what generate support requests. A significant number (i.e.
significant enough to generate a flow of support requests, once you have
5,000 new installs a week) of hosting companies disable this mechanism
(in part, through disabling loopback HTTP connections), and instruct
their users to call wp-cron.php directly (via a scheduler in their
control panel). (I know that's "non-optimal". I know about
ALTERNATE_WP_CRON - as I say, I'm trying to deal with what users are
doing and likely to keep on doing, not what they ought to do).

2) I said before to Otto that from my reading of the code in
wp-cron.php, there's also an attempt to apply the same locking mechanism
you describe above in there. The only difference is that the loopback
HTTP introduces a slightly random timing delay, which fortuitously gives
the time delay needed for the locking mechanism to work - i.e. the
effect is a side-effect of the method,rather than procured directly by
it. Am I reading that wrong?
Post by Dion Hulse (dd32)
I actually have reasons in the past to rely on the multi-instance cron processing for some sampling plugins I wrote. Each instance needed to be independent. I don't want WP to be the traffic cop.
If you need to *rely* upon an action running twice, then surely you
should just schedule two actions? Note that I wasn't suggesting locking
at the *individual action* level but at the *cron queue processor* level.

Best wishes,
David
--
WordShell - WordPress fast from the CLI - www.wordshell.net
Dion Hulse (dd32)
2013-09-20 13:12:29 UTC
Permalink
Post by David Anderson
Post by Dion Hulse (dd32)
I just wanted to point out, that doing this bypasses the major component of
WP Cron which attempts to allow things to only run once.. doing so you'll
almost be guaranteed to have duplicate events fired.
When WordPress fires off the cron, it creates a locking transient (which
should be pretty reliable when using an external object cache, not awesome,
but still pretty good, when not) which is based on microtime(true) which
includes microseconds: 1379596515.**2753798961639404296875
The request is then made via
wp-cron.php?doing_wp_cron=**1379596515.**2753798961639404296875. If the
GET arg
doesn't match exactly what's in the transient, that request doesn't perform
any operations.
Hi Dion,
1) My motivation is to torture-test the worst-case scenarios, because they
are what generate support requests. A significant number (i.e. significant
enough to generate a flow of support requests, once you have 5,000 new
installs a week) of hosting companies disable this mechanism (in part,
through disabling loopback HTTP connections), and instruct their users to
call wp-cron.php directly (via a scheduler in their control panel). (I know
that's "non-optimal". I know about ALTERNATE_WP_CRON - as I say, I'm trying
to deal with what users are doing and likely to keep on doing, not what
they ought to do).
Yes, If you take the worst-possible case, and bypass the locking mechanisms
which WordPress has, then you will no doubt have more issues, it's really
as simple as that, and as a plugin developer you have to expect the worst
from a users configuration and program defensively.
WordPress can't work 100% under every condition, and will do it's best
under the conditions presented.

Unfortunately, WordPress only has a default cron timeout of 60 seconds when
you call wp-cron.php directly, so if you have a real cronjob running every
2 minutes, even if the first cron is still running, the 2nd will assume
that the first has timed out due to poor hosting configurations or a PHP
error - both conditions that it has to contend with as many hosts will kill
processes after 45~60 seconds still.

If you're calling wp-cron.php directly, and you want each process to hold
the lock longer, you can define 'WP_CRON_LOCK_TIMEOUT' in wp-config.php to
increase the time (it defaults to 60 as mentioned before).
It should be safe to increase it to 5 minutes (300) with minimal issues for
most people.
Post by David Anderson
2) I said before to Otto that from my reading of the code in wp-cron.php,
there's also an attempt to apply the same locking mechanism you describe
above in there. The only difference is that the loopback HTTP introduces a
slightly random timing delay, which fortuitously gives the time delay
needed for the locking mechanism to work - i.e. the effect is a side-effect
of the method,rather than procured directly by it. Am I reading that wrong?
The loopback request doesn't really cause any benefit due to delays or
anything, the only benefit is that in doing so, it's locking the cron
request for the DURATION of the cronjob through the random url value, where
as when you call wp-cron.php directly, you're only locking it for
WP_CRON_LOCK_TIMEOUT seconds.


I don't say this lightly, I'm just trying to explain how the current system
works, if you have some ideas on how to improve the experience for users,
while retaining compatibility with shitty hosts and bad PHP configurations
(and even worse plugins) then I'm all ears, and would love to see you
submit a trac ticket with a patch or suggestions.
David Anderson
2013-09-20 13:52:43 UTC
Permalink
Post by Dion Hulse (dd32)
The loopback request doesn't really cause any benefit due to delays or
anything, the only benefit is that in doing so, it's locking the cron
request for the DURATION of the cronjob through the random url value, where
as when you call wp-cron.php directly, you're only locking it for
WP_CRON_LOCK_TIMEOUT seconds.
I've read through the code a lot to try to understand this, but I
don't... can you give me a clue?

1) "random url value" - where is this? I can't see any random element
being added in the code, or in my web logs. I just see:

POST /wp-cron.php?doing_wp_cron=(microtime)

2) I also don't see how locking "for the DURATION of the cronjob" is
happening. The lock is applied via the transient 'doing_cron'; but I see
nothing in wp-includes/cron.php that is stopping that transient being
over-written once WP_CRON_LOCK_TIMEOUT seconds have passed, and nothing
that continuously updates it whilst the cron job progresses. I see these
lines of code, in which the comment seems to say that this is where it
is done:

// don't run if another process is currently running it or more
than once every 60 sec.
if ( $lock + WP_CRON_LOCK_TIMEOUT > $gmt_time )
return;

The comment says "don't run if another process is currently running" -
but I don't see the connection between the comment and the code; the
code only seems to do "don't run if ... more than once every 60
(WP_CRON_LOCK_TIMEOUT) sec".

What am I missing?

David
--
WordShell - WordPress fast from the CLI - www.wordshell.net
Loading...