Discussion:
MySQL: Union vs global table
Dino Termini
2014-02-20 22:15:35 UTC
Permalink
Hi list,

I am working on a plugin that stores information in a table. In MU environments, each blog will have its own instance of that table (just like wp itself does for posts etc). Now, I am also being asked to provide a "network view" of that information. I evidently have two choices: UNION ALL on wp_*_mytable, or use a global table with blog_id. We may be talking about millions of rows per blog. So the latter approach would affect performance when analyzing the info for a given site (more frequent), I assume.

What would you do in this case?

Thanks
Dino
Chloé Desoutter
2014-02-20 22:36:50 UTC
Permalink
Hi Dino,

Fake a materialized view (by a CREATE OR REPLACE TABLE x AS SELECT) as
MySQL doesn't support them. Update it on a regular period. This will suck
the less.

Yours sincerely
Post by Dino Termini
Hi list,
I am working on a plugin that stores information in a table. In MU
environments, each blog will have its own instance of that table (just like
wp itself does for posts etc). Now, I am also being asked to provide a
"network view" of that information. I evidently have two choices: UNION ALL
on wp_*_mytable, or use a global table with blog_id. We may be talking
about millions of rows per blog. So the latter approach would affect
performance when analyzing the info for a given site (more frequent), I
assume.
What would you do in this case?
Thanks
Dino
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
Eric Hendrix
2014-02-20 22:43:20 UTC
Permalink
Agreed, updated as infrequently as daily. Either way your performance for
"millions of rows per blog" will take a serious hit - hope folks running
this are on their own server - not a particularly suitable situation for
VPS or otherwise, right?


On Thu, Feb 20, 2014 at 5:36 PM, Chloé Desoutter <
Post by Chloé Desoutter
Hi Dino,
Fake a materialized view (by a CREATE OR REPLACE TABLE x AS SELECT) as
MySQL doesn't support them. Update it on a regular period. This will suck
the less.
Yours sincerely
Post by Dino Termini
Hi list,
I am working on a plugin that stores information in a table. In MU
environments, each blog will have its own instance of that table (just
like
Post by Dino Termini
wp itself does for posts etc). Now, I am also being asked to provide a
"network view" of that information. I evidently have two choices: UNION
ALL
Post by Dino Termini
on wp_*_mytable, or use a global table with blog_id. We may be talking
about millions of rows per blog. So the latter approach would affect
performance when analyzing the info for a given site (more frequent), I
assume.
What would you do in this case?
Thanks
Dino
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
--
*Eric A. Hendrix*
***@gmail.com
(910) 644-8940

*"Non Timebo Mala"*
Dino Termini
2014-02-21 00:21:13 UTC
Permalink
So you're both saying that this would be better than putting all the records in a global table with blog_id?

Yes, people using this have their own server.

Thank you all,
Dino
Post by Eric Hendrix
Agreed, updated as infrequently as daily. Either way your performance for
"millions of rows per blog" will take a serious hit - hope folks running
this are on their own server - not a particularly suitable situation for
VPS or otherwise, right?
On Thu, Feb 20, 2014 at 5:36 PM, Chloé Desoutter <
Post by Chloé Desoutter
Hi Dino,
Fake a materialized view (by a CREATE OR REPLACE TABLE x AS SELECT)
as
Post by Chloé Desoutter
MySQL doesn't support them. Update it on a regular period. This will
suck
Post by Chloé Desoutter
the less.
Yours sincerely
Post by Dino Termini
Hi list,
I am working on a plugin that stores information in a table. In MU
environments, each blog will have its own instance of that table
(just
Post by Chloé Desoutter
like
Post by Dino Termini
wp itself does for posts etc). Now, I am also being asked to
provide a
UNION
Post by Chloé Desoutter
ALL
Post by Dino Termini
on wp_*_mytable, or use a global table with blog_id. We may be
talking
Post by Chloé Desoutter
Post by Dino Termini
about millions of rows per blog. So the latter approach would
affect
Post by Chloé Desoutter
Post by Dino Termini
performance when analyzing the info for a given site (more
frequent), I
Post by Chloé Desoutter
Post by Dino Termini
assume.
What would you do in this case?
Thanks
Dino
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
--
*Eric A. Hendrix*
(910) 644-8940
*"Non Timebo Mala"*
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
Morgan Tocker
2014-02-21 01:00:27 UTC
Permalink
Hi Dino,
Post by Dino Termini
I am working on a plugin that stores information in a table. In MU environments, each blog will have its own instance of that table (just like wp itself does for posts etc). Now, I am also being asked to provide a "network view" of that information. I evidently have two choices: UNION ALL on wp_*_mytable, or use a global table with blog_id. We may be talking about millions of rows per blog. So the latter approach would affect performance when analyzing the info for a given site (more frequent), I assume.
What would you do in this case?
I work for Oracle (MySQL team).

If you have an example query (and ideally can provide EXPLAIN <select statement> output), I’d be happy to tell you if it is efficient.

- Morgan
Dino Termini
2014-02-21 16:39:04 UTC
Permalink
Hi Morgan,

thank you so much for your availability. I will contact you in private
shortly.

Best,
Dino
Post by Chloé Desoutter
Hi Dino,
Post by Dino Termini
I am working on a plugin that stores information in a table. In MU environments, each blog will have its own instance of that table (just like wp itself does for posts etc). Now, I am also being asked to provide a "network view" of that information. I evidently have two choices: UNION ALL on wp_*_mytable, or use a global table with blog_id. We may be talking about millions of rows per blog. So the latter approach would affect performance when analyzing the info for a given site (more frequent), I assume.
What would you do in this case?
I work for Oracle (MySQL team).
If you have an example query (and ideally can provide EXPLAIN <select statement> output), I’d be happy to tell you if it is efficient.
- Morgan
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
Morgan Tocker
2014-02-24 15:11:59 UTC
Permalink
Hi wp-hackers,
thank you so much for your availability. I will contact you in private shortly.
With Dino’s permission I’m sharing the email he sent me and my reply -

———

Hi Dino,
thank you again for your time. Attached you can find the definition for the tables used by my plugin, for your reference.
It's basically a real time analytics tool for WordPress. The main table is wp_slim_stats, which contains all the pageviews.
In a MU environment, the plugin will create its tables for each new site (wp_1_slim_stats, wp_2_slim_stats, etc). Site admins can access reports for their own blogs.
SELECT t1.*, tb.*,tci.*
FROM wp_slim_stats t1
INNER JOIN wp_slim_browsers tb ON t1.browser_id = tb.browser_id
INNER JOIN wp_slim_content_info tci ON t1.content_info_id = tci.content_info_id
ORDER BY t1.dt desc
LIMIT 0, 50
The corresponding explain is attached as explain-select-1.png.
So here is how this is working:

MySQL examines table t1 (wp_slim_stats) first and applies a sort on dt (will be expensive if there are a lot of rows).
From the largest dt value until 50 is reached, it will very efficiently do a primary key lookup on tb (wp_slim_browsers) and then tci (wp_slim_content_info). As long as there is full participation (INNER JOIN always finds rows in this table, the joining is efficient).

Short story:
I think you need an index on wp_slim_stats.dt to avoid the sort - but otherwise it’s an efficient query.
Now, some users have requested to get a Network View of their reports, which basically combines all the pageviews to see what post is the most popular network-wide, etc.
In order to do that, I was thinking about two approaches: 1) do a UNION ALL of all those selects or 2) replace all the blog-specific tables with a global wp_slim_stats that includes a new column blog_id.
The latter seems to be easier to implement, but what about performances? Think about a network with 100 blogs that make 100k pageviews/month. The new global table would have 100x100k = 10M records.
You would not be UNION’ing ALL millions of rows though, you’d be UNIONing top 50 from all tables right? As long as you can use the indexes (like the query you specified), then the number of rows in the table doesn’t matter. I think as long as it is efficient enough, this is probably the more desired option?

In MySQL until 5.7 (currently in developer preview) UNION ALL needs to creates a temporary table to buffer results before sending them to the client (so there is a small level of inefficiency), so make sure that each query being unioned doesn’t have too many rows. More information on this optimization on my blog here:

http://www.tocker.ca/2013/12/10/testing-the-union-all-optimization-in-mysql-5.7-dmr3.html
Loading...