Issue Details (XML | Word | Printable)

Key: MBS-5111
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Normal Normal
Assignee: Ian McEwen
Reporter: Oliver Charles
Votes: 1
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
MusicBrainz Server

Editor statistics data regularly falls out of cache, and cannot be recomputed

Created: 08/Aug/12 02:51 PM   Updated: 01/Oct/12 01:25 PM   Resolved: 01/Oct/12 01:25 PM
Component/s: None
Affects Version/s: None
Fix Version/s: 2012-10-01

Issue Links:
Duplicate
 


 Description  « Hide

The SQL to calculate statistics over our editor table takes too long to run, and thus the request times out:

Aug  8 06:59:10 totoro postgres[28158]: [2-1] ERROR:  canceling statement due to user request
Aug  8 06:59:10 totoro postgres[28158]: [2-2] STATEMENT:  SELECT editor, count(vote.id) FROM vote
Aug  8 06:59:10 totoro postgres[28158]: [2-3] #011                     JOIN editor ON vote.editor = editor.id
Aug  8 06:59:10 totoro postgres[28158]: [2-4] #011                     WHERE NOT superseded AND vote != -1
Aug  8 06:59:10 totoro postgres[28158]: [2-5] #011                       AND cast(privs AS bit(10)) & 2::bit(10) = 0::bit(10)
Aug  8 06:59:10 totoro postgres[28158]: [2-6] #011                     GROUP BY editor, editor.name
Aug  8 06:59:10 totoro postgres[28158]: [2-7] #011                     ORDER BY count(vote.id) DESC, musicbrainz_collate(editor.name)
Aug  8 06:59:10 totoro postgres[28158]: [2-8] #011                     LIMIT 25

As the job of a cache is simply to speed things up, but not be the primary source, this solution will not work, and we need to investigate either speeding these queries up, or storing the results. It may be good enough to simply calculate these stats once a day and add the to our statistics table.



Sort Order: Ascending order - Click to sort in descending order
Oliver Charles added a comment - 08/Aug/12 02:51 PM

I've created this in 2012-09-03 as it's somewhat of a regression/incomplete new feature.


Ian McEwen added a comment - 08/Aug/12 09:55 PM

IMO we should probably just store these in the statistic table and compute them daily like the others, since editor IDs are also numbers; graphing the stat would end up looking very silly, but that's okay given we don't need to add it to the main graph.


Oliver Charles added a comment - 21/Aug/12 01:20 PM

Hmm, how would we record the statistic? It would have to be stat.editor.top.$editor_id = $edit_count, which is a bit... weird. Whatcha think?


Ian McEwen added a comment - 21/Aug/12 03:36 PM

I'd say you'd need two stats per line: one that's for, say edit.editor.top.overall.$rank = $editor_id and one for count.edit.editor.top.overall.$rank = $edit_count. Then you'd fetch all stats, pull out edit.editor.top.overall.([0-9]+) mapped to count.edit.editor.top.overall.$1 and order by $1. Repeat for vote.editor.top.overall.$rank and whatever else.


Ian McEwen added a comment - 21/Aug/12 03:38 PM

Hence my comment about editor IDs just being integers – we can store them in the 'value' end of the equation, not just in the name. If we don't want to preserve history, we can set up some sort of overwrite and set them all to some dummy date, but just doing them as normal stats seems saner to me.


Oliver Charles added a comment - 29/Aug/12 10:15 AM

Moving to the next release as we're in freeze now.


Ian McEwen added a comment - 10/Sep/12 10:22 AM

Moving to next fix version, freeze is today and nobody seems to be working on this.


Ian McEwen added a comment - 11/Sep/12 09:11 AM