Commit e0c77eb3 authored by Georgios D. Tsoukalas's avatar Georgios D. Tsoukalas
Browse files

Fix serious inefficiency in pithos 0.13 migration

As was run on production during migration.

The migration looped over all rows (node, muser) from versions,
and update muser in each one.

However, the tuples (node, muser) are not unique in the table,
and more important, the muser values are much fewer than the nodes
(since there are by definition much more users than files).

Instead, we now loop over the distinct muser values,
and issue an update statement for each one (which updates much more
than one row, one for every version this muser has produced)
parent 0a0abc65
......@@ -118,17 +118,17 @@ def migrate(callback):
bar.next()
bar.finish()
s = sa.select([v.c.node, v.c.muser])
versions = connection.execute(s).fetchall()
s = sa.select([v.c.muser]).distinct()
musers = connection.execute(s).fetchall()
bar = IncrementalBar('Migrating version modification users...',
max=len(versions)
max=len(musers)
)
for node, muser in versions:
for muser, in musers:
match = callback(muser)
if not match:
bar.next()
continue
u = v.update().where(v.c.node == node).values({'muser':match})
u = v.update().where(v.c.muser == muser).values({'muser': match})
connection.execute(u)
bar.next()
bar.finish()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment