1
0
mirror of https://github.com/ARM-software/devlib.git synced 2024-10-06 02:40:50 +01:00

memoized: fix for bug where wrong cached results were returned

memoized() used id() to get "unique" representations for arguments
passed to a function, in order to ensure that results for the same
function called with different arguments are treated differently.

However, Python object identities are only guaranteed to be unique at a
particular point in time. It is possible than a particular ID gets
reused for a different object if the previous object associated with
that ID no longer exists. In particular, in CPython, the IDs are just
addresses of the corresponding PyObject's in memory. If a PyObject gets
garbage collected, another object may get allocated in its place, and
the new object will "inherit" the ID by virtue of being in the same
memory location. If the new object is then used to call a memoized
function that was previously called with the old object, the old cached
result will be incorrectly returned.

To get around this issue, the cache is now indexed not only by the ID of
an object but also but the first few bytes of its value.

NOTE: there is still a potential issue it two relatively large objects
gets allocated in the same place and happen to share the first few
(currently 32) bytes, and are then both used as parameters to the same
memoized function. The only way around that is to hash the entire value
of the object. However, given the performance penalty that would incur
for larger object, and the unlikeliness of this situation actually
arising, it is currently deemed not worth it.
This commit is contained in:
Sergei Trofimov 2016-09-06 09:57:58 +01:00
parent 390a544a92
commit 6d854fd4dc

View File

@ -29,6 +29,7 @@ import subprocess
import pkgutil
import logging
import random
import ctypes
from operator import itemgetter
from itertools import groupby
from functools import partial
@ -541,13 +542,36 @@ def reset_memo_cache():
__memo_cache.clear()
def __get_memo_id(obj):
"""
An object's id() may be re-used after an object is freed, so it's not
sufficiently unique to identify params for the memo cache (two different
params may end up with the same id). this attempts to generate a more unique
ID string.
"""
obj_id = id(obj)
obj_pyobj = ctypes.cast(obj_id, ctypes.py_object)
# TODO: Note: there is still a possibility of a clash here. If Two
# different objects get assigned the same ID, an are large and are
# identical in the first thirty two bytes. This shouldn't be much of an
# issue in the current application of memoizing Target calls, as it's very
# unlikely that a target will get passed large params; but may cause
# problems in other applications, e.g. when memoizing results of operations
# on large arrays. I can't really think of a good way around that apart
# form, e.g., md5 hashing the entire raw object, which will have an
# undesirable impact on performance.
num_bytes = min(ctypes.sizeof(obj_pyobj), 32)
obj_bytes = ctypes.string_at(ctypes.addressof(obj_pyobj), num_bytes)
return '{}/{}'.format(obj_id, obj_bytes)
@wrapt.decorator
def memoized(wrapped, instance, args, kwargs):
"""A decorator for memoizing functions and methods."""
func_id = repr(wrapped)
def memoize_wrapper(*args, **kwargs):
id_string = func_id + ','.join([str(id(a)) for a in args])
id_string = func_id + ','.join([__get_memo_id(a) for a in args])
id_string += ','.join('{}={}'.format(k, v)
for k, v in kwargs.iteritems())
if id_string not in __memo_cache: