Skip to content

API pattern for “setdefault”-like operations #40

@encukou

Description

@encukou

IMO, we need an API pattern for “setdefault”-like operations, which would be safer under nogil.

I was thinking about these proposed new APIs:

The first case is like dict.get. IMO, it's good to return 0 and set *result to NULL if the item is not found. If the caller needs a default value, they can create it when needed. However, if they then want to insert the default back into the dict, in nogil they run into a race condition: the item might have already been inserted.

The second case is like dict.setdefault, doing some non-trivial work if the item is not found (in this case, creating a new module and putting it in sys.modules). That should be atomic, so it makes sense to me to use a default -- either take it as an argument, or us an implied empty value. Taking a default argument is not free, though: the caller needs to create it even if it's unused.

The trouble with PyImport_ImportOrAddModule (and other “setdefault”-style API with implied defaults) is that the newly created default sometimes needs further customization, and that will not be atomic under nogil: other threads have a chance to see a uninitialized module.

IMO, we need an API pattern for “setdefault”-like operations that allows creating a custom new value only if needed, and either

  • locks the container while that value is being constructed (which sounds like a deadlock magnet), or
  • handles the race by discarding a losing thread's newly made default:
# rough pseudocode
value = get(key)
if value is NULL:
    new = create_a_default()
    value = setdefault(key, new)   # note that `value == new` is not guaranteed
    ddecref(new)

Note that this get+setdefault is different from get+set (which we use in a bunch of places, for example in https://github.com/python/cpython/blob/d2f305dfd183025a95592319b280fcf4b20c8694/Python/pylifecycle.c#L2276-L2285 -- didn't check if that instance is safe).
With get+set, if there's a race, the first thread's value is discarded -- but that value might already have been retrieved by other code.

cc @colesbury

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions