SQLAlchemy tracking column modifications


Recently I've been playing around with tracking record/field level modifications in SQLAlchemy, here's what I've learnt.

Tracking record created at / updated at times

A common pattern in SQLAlchemy is to track the created_at & updated_at values of individual records.

Typically this is achieved with a mixin as follows:

class TimestampMixin(object):
    created_at: Mapped[dt.datetime] = mapped_column(DateTime, default=func.now())
    updated_at: Mapped[dt.datetime] = mapped_column(
        DateTime, default=func.now(), onupdate=func.now()
    )


class User(TimestampMixin, Base):
    __tablename__ = "user_account"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))
    fullname: Mapped[Optional[str]]

Here's an example of this in practice:

>>> session = Session(engine)
>>> spongebob = User(name="spongebob", fullname="Spongebob Squarepants")
>>> session.add(spongebob)
>>> session.commit()

>>> spongebob.created_at
datetime.datetime(2023, 6, 22, 18, 17, 52)

>>> spongebob.updated_at
datetime.datetime(2023, 6, 22, 18, 17, 52)

>>> spongebob.name = "Spongebob"
>>> session.commit()

>>> spongebob.updated_at
datetime.datetime(2023, 6, 22, 18, 18, 11)

Tracking individual field edit times

What if we needed to keep track of when each individual field was updated/changed independently?

There's a million ways to achieve this, but a simple mechanism I came up with is to use a mixin that saves individual field changes to a JSON column on each table

class ChangeTrackingMixin:
    changes: Mapped[Dict] = mapped_column(JSON, default=JSON.NULL)

    def __setattr__(self, attr, value):
        if attr != "changes" and attr in self.__table__.c.keys():
            self._record_field_change(attr, value)
        super().__setattr__(attr, value)

    def _record_field_change(self, attr, new_value):
        if self.changes is JSON.NULL or self.changes is None:
            self.changes = {}
        self.changes[attr] = {
            "value": new_value,
            "updated_at": dt.datetime.now().isoformat(),
        }
        flag_modified(self, "changes")

    def get_updated_at(self, attr):
        updated_at = self.changes.get(attr, {}).get("updated_at")
        if updated_at:
            return dt.datetime.fromisoformat(updated_at)

How it works

The 'magic' here is this section:

def __setattr__(self, attr, value):
    if attr != "changes" and attr in self.__table__.c.keys():
        self._record_field_change(attr, value)
    super().__setattr__(attr, value)

This __setattr__ method gets called every time we try to modify the attribute of a user record.

If we happen to be editing one of the keys in the user table (ignoring the "changes" field):

if attr != "changes" and attr in self.__table__.c.keys():

Then we call some extra bookkeeping logic:

self._record_field_change(attr, value)

Then fall back to calling the original logic to update the field

super().__setattr__(attr, value)


Usage

To lookup the individual edit time of a field you can write

>>> obj.get_updated_at('field_name')

You can see this in action below:

>>> session = Session(engine)
>>> spongebob = User(name="spongebob", fullname="Spongebob Squarepants")
>>> session.add(spongebob)
>>> session.commit()

>>> spongebob.created_at
datetime.datetime(2023, 6, 22, 19, 6, 11)

>>> spongebob.updated_at
datetime.datetime(2023, 6, 22, 19, 6, 11)

# Update the name (uppercase 'S')
>>> spongebob.name = "Spongebob"
>>> session.commit()

# Reflects the new timestamp of when 'name' was updated
>>> spongebob.get_updated_at("name")
datetime.datetime(2023, 6, 22, 16, 6, 53, 350103)

# Note 'fullname' still retains its earlier value
>>> spongebob.changes
{'name': {'value': 'Spongebob', 'updated_at': '2023-06-22T16:06:53.350103'}, 'fullname': {'value': 'Spongebob Squarepants', 'updated_at': '2023-06-22T16:06:04.176241'}}


Downsides

No server time

A major issue with this approach is we're no longer using the server time func.now(). Instead we're using the local time of the Python environment executing the code.

dt.datetime.now().isoformat()

Unfortunately this means that the updated_at timestamp will be different for each dirty field in the session when committed. This is demonstrated below:

>>> session = Session(engine)
>>> spongebob = User(name="spongebob", fullname="Spongebob Squarepants")
>>> session.add(spongebob)
>>> session.commit()

# Both these values are the same (computed server side)
>>> spongebob.created_at
datetime.datetime(2023, 6, 22, 19, 6, 11)

>>> spongebob.updated_at
datetime.datetime(2023, 6, 22, 19, 6, 11)

# Each of these individual field values are different :(
>>> spongebob.get_updated_at("fullname")
datetime.datetime(2023, 6, 22, 16, 6, 4, 176241)

>>> spongebob.get_updated_at("name")
datetime.datetime(2023, 6, 22, 16, 6, 4, 176218)

Doesn't work with update statements

Mutating the record with an update will not propagate changes to the JSON column timestamps:

>>> session = Session(engine)
>>> spongebob = User(name="spongebob", fullname="Spongebob Squarepants")
>>> session.add(spongebob)
>>> session.commit()

>>> spongebob.get_updated_at('name')
datetime.datetime(2023, 6, 22, 16, 38, 25, 595457)

>>> session.execute(update(User).values(name="Spongebob"))
<sqlalchemy.engine.cursor.CursorResult object at 0x7f2cd49657f0>

>>> session.refresh(spongebob)

# This value is is the same as before!!!
>>> spongebob.get_updated_at('name')
datetime.datetime(2023, 6, 22, 16, 38, 25, 595457)

To compare, using an update statement does update record level updated_at field from the TimestampMixin

>>> session = Session(engine)
>>> spongebob = User(name="spongebob", fullname="Spongebob Squarepants")
>>> session.add(spongebob)
>>> session.commit()

>>> spongebob.updated_at
datetime.datetime(2023, 6, 22, 19, 35, 52)

>>> session.execute(update(User).values(name="Spongebob"))
<sqlalchemy.engine.cursor.CursorResult object at 0x7f35d8f697f0>

>>> session.refresh(spongebob)

# Value is updated!
>>> spongebob.updated_at
datetime.datetime(2023, 6, 22, 19, 36, 10)

Conclusion

Perhaps there's a clever mechanism with event listeners that avoids these drawbacks but I've yet to figure it out.

If you happen to know and I haven't updated this post, please let me know what I'm missing!