Currently when edits are submitted by normal users we capture the JSON version of the POI before and after the change. Editors can then approve the change or reject it, if nobody does anything then the change is auto approved after a few days.
After a while we archive the edit queue history for a POI and may eventually discard old edits.
If we were going to implement a new process for differential edits I’d want us to use something that’s well established, so we don’t have to build a bunch of custom stuff. Maybe git, maybe dat (https://dat.foundation/) maybe some other open data versioning tool.
The main issue with bulk updates is that the submitter may or may not have bothered to de-duplicate their data or compare it to what we already have. There is also data model transformation necessary in most case to translate connector types/field names etc. We have a slight gap in our data model when it comes to grouping EVSE that it more obvious when compared to OCPI and it can be an issue or not depending on what we’re importing.
The vast majority of data we see elsewhere in other open data sources (gov repositories etc) has large segments of low quality data (POIs in the wrong country, invalid latitude/longitude etc) and ideally if we know someone if trying to make a batch update then we first would feed it though some filters. Currently we do this with our own imports, but I don’t plan to continue these as they are not really sustainable (they are moving targets).
Operators etc could use our normal API to submit one change at a time, but they would inevitably try to replace all the fields rather than doing a differential update of just one or two fields, overwriting anything real users may have contributed (e.g. corrections). Currently if a user edits something that was previously from an import we take control of that POI from then on and don’t attempt to import anything over the top of it (it’s been this way since about 2013).
It’s a fairly classic data management problem of which data source is the primary and which is just a copy. Ultimately we would rather be the primary, or lock records so that they can only be updated by the source (which was considered in the past but means you have to keep mistakes and can’t fix them directly).
I’m very open to suggestions and even the development of new tools/systems, but if it requires me to do anything then it’s probably not going to happen unfortunately.