Possible source of Duplicate Stations

Hello OCM,

I see quite a few instances of duplicate stations in which the station with the lower OCM number has no Operational Status field set. I am wondering if new imports of the AFDC automatically generate a duplicate because the existing station at that location can not be found when this field is blank.

Here’s one example, but there are many.

OCM-121591 (Walmart 124 Little Rock) has no Operational Status.

OCM-122748 is a duplicate of the above.

Andy

1 Like

Thanks for raising that Andy, I’ve not seen that issue before.

Looks like many date from 2019 or thereabouts and about 5k affected (so 2.5K duplicates in that data set of 67K). I’ll have a look at whether we can clean these up as a batch at some point.

I have been finding that the higher-numbered OCM-Id’s are generally closer to true location of the chargers.

I’ve got an automated duplicate detector and have been removing them by hand in the US. I would be happy to send a list of duplicate OCM-pairs, …

Andy

Thanks @AndyK I’ve applied an automated cleanup for most of these (highest ID preferred except where there are user comments or photos) plus some manual work for about 60 or so items.

Okay, cool, thank you!

I must have a slightly different algorithm because I’m still detecting some duplicates.

For instance OCM-113045 and OCM-123618 at 1340 North Fourth Street, Wytheville, Virginia 24382 I’m correctly flagging as duplicates. My algorithm looks for same Region (E.g., Texas or British Columbia), same Provider, and within 100 meters of each other. Note that I’m hand checking these so I’m not concerned with false matches, which I have seen.

Some of the duplicate stations are pretty far apart. For instance OCM-280697 and OCM-282032 at 13306 Saint John Church Rd, Stony Creek, VA, 23882 which are 70 meters from each other.

Thanks!
Andy

Thanks @AndyK the changes I made were only focused on duplicated data provider references so two items appearing in the imported data with the same data providers reference.

We do have generic de-duplication on imports (I think it might be to 50m radius but I can’t remember without looking at the code) but we don’t enforce deduplication for edits, so if someone manually submits a location we will ask if they are sure they’re not submitting a duplicate, but we won’t prevent it if they insist.

We do have an admin report that analyses the data set for duplicates but there are so many potential duplicates it’s basically useless.

Looking at the detailed JSON data it’s possible to determine how the OCM-280697 / OCM-282032 duplicate came about.

  1. Construction of the station on the NE corner of the Davis Travel center was underway in 2023 July per the Google Maps camera car images.

  2. A user added OCM-280697 on 2023-11-29 using http://openchargemap.org. The lat/lng are ~ 70 meters away from the true chargers location.

  3. An automated data import from afdc.energy.gov occurred on 2023-12-04 (about a week later) and created duplicate OCM-282032.

So the user created the first entry and the automated import of AFDC created the duplicate.

The addresses are also slightly different.
USER: 13306 Saint John Church Rd, Stony Creek, VA, 23882
AFDC: 13306 St John Church Rd, Stony Creek, VA, 23882
GOOGLE: 13306 St John Church Rd, Stony Creek, VA 23882

I’m guessing that the user being off by over 50 meters and the slight address discrepancies lead to the duplicate not being detected by the automated importer.

Based on this thread in which you mention, “focused on duplicated data provider references” to avoid duplicate detection, it’s probably best to retain the AFDC-generated station data. However, I’ll attempt to sleuth a few more duplicates to see if this is a consistent pattern, especially across providers.