This fell out while investigating why the issue with online IDs
mismatching in the `.osu` could be worked around by importing the map
three times in total when starting from it not being available locally.
Here follows an explanation of why that "helped".
Import 1:
- The beatmap set is imported normally.
- Online metadata population sees the online ID mismatch and resets it
on the problematic beatmap.
Import 2:
- The existing beatmap set is found, but deemed not reusable
because of the single beatmap having its ID reset to -1.
- The existing beatmap set is marked deleted, and all the IDs of
its beatmaps are reset to -1.
- The beatmap set is reimported afresh.
- Online metadata population still sees the online ID mismatch
and resets it on the problematic beatmap.
Note that at this point the first import *is still physically present
in the database* but marked deleted.
Import 3:
- When trying to find the existing beatmap set to see if it can be
reused, *the one pending deletion and with its IDs reset -
- the remnant from import 1 - is returned*.
- Because of this, `validateOnlineIds()` resets online IDs
*on the model representing the current reimport*.
- The beatmap set is reimported yet again.
- With the online ID reset, the online metadata population check for
online ID mismatch does not run because *the IDs were reset to -1*
earlier.
Preferring undeleted models when picking the model instance for reuse
prevents this scenario.
Both online and offline using the cache.
The rationale behind this change is that in the current state of
affairs, `TestPartiallyMaliciousSet()` fails in a way that cannot be
reconciled without this sort of change.
The test exercises a scenario where the beatmap being imported has an
online ID in the `.osu` file, but its hash does not match the online
hash of the beatmap. This turns out to be a more frequent scenario than
envisioned because of users doing stupid things with manual file editing
rather than reporting issues properly.
The scenario is realistic only because the behaviour of the endpoint
responsible for looking up beatmaps is such that if multiple parameters
are given (e.g. all three of beatmap MD5, online ID, and filename), it
will try the three in succession:
f6b341813b/app/Http/Controllers/BeatmapsController.php (L260-L266)
and the local metadata cache implementation reflected this
implementation.
Because online ID and filename are inherently unreliable in this
scenario due to being directly manipulable by clueless or malicious
users, neither should not be used as a fallback.
I don't know why this was ever a good idea, and would say that we want
this to fail *hard* not soft. If things ever get in this state, things
have gone *seriously* wrong elsewhere, and need to be fixed there.