After way too much time investigating this, the encoding situation is
not great right now.
- Stable sets the "default code page" to be used for encoding filenames
to Shift-JIS (932):
c29ebd7fc5/osu!/GameBase.cs#L3099
- Lazer does nothing (therefore using UTF-8).
When importing to lazer, stable files are assumed to be UTF-8. This
means that the linked beatmaps don't work correctly. Forcing lazer to
decompress *and* compress using Shift-JIS will fix this.
Here's a rough idea of how things look for japanese character filenames
in current `master`:
| | stable | lazer |
|--------|--------|--------|
| export encoding | shift-jis | utf8 |
| utf8 [bit flag](https://superuser.com/a/1507988) set | ❌ | ❌ |
| import stable export osz | ✅ | ❌ |
| import lazer export osz | ❌ | ✅ |
| windows unzip | ❌ | ❌ |
| macos unzip | ✅ | ✅ |
and after this change
| | stable | lazer |
|--------|--------|--------|
| export encoding | shift-jis | shift-jis |
| utf8 [bit flag](https://superuser.com/a/1507988) set | ❌ | ❌ |
| import stable export osz | ✅ | ✅ |
| import lazer export osz | ✅ | ✅ |
| windows unzip | ❌ | ❌ |
| macos unzip | ✅ | ✅ |
A future endeavour to improve compatibility would be to look at setting
the utf8 flag in lazer, switching the default to utf8, and ensuring the
stable supports this flag (I don't believe it does right now).
Closes https://github.com/ppy/osu/issues/27540.
As it turns out, some ZIP archivers (like 7zip) will decide to add fake
entries for directories, and some (like windows zipfolders) won't.
The "directory" entries aren't really properly supported using any
actual data or attributes, they're detected by sharpcompress basically
by heuristics:
ab5535eba3/src/SharpCompress/Common/Zip/Headers/ZipFileEntry.cs (L19-L31)
When importing into realm we have thus far presumed that these directory
entries will not be a thing. Having them be a thing breaks multiple
things, like:
- When importing from ZIPs with separate directory entries, a separate
`RealmFile` is created for a directory entry even though it doesn't
represent a real file
- As a result, when re-exporting a model with files imported from such
an archive, a zero-byte file would be created because to the database
it looks like it was originally a zero-byte file.
If you want to have fun, google "zip empty directories". You'll see
a whole gamut of languages, libraries, and developers stepping on this
rake. Yet another episode of underspecced mistakes from decades ago
that were somebody's "good idea" but continue to wreak havoc forevermore
because now there are two competing conventions you can't just pick one.
Switch from InvariantCultureIgnoreCase to OrdinalIgnoreCase when
checking file paths in archives for substrings indicating the file can
be ignored for performance gains.
Co-Authored-By: Dan Balasescu <smoogipoo@smgi.me>
Add a filename ignore list to ZipArchiveReader to filter out superfluous
OS-generated files from archives during the import process. In addition
to decreasing the size of files imported this allows imports of some
incorrectly-constructed archives. An example is the case of having
a __MACOSX directory next to a single directory with the actual files -
filtering out the former at ZipArchiveReader allows the fallback added
in #6170 to work.