We have Unicode these days: blåhaj
Programmer Humor
Post funny things about programming here! (Or just rant about your favourite programming language.)
Rules:
- Posts must be relevant to programming, programmers, or computer science.
- No NSFW content.
- Jokes must be in good taste. No hate speech, bigotry, etc.
Unicode in filenames? Are you crazy?!
Okay that was /s to some extent but I gotta rant, I'm totally convinced that there's still new software today that completely trip over themselves when files or paths have non-ASCII characters, or sometimes even a space. Incompetence didn't go anywhere.
I still use underscores for filenames, basically muscle memory at this point
Spaces in file names will always be fiddly though. It'll work, but it'll still be wrong, because arguments are space separated, and having spaced file names totally messes with that.
I try to just always put files names or paths into quotes in CLI or tie it to a variable in programming. This way it also accepts spaces and knows how to separate it from arguments.
Yeah. It's a good idea to guard against it, but I would still never put spaces in filesnames that I myself choose.
Unicode in filenames can be a bad idea, since there are more than one way to achieve what looks like the same character. So matching patterns could fail if you think it's one way, but it's actually another representation in unicode.
Good point. Do filesystems use a normal form to at least prevent having two files with effectively the same name?
I should point out the flip side though, that there's no avoiding Unicode in filenames. Users in languages that don't use the Latin alphabet (such as Japanese, Chinese, Korean, Hebrew, Arabic, Greek and Russian, and the list could go on) can reasonably expect to be able to give a file a name they can read and understand with no extra effort. All the software woes that come with it - too bad, software needs to deal with it.
I'm not sure. A few years ago I remember that OpenBSD expected ASCII for files, but I think Linux expects utf-8. I could be wrong though.
I'm assuming Unicode anyway, and UTF-8 is by far the most natural because most files will be in ASCII. A "normal form" (see link above), you might think of it as a canonical form, is a way to check if two strings are equivalent, even if they encoded the text differently. Like the example mentioned on Wikipedia:
For example, the distinct Unicode strings "U+212B" (the angstrom sign "Å") and "U+00C5" (the Swedish letter "Å") are both expanded by NFD (or NFKD) into the sequence "U+0041 U+030A" (Latin letter "A" and combining ring above "°") which is then reduced by NFC (or NFKC) to "U+00C5" (the Swedish letter "Å").
Incompetence didn't go anywhere.
Now that's certainly true, but the beauty of open source software is that we can fix bugs when we encounter them.
I'm too lazy to memorize alt codes
Use a compose key
Why you torture blahaj?
Why are we sttill kink shaming?
Blahaj cannot speak, therefore Blahaj cannot give consent.
You don't necessarily need speech for consent since non-verbal/mute people exist.
blahaj.exe.tar.gz
blahaj.elf.tar.gz.part
blåhaj.squashfs
I feel like unicode in the filename is heavily against the spirit of using squashfs, or at least the ways I've seen it used.
Ok, what kind of monster names their executables .elf
?
Well, a.out doesn't make much sense these days.
Gotta move to .elf
Pi Pico SDK does. Well, the version for debugging symbols, anyway. Regular executable is .uf2.
I reserve .elf for executables for other platforms, like microcontroller firmware.
wii homebrew developer maybe?
mv blahaj.elf.tar.gz.part ./rivendell
Speaking of which, it blew my mind when I discovered that .EXEs are just ~~zip files~~ compressed archives. Same goes for .DLLs, and a lot of other common Windows file extensions as well. (.DOC too, for example IIRC). They all open in your favorite archiver software (I like NanaZip; which is a fork of 7-Zip with a modern UI).
I don't think that's true for .exe or .dll files, but it's definitely true for .docx files and other Office files ending with x. Some .exe's are self-extracting archives or have other files embedded in them, so maybe that's what you've been seeing.
You are actually correct. They can contain archived files or resources that can be unpacked with an archive program (including on Linux btw), but they aren't just a zip file. That's why my Linux archive manager (ark I think) offer to open one, but won't execute it. They can see the extra content even if they can't execute the file as intended.
Aren’t the x-suffixed files just an xml format?
It's a zip file that includes a bunch of things, including embedded images and a bunch of other junk, but yes - the most important and central files in the zip are XML-based.
Just because they open in 7-Zip or whatever doesn't mean they are just a zip file. There are several kinds of archives. EXEs are a special case as well. They aren't archives at all. Rather they can contain archives or extra content along with being an executable. One reason is self extracting archives. Here an archive is packaged with an extraction program as an exe all in one. The other case is exes that have extra resources like images, videos, graphics textures, etc. Either way it's an executable plus some extra stuff, not a zip archive. DLLs I am not sure about, but I suspect something similar is happening here.
Next time you should research stuff before posting it on Lemmy. Things are sometimes more complicated than they appear.
docx you are correct about though. Specifically it's a zip file that contains XML files and resources.
Edit: I actually found an article on self extracting archives, it's quite an interesting technology to be fair even if it causes confusion: https://en.m.wikipedia.org/wiki/Executable_compression
By "zip file", I meant a compressed archive. I'm not as nerdy as you guys are so I see now that there is a difference. I appreciate the correction.
That said, you have to admit that it's still cool that these different file formats are nothing more than archives. Maybe not to you but it blew my mind when I first learned this.
blahaj.zone
free him
You don't need to tape archive it, it's one thing
Yeah but you can
I feel so compressed.