• pmk@lemmy.sdf.org
    link
    fedilink
    arrow-up
    2
    ·
    5 months ago

    I’m not sure. A few years ago I remember that OpenBSD expected ASCII for files, but I think Linux expects utf-8. I could be wrong though.

    • NeatNit@discuss.tchncs.de
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      5 months ago

      I’m assuming Unicode anyway, and UTF-8 is by far the most natural because most files will be in ASCII. A “normal form” (see link above), you might think of it as a canonical form, is a way to check if two strings are equivalent, even if they encoded the text differently. Like the example mentioned on Wikipedia:

      For example, the distinct Unicode strings “U+212B” (the angstrom sign “Å”) and “U+00C5” (the Swedish letter “Å”) are both expanded by NFD (or NFKD) into the sequence “U+0041 U+030A” (Latin letter “A” and combining ring above “°”) which is then reduced by NFC (or NFKC) to “U+00C5” (the Swedish letter “Å”).