The column Subject shows non-ascii characters in a weird way
DiscussionsBug Collectors
Rejoignez LibraryThing pour poster.
1bnielsen
I've noticed that Subject is shown in one way in "Your books" (wrong) and another way in the export files (correct) when Subject contains non-ascii.
Example: https://www.librarything.com/catalog/bnielsen?&deepsearch=232960202
finds a book in my library by Juan Moreno.
Choosing a style that includes Title and Subject shows Title as
Tusind linjer løgn : den utrolige historie om en forudsigelig medieskandale
and Subject as
bedragere
journalister
journalisti.k
medier
pressevÊsen
Relotius, Claas
skandaler
Tidsskrifter
Tyskland
When I export as TSV the Subject is given as Journalister|Medier|Relotius, Claas|Skandaler|TYSKLAND|Tidsskrifter|Tyskland|bedragere|journalister|journalisti.k|medier|pressevæsen|skandaler|tidsskrifter|tyskland
The export file has pressevæsen as one Subject but LT shows it as pressevÊsen which is weirdly wrong, so I guess that some character set conversion is wrong.
I don't think this is something my browser is doing, but I'd like others to confirm that they are seeing the same as me :-)
Example: https://www.librarything.com/catalog/bnielsen?&deepsearch=232960202
finds a book in my library by Juan Moreno.
Choosing a style that includes Title and Subject shows Title as
Tusind linjer løgn : den utrolige historie om en forudsigelig medieskandale
and Subject as
bedragere
journalister
journalisti.k
medier
pressevÊsen
Relotius, Claas
skandaler
Tidsskrifter
Tyskland
When I export as TSV the Subject is given as Journalister|Medier|Relotius, Claas|Skandaler|TYSKLAND|Tidsskrifter|Tyskland|bedragere|journalister|journalisti.k|medier|pressevæsen|skandaler|tidsskrifter|tyskland
The export file has pressevæsen as one Subject but LT shows it as pressevÊsen which is weirdly wrong, so I guess that some character set conversion is wrong.
I don't think this is something my browser is doing, but I'd like others to confirm that they are seeing the same as me :-)
2waltzmn
>1 bnielsen: I don't think this is something my browser is doing, but I'd like others to confirm that they are seeing the same as me :-)
I see it too, in your library.
I'd guess it's not a character set issue with the browser, because many of the non-standard characters are correct. It looks more like a character set issue in the data import.
It happens a LOT in the books that are imported "from old catalog(ue)," with which I have much experience. :-)
I see it too, in your library.
I'd guess it's not a character set issue with the browser, because many of the non-standard characters are correct. It looks more like a character set issue in the data import.
It happens a LOT in the books that are imported "from old catalog(ue)," with which I have much experience. :-)
3bnielsen
>2 waltzmn: Thanks for confirming it. I guess that maybe the export mechanism sanitizes this?
4AnnieMod
I don’t think that the export sanitizes anything - I suspect that we are looking at differences in the default encoding of LT and the encoding in the export.
5bnielsen
>4 AnnieMod: Possible. The catalog view use frames, so it's beyond me to guess the character sets involved :-)
However a mouse over pressevÊsen shows that it links to
https://www.librarything.com/subject/pressevæsen, i.e.
https://www.librarything.com/subject/pressev%C3%A6sen
which gives:
Subject: pressevæsen
Books Under This Subject
None
https://www.librarything.com/subject/pressev%C3%83%C5%A0sen
gives:
Subject: pressevÊsen
Books Under This Subject
None
so there's another bug with the links.
ETA: Source: Det kongelige Bibliotek / Københavns Universitet
ETA: Hmm, lots of Subjects link to something empty. Some don't, like:
https://www.librarything.com/subject/Usa
so I don't know what to expect. I would have expected each link to at least give the book containing the link, but apparently not?
However a mouse over pressevÊsen shows that it links to
https://www.librarything.com/subject/pressevæsen, i.e.
https://www.librarything.com/subject/pressev%C3%A6sen
which gives:
Subject: pressevæsen
Books Under This Subject
None
https://www.librarything.com/subject/pressev%C3%83%C5%A0sen
gives:
Subject: pressevÊsen
Books Under This Subject
None
so there's another bug with the links.
ETA: Source: Det kongelige Bibliotek / Københavns Universitet
ETA: Hmm, lots of Subjects link to something empty. Some don't, like:
https://www.librarything.com/subject/Usa
so I don't know what to expect. I would have expected each link to at least give the book containing the link, but apparently not?