The column Subject shows non-ascii characters in a weird way

DiscussionsBug Collectors

Rejoignez LibraryThing pour poster.

The column Subject shows non-ascii characters in a weird way

1bnielsen
Avr 10, 2023, 3:41 pm

I've noticed that Subject is shown in one way in "Your books" (wrong) and another way in the export files (correct) when Subject contains non-ascii.

Example: https://www.librarything.com/catalog/bnielsen?&deepsearch=232960202
finds a book in my library by Juan Moreno.
Choosing a style that includes Title and Subject shows Title as
Tusind linjer løgn : den utrolige historie om en forudsigelig medieskandale
and Subject as
bedragere
journalister
journalisti.k
medier
pressevÊsen
Relotius, Claas
skandaler
Tidsskrifter
Tyskland

When I export as TSV the Subject is given as Journalister|Medier|Relotius, Claas|Skandaler|TYSKLAND|Tidsskrifter|Tyskland|bedragere|journalister|journalisti.k|medier|pressevæsen|skandaler|tidsskrifter|tyskland

The export file has pressevæsen as one Subject but LT shows it as pressevÊsen which is weirdly wrong, so I guess that some character set conversion is wrong.

I don't think this is something my browser is doing, but I'd like others to confirm that they are seeing the same as me :-)

2waltzmn
Avr 10, 2023, 7:16 pm

>1 bnielsen: I don't think this is something my browser is doing, but I'd like others to confirm that they are seeing the same as me :-)

I see it too, in your library.

I'd guess it's not a character set issue with the browser, because many of the non-standard characters are correct. It looks more like a character set issue in the data import.

It happens a LOT in the books that are imported "from old catalog(ue)," with which I have much experience. :-)

3bnielsen
Avr 11, 2023, 12:42 am

>2 waltzmn: Thanks for confirming it. I guess that maybe the export mechanism sanitizes this?

4AnnieMod
Avr 11, 2023, 12:53 am

I don’t think that the export sanitizes anything - I suspect that we are looking at differences in the default encoding of LT and the encoding in the export.

5bnielsen
Modifié : Avr 11, 2023, 2:37 am

>4 AnnieMod: Possible. The catalog view use frames, so it's beyond me to guess the character sets involved :-)

However a mouse over pressevÊsen shows that it links to
https://www.librarything.com/subject/pressevæsen, i.e.
https://www.librarything.com/subject/pressev%C3%A6sen
which gives:

Subject: pressevæsen
Books Under This Subject
None

https://www.librarything.com/subject/pressev%C3%83%C5%A0sen
gives:

Subject: pressevÊsen
Books Under This Subject
None

so there's another bug with the links.

ETA: Source: Det kongelige Bibliotek / Københavns Universitet

ETA: Hmm, lots of Subjects link to something empty. Some don't, like:

https://www.librarything.com/subject/Usa

so I don't know what to expect. I would have expected each link to at least give the book containing the link, but apparently not?