'Stop list' words misapplied in other languages

DiscussionsRecommend Site Improvements

Rejoignez LibraryThing pour poster.

'Stop list' words misapplied in other languages

1Cynfelyn
Juil 27, 2016, 3:43 pm

This is probably asking for the moon, in which case can anyone suggest a work-around?

Sorting my library alphabetically by title, books include (with the sort character in brackets):
Friends at Thrush Green (1)
I fyd sy well (3)
Gabriela (1)
...
Makers of Rome (1)
Un man (4)
The man who mistook his wife for a hat (5)
...
Ultimate sticker book : Dinosaur (1)
Un ennyd fer : bywyd a gwaith E. O. Jones, 1873-1915, ffotograffydd cynnar (1)
The upanishads (5)

LT clearly has a multi-lingual stop list of words which automatically trigger sort characters, in these cases including Roman I, French indefinite article 'un', and English definite article 'the'. And it's clearly my job to edit my books if they are misapplied; 'i' is 'to', and 'un' is 'one' in Welsh. Fair enough so far.

But this stop list or something similar is also being applied when arranging books indexed with the same CK People/Characters, CK Places, and (within your own library) tagged with the same tag. For example, Un ennyd fer is listed on http://www.librarything.com/place/Ceredigion%2C+Wales between '(The) dual nature of the Irish colonization of Dyfed ...' and 'Hand-list ...', rather than between 'Tregaron ...' and 'Valley ...', as it would be if it was only the sort character in play.

See also where Die another day (German def. art.) appears in http://www.librarything.com/character/James+Bond. And, ad absurdum, where The Les Dawson Joke Book (English + French def. art's) appears in http://www.librarything.com/place/Britain. How to de-register individual books from the stop list?

2MarthaJeanne
Juil 27, 2016, 4:12 pm

This is very peculiar. Back before we could adjust the sort character a big problem for non-English books was that words that were articles in another language but words in English were not on the list. 'Der' and 'das' were not sorted on, but 'die' was, making it very hard to handle books in German.

3jjwilson61
Juil 27, 2016, 4:23 pm

I'm surprised about 'I' since in English that wouldn't be a stop word.

4Cynfelyn
Juil 27, 2016, 7:02 pm

>3 jjwilson61: For more examples, see http://www.librarything.com/place/Los+Angeles,+California,+USA.

I Am Legend, I Am Not Sidney Poitier and I Am Not Spock all come between Always Outnumbered, Always Outgunned and American Express Pocket Guide to Los Angeles and San Francisco. So it's pretty consistent.

Interestingly, 'I' followed by a punctuation mark is indexed under I, immediately after books beginning "I'll" and "I'm". See I, Fatty, "I, Richard" (faulty touchstone) and I, Vampire (New 52) Vol. 1: Tainted Love.

There are examples of other misapplied stop words on this page. Die A Little is indexed under "little". A Is for Alibi and O Is For Outlaw are both indexed under "is". O yes, and Los Angeles itself! And LA!!

5reading_fox
Juil 28, 2016, 6:56 am

You can set the sort character in Edit Book.

Next to the title is a dropdown arrow, choose what number character you want it to sort by. I've not yet needed to change the defaults, but then I normally sort by date or tags anyway.

6gilroy
Juil 28, 2016, 7:16 am

>4 Cynfelyn: That's weird. I am not a cop sorts between House Rules and Insurgent for me...

7Cynfelyn
Juil 28, 2016, 8:48 am

>6 gilroy: Yes, that's in your own "My Library", where, as >5 reading_fox: says, you can reset the sort character yourself. But it only seems to apply to your own library.

I am not a cop has no CK, so on the basis of the tag cloud, I've just added "New York, New York, USA". If it's inappropriate, please change it. But the point is, LT has just filed the book on the http://www.librarything.com/place/New+York,+New+York,+USA page along with several other "I am ..." books, between Always On My Mind and Amanda / Miranda.

So unless anyone can think of a better approach, my RSI would be that the automatic stop word/sort character arrangement for any specific work should be subject to some threshold of users changing the sort character on their own libraries. There would probably need to be a sliding scale of thresholds, as what would be appropriate for the two copies of I fyd sy well (presumably 50%), would not be appropriate for the 66 copies of Die Hard 3: With a Vengeance, or the 3,234 copies of I Am Legend.

A couple of other examples that caught my eye:
Al Capone Does My Shirts filed under "Capone" on http://www.librarything.com/character/Al+Capone
The Al Qaeda Connection filed under "Qaeda" on http://www.librarything.com/character/Osama+bin+Laden

8lorax
Juil 28, 2016, 9:30 am

>5 reading_fox:

The original post is tremendously confusing, I admit, but the OP is not actually talking about the sort character or the sort within Your Library (which he appears to bring up only to indicate that he understands it, but spending so much time on it obscures the point). He's talking about the sort on other places such as CK character pages, where the list of 'articles to ignore' is applied universally.

9gilroy
Juil 28, 2016, 9:47 am

>7 Cynfelyn: Okay, I see your glitch now.
(Weird, I'm slacking. Normally I fill in the CK when I finish reading a book...)

10jjwilson61
Juil 28, 2016, 10:36 am

I believe this must be a recent change and in the not too distant past it just did a straight sort on the title in those places. I think this should get filed as a bug so it will get looked at by LT staff (and not languish away in this group which is where ideas go to die).

11lorax
Juil 28, 2016, 11:01 am

>10 jjwilson61:

I think this should get filed as a bug so it will get looked at by LT staff (and not languish away in this group which is where ideas go to die).

If it gets filed as a bug it will just languish away in the group where bug reports go to die.

12lorannen
Juil 28, 2016, 1:29 pm

>10 jjwilson61: This behavior shouldn't have been caused by any recent changes. It might be a bit of a time-sink in terms of code to make it happen, but I agree, this is something we should address.

13gilroy
Juil 28, 2016, 1:43 pm

>12 lorannen: Hey lorannen what ever happened to the big RSI list/meeting that was supposed to happen. It's been almost two years since Kristi created it, then it got buried under TinyCat (Thanks, Tim. :P) So what news?

14gilroy
Juil 29, 2016, 9:11 am

As I go through different things on the site, I realized this happens not just on the Series or Author pages, but it also happens on the combination pages as well...

15MarthaJeanne
Modifié : Juil 29, 2016, 10:35 am

No, on the author combine pages there are no stop words. Everything is in alphabetical order.

https://www.librarything.com/combine.php?author=everettpercival

A history
A portrait
Abstraktion
...

16gilroy
Juil 29, 2016, 11:52 am

>15 MarthaJeanne: Um, that's what this is about.

17jjwilson61
Juil 29, 2016, 11:56 am

>16 gilroy: I'm not sure what you think it's about, but what I think it's about is that I'd rather see lists of book titles everywhere just be strictly alphabetical instead of the attempt at ignoring certain words that doesn't work. In your catalog, it's fine because it's under your control, but everywhere else, it should just be alphabetical.

18gilroy
Juil 29, 2016, 11:58 am

>17 jjwilson61: I'd say it was about consistency. If you're going to offer the ability to say where the sort starts, it should be site wide, not just in the library.

Which may be where the issue comes in anyway.

19lorax
Juil 29, 2016, 12:21 pm

>16 gilroy:

No, they're different issues.

There are three types of sorts going on.

1. In Your Library, the sort character is user-controllable for every title; there is a list of articles to be ignored, such as 'The' and 'Der' which is applied by default.

2. On CK pages, the list of articles is applied universally to all titles and cannot be overridden. This results in unexpected sorting for words beginning with the English word 'Die', which is treated as the German article and thus ignored for sort purposes. Consider this to be using the article list but not allowing overrides.

3. On combine/separate pages, the list of articles is never used. Titles beginning with the English word 'A' sort at the top of the list.

Consider two hypothetical titles:

"A is for Apple"
"A history of the world"

A user would most likely choose to override the article sort in the first case and not in the second, sorting the children's alphabet book under "A" and the history under "H".

The CK page would apply the article list in both cases, sorting the alphabet book under "I" and the history under "H".

The combine/separate page would ignore the article list, sorting them both under "A".

20Cynfelyn
Mar 31, 2018, 3:35 pm

21Cynfelyn
Juil 25, 2021, 10:11 am

Bump.

For example, books with the CK Important place "Los Angeles, California, USA" (https://www.librarything.com/place/Los+Angeles,+California,+USA). The "The" and "A" examples below are being applied appropriately. The "Los", "La", "Die" and "I" examples are misapplied, presumably from non-English stop lists:

Angel: Season 4
Angel: Season 5
Los Angeles by Santi Visalli
(plus 130+ other "Los Angeles ...' and "The Los Angeles ..." titles)
Angels Flight
Angels flight : a Los Angeles funicular railway

Corridor {short story}
La corsa del levriero
The Coterie
Cottage by the Sea: A Novel

Happy Family: A Novel
Harbinger Renegade Volume 1: The Judgment of Solomon
Die Hard 3: With a Vengeance {1995 film}
A Die Hard Christmas
Hardcover

Heart Breaker
I Heart Hollywood
The Heart of a Woman
Heart Song

Looking Through Blind Eyes
Lord of Shadows
*You'd expect the 130+ "Los Angeles ..." titles to be here
Lost Hills
The Lost Letter

Short Bike Rides in and Around Los Angeles
LA Shorts
Shot

Warner Bros Story
I Was a Teenage Fairy
I Was Told It Would Get Easier
Watch Me

22Cynfelyn
Modifié : Août 27, 2023, 1:09 pm

Reminded of this by tripping over another example. The Welsh version of the multi-language From the four corners of Europe : tales and folk legends, 'O bedwar ban Ewrop : straeon gwerin o Ewrop', appears in LT's Welsh site's CK/People&Characters/Myrddin list of books under "Bedwar", or at least before "Black" (see https://cym.librarything.com/character/Myrddin).

Willa Cather's O Pioneers! also appears under "Pioneers!" in CK/Places/Virginia, USA. I don't know in what language "O" would be a candidate for a stop list. It's an exclamation in English, and "from" in Welsh, so it's being mis-applied in those languages.

I've just wasted a happy hour trying to find a book title with the first three words stop-listed, but still can't do better than the two-word titles already mentioned:
The Les Dawson Joke Book
Die A Little
The Al Qaeda Connection
A Die Hard Christmas
Also:
The UN Conference Study
The La Follettes and the Wisconsin Idea, plus all the "The La Tene culture", "The La Salle expedition", "The La Guardia", "The LA" &c. titles.
The Ein Harod Museum of Art. Besides being a German indefinite article, "Ein" also seems to be latinized Hebrew for 'spring/oasis'.

P.S. I'm also posting something to Bug Collectors, as suggested above.

23AndreasJ
Août 27, 2023, 3:14 pm

FWIW, ”O” means “The” (masculine singular) in Portuguese.

24Cynfelyn
Août 27, 2023, 3:53 pm

>23 AndreasJ: Aha. Thanks.