since the dst change I've been waking up in the morning right in the middle of a REM cycle (maybe my only REM cycle of the night because I am not good at getting enough sleep) and it's reeeeally starting to mess me up
@wrenpile characterizing semantics would be a different set of techniques altogether (topic modeling). I'm personally more interested in keywords as a way of analyzing distinctive word *choices* in a text even if those words have similar meanings to higher-frequency words.
@wrenpile "keyword" in this context meaning words that are particular to the text in question, as opposed to some other randomly chosen English text. the insight here is that many texts contain the word "beautiful," but shakespeare's sonnets are special in that the word "beauteous" occurs frequently there but not in English in general
(for comparison, the 7+ character adjectives deemed least keyword-like by this method are words like 'married', 'curious', 'necessary', 'private', 'religious', 'forward', 'beautiful', 'several', 'certain', 'different' etc. and of course it works with *all* kinds of words— limiting it to just 7+ character adjectives was just a fun experiment)
(this is from my latest attempt to do the thing that I was talking about here https://mastodon.social/@aparrish/99660550376065760, i.e., simple keyword extraction from small corpora using only stuff you get with spaCy, in particular spaCy's English-wide unigram log probabilities. current solution: use G² [see e.g. http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html], as implemented in scipy's chi2_contingency function, using a wild guess for what the actual token count is for spaCy's unigram frequencies. I guessed 1000000 and it seems to work fine)
@enkiv2 yeah, that's it exactly. in the paper you linked it's called "Weirdness" (by doing log(p in doc) / log(p in ref) I'm sorta implicitly scaling the sizes of the corpora)
"An Account of the Land of Witches" by Sofia Samatar is just breathtaking geez https://theoffingmag.com/fiction/account-land-witches/ "The word she gave me was pomegranate. It was not only a word; it was a dream. In the Land of Witches, words open doors in the dreamscape. In the dream-language, said Verken, pomegranate means dusk and the rattling of dry leaves. It also means winter. It means black bile and a cloister. It means a tooth."
as I brew tea waiting for a few million records to insert, I am beginning to realize that maybe the infrastructure for this project shouldn't have been "sqlite on a thumb drive"
this piece in which Liza Daly researches the biography of an obscure 19th century writer is fascinating and beautiful 'I traced Anna’s family and friends backwards and forwards across a continent... and all their voices, all but one, are men’s. For all her faults, her “colossal ignorance” and her crackpot theories, she had a voice, one that she insisted be heard. Thanks to accidents of history and mass digitization, her voice has carried a hundred years forward.' https://medium.com/@liza/always-a-fan-of-the-marvelous-the-hidden-history-of-anna-adolph-8c0bc3888db4
it's super weird to me that there are living people who were born before the discovery of what feel like very fundamental facts of biology, like dna's helix structure, neurotransmitters, rem sleep