Entry tags:
Too braindrained for subject lines
-I'd avoided working on it for *cough* two-and-a-half years, but I'm finally poking at the AO3 html parser. I've thus far succeeded in triggering a number of infinite loops, rewriting it to be 10 times slower, and causing bits of text to come out in a completely jumbled order (which makes for some interestingly surreal reading). If you need me for anything, I'll be off in the land of nodes, deep in the forests of recursion. Wish me luck.
-Finally watched The Time of Angels. Very creepy, Amy was awesome, and the Doctor was great. She saved herself! He bit her hand! Not that keen on River Song, but I'm curious to see if we learn more about her next week - I hope so.
-Have also been re-reading Agatha Christie every night before bed. I'd forgotten the extent to which the early Hercule Poirot novels are all about Hastings'... lack of perspicacity, shall we say? Really, the entire point of both The Mysterious Affair at Styles and The Murder on the Links is that he's entirely clueless and instantly infatuated with every pretty girl who walks by, and the plot hinges on it on multiple occasions. I'm halfway through The Big Four (not her best work), and age and experience only seem to have tempered those qualities of Hastings' to some extent. (I had also forgotten that he was fairly young when he first appeared, although it's possible I had a different scale for relative age when I was 12.)
"Unlikely friendship" is kind of an understatement for those two, isn't it?
-Finally watched The Time of Angels. Very creepy, Amy was awesome, and the Doctor was great. She saved herself! He bit her hand! Not that keen on River Song, but I'm curious to see if we learn more about her next week - I hope so.
-Have also been re-reading Agatha Christie every night before bed. I'd forgotten the extent to which the early Hercule Poirot novels are all about Hastings'... lack of perspicacity, shall we say? Really, the entire point of both The Mysterious Affair at Styles and The Murder on the Links is that he's entirely clueless and instantly infatuated with every pretty girl who walks by, and the plot hinges on it on multiple occasions. I'm halfway through The Big Four (not her best work), and age and experience only seem to have tempered those qualities of Hastings' to some extent. (I had also forgotten that he was fairly young when he first appeared, although it's possible I had a different scale for relative age when I was 12.)
"Unlikely friendship" is kind of an understatement for those two, isn't it?
no subject
I loved how cool Amy is and I loved it when he bit her hand!
no subject
I've been listening to the Christie BBC Radio plays on my way to/from work, it's been fun.
PS: Let me know if you want to bounce ideas for the HTML parser? I know I kind of messed it up before, but I did learn a lot about the issues involved and can probably at least discuss it.
no subject
no subject
So basically, what I'm trying to do now is see what we can offload to third-party tools that we wouldn't have to maintain. Nokogiri makes the actual parsing a lot easier, but it doesn't add linebreaks for us, which is the difficult part. If I just use gsub to turn the newlines into p and br tags and then hand it off to Nokogiri to make sense of, that produces mostly workable results, but it also adds things we don't want (br tags in invalid places) and strips out things we do want (people using inline tags to cover multiple paragraphs -- not that we want that, but I'm guessing they'd be pissed if it suddenly stopped working!). If I parse it first and then try to add the paragraphs, I wind up with text soup, at least so far. I'm going to give that a few more tries, though.
no subject
I had exactly the same problems you're discovering. I think finally I was toying with dumping it into the parser with no modification, and then iterating through it node by node and creating a new document, adding paragraphs and newlines where appropriate. That required jumping into recursion, though, which didn't thrill me, and I had a lot of trouble with text nodes vs. elements.
Originally I was putting multiple
tags between inline elements, but I wonder if they look different to our paragraph linebreaks with our stylesheet? Maybe it would be better to add a style attribute to any block elements inside them, or maybe add a surrounding div.
*ponders*
If we're on Ruby 1.9 (I think that was the number) there's a new regex engine we could use that would allow lookaheads, thus allowing us not to have to read in entire blocks of tags, but it probably wouldn't be any faster. XD
no subject
Yeah. I've been trying to do it without creating a new document, but I don't think that's going to work - too hard to keep things in order. I'm trying a second document now; we'll see if that goes any better.
I've been trying out only adding br tags inside inline elements, instead of paragraphs. I might stick with that unless somebody disapproves loudly, since it's a bit more valid. :D
And we're sadly still on Ruby 1.8.7 - moving to 1.9 is a big job, so I'm not sure when we'll get to that.
no subject
I really liked 'Time of Angels' too. The angels are the creepiest monster ever, I am always pleasurably freaked out by the episodes with them in. I've liked River Song in the past and was pleased when she turned up again, although I thought they laid the all-knowing wife thing on a bit too thick this time.
no subject
The angel pulling a Samara and escaping out of the television - yeah, thanks, I was creeped out FOREVER by The Ring, really needed a repeat. I watched from between my fingers, sort of.
And for Elz: good luck, don't get eaten by a grue! *gives tiny code fairy to light the way, and a machete for the thicker parts of the code jungle*
no subject