flamebyrd: (Default)
flamebyrd ([personal profile] flamebyrd) wrote in [personal profile] elz 2010-04-29 02:18 pm (UTC)

I tried something with the precursor to Nokogiri for a while, but eventually decided it was taking me too long and that to actually fix the bug I had originally set out to fix the regexes would do.

I had exactly the same problems you're discovering. I think finally I was toying with dumping it into the parser with no modification, and then iterating through it node by node and creating a new document, adding paragraphs and newlines where appropriate. That required jumping into recursion, though, which didn't thrill me, and I had a lot of trouble with text nodes vs. elements.

Originally I was putting multiple
tags between inline elements, but I wonder if they look different to our paragraph linebreaks with our stylesheet? Maybe it would be better to add a style attribute to any block elements inside them, or maybe add a surrounding div.

*ponders*
If we're on Ruby 1.9 (I think that was the number) there's a new regex engine we could use that would allow lookaheads, thus allowing us not to have to read in entire blocks of tags, but it probably wouldn't be any faster. XD

Post a comment in response:

If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting