elz: (archive of our own)
elz ([personal profile] elz) wrote 2010-04-29 01:47 pm (UTC)

What Zooey said! The Regex parser had that one problem with timing out/giving "regular expression too large" errors when the input was very large/complex, but it worked well for the majority of cases. When Ronan came on board, he said he knew regular expressions very well, so I asked him to look at it, and he wound up trying something different. Which, in turn, also had a lot to recommend it, but it had a few bugs of its own, which were hard for anyone else to fix because of its complexity.

So basically, what I'm trying to do now is see what we can offload to third-party tools that we wouldn't have to maintain. Nokogiri makes the actual parsing a lot easier, but it doesn't add linebreaks for us, which is the difficult part. If I just use gsub to turn the newlines into p and br tags and then hand it off to Nokogiri to make sense of, that produces mostly workable results, but it also adds things we don't want (br tags in invalid places) and strips out things we do want (people using inline tags to cover multiple paragraphs -- not that we want that, but I'm guessing they'd be pissed if it suddenly stopped working!). If I parse it first and then try to add the paragraphs, I wind up with text soup, at least so far. I'm going to give that a few more tries, though.

Post a comment in response:

If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting