hoodwink.d enhanced
RSS
2.0
XHTML
1.0

RedHanded

Christoffer's Hpricot Goodies #

by why in inspect

So, in what ways have you guys extended Hpricot? I really enjoy this collection of accessories to Hpricot by Christoffer Sawicki, who also wrote the Hpricot-based HTML-to-feed library called Feedalizer.

He has one script that does gsub! on all text nodes in the document. Another script is for generating tables of contents from the headers on an HTML page. I imagine that would go great with Markdown and Textile. (See also: del.icio.us/tag/hpricot.)

said on

Yeah thanks goes to us for letting him share it :))

But they are nice little trinkets of code, they are. Props to Qerub, aka Christoffer!

said on

I have an HTML Scrubber based on Hpricot, but I’m currently working on redoing it so that scrub is part of Hpricot instead of being a separate class.

Thanks for making it so easy.

said on

Does Hpricot supports opensearch types of XML ? I am having a difficult time to parse the following -

title

Also how do I deal with

  • http://link.com

    the

  • above is valid xml but I can’t seems to figure it out..

    Any help..

  • said on

    the tags didn’t show up.. lets try again.. Nope seems like I can’t provide an example here.. anyway opensearch tags ”:” in their tag name like “opensearch:title”

    also how to deal with single xml tags like “br /” or “link /”

    Thanks for your help..

    said on

    Nice one Qerub, Hpricot Goodies is really useful. Thanks! ;)

    said on

    Heh. Thanks for the publicity, but more important: thanks again for Hpricot!

    Yes, HTML Outliner is being used to generate table of contents for articles. I should probably bundle some code that takes the HTMLOutliner#outline tree and returns a multidimensional <ul> that is ready to be used.

    said on

    UnderpantsGnome: That would be a great addition to the main lib. I actually really like the strip methods you’ve made. What other plans do you have?

    Andrew: Send me some XML . Hpricot doesn’t have problems parsing namespaces, however its xpath syntax doesn’t support namespaces since its a hybrid of CSS and XPath.

    said on

    why: I was/am basically just moving the block from HtmlScrubber, less the config into my Hpricot additions. Mostly becasue then I could call it Hpricot::Scrub and that made me laugh.

    Then you could do:

    doc = Hpricot(open('http://slashdot.org/').read)
    doc.scrub(config_hash)
    

    I haven’t had any new needs for this, though I was considering making the config more like perl’s HTML ::Scrubber where you can specify global attributes to allow/deny but also specify attributes to allow/deny at the tag level.

    Other than that I’m open to suggestions.

    Thanks again for making this so easy to accomplish, I so didn’t want to rewrite HTML ::Scrubber from scratch.

    said on

    why- I was playing around with Hpricot Scrub and it seems to have gotten unhappy since 0.4.86 (last working) I also have an image sneaking through that I don’t think should be. I have the current changes with a “test” that shows the failure on recent gems and the stray image on <= 0.4.86.

    You can grab it here if you’d like to take a look hpricot_scrub.zip

    As usual feedback welcome.

    said on

    Well, it looks like scrub’s use of traverse_all_element is the source of the problem. If you remove stuff while it’s traversing, things end up getting skipped.

     >> a = [:cadillac, :driver, :teacup]
     >> a.each { |x| a.delete(x) }
     => [:driver]
    

    You know what I’m saying?

    said on

    Oh, duh… this works better, unless you see a potential issue with this that I’m missing.

    
    children.reverse.each {|e|
      e.strip unless e.class == Hpricot::Text ||
        config[:allow_tags].include?(e.name)
    }
    
    said on DD Mon YYYY at HH:MM

    * do fancy stuff in your comment.

    PREVIEW PANE