hoodwink.d enhanced
RSS
2.0
XHTML
1.0

RedHanded

Hpricot and Sandbox for Win32 #

by why in inspect

Mauricio checked in some Rakefiles for cross-compiling to win32, so I’ve got some win32 gems for Hpricot and the (FF)Sandbox. The majority of you can now:

 gem install hpricot --source code.whytheluckystiff.net
 gem install sandbox --source code.whytheluckystiff.net

THE SANDBOX ONE IS COMPILED FOR 1.8.4 SO IT HAS TOTALLY GOT HOLES. And yet, fun can still be had I’m sure.

As for cross-compiling, I’m using mingw on FreeBSD. Here’s how:

 cd /usr/ports/devel/mingw32-gcc
 sudo make install

 cd ~/sand
 wget http://ftp.ruby-lang.org/pub/ruby/binaries/mingw/1.8/ruby-1.8.4-i386-mingw32.tar.gz
 mv usr/local RUBY-1_8_4-MINGW32
 rm -rf usr

 cd ~/dev
 svn co https://code.whytheluckystiff.net/svn/sandbox/trunk sandbox
 cd sandbox
 export MINGW32_RUBY=/home/why/sand/RUBY-1_8_4-MINGW32
 export MINGW32_PREFIX=mingw32
 rake rubygems_win32

I mean that’s alot better than putting together a VM and trying to track down a decent free Microsoft compiler. A couple weeks ago, I spent six hours on it and made no progress.

said on

Yay! Hpricot totally rocks the Windows. Thanks for the speedy work on that!

said on
Is this what you mean by VM?

VM/370 ONLINE
            VV        VV    MM        MM
            VV        VV    MMM      MMM
            VV        VV    MMMM    MMMM
            VV        VV    MM MM  MM MM
     3333333333     777777777777MMMM  00000000
   333333333333    77777777777  MM  0000000000
    33      VV33    77VV    77      00MM      00
             V33     VV    77M      00MM      00
              33    VV    77MM      00MM      00
           3333VV  VV    77 MM      00MM      00
           3333 VVVV     77 MM      00MM      00
              33 VV      77 MM      00MM      00
              33         77         00        00
    33        33         77         00        00
    333333333333         77          0000000000
     3333333333          77           00000000
said on

I mean that’s alot better than putting together a VM and trying to track down a decent free Microsoft compiler. A couple weeks ago, I spent six hours on it and made no progress.

Is there something wrong with Microsoft’s own C++ compiler (on Windows, that is, of course)? The toolkit is free as in beer.

said on

These win32 binaries provided by the puny MinGW toolchain, which doesn’t stand comparison with the almighty MS VC2005 Express, but empowers non-win32 developers to build binaries for their win32 users.

More details about the cross-compilation magic in the Rakefile, and here .

said on

Asztal: First of all, it runs on win32 only :-) Second, and most importantly: it’s not binary-compatible with the VC6 ruby build (I read now that MinGW isn’t fully either, but it works for all but a few extensions).

said on

Does Hpricot not support [] in an Xpath? I know I can index the elements from the resultant ruby array following a search, but I have to believe it could be faster using an internal lookup to an internal C data structure via an Xpath string.

said on
OK, even though Hpricot isn’t an XML parser, it seems to handle RSS well enough, but feedburner’s atom feeds forces this exception…
irb(main):008:0> Hpricot(URI.parse("http://feeds.feedburner.com/vedana").read)
/usr/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/parse.rb:164:in `build_node': 
[bug] unknown structure: 
[:xmlprocins, "href=\"http://feeds.feedburner.com/~d/styles/itemcontent.css\" 
type=\"text/css\" media=\"screen", nil, nil] (Exception)
 from /usr/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/parse.rb:59:in `make' 
 from /usr/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/parse.rb:59:in `make'
 from /usr/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/parse.rb:11:in `parse'
 from /usr/lib/ruby/gems/1.8/gems/hpricot-0.3/lib/hpricot/parse.rb:4:in `Hpricot'
 from (irb):8:in `irb_binding'
 from /usr/lib/ruby/1.8/irb/workspace.rb:52:in `irb_binding' 
 from /usr/lib/ruby/1.8/irb/workspace.rb:52
said on

hoyhoy: Thankyou for that. Is fixed in SVN , the feed parses fine.

The thing is: I’m not keeping any internal C data structures here. It’s usually the parsing that really slows you down, not the data structures. It would be faster to do it all in C, but then it would be something that I couldn’t possibly finish.

said on

That makes supporting [] in XPath kind of moot. I’ve often considered making my own whizbang XML parser gem using flex/bison or spirit. Unfortunately, after having that kind of thought, I usually sit down and eat a sandwich and it goes away.

said on

Haha, oh yeah dreaming about scanners and parsers is really fun, it’s too bad the reality is so vomitously bleak.

said on

Well, maybe we should make that not so. :) (HAH)

said on

I am the proud owner of a 2006 Win32 Sandbox, Unsafe Edition, with leather seats.

said on
More XML weirdness:

irb(main):035:0> Hpricot(URI.parse("http://rss.slashdot.org/Slashdot/slashdot").read).search("item")[1].to_html.each_line { |l| puts l if l.match(/dc:date/) }; nil<dc:date>2006-07-22T19:29:00+00:00</dc:date>=> nilirb(main):036:0> puts Hpricot(URI.parse("http://rss.slashdot.org/Slashdot/slashdot").read).search("dc:date")[1]nil=> nil
said on
Terminal.app et me newlines.
irb(main):035:0> Hpricot(URI.parse("http://rss.slashdot.org/Slashdot/slashdot").read).search("item")[1].to_html.each_line
 { |l| puts l if l.match(/dc:date/) }; nil
<dc:date>2006-07-22T19:29:00+00:00</dc:date>
=> nil
irb(main):036:0> puts Hpricot(URI.parse("http://rss.slashdot.org/Slashdot/slashdot").read).search("dc:date")[1]
nil
=> nil
said on
I added a colon to this match on line 102, and then search will find tags with colons.

m = expr.match %r!^([#.]?)([a-z0-9\\*_:-]*)!i
said on

Line 102 in traverse.rb, but you knew that.

said on

... it’s not binary-compatible with the VC6 ruby build…

Dear God, why do people still use VC6 ?

said on

Capn: Because it’s become a de facto standard, and tends to have good windows compatibility.

said on

hoyhoy: You have a sneaky smiley man in your regexp! :-]

said on

this is great! but i have a little problem: is v0.3 missing something? Container::Trav.filter calls to_node.subst_subnode on line 402, but that doesn’t seem to exist?

said on

FlashHater: However, VC7 has the same perfect level of windows compatibility AND is one of the most standards compliant compilers on the market. Essentially, with the availability of VC7 , there is no reason to use VC6 .

said on

So am I using hpricot wrong or is this a bug?

doc = Hpricot(open(“http://usgenweb.org/”)) puts (doc/:a).length # => 15 but it should be 66

also the follow errors out

doc.to_s

said on

Hpricot doesn’t support this:

doc.search("//table/tr/td[3]")

... does it?

I’m trying to fetch the 3rd td in each tr.

said on

bearik, this should work

doc.search("//table/tr/td")[3]
said on
bearik, I missed the word “each” in your post, so my previous recipe would not work. I think the only way is to use nested searches. Something, on the lines of:
doc.search("//table/tr).each do {|x|
  x.search(/td/)[2]
}
said on
 doc.search("table tr td:nth(3)")

There’s still some unimplemented XPath. Refer to the supported CSS selectors for other ideas.

said on DD Mon YYYY at HH:MM

* do fancy stuff in your comment.

PREVIEW PANE