hoodwink.d enhanced
RSS
2.0
XHTML
1.0

RedHanded

Friday

2005.08.26

Nearing Greasemonkey for Browser-of-Choice with the Hoodlum Proxy! #

by why in inspect

Here it is. Day five of the Hoodwink’d docu-drama. Can you believe this? Here, look. We are left with a smoldering meteor from an alien planet in our hands:

hoodlum.rb

No more greasemonkey.

No more HOSTS entry.

Run hoodlum.rb.

Firefox > Preferences > General > Connection Settings… > Manual Proxy > 127.0.0.1 + 37004

Not yet for Safari and IE.

There’s so much to say about this proxy. I’ll start by saying that I barely wrote any of the code. MenTaLguY wrote the good stuff. And Goto Kentaro wrote the really good stuff because he wrote WEBrick::HTTPProxy, which we’re just clawing inside and operating from within.

The overhead view is this:

  1. The prewink method rewrites the request. We serve cached stuff from within the proxy (so the JS file doesn’t get retrieved everytime you go anywhere) and mimick HOSTS.
  2. The upwink method injects a Javascript into the page.
  3. The fixup_script method replaces GM_API methods with HoOdLuM_API methods. This is where browser compatibility stuff will go. We could end up implementing all of Firefox inside IE and Safari, though. So this is still up in the air.

I still want to see what MrCode’s got. Wonderland could be a lot cleaner and could use Ruby alot more effectively. For now, Hoodlum sits on the Javascript side.

No, XPath on Messy HTML is Just as Easy in Ruby #

by why in inspect

You think XPath is easier in Javascript than in Ruby when it comes to invalid HTML? I’ve heard this from a lot of correspondence over the past week. Because Javascript has the DOM, right?

Use HTree+REXML. HTree cleans and REXML peppers and gobbles. Here’s a hairy, little method that will save some pain:

 require 'htree'
 require 'rexml/document'
 require 'open-uri'

 def read_xhtml_from( uri )
   open( uri ) { |f| HTree.parse f }.each_child do |child|
     if child.respond_to? :qualified_name
       doc = ""; child.display_xml( doc )
       if child.qualified_name == 'html'
         return REXML::Document.new( doc ) 
       end
     end
   end
 end 

Okay, so. How to use it? That nice REXML way you’re already used to.

 html = read_xhtml_from "http://redhanded.hobix.com/" 
 html.each_element( "//div[@class='entryFooter']" ) do |e|
   puts e.text( "./a[starts-with(@href, 'http://redhanded.hobix.com/')]" )
 end

Hoodlums IV: No Sleep Till Borken #

by why in inspect

hoodlum.rb

Not quite there yet.

Proxies are suddenly a riot.