For an R&D project I needed something that would help in scraping our existing web pages, specifically pages with forms. I wanted it to be preferably a java library to easily plug into ColdFusion. After trying a few (jTidy, Cobra and two HtmlParsers) I stumbled on Jericho. http://sourceforge.net/projects/jerichohtml/. It did all I needed to do and then some. It did great parsing HTML and getting the values I was after. Projects java docs helped a lot.
There is only one .jar file (as of this writing jericho-html-2.6.jar). So put it somewhere in the CLASSPATH. To check if you put it in the right place in ColdFusion 8 administrator look in Settings Summary and see if it's listed under Java Class Path.
I didn't go as far as to write full on wrapper for it. I was after form fields so here is the code that get's what I need.
Here is the code for parseFormValues() where you can see some of the API Jericho provides in action. As I looked through this, I noticed where I collect lists values with listAppend() if the values have commas it would create a problem.
So keep it in mind if you plan to use it.
I am sure there are some other improvements that can be made since it's a first pass at this.
Note on "this" scope usage. The component this code is in, extends BaseComponent (thank you Hal Helms)
with generic (can you say "lazy" :-) ) set and get implemented with onMissingMethod. You have to use "this" for it to work inside the component.
I did find accidentally later in the project that using this (ooh cool pun) technique is slower then actually creating a setter and a getter, which kind of makes sense.
Thursday, February 19, 2009
Wednesday, February 18, 2009
Railo 3 on Ubuntu 8.10
I wanted to try Railo so I installed it on Ubuntu laptop. Thanks Mark Mandel for a short blog over here http://www.compoundtheory.com/?action=displayPost&ID=393.
I didn't know what to put for mail server in Railo admin, and it kept asking for it for cfmail tag when trying to load a site. Ended up installing postfix from instructions here http://my.opera.com/Contrid/blog/show.dml/478684.
The menu for settings is a little strange, old school, and I mean old school looking. I chose "local setup" in the first menu. Use arrows to move around selection choices and Space to select. It worked in the end and "localhost" was recognized as a mail server in Railo.
I didn't know what to put for mail server in Railo admin, and it kept asking for it for cfmail tag when trying to load a site. Ended up installing postfix from instructions here http://my.opera.com/Contrid/blog/show.dml/478684.
The menu for settings is a little strange, old school, and I mean old school looking. I chose "local setup" in the first menu. Use arrows to move around selection choices and Space to select. It worked in the end and "localhost" was recognized as a mail server in Railo.
Monday, February 16, 2009
Ubuntu coldfusion 8 java classpath
Quick note on Ubuntu. Once again setting up Ubuntu 8.10 with coldfusion, hard drive failed on last install. Needed to add additional java classpath location. Edited /opt/jrun4/bin/jvm.config under #JVM classpath section.
Subscribe to:
Posts (Atom)