There is a package in you installatio called file-storage which probably does what you want (well, it does more actually but thats beside the point)..
I don't know of a parser for html, but the range of functions within OpenACS and TCL should give you what you need.