Forum OpenACS Development: Pasting HTML into Xinha

Collapse
Posted by Janine Ohmer on
I'm trying to transfer text from an existing, very badly coded page into an xowiki page. Unfortunately, copying and pasting the text brings much of the unwanted HTML formatting code with it.

I've looked in the xinha forums and have found several plugins that purport to fix this; I tried one of them but could not get it to work. It's a bit clunky anyway; it uses a popup window to intercept the paste and strip the unwanted HTML out of it.

There is a low-tech workaround for this, which is to do the paste into a text editor and re-copy from there, but that's just too kludgy to settle for unless I have to.

Has anyone solved this problem? This is in Firefox, which probably matters since these things are so browser dependent.

Collapse
Posted by Dave Bauer on
Seems like you need to run HTML tidy on it.

http://tidy.sourceforge.net/

Google seems to think there is a plugin for that.

http://www.google.com/search?q=xinha+html+tidy

Did you try it? I don't have any experience with it.

Collapse
Posted by Malte Sussdorff on
Do we have an HTML Tidy in OpenACS? Meaning that we could run set content [tidy::html -content [lindex $content_body 0]]. Or does something like this exist in TCL(lib) or someplace else?

As much as I like XinHA, cleaning up HTML is something we should enforce on the server side as users are usually way to lazy to press the Tidy button if they are copy / pasting from Word et.al.

Collapse
Posted by Janine Ohmer on
I agree with you in principle, Malte, but in this case I want the filter to remove all of the incoming HTML and just leave the text, and that would be a bit drastic for a server-side filter.

It turns out that HTMLTidy has been superceded by SuperClean (http://xinha.python-hosting.com/wiki/SuperClean). I've tried it and it sort of works, but trying to use the Tidy option removes everything including the text, a tiny bit *too* tidy. :) I think this is because it wants to call Tidy via PHP, which is not going to work from xowiki.

The plugin I tried to implement, but could not get working, is here: http://xinha.python-hosting.com/ticket/349

The idea is to trap the paste command and instead of having the user paste directly into the text box, have them paste into a popup box instead. The code behind the box will strip all HTML out of the pasted text before passing it on to the real paste handler. Although I dislike popups I was willing to give it a try, but I could not get the popup to happen. No errors of any kind, it just didn't work.

Any other thoughts/suggestions?

Collapse
Posted by Hamilton Chua on
I looked deeper into the htmlarea.js file from xinha and I noticed that it supports cleaning word html.

In fact, there is a button with the icon of a word document with a red circle/slash that users can press to clean up the content in the textarea of any word cruft.

The javascript function (HTMLArea.prototype._wordClean ) that does this can be found in htmlarea.js and is called whenever that button with the word icon is clicked.

Also, I've found that you can automatically execute this function as soon as content is pasted by setting this.htmlareaPaste and this.killWordOnPaste to true in the htmlarea.js file.

However, htmlareaPaste works only in IE so for users that use mozilla based browsers, they will still need to click the clean word icon after pasting.

Collapse
Posted by Malte Sussdorff on
I think we should make the patch Hamilton proposes to be the default one in OpenACS. Or are we really concerned about pasted Word HTML "crap" ?

I assume from the way you have been describing it that you cannot make this a acs-parameter, can you?

Collapse
Posted by Janine Ohmer on
I found a solution to our immediate need. There is a Firefox extension called "Copy Plain Text" (https://addons.mozilla.org/firefox/134/) which cleans the text when you copy it from Firefox (provided you use the Copy as Plain Text item it adds to the right-click menu). So far it works as advertised.

This isn't a solution to the longer term problem of users pasting in junky HTML, but that one is going to be harder to solve.