Rich text (development): Difference between revisions

From FreeMind
Jump to navigationJump to search
No edit summary
 
Line 1: Line 1:
=== Using HTML versus XHTML in rich text nodes ===
== Using HTML versus XHTML in rich text nodes ==


What follows is a preliminary analysis of the issues concerning HTML versus XHTML used for rich text in FreeMind.
What follows is a preliminary analysis of the issues concerning HTML versus XHTML used for rich text in FreeMind.
Line 13: Line 13:
If you find the other options more attractive, an important question is: is it really possible to process XHTML from within an XML attribute? Can someone demonstrate that? If yes, that would make XHTML rather attractive. But if not really, then using HTML would be as good as using XHTML.
If you find the other options more attractive, an important question is: is it really possible to process XHTML from within an XML attribute? Can someone demonstrate that? If yes, that would make XHTML rather attractive. But if not really, then using HTML would be as good as using XHTML.


==== Converting HTML from node attribute TEXT to plain text within XSLT ====
=== Converting HTML from node attribute TEXT to plain text within XSLT ===


Converting HTML from node attribute TEXT to plain text within XSLT is virtually impossible. XSLT does not feature regular replaces, and it does not even feature simple text replaces (simple text replaces can be sort of [http://aspn.activestate.com/ASPN/Cookbook/XSLT/Recipe/65426 implemented within XSLT] though). My old view on this option follows.
Converting HTML from node attribute TEXT to plain text within XSLT is virtually impossible. XSLT does not feature regular replaces, and it does not even feature simple text replaces (simple text replaces can be sort of [http://aspn.activestate.com/ASPN/Cookbook/XSLT/Recipe/65426 implemented within XSLT] though). My old view on this option follows.
Line 19: Line 19:
: I estimate that the most straightforward way of solving the related problems is to find out how to convert HTML into plain text within XSLT script. I currently do not have a sufficient knowledge of XSLT to judge on that; it should be possible using several regular expression replacements, modelled on what we already have in FreeMind in Java. As soon as we would be able to do that, there might even be a regular expression way to covert boldface and italics to open document format of OpenOffice. Admittedly, instead of having one conversion routine from HTML to plain text in FreeMind, it would have to be replicated in every XSLT script dealing with FreeMind mind maps.
: I estimate that the most straightforward way of solving the related problems is to find out how to convert HTML into plain text within XSLT script. I currently do not have a sufficient knowledge of XSLT to judge on that; it should be possible using several regular expression replacements, modelled on what we already have in FreeMind in Java. As soon as we would be able to do that, there might even be a regular expression way to covert boldface and italics to open document format of OpenOffice. Admittedly, instead of having one conversion routine from HTML to plain text in FreeMind, it would have to be replicated in every XSLT script dealing with FreeMind mind maps.


==== Storing XHTML on par with FreeMind XML ====
=== Storing XHTML on par with FreeMind XML ===


The option of storing XHTML elements on par with FreeMind XML elements like <node> would require a considerable effort. The benefits of the effort would include
The option of storing XHTML elements on par with FreeMind XML elements like <node> would require a considerable effort. The benefits of the effort would include
Line 58: Line 58:
# Storing plain text as <code>< node >< content >Plain</ text ></ node ></code>
# Storing plain text as <code>< node >< content >Plain</ text ></ node ></code>


==== Converting HTML to XHTML in Java ====
=== Converting HTML to XHTML in Java ===


I find a conversion of HTML to XHTML possible with reasonable effort. I think there are not many more subtleties apart from those already addressed: closed tags < td >, < tr > and the like, closed tags < /br >, < /hr >, < img/ > and several others. We would be able to discover all the subtleties by reading XHTML standard and by empirical testing. (For reference, [http://www.shredzone.net/articles/java/html2xhtml/ html2xhtml at shredzone.net], thanks to Dimitri)
I find a conversion of HTML to XHTML possible with reasonable effort. I think there are not many more subtleties apart from those already addressed: closed tags < td >, < tr > and the like, closed tags < /br >, < /hr >, < img/ > and several others. We would be able to discover all the subtleties by reading XHTML standard and by empirical testing. (For reference, [http://www.shredzone.net/articles/java/html2xhtml/ html2xhtml at shredzone.net], thanks to Dimitri)
Line 68: Line 68:
See also [http://www.javaworld.com/javaworld/jw-04-2006/jw-0410-html.html Convert HTML content to PDF format at JavaWorld] using [http://sourceforge.net/projects/jtidy/ JTidy].
See also [http://www.javaworld.com/javaworld/jw-04-2006/jw-0410-html.html Convert HTML content to PDF format at JavaWorld] using [http://sourceforge.net/projects/jtidy/ JTidy].


==== Converting XHTML to HTML in Java ====
=== Converting XHTML to HTML in Java ===


Searching the web using the expressions
Searching the web using the expressions
Line 98: Line 98:
# [http://membres.lycos.fr/cvincent/xml/api/fr.bizolin.xml.XHTML2HTML.html XHTML2HTML in Java], albeit needing some Perl package which is difficult to understand. As old as of year 2000.
# [http://membres.lycos.fr/cvincent/xml/api/fr.bizolin.xml.XHTML2HTML.html XHTML2HTML in Java], albeit needing some Perl package which is difficult to understand. As old as of year 2000.


==== Requirements on file format of FreeMind ====
=== Requirements on file format of FreeMind ===


We have identified the following requirements on the format of FreeMind. These have different priority; people may disagree about what the priority of their requirements are.
We have identified the following requirements on the format of FreeMind. These have different priority; people may disagree about what the priority of their requirements are.
Line 117: Line 117:
For [[User:Danielpolansky|Danielpolansky]], the requirement on staying reasonably backwards compatible is important; the requirement on the solution to be fast too; the requirement on XSLT transformations is completely unimportant in view of the possibility of replacing XSLT transformations with small Java functions doing the same with less footprint; the requirement on keeping the format simple is important; the requirement on making it easy to create mind maps programatically is important.
For [[User:Danielpolansky|Danielpolansky]], the requirement on staying reasonably backwards compatible is important; the requirement on the solution to be fast too; the requirement on XSLT transformations is completely unimportant in view of the possibility of replacing XSLT transformations with small Java functions doing the same with less footprint; the requirement on keeping the format simple is important; the requirement on making it easy to create mind maps programatically is important.


==== Sources of HTML coming to mind maps ====
=== Sources of HTML coming to mind maps ===


Rich text in the form of HTML will be coming into FreeMind from the following sources:
Rich text in the form of HTML will be coming into FreeMind from the following sources:
Line 127: Line 127:
To my experience, the pasting is much more usual and of higher volume than direct editing.
To my experience, the pasting is much more usual and of higher volume than direct editing.


==== See also ====
=== See also ===


* [http://en.wikipedia.org/wiki/Xhtml XHTML at Wikipedia]
* [http://en.wikipedia.org/wiki/Xhtml XHTML at Wikipedia]

Revision as of 07:46, 10 June 2007

Using HTML versus XHTML in rich text nodes

What follows is a preliminary analysis of the issues concerning HTML versus XHTML used for rich text in FreeMind.

AFAIK there are two separate questions: (1) should we store (a) HTML or (b) XHTML in nodes, and (2) should we (a) store only one thing or (b) store plain text in one attribute, and store HTML/XHTML in another attribute where HTML is available, modelling on email systems.

As for the first question, using HTML has the advantage of being straightforward: it is supported by JLabel, it is supported by Java HTML editing component, and it is the format now mostly used in web pages. An advantage of XHTML is that it is a flavor of XML and thus easily amenable to XSLT processing.

As for the second question, as soon as we would store also plain text, it would be automatically available to all XSLT processing, which would make the first question less decisive. However, it would considerably increase the size of mind maps stored on the disk, by my estimation by factor 1.5 as soon as a lot of rich text would be used.

Performing transformations of HTML to XHTML on the fly before performing preprocessing from FreeMind would not really save the day, as XHTML would still need to be stored in mind maps on the disk; if we would use HTML internally, we would have to convert XHTML to HTML upon loading a new map for all nodes, instead of doing that only upon nodes being shown for the purpose of JLabel.

If you find the other options more attractive, an important question is: is it really possible to process XHTML from within an XML attribute? Can someone demonstrate that? If yes, that would make XHTML rather attractive. But if not really, then using HTML would be as good as using XHTML.

Converting HTML from node attribute TEXT to plain text within XSLT

Converting HTML from node attribute TEXT to plain text within XSLT is virtually impossible. XSLT does not feature regular replaces, and it does not even feature simple text replaces (simple text replaces can be sort of implemented within XSLT though). My old view on this option follows.

I estimate that the most straightforward way of solving the related problems is to find out how to convert HTML into plain text within XSLT script. I currently do not have a sufficient knowledge of XSLT to judge on that; it should be possible using several regular expression replacements, modelled on what we already have in FreeMind in Java. As soon as we would be able to do that, there might even be a regular expression way to covert boldface and italics to open document format of OpenOffice. Admittedly, instead of having one conversion routine from HTML to plain text in FreeMind, it would have to be replicated in every XSLT script dealing with FreeMind mind maps.

Storing XHTML on par with FreeMind XML

The option of storing XHTML elements on par with FreeMind XML elements like <node> would require a considerable effort. The benefits of the effort would include

  • + better support for XSLT transformations
  • + more readable XML of FreeMind mind maps

The costs would include

  • - switching away from NanoXML/Lite to a more bloated technology for reading and saving of mind maps, meaning considerable slowing down upon loading and saving of mind maps. (That is not true; I was wrong. We would only have to adjust NanoXML so that it stops parsing XML within certain elements and reads everything within them as uninterpreted string instead. --Danielpolansky 06:34, 13 May 2006 (PDT))

By storing on par, I mean the following.

  <map>
    <node>
      <html>
        <body>Hello </br> Dolly.</body>
      </html>
    </node>
  </map>

I recommend to avoid this option. Out current option is

  <map>
    <node TEXT="& lt ; HTML & gt ; Hello & lt ; br & gt ; Dolly."/>
  </map>

A discussion shows the following possibilities of storing on par.

  1. Storing HTML directly like < node >< html ></ html ></ node >
  2. Storing HTML within content element like < node >< content >< html >< /html ></ content ></ node > and
  3. Storing plain text within TEXT attribute of node element
  4. Storing plain text as < node >< text >Plain</ text ></ node >
  5. Storing plain text as < node >< content >Plain</ text ></ node >

Converting HTML to XHTML in Java

I find a conversion of HTML to XHTML possible with reasonable effort. I think there are not many more subtleties apart from those already addressed: closed tags < td >, < tr > and the like, closed tags < /br >, < /hr >, < img/ > and several others. We would be able to discover all the subtleties by reading XHTML standard and by empirical testing. (For reference, html2xhtml at shredzone.net, thanks to Dimitri)

However, W3C points out that converting HTML to XHTML is a tricky business.

The main problem of developing your own converter is that either you are sure your HTML is correct (and so you only need to fix cases, quotes in attributes, entitities and close the few HTML empty tags) or you will go crazy trying to cope with all the possible errors that the "official" web browsers accept but that would kill any simple parser.

See also Convert HTML content to PDF format at JavaWorld using JTidy.

Converting XHTML to HTML in Java

Searching the web using the expressions

  • "XHTML to HTML" java
  • XHTML2HTML java GPL
  • XHTML2HTML java

I have found very little about already existing code for converting XHTML to HTML in Java, with GNU GPL licenced code. Thus, my recommendation is to create a new method for that, in the class Tools. The method would be created with the use of [1] as a checklist. The method would use regular expression replaces, unless we see that this is too slow, which I do not think will be the case. (We already use regular expression replaces in method update() of NodeView.)

A preliminary code is as follows.

   public String xhtmlToHtml(String xhtmlText) {
      //Remove '/' from <.../> of elements that do not have '/' there in HTML
      return xhtmlText.
         replaceAll("<(("+
                    "br|area|base|basefont|"+
                    "bgsound|button|col|colgroup|embed|hr"+
                    "|img|input|isindex|keygen|link|meta"+
                    "|object|plaintext|spacer|wbr"+
                    ")(\\s[^>]*)?)/>",
                    "<$1>"));

See also

  1. Discussion on converting XHTML to HTML using XSLT (can be checked against when developing conversion in Java)
  2. XHTML2HTML in Java, albeit needing some Perl package which is difficult to understand. As old as of year 2000.

Requirements on file format of FreeMind

We have identified the following requirements on the format of FreeMind. These have different priority; people may disagree about what the priority of their requirements are.

  1. guarantee file format integrity, i.e. XML conformance.
  2. be flexible for new features like (x)html, svg, mathml,...
  3. stay reasonably backwards compatible, so that all existing generators of FreeMind mind maps work with later versions of FreeMind too. Put differently, create new format only as an extension of the old format, that is by adding new elements and attributes only
  4. new format should allow for import of all old format features (from stable versions).
  5. limit redundancy of information.
  6. keep it easy to do XSLT transformations.
  7. keep the format as simple as possible.
  8. make it easy to create and edit mm files manually in an editor like Vim, Emacs or Notepad.
  9. make it easy to create mm files programmatically.
  10. the solution should be fast.
  11. the solution should be safe.
  12. the file format of both notes and nodes should be XHTML (a further specification of the first requirement)

For Danielpolansky, the requirement on staying reasonably backwards compatible is important; the requirement on the solution to be fast too; the requirement on XSLT transformations is completely unimportant in view of the possibility of replacing XSLT transformations with small Java functions doing the same with less footprint; the requirement on keeping the format simple is important; the requirement on making it easy to create mind maps programatically is important.

Sources of HTML coming to mind maps

Rich text in the form of HTML will be coming into FreeMind from the following sources:

  • directly entered in FreeMind using WYSIWYG editor
  • pasted from web pages
  • pasted from Microsoft Word documents, and other applications exposing HTML to the clipboard

To my experience, the pasting is much more usual and of higher volume than direct editing.

See also

--Danielpolansky 03:29, 29 Apr 2006 (PDT)