Skip to content Skip to sidebar Skip to footer

How To Parse Html From Javafx Webview And Transfer This Data To Jsoup Document?

I am trying to parse sidebar TOC(Table of Components) of some documentation site. Jsoup I have tried Jsoup. I can not get TOC elements because the HTML content in this tag is not p

Solution 1:

WebViewbrowser=newWebView();
    WebEnginewebEngine= browser.getEngine();
    Stringurl="https://docs.microsoft.com/en-us/ef/ef6/";
    webEngine.load(url);
    //get w3c document from webEngine
    org.w3c.dom.Documentw3cDocument= webEngine.getDocument();
    // use jsoup helper methods to convert it to stringStringhtml=neworg.jsoup.helper.W3CDom().asString(webEngine.get);
    // create jsoup document by parsing htmlDocumentdoc= Jsoup.parse(url, html);

Solution 2:

I can't promise this is the best way as I've not used Jsoup before and I'm not an expert on the XML API.

The org.jsoup.Jsoup class has a method for parsing HTML in String form: Jsoup.parse(String). This means we need to get the HTML from the WebView as a String. The WebEngine class has a document property that holds a org.w3c.dom.Document. This Document is the HTML content of the currently showing web page. We just need to convert this Document into a String, which we can do with a Transformer.

import java.io.StringWriter;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.jsoup.Jsoup;

publicclassUtils {

  privatestatic Transformer transformer;

  // not thread safepublicstatic org.jsoup.nodes.Document convert(org.w3c.dom.Document doc)throws TransformerException {
    if (transformer == null) {
      transformer = TransformerFactory.newDefaultInstance().newTransformer();
    }

    StringWriterwriter=newStringWriter();
    transformer.transform(newDOMSource(doc), newStreamResult(writer));
    return Jsoup.parse(writer.toString());
  }

}

You would call this every time the document property changes. I did some "tests" by browsing Google and printing the org.jsoup.nodes.Document to the console and everything seems to be working.

There is a caveat, though; as far as I understand it the document property does not change when there are changes within the same web page (the Document itself may be updated, however). I'm not a web person, so pardon me if I don't make sense here, but I believe that this includes things like a frame changing its content. There may be a way around this by interfacing with the JavaScript using WebEngine.executeStript(String), but I don't know how.

Post a Comment for "How To Parse Html From Javafx Webview And Transfer This Data To Jsoup Document?"