Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Faster XML Append Method? 1

Status
Not open for further replies.

userMikeD

Programmer
Nov 5, 2008
28
US
Hi,

I need to append some XML to a vary large XML doc (6.7mb), and it's taking a really long time. For one or two records, it wouldn't be a problem, but I have to do this on hundreds of thousands of records.

Here are the three different ways I've tried so far:

dbms_xmldom:
Code:
FUNCTION insert_node(p_xml1 XMLTYPE, p_xml2 XMLTYPE) RETURN XMLTYPE AS
	v_doc1  dbms_xmldom.domdocument;
	v_doc2  dbms_xmldom.domdocument;
	v_node1 dbms_xmldom.domnode;
	v_node2 dbms_xmldom.domnode;
BEGIN
	v_doc1  := dbms_xmldom.newdomdocument(p_xml1);
	v_doc2  := dbms_xmldom.newdomdocument(p_xml2);
	v_node1 := dbms_xmldom.makenode(dbms_xmldom.getdocumentelement(v_doc1));
	v_node2 := dbms_xmldom.makenode(dbms_xmldom.getdocumentelement(v_doc2));
	v_node2 := dbms_xmldom.importnode(v_doc1, v_node2, true);
	v_node1 := dbms_xmldom.appendchild(v_node1, v_node2);
	RETURN dbms_xmldom.getxmltype(v_doc1);
END insert_node;
~5.5 secs.

Clob Concatenation:
Code:
   FUNCTION insert_node(p_xml1 XMLTYPE, p_xml2 XMLTYPE) RETURN XMLTYPE AS
      v_clob1     CLOB := p_xml1.getClobVal();
      v_clob2     CLOB := p_xml2.extract('/*').getClobVal();
      v_dest      CLOB := EMPTY_CLOB;
      v_close     VARCHAR2(100) := '</'||p_xml1.getRootElement()||'>';
      v_len1      NUMBER;
      v_len2      NUMBER;
      v_close_idx NUMBER;
   BEGIN
      dbms_lob.createtemporary(v_dest, TRUE);
      v_len1 := dbms_lob.getlength(v_clob1);
      v_len2 := dbms_lob.getlength(v_clob2);
      v_close_idx := v_len1 - LENGTH(v_close);      
      dbms_lob.copy(v_dest, v_clob1, v_close_idx, 1, 1);
      dbms_lob.writeappend(v_dest, v_len2, v_clob2);
      dbms_lob.writeappend(v_dest, LENGTH(v_close), v_close);
      RETURN XMLTYPE(v_dest);
   END insert_node;
~7.1 secs.

Java Stringbuffer Concatenation:
Note: This is not perfect and was just a POC.
Code:
CREATE OR REPLACE AND COMPILE JAVA SOURCE NAMED StringUtils AS
import oracle.sql.CLOB;
import java.io.Reader;
import java.io.Writer;
import java.io.PrintWriter;
import java.lang.Exception;
public class StringUtils {
   public static void concat(oracle.sql.CLOB clob1, oracle.sql.CLOB clob2, oracle.sql.CLOB output) throws Exception
   {
      try
      {
        Writer w = output.getCharacterOutputStream();
        PrintWriter pw = new PrintWriter(w);
        StringBuffer sb = new StringBuffer();
        Reader clobStream;
        int nchars = 0;
        char[] buffer = new char[100000];
        clobStream = clob1.getCharacterStream();
        while((nchars = clobStream.read(buffer)) != -1)
            sb.append(buffer, 0, nchars);
        clobStream.close();
        clobStream = clob2.getCharacterStream();
        while((nchars = clobStream.read(buffer)) != -1)
            sb.append(buffer, 0, nchars);
        clobStream.close();
        pw.write(sb.toString().toCharArray());
        pw.close();
      }
      catch (Exception ex)
      {
        throw ex;
      }
   }  
}

CREATE OR REPLACE PROCEDURE java_concat(p_clob1 CLOB, p_clob2 CLOB, output CLOB) IS 
LANGUAGE JAVA NAME 'StringUtils.concat(oracle.sql.CLOB, oracle.sql.CLOB, oracle.sql.CLOB)';
~6.6 secs.

None of these are fast, but if I could shave even half a second off, it would make a huge difference. Does anyone have an idea for a faster way to append xml to a really large XML Doc?

Thanks,
Mike
 

Have you tried NOCOPY?
Code:
FUNCTION insert_node(p_xml1 NOCOPY XMLTYPE
                    ,p_xml2 NOCOPY XMLTYPE)
RETURN XMLTYPE AS
-- Etc...
[3eyes]



----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
 
That doesn't compile. I thought NOCOPY was for IN/OUT variables anyway?
 

The main idea is to pass the parameters by reference and not by value.

Try the corrected syntax:
Code:
FUNCTION insert_node(p_xml1 [red]IN[/red] NOCOPY XMLTYPE
                    ,p_xml2 [red]IN[/red] NOCOPY XMLTYPE)
RETURN XMLTYPE AS
-- Etc...
[noevil]


----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
 
Thanks for the follow up, but that still doesn't compile because NOCOPY is only for IN/OUT or OUT parameters. IN parameters are already passed by reference.
I changed the insert_node function to a procedure and declared p_xml1 as IN OUT NOCOPY, but that was actually slower than as a function. So I don't think NOCOPY will do the trick.

Thanks again for the reply.
 
This is the built-in xmltype member: appendchildxml() which I expect not only more versatile for the purpose in insert and I would expect accomplish with much improved speed.
[tt]
FUNCTION insert_node(p_xml1 XMLTYPE, p_xml2 XMLTYPE) RETURN XMLTYPE AS
BEGIN
RETURN [blue]appendchildxml(p_xml1,'/*',p_xml2);[/blue]
END insert_node;
[/tt]
 
Thanks for the reply, tsuji. That method completed in ~4.7 secs. So it's almost a full second faster! Thanks so much for the suggestion!
 
Thanks! I am a bit surprised the improvement is not more. Could you try a slight alternative, which I know as far as the xpath locating operation would result in the same result if p_xml1 is the xml node instead of a fragment (which would be refuted before reaching the xpath locating.)
[tt]
RETURN appendchildxml(p_xml1,'/*[blue][1][/blue]',p_xml2);
[/tt]
Check it see if it improves more for a materially large doc you're working on.
 
Thanks for the alternative xpath. After 100 runs, the average for /* was 4.149, and the average for /*[1] was 4.148. The first run on each xpath was ~4.7, and the rest were ~4.15. So actually, either one averages out to more than a second faster than the previous methods, which is awesome!

Since they basically take the same amount of time, is one xpath easier on resources than the other?

Again, thanks so much for the suggestion! This will save more than 24 hours of processing with the volume I'm working with!

Cheers,
Mike
 
Thanks again, Mike, for the feedback. That is quite good case study.

The two xpath's effectively won't make much difference. That's what I would anticipate. However, if the (local-)node name is deterministic, you can replace the wildcard (*) by the name, that can be just another slight improvement only. But, if it come dynamic and varying, that won't do without hardcoding.

Hence, the last significant chunk of performance improvement should come from making the logic inline rather than calling the udf insert_node() like that, as now the function is a one-liner built-in method. Inline logic becomes possible. That would make another significant gain in performance.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top