Replies: 6 comments 5 replies
-
|
For the implementation, I've thought of two approaches. Broadly:
Approach 1The first approach is the one I used to create the POC. I created a Maven project and used the JAXB Maven Plugin from JAXB Tools to generate docx4j's classes from the XSDs. I added several XJC plugins to the process: the Copyable plugin to implement a But I only did enough to get it working at runtime for two particular test cases. Rather than compile these classes as part of docx4j, I built a separate JAR with them and used it in place of the original docx4j-openxml-objects* JARs when building our application. I think there would be a lot of work left to bring the generated classes into line with the existing classes, due to the manual changes they've accumulated. The benefit of this approach is that it would become possible to use XJC plugins to make this deep-copy change to all 2500+ classes automatically, and in the future to make other changes/fixes automatically across all generated classes (including any new classes from new XSDs). But the downsides are notable:
Approach 2Directly implement deep-copy methods in the existing classes. This would avoid the work and risk of automating the changes via XJC and JAXB Tools. But that is replaced by the work and risk of coming up with a way to do this across all 2500+ classes using something like a script, OpenRewrite recipe, or AI, hopefully in a reliable and deterministic way. And by making the change directly, we might make automation or big changes harder in the future by introducing more custom code across all these files. (Rather than making big changes easier by establishing a repeatable process for generating the code from the XSDs.) |
Beta Was this translation helpful? Give feedback.
-
|
Hi Buck A faster deep copy as indicated by your POC results would be a very worthwhile improvement, so yes, there is interest in having this as a contribution. Thank you for your analysis of available approaches. Some initial notes:
My initial preference would be for some variant on approach 1. Approach 2 may be feasible using OpenRewrite, but it wouldn't be a one-off operation (owing to new XSDs), so it would be another moving part to maintain, which is another reason for favouring the XJC-based approach 1. Were this to be implemented, we would release it in a docx4j 15.0 (or other N.0) as a signal that it is a ".0" release which may introduce new bugs. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Buck The just committed https://github.com/plutext/docx4j/blob/VERSION_11_5_10/xsd/ROOT.xsd will generate pretty much all docx4j-openxml-objects-* (sans manual edits). There are a couple of small discrepancies I will look into later this week, but I hope that gives you a solid base to work from. I also added https://github.com/plutext/docx4j/blob/VERSION_11_5_10/xsd/docx4j_jaxb_packages.xlsx as a bit of a guide as to which xsd files results in which Java packages. Historically, docx4j was a monolithic project built using ant. When we converted to Maven and modules, it seemed like a good idea to have: that is, to split the pml and sml specific generated classes into separate modules from wml, dml etc. I am not wedded to this. That is, if it is simpler to use ROOT.xsd (or equivalent) to build cheers .. Jason |
Beta Was this translation helpful? Give feedback.
-
|
Hi Jason, Amazing, thank you again! I had been generating classes from a list of separate XSDs, but having one root XSD seems like a much better idea. And indeed, I hadn't even anticipated it yet, but it will probably be simpler to combine the modules. I will take a closer look at this tomorrow. (I'm in the US, Central time) Cheers! Buck |
Beta Was this translation helpful? Give feedback.
-
|
Small update: I haven't given up on this, I just have competing priorities while I work through the necessary changes. I've got to a point where docx4j compiles with the generated classes, and have started working on failing tests. (Mostly missing I'm working internally so far; I'll see if I can push code to GitHub next week so it's open for feedback. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @plutext, We now have a working version of this feature internally, and we'd still like to contribute it if possible. Would you need us to sign individual and/or corporate Contributor License Agreements? I see there is an individual CLA in the docx4j repo that looks to be based on Apache's CLAs, but I thought we'd better check with you about whether it's required, whether it's up-to-date, and whether a corporate CLA is required in place of or in addition to the individual agreement. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! In our application, we call
XmlUtils.deepCopyrepeatedly to duplicate parts of a DOCX document. This is slow, sinceXmlUtils.deepCopyis based on marshaling and unmarshaling the XML.A much faster alternative is to implement a deep-copy method on each of docx4j's OpenXML object classes (
Body,P,R,Text, etc.). That's a challenge, because there appear to be more than 2500 of those classes. Nevertheless, I've been able to create a proof of concept that, for my own particular test cases, is 25X faster for 10K table rows and 46X faster for 100K table rows.We plan to pursue this improvement ourselves, but have some questions:
Beta Was this translation helpful? Give feedback.
All reactions