Skip to content

Merge Method Retains Attachments & Metadata from First PDF Only #104

@Yanal-Yves

Description

@Yanal-Yves

Describe the bug
The current Merge method implementation combines multiple PDFs by importing the pages of subsequent documents into the first document. As a result, the final merged PDF incorrectly retains only the attachments and metadata of the first PDF. All attachments and metadata from the other source PDFs are discarded.

This behavior is particularly problematic for standards like Factur-X/ZUGFeRD. In this standard, an invoice consists of a PDF with a mandatory XML file attached. Crucially, the standard specifies that one PDF file must correspond to exactly one invoice.

This means that merging the attachments from multiple Factur-X invoices into a single PDF is a fundamentally invalid operation, as the standard does not permit multiple invoices in one file. The library's current behavior—keeping the attachment from the first document—creates a misleading and technically incorrect file that appears to contain multiple invoices but only has the structured data for the first one.

Given this, the most logical and safest approach is for the Merge method to create a brand new, clean PDF document that contains only the imported pages, without inheriting any attachments or metadata. This would provide a predictable result and allow the developer to manage attachments and metadata explicitly if needed.

Notably, this proposed behavior would be consistent with the existing Split method, which correctly creates new PDF files containing only the requested pages.

To Reproduce
Steps to reproduce the behavior:

  1. Take two separate PDF files, docA.pdf and docB.pdf, each with its own unique attachments.
  2. Merge them using the library's Merge method, with docA.pdf as the first document.
  3. Inspect the resulting merged.pdf.

Actual Behavior
The merged.pdf file contains all pages from docA.pdf and docB.pdf, but it only has the attachments and metadata from docA.pdf. The attachments from docB.pdf are lost.

Expected behavior
The merged.pdf file should be a new document containing all the pages from the source PDFs but with no attachments or metadata inherited from any of the source files.

Desktop (please complete the following information):

  • .net version : .net 8
  • OS: TuxedoOS and Windows 10
  • Version 2.6.0

Pull request for that suggestion : https://github.com/GowenGit/docnet/pull/101/files

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions