Java中常用的几种DOCX转PDF方法

539 查看

本文从属于笔者的Java入门与最佳实践系列文章。

DOCX2PDF

将DOCX文档转化为PDF是项目中常见的需求之一,目前主流的方法可以分为两大类,一类是利用各种Office应用进行转换,譬如Microsoft Office、WPS以及LiberOffice,另一种是利用各种语言提供的对于Office文档读取的接口(譬如Apache POI)然后使用专门的PDFGenerator库,譬如IText进行PDF构建。总的来说,从样式上利用Office应用可以保证较好的样式,不过相对而言效率会比较低。其中Microsoft Office涉及版权,不可轻易使用(笔者所在公司就被抓包了),WPS目前使用比较广泛,不过存在超链接截断问题,即超过256个字符的超链接会被截断,LiberOffice的样式排版相对比较随意。而利用POI接口进行读取与生成的方式性能较好,适用于对于格式要求不是很高的情况。另外还有一些封装好的在线工具或者命令行工具,譬如docx2pdfOfficeToPDF

MicroSoft Office

本部分的核心代码如下,全部代码参考这里:

private ActiveXComponent oleComponent = null;
private Dispatch activeDoc = null;
private final static String APP_ID = "Word.Application";

// Constants that map onto Word's WdSaveOptions enumeration and that
// may be passed to the close(int) method
public static final int DO_NOT_SAVE_CHANGES = 0;
public static final int PROMPT_TO_SAVE_CHANGES = -2;
public static final int SAVE_CHANGES = -1;

// These constant values determine whether or not tha application
// instance will be displyed on the users screen or not.
public static final boolean VISIBLE = true;
public static final boolean HIDDEN = false;

/**
 * Create a new instance of the JacobWordSearch class using the following
 * parameters.
 *
 * @param visibility A primitive boolean whose value will determine whether
 *                   or not the Word application will be visible to the user. Pass true
 *                   to display Word, false otherwise.
 */
public OfficeConverter(boolean visibility) {
    this.oleComponent = new ActiveXComponent(OfficeConverter.APP_ID);
    this.oleComponent.setProperty("Visible", new Variant(visibility));
}

/**
 * Open ana existing Word document.
 *
 * @param docName An instance of the String class that encapsulates the
 *                path to and name of a valid Word file. Note that there are a few
 *                limitations applying to the format of this String; it must specify
 *                the absolute path to the file and it must not use the single forward
 *                slash to specify the path separator.
 */
public void openDoc(String docName) {
    Dispatch disp = null;
    Variant var = null;
    // First get a Dispatch object referencing the Documents collection - for
    // collections, think of ArrayLists of objects.
    var = Dispatch.get(this.oleComponent, "Documents");
    disp = var.getDispatch();
    // Now call the Open method on the Documents collection Dispatch object
    // to both open the file and add it to the collection. It would be possible
    // to open a series of files and access each from the Documents collection
    // but for this example, it is simpler to store a reference to the
    // active document in a private instance variable.
    var = Dispatch.call(disp, "Open", docName);
    this.activeDoc = var.getDispatch();
}

/**
 * There is more than one way to convert the document into PDF format, you
 * can either explicitly use a FileConvertor object or call the
 * ExportAsFixedFormat method on the active document. This method opts for
 * the latter and calls the ExportAsFixedFormat method passing the name
 * of the file along with the integer value of 17. This value maps onto one
 * of Word's constants called wdExportFormatPDF and causes the application
 * to convert the file into PDF format. If you wanted to do so, for testing
 * purposes, you could add another value to the args array, a Boolean value
 * of true. This would open the newly converted document automatically.
 *
 * @param filename
 */
public void publishAsPDF(String filename) {
    // The code to expoort as a PDF is 17
    //Object args = new Object{filename, new Integer(17), new Boolean(true)};
    Object args = new Object {
        filename, new Integer(17)
    } ;
    Dispatch.call(this.activeDoc, "ExportAsFixedFormat", args);
}

/**
 * Called to close the active document. Note that this method simply
 * calls the overloaded closeDoc(int) method passing the value 0 which
 * instructs Word to close the document and discard any changes that may
 * have been made since the document was opened or edited.
 */
public void closeDoc() {
    this.closeDoc(JacobWordSearch.DO_NOT_SAVE_CHANGES);
}

/**
 * Called to close the active document. It is possible with this overloaded
 * version of the close() method to specify what should happen if the user
 * has made changes to the document that have not been saved. There are three
 * possible value defined by the following manifest constants;
 * DO_NOT_SAVE_CHANGES - Close the document and discard any changes
 * the user may have made.
 * PROMPT_TO_SAVE_CHANGES - Display a prompt to the user asking them
 * how to proceed.
 * SAVE_CHANGES - Save the changes the user has made to the document.
 *
 * @param saveOption A primitive integer whose value indicates how the close
 *                   operation should proceed if the user has made changes to the active
 *                   document. Note that no checks are made on the value passed to
 *                   this argument.
 */
public void closeDoc(int saveOption) {
    Object args = {new Integer(saveOption)};
    Dispatch.call(this.activeDoc, "Close", args);
}

/**
 * Called once processing has completed in order to close down the instance
 * of Word.
 */
public void quit() {
    Dispatch.call(this.oleComponent, "Quit");
}

WPS

本文的核心代码如下,完整代码查看这里:

        @Override
        public boolean convert(String word, String pdf) {
            File pdfFile = new File(pdf);
            File wordFile = new File(word);
            boolean convertSuccessfully = false;

            ActiveXComponent wps = null;
            ActiveXComponent doc = null;


            try {
                wps = new ActiveXComponent("KWPS.Application");

//                Dispatch docs = wps.getProperty("Documents").toDispatch();
//                Dispatch d = Dispatch.call(docs, "Open", wordFile.getAbsolutePath(), false, true).toDispatch();
//                Dispatch.call(d, "SaveAs", pdfFile.getAbsolutePath(), 17);
//                Dispatch.call(d, "Close", false);

                doc = wps.invokeGetComponent("Documents")
                        .invokeGetComponent("Open", new Variant(wordFile.getAbsolutePath()));

                try {
                    doc.invoke("SaveAs",
                            new Variant(new File("C:\\Users\\lotuc\\Documents\\mmm.pdf").getAbsolutePath()),
                            new Variant(17));
                    convertSuccessfully = true;
                } catch (Exception e) {
                    logger.warning("生成PDF失败");
                    e.printStackTrace();
                }

                File saveAsFile = new File("C:\\Users\\lotuc\\Documents\\saveasfile.doc");
                try {
                    doc.invoke("SaveAs", saveAsFile.getAbsolutePath());
                    logger.info("成功另存为" + saveAsFile.getAbsolutePath());
                } catch (Exception e) {
                    logger.info("另存为" + saveAsFile.getAbsolutePath() + "失败");
                    e.printStackTrace();
                }
            } finally {
                if (doc == null) {
                    logger.info("打开文件 " + wordFile.getAbsolutePath() + " 失败");
                } else {
                    try {
                        logger.info("释放文件 " + wordFile.getAbsolutePath());
                        doc.invoke("Close");
                        doc.safeRelease();
                    } catch (Exception e1) {
                        logger.info("释放文件 " + wordFile.getAbsolutePath() + " 失败");
                    }
                }

                if (wps == null) {
                    logger.info("加载 WPS 控件失败");
                } else {
                    try {
                        logger.info("释放 WPS 控件");
                        wps.invoke("Quit");
                        wps.safeRelease();
                    } catch (Exception e1) {
                        logger.info("释放 WPS 控件失败");
                    }
                }
            }

            return convertSuccessfully;
        }

LiberOffice

LiberOffice本身提供了一个命令行工具进行转换,在你安装好了LiberOffice之后

/usr/local/bin/soffice --convert-to pdf:writer_pdf_Export /Users/lotuc/Downloads/test.doc

如果有打开的libreoffice实例, 要穿入env选项指定一个工作目录

/usr/local/bin/soffice "-env:UserInstallation=file:///tmp/LibreOffice_Conversion_abc" --convert-to pdf:writer_pdf_Export /Users/lotuc/Downloads/test.doc

首先我们需要安装好LiberOffice,然后将依赖的Jar包添加到classpath中:

Install Libre Office

Create a Java project in your favorite editor and add these to your class path:
  [Libre Office Dir]/URE/java/juh.jar
  [Libre Office Dir]/URE/java/jurt.jar
  [Libre Office Dir]/URE/java/ridl.jar
  [Libre Office Dir]/program/classes/unoil.jar

然后我们需要启动一个LiberOffice进程:

import java.util.Date;
import java.io.File;
import com.sun.star.beans.PropertyValue;
import com.sun.star.comp.helper.Bootstrap;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XDesktop;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.text.XTextDocument;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;
import com.sun.star.util.XReplaceDescriptor;
import com.sun.star.util.XReplaceable;

public class MailMergeExample {

public static void main(String[] args) throws Exception {

 // Initialise
 XComponentContext xContext = Bootstrap.bootstrap();

 XMultiComponentFactory xMCF = xContext.getServiceManager();
 
 Object oDesktop = xMCF.createInstanceWithContext(
      "com.sun.star.frame.Desktop", xContext);
 
 XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(
      XDesktop.class, oDesktop);

接下来我们需要加载目标Doc文档:

// Load the Document
String workingDir = "C:/projects/";
String myTemplate = "letterTemplate.doc";

if (!new File(workingDir + myTemplate).canRead()) {
 throw new RuntimeException("Cannot load template:" + new File(workingDir + myTemplate));
}

XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime
 .queryInterface(com.sun.star.frame.XComponentLoader.class, xDesktop);

String sUrl = "file:///" + workingDir + myTemplate;

PropertyValue[] propertyValues = new PropertyValue[0];

propertyValues = new PropertyValue[1];
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Hidden";
propertyValues[0].Value = new Boolean(true);

XComponent xComp = xCompLoader.loadComponentFromURL(
 sUrl, "_blank", 0, propertyValues);


然后我们可以使用如下方式对内容进行替换:

// Search and replace
XReplaceDescriptor xReplaceDescr = null;
XReplaceable xReplaceable = null;

XTextDocument xTextDocument = (XTextDocument) UnoRuntime
  .queryInterface(XTextDocument.class, xComp);

xReplaceable = (XReplaceable) UnoRuntime
  .queryInterface(XReplaceable.class, xTextDocument);

xReplaceDescr = (XReplaceDescriptor) xReplaceable
  .createReplaceDescriptor();

// mail merge the date
xReplaceDescr.setSearchString("<date>");
xReplaceDescr.setReplaceString(new Date().toString());
xReplaceable.replaceAll(xReplaceDescr);

// mail merge the addressee
xReplaceDescr.setSearchString("<addressee>");
xReplaceDescr.setReplaceString("Best Friend");
xReplaceable.replaceAll(xReplaceDescr);

// mail merge the signatory
xReplaceDescr.setSearchString("<signatory>");
xReplaceDescr.setReplaceString("Your New Boss");
xReplaceable.replaceAll(xReplaceDescr);

然后可以输出到PDF中:

// save as a PDF
XStorable xStorable = (XStorable) UnoRuntime
  .queryInterface(XStorable.class, xComp);

propertyValues = new PropertyValue[2];
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Overwrite";
propertyValues[0].Value = new Boolean(true);
propertyValues[1] = new PropertyValue();
propertyValues[1].Name = "FilterName";
propertyValues[1].Value = "writer_pdf_Export";

// Appending the favoured extension to the origin document name
String myResult = workingDir + "letterOutput.pdf";
xStorable.storeToURL("file:///" + myResult, propertyValues);

System.out.println("Saved " + myResult);

xdocreport

本文的核心代码如下,完整代码查看这里:

/**
 * @param inpuFile 输入的文件流
 * @param outFile  输出的文件对象
 * @return
 * @function 利用Apache POI从输入的文件中生成PDF文件
 */
@SneakyThrows
public static void convertWithPOI(InputStream inpuFile, File outFile) {

    //从输入的文件流创建对象
    XWPFDocument document = new XWPFDocument(inpuFile);

    //创建PDF选项
    PdfOptions pdfOptions = PdfOptions.create();//.fontEncoding("windows-1250")

    //为输出文件创建目录
    outFile.getParentFile().mkdirs();

    //执行PDF转化
    PdfConverter.getInstance().convert(document, new FileOutputStream(outFile), pdfOptions);

}

/**
 * @param inpuFile
 * @param outFile
 * @param renderParams
 * @function 先将渲染参数填入模板DOCX文件然后生成PDF
 */
@SneakyThrows
public static void convertFromTemplateWithFreemarker(InputStream inpuFile, File outFile, Map<String, Object> renderParams) {

    //创建Report实例
    IXDocReport report = XDocReportRegistry.getRegistry().loadReport(
            inpuFile, TemplateEngineKind.Freemarker);

    //创建上下文
    IContext context = report.createContext();

    //填入渲染参数
    renderParams.forEach((s, o) -> {
        context.put(s, o);
    });

    //创建输出流
    outFile.getParentFile().mkdirs();

    //创建转化参数
    Options options = Options.getTo(ConverterTypeTo.PDF).via(
            ConverterTypeVia.XWPF);

    //执行转化过程
    report.convert(context, options, new FileOutputStream(outFile));
}