The Muhimbi PDF Converter comes with the ability to convert HTML to PDF. However, as HTML is not really a language that is intended for output to a printer (or PDF), some pages may not look as expected.
Note that as of version 8.3 the PDF Converter comes with a brand new HTML to PDF Converter. This converter solves a number, but not all, of the issues described below.
Listed below are a number of possible workarounds that may improve the formatting of the generated PDFs.
When using the Internet Explorer based HTML to PDF option (it is possible to switch to 'WebKit' in release 8.3 and later), the PDF Converter does not go through Internet Explorer's print processing engine, so any print specific CSS entries are not used. If you have control over the page that is being converted then you can add some logic inside the page that looks at a query string parameter ( e.g. ?pdfconversion=true). Based on this parameter being present you can then emit different CSS / HTML that improves the formatting, e.g. a different page width. In SharePoint this can be achieved by modifying the master page or inserting a hidden 'content editor web part'.
<!-- Optional delay (in milliseconds) between loading the web page and
<add key="HTMLConverterFullFidelity.ConversionDelay" value="0"/>
<!-- Max number of cycles (1sec apart) before a converter is considered 'hanging' and will be terminated -->
<add key="ProcessMonitor.MAX_HUNG_COUNT" value="15" />
Alternatively you may want to consider making a change to one or more of the following settings in the Conversion Service's config file.
- HTMLConverterFullFidelity.PaperSize: Specify the paper size to use for the PDF when converting HTML pages. For example A4, Letter or a custom page size. For full details see the Conversion Service's config file.
- HTMLConverterFullFidelity.PageOrientation: Specifies the page orientation, either 'Portrait' or 'Landscape'.
- HTMLConverterFullFidelity.PageMargin: The Margin / border around the generated PDF file.
- HTMLConverterFullFidelity.ScaleMode: Determine how the HTML will be scaled to the PDF page size, either FitWidth, FitWidthScaleImagesOnly or NoScale.
- HTMLConverterFullFidelity.SplitTextLines: Should text be broken up or wrapped to a new page?
- HTMLConverterFullFidelity.SplitImages: Should images be broken up or wrapped to a new page?
When using the Internet Explorer based HTML to PDF option (it is possible to switch to 'WebKit' in release 8.3 and later), converting a URL sometimes results in 'bitmap' output where it looks like there is just a screenshot of the page content in the generated PDF rather than 'real text'. This happens when the server running the conversion service is running Internet Explorer 9 or later. Microsoft introduced a change in that version that causes all HTML5 content to be rendered as a bitmap. Non HTML5 content is rendered just fine. To solve this problem roll back to Internet Explorer 8 or, if you have control over the content of the web page, skip output of the HTML5 doctype when a query string is passed in that indicates PDF Conversion. Alternatively update to the latest release and use the default 'WebKit' based conversion engine.
Although there is no 'one size solves all' answer, using the workarounds recommended above should make it possible to generated acceptable looking PDF files for most situations.