No matter how fancy you make your Tableu report or you interactive bokeh+flask website, more often than not you are going to end up having someone ask for the same exact report in a pdf sent to someones email.
This can be a pain, but it does have some advantages, mainly that now you have a email trail of the report being sent out and you are not relying on people who may or not be very technologically savvy to run your Tableau/ Qlikview report. Also if we use the correct HTML headers in the PDF import a nice bookmark navigation will be shown in most PDF readers.
I haven't found a perfect out of the box solution but I have found that a little css, jinja, weasyprint and python cover most of my needs although sometimes a little bit of PIL is needed.
First we need to install all the necessary packages. Jinja can just be installed using:
pip install jinja2
We also need to install weasyprint, this requires a bit more work depending on your operating system and specially on windows since GTK+ is needed.
Once we are have those two packages we are ready to begin. The process requires 3 files to be created:
- python script
- HTML jinja template
- CSS styling
First we will start out creating our python script. For this sample project we will be looking at the diabetes data set.
We will look at patients within the age group with qualifier being emphasized. We want to generate a PDF for emailing with a nice chart being automatically created. Just to have some fun we will use Faker to give some names to everyone.
import pandas as pd
import numpy as np
from jinja2 import Environment, FileSystemLoader
import re
from faker import Faker
from weasyprint import HTML
#Initialize our Faker
fake=Faker()
#Load the csv with diabetes information into a pandas df
diab_org=pd.read_csv('https://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt',sep='\t')
#Select only the columns we are interested in
diab=diab_org[['AGE','SEX','BMI','BP']].copy()
#Just for fun assign some fake names to the dataset
diab['Name']=diab.AGE.apply(lambda x: fake.name())
Once we have created the specific data we want to report on we need to load our jinja template.
env=Environment(loader=FileSystemLoader('.'))
#declare our jinja template
template= env.get_template('report_template.html')
template_vars={'title':'Diabetes Dataset',
'data':diab.sort_values('AGE').head(20),
'headers':list(diab.columns.values)}
We pass a dictionary along with the render_template function call.
#render out our html with jinja
html_out=template.render(template_vars)
The jinja template is where most of the magic happens. The following is the template used for a simple table with specific row highlighting.
There are a couple of interesting things, the first is how the information is passed. We can access specific named variables from the dictionary passed on the render_template call, just need to wrap them in `{{}}`
Second is how we interact with arrays and generate for loops inside jinja. Jinja includes a lot of functionality and reading up on it is recommended. For the purposes of my reporting the main aspects being used are if loops and for loops (which can iterate thru arrays, pandas data frames and dictionaries). We initialize a loop with {% %}
Once we generate the HTML code from the template we can take a look at it.
If we wanted to save the HTML we can just do so with:
file_name='report.html'
with open(file_name, 'w') as fh:
fh.write(html_out)
However we want to then convert this HTML code into a PDF file. This is where weasyprint comes in.
We can convert this using:
#Save our pdf file
pdf_name='report_pdf.pdf'
HTML(string=html_out).write_pdf(pdf_name)
However if we take a look at the resulting file it doesn't look too sexy. In order to fix our presentation we can pass a style sheet parameter to the PDF function. It's important to note that passing the CSS in the HTML code/ template does not work properly most of the time and it is better to include in the write_pdf
call.
Since we want to make this look good we can pass a style sheet as such:
pdf_name='report_pdf.pdf'
HTML(string=html_out).write_pdf(pdf_name,stylesheets=["report.css"])
I would recommend using a downloaded bootstrap CSS that does not contain the print logic behind it as it tends to remove colors and alter the document in ways that aren't always great
Realistically that still looks pretty hideous. So I'm going to be lazy and just use bootstrap like 95% of the internet, and modify our template a bit to use bootstrap classes.
pdf_name='report_pdf.pdf'
HTML(string=html_out).write_pdf(pdf_name,stylesheets=["bootstrap.css"])
This is starting to look better, however once this tries to be printed we run into some problems relating to page size, this is also where we would change to landscape or large size if we wanted to.
In order to fix this we also pass along our own costume css with the important parameter in the definitions. Passing the following css:
and changing our PDF export function
HTML(string=html_out).write_pdf(pdf_name,stylesheets=["report_fancy.css","bootstrap.css"])
The resulting document looks as something we wouldn't feel terrible emailing our CEO, plus is pretty easy to parametrize and automatically send out.
In Pt 2. we will go over how to add images and plots to our documents using the same procedure, and how to generate PDFs without weasyprint/ jinja when all we want is to send some charts.