How to download as pdf?

#74
by vcoyk - opened

How do I download this as a pdf so I can read it offline? If I try to save it as a pdf using cmd+p it only saves 1 page.

The actual web page of the blog is here: https://nanotron-ultrascale-playbook.static.hf.space/dist/index.html
You can try saving it as a pdf from here 👆.

Detail of issue if you are interested:
The issue of only saving 1 page could be due to the fact that the actual blog is embedded inside an iframe in HF Space shell. (so the save as PDF is unable to scroll inside it)

formatting is definitely messed up! On the Immersive reader mode the tables are getting blackedout. I will wait until the authors clean up and provide the link.

Nanotron Research org

We’ll add a rough pdf version tomorrow/this weekend and a much more polished version a bit later with the physical version

that would be amazing!! thanks @thomwolf and team.

commenting for notification

Looking forward!

There's a button near the top of the page on the right: Download Pdf. I could not get this to work in Safari, but it did work in Brave. The pdf is well formatted.

Nanotron Research org

We can now download as PDF 🥳

nouamanetazi changed discussion status to closed

We can now download as PDF 🥳

Sorry to bring this up again. It is still not printing friendly. The PDF is still in 1 page, can't be split into multiple pages. Currently I am following @noobmldude 's method to print the pages

@noobmldude have you tried to print this pdf? The whole pdf is squeezed in 1 A4/letter page, not printable at all.

Yes @xm2023 , it is a single page PDF and I agree not printer-friendly. I read it on a digital device. Unfortunately, I have not found a way to print this playbook.

This was meant to be a rough-pdf version. See comment from @thomwolf :

We’ll add a rough pdf version tomorrow/this weekend and a much more polished version a bit later with the physical version

Probably due to the fact that with so many images and non text elements it could be difficult to add page breaks. And this single page PDF could be the fastest way to preserve formatting and also extract a PDF-like file.
Hopefully the polished version when it comes is split into pages and more printer-friendly.

Used PyMuPDF to split the pdf into separate pages to print. Hope this helps!

import fitz  # PyMuPDF

def split_pdf(input_pdf, output_pdf):
    # Open documents
    source_doc = fitz.open(input_pdf)
    output_doc = fitz.open()

    # Constants
    TOP_MARGIN = 25
    BOTTOM_MARGIN = 25
    SKIP_TOP = 692             # Skip this much content from top (title area)
    TITLE_HEIGHT = 692         # Height of the title section to extract
    A4_RATIO = 210 / 297       # Width / Height for A4

    first_page = source_doc[0]
    page_width = first_page.rect.width
    page_height = first_page.rect.height

    # Calculate A4-based target content height
    target_content_height = page_width / A4_RATIO
    a4_height = target_content_height

    # --- Create Title Page ---
    title_margin_top = (a4_height - TITLE_HEIGHT) / 2
    title_page = output_doc.new_page(width=page_width, height=a4_height)

    title_clip_rect = fitz.Rect(0, 0, page_width, TITLE_HEIGHT)
    title_target_rect = fitz.Rect(0, title_margin_top, page_width, title_margin_top + TITLE_HEIGHT)

    title_page.show_pdf_page(
        title_target_rect,
        source_doc,
        0,
        clip=title_clip_rect
    )

    # --- Split Remaining Content ---
    remaining_height = page_height - SKIP_TOP
    num_splits = int(remaining_height / target_content_height)

    # Adjusted height for each new split section
    adjusted_content_height = (remaining_height - TOP_MARGIN - BOTTOM_MARGIN) / num_splits
    new_page_height = adjusted_content_height + TOP_MARGIN + BOTTOM_MARGIN

    for i in range(num_splits):
        top = SKIP_TOP + i * adjusted_content_height
        bottom = top + adjusted_content_height

        content_clip = fitz.Rect(0, top, page_width, bottom)
        content_target = fitz.Rect(0, TOP_MARGIN, page_width, TOP_MARGIN + adjusted_content_height)

        new_page = output_doc.new_page(width=page_width, height=new_page_height)
        new_page.show_pdf_page(
            content_target,
            source_doc,
            0,
            clip=content_clip
        )

    # Save the new document
    output_doc.save(output_pdf, garbage=4, deflate=True)

Sign up or log in to comment