nanotron/ultrascale-playbook · How to download as pdf?

Feb 19

How do I download this as a pdf so I can read it offline? If I try to save it as a pdf using cmd+p it only saves 1 page.

noobmldude

Feb 20

The actual web page of the blog is here: https://nanotron-ultrascale-playbook.static.hf.space/dist/index.html
You can try saving it as a pdf from here 👆.

Detail of issue if you are interested:
The issue of only saving 1 page could be due to the fact that the actual blog is embedded inside an iframe in HF Space shell. (so the save as PDF is unable to scroll inside it)

vmsenthil

Feb 20

formatting is definitely messed up! On the Immersive reader mode the tables are getting blackedout. I will wait until the authors clean up and provide the link.

thomwolf

Nanotron Research org Feb 20

We’ll add a rough pdf version tomorrow/this weekend and a much more polished version a bit later with the physical version

noobmldude

Feb 20

that would be amazing!! thanks @thomwolf and team.

knoel

Feb 21

commenting for notification

TristanBehrens

Feb 21

Looking forward!

erlebach

Feb 21

There's a button near the top of the page on the right: Download Pdf. I could not get this to work in Safari, but it did work in Brave. The pdf is well formatted.

nouamanetazi

Nanotron Research org Feb 22

We can now download as PDF 🥳

nouamanetazi changed discussion status to closed Feb 22

xm2023

Mar 27

We can now download as PDF 🥳

Sorry to bring this up again. It is still not printing friendly. The PDF is still in 1 page, can't be split into multiple pages. Currently I am following @noobmldude 's method to print the pages

noobmldude

Mar 27

@xm2023 There is already a prepared PDF provided by the HF team.
You can find it here: https://huggingface.co/spaces/nanotron/ultrascale-playbook/blob/main/The_Ultra-Scale_Playbook_Training_LLMs_on_GPU_Clusters.pdf

xm2023

Mar 27

@noobmldude have you tried to print this pdf? The whole pdf is squeezed in 1 A4/letter page, not printable at all.

noobmldude

Mar 28

•

edited Mar 28

Yes @xm2023 , it is a single page PDF and I agree not printer-friendly. I read it on a digital device. Unfortunately, I have not found a way to print this playbook.

This was meant to be a rough-pdf version. See comment from @thomwolf :

We’ll add a rough pdf version tomorrow/this weekend and a much more polished version a bit later with the physical version

Probably due to the fact that with so many images and non text elements it could be difficult to add page breaks. And this single page PDF could be the fastest way to preserve formatting and also extract a PDF-like file.
Hopefully the polished version when it comes is split into pages and more printer-friendly.

sleepingllama

Apr 6

I was able to download pdf version by opening https://huggingface.co/spaces/nanotron/ultrascale-playbook/blob/main/The_Ultra-Scale_Playbook_Training_LLMs_on_GPU_Clusters.pdf at safari.
Hope this works.

colinlaganier

23 days ago

Used PyMuPDF to split the pdf into separate pages to print. Hope this helps!

import fitz  # PyMuPDF

def split_pdf(input_pdf, output_pdf):
    # Open documents
    source_doc = fitz.open(input_pdf)
    output_doc = fitz.open()

    # Constants
    TOP_MARGIN = 25
    BOTTOM_MARGIN = 25
    SKIP_TOP = 692             # Skip this much content from top (title area)
    TITLE_HEIGHT = 692         # Height of the title section to extract
    A4_RATIO = 210 / 297       # Width / Height for A4

    first_page = source_doc[0]
    page_width = first_page.rect.width
    page_height = first_page.rect.height

    # Calculate A4-based target content height
    target_content_height = page_width / A4_RATIO
    a4_height = target_content_height

    # --- Create Title Page ---
    title_margin_top = (a4_height - TITLE_HEIGHT) / 2
    title_page = output_doc.new_page(width=page_width, height=a4_height)

    title_clip_rect = fitz.Rect(0, 0, page_width, TITLE_HEIGHT)
    title_target_rect = fitz.Rect(0, title_margin_top, page_width, title_margin_top + TITLE_HEIGHT)

    title_page.show_pdf_page(
        title_target_rect,
        source_doc,
        0,
        clip=title_clip_rect
    )

    # --- Split Remaining Content ---
    remaining_height = page_height - SKIP_TOP
    num_splits = int(remaining_height / target_content_height)

    # Adjusted height for each new split section
    adjusted_content_height = (remaining_height - TOP_MARGIN - BOTTOM_MARGIN) / num_splits
    new_page_height = adjusted_content_height + TOP_MARGIN + BOTTOM_MARGIN

    for i in range(num_splits):
        top = SKIP_TOP + i * adjusted_content_height
        bottom = top + adjusted_content_height

        content_clip = fitz.Rect(0, top, page_width, bottom)
        content_target = fitz.Rect(0, TOP_MARGIN, page_width, TOP_MARGIN + adjusted_content_height)

        new_page = output_doc.new_page(width=page_width, height=new_page_height)
        new_page.show_pdf_page(
            content_target,
            source_doc,
            0,
            clip=content_clip
        )

    # Save the new document
    output_doc.save(output_pdf, garbage=4, deflate=True)