Ondřej Plátek Archive
PhD candidate@UFAL, Prague. LLM & TTS evaluation. Engineer. Researcher. Speaker. Father.

Merge pdf file in Ubuntu

gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=combinedpdf.pdf -dBATCH 1.pdf 2.pdf 3.pdf

pdftk *pdf cat output ../FINAL_NAME.pdf
extract images from pdf

pdfimages -j Arco\ Big\ Walls-en.pdf arco-big
Lot of other tricks with pdf including OCR(not working very well, It would need some boosting - language model, domain adaptation, contrast adjustment, .., etc) in tutorial: http://blog.konradvoelkel.de/2010/01/linux-ocr-and-pdf-problem-solved/