some epstein files can be unredacted...The Trump administration is so bloody stupid they just highlighted the text in black in the pdfs, leaving all the text available underneath.

slothrop@lemmy.ca · 2 days ago

some epstein files can be unredacted...The Trump administration is so bloody stupid they just highlighted the text in black in the pdfs, leaving all the text available underneath.

MedicsOfAnarchy@lemmy.world · 2 days ago

If anyone would like to post instructions for the pdf impaired…

pieland@piefed.social · edit-2 2 days ago

I haven’t tried this, but this is what I’ve read:

You can highlight the redacted text, copy it, and paste the text into another document (like Word, WordPad, Notepad, etc.).

Another method I saw mentioned on Facebook:

“The backgrounds are transparent. Pull them into photoshop and throw a layer of white between the text and the black background and you have your text.”

If anything is unclear, please ask. Even if I can’t answer it, maybe someone else can.

adhd_traco@piefed.social · 2 days ago

It looks like the maintainers of the project have updated their project and page. It’s much simpler now.

Installation:

pip install pdfplumber pymupdf

Usage:

python redact_extract.py example.pdf

pieland@piefed.social · 2 days ago

If anyone has any questions about this comment as well, please feel free to ask. I know what this comment means, but a few years ago I wouldn’t’ve had a clue.

thesohoriots@lemmy.world · 2 days ago

You’re asking on Lemmy?

Install gentoo

adhd_traco@piefed.social · edit-2 2 days ago

Here’s step-by-step instructions for the tool that the OP of the reddit thread open sourced. It creates a side-by-side pdf of the redacted and unredacted version at the end.

No root access is required at any point.

Download and extract the files of https://github.com/leedrake5/unredact?tab=readme-ov-file
Create a python virtual environment, but make sure the destination folder doesn’t already exist (~/.env) here.

python3 -m venv ~/.env

activate the environment

source ~/.env/bin/activate
You should now see (.env) before your prompt.

Now install the python dependencies.

pip install pdfplumber pymupdf
You’re all set up.

While still having your virtual environment active, indicated by the (.env) before your pompt, navigate to the downloaded github project, where the ‘redact_extract.py’ file is located.
Copy whatever pdf document you want to try to unredact to the same location.
execute the script

python redact_extract.py taco_crimes.pdf

The script should now have created a file for you in the current location with the redacted and unredacted version side by side.

To leave the virtual environment:

deactivate

To enter it again:

source ~/.env/bin/activate

To delete everything cleanly, just delete the virtual environment (~/.env in this case)

The project linked in evacide’s Mastodon toot is even simpler to install. Create and activate a virtual environment like before, but at a different location (.env1 instead of .env, for example).

Then install the tool from pip in the virtual environment:

pip install x-ray

The tool is now installed and can be executed with a pdf file like so:

xray /path/to/your/file.pdf

https://github.com/freelawproject/x-ray

(sorry for the bad formatting. after posting, I can’t preview anymore to figure out how to fix it.)

gwl@lemmy.blahaj.zone · 2 days ago

Could you show some of what it extracted then?

adhd_traco@piefed.social · edit-2 2 days ago

I haven’t tried any yet, except verify with sample from reddit’s OP.

Here’s a a google drive link from the reddit thread with three files. One original justice.gov pdf with bad redactions. The same file, unredacted. And a third single pdf side-by-side of the aforementioned two files. By default, OP’s tool creates this side-by-side pdf.

lmmarsano@lemmynsfw.com · 2 days ago

You don’t need the lame script. Just select the obscured text in a document like the one they linked to & copy & paste or have the voice reader read it back.

some epstein files can be unredacted...The Trump administration is so bloody stupid they just highlighted the text in black in the pdfs, leaving all the text available underneath.

some epstein files can be unredacted...The Trump administration is so bloody stupid they just highlighted the text in black in the pdfs, leaving all the text available underneath.

Reddit - The heart of the internet