- #Malicious pdf attributes how to
- #Malicious pdf attributes pdf
- #Malicious pdf attributes code
- #Malicious pdf attributes password
#Malicious pdf attributes pdf
For example, /JBIG2Decode 1(1) tells you that the PDF document contains the name /JBIG2Decode and that it was obfuscated (using hexcodes, e.g. This is not necessarily and indication of a malicious PDF document, but requires further investigation.Ī number that appears between parentheses after the counter represents the number of obfuscated occurrences. JBIG2Decode indicates if the PDF document uses JBIG2 compression. The combination of automatic action and JavaScript makes a PDF document very suspicious. All malicious PDF documents with JavaScript I’ve seen in the wild had an automatic action to launch the JavaScript without user interaction. AA and /OpenAction indicate an automatic action to be performed when the page/document is viewed. Of course, you can also find JavaScript in PDF documents without malicious intend. Almost all malicious PDF documents that I’ve found in the wild contain JavaScript (to exploit a JavaScript vulnerability and/or to execute a heap spray). JS and /JavaScript indicate that the PDF document contains JavaScript. An object stream is a stream object that can contain other objects, and can therefor be used to obfuscate objects (by using different filters). ObjStm counts the number of object streams.
#Malicious pdf attributes password
Encrypt indicates that the PDF document has DRM or needs a password to be read. Most malicious PDF document have only one page. Page gives an indication of the number of pages in the PDF document. I’ve found a couple of PDF documents without xref or trailer, but these are rare (BTW, this is not an indication of a malicious PDF document). PDFiD will scan a PDF document for a given list of strings and count the occurrences (total and obfuscated) of each word:Īlmost every PDF documents will contain the first 7 words (obj through startxref), and to a lesser extent stream and endstream. To avoid the risk of getting exploited, I decided to keep this program very simple (it is even simpler than pdf-parser.py). Parsing a PDF document completely requires a very complex program, and hence it is bound to contain many (security) bugs. The idea is to use this tool first to triage PDF documents, and then analyze the suspicious ones with my pdf-parser.Īn important design criterium for this program is simplicity. This tool is not a PDF parser, but it will scan a file to look for certain PDF keywords, allowing you to identify PDF documents that contain (for example) JavaScript or execute an action when opened. Make-pdf-embedded.py creates a PDF file with an embedded file. To provide your own JavaScript, use option –javascript for a script on the command line, or –javascriptfile for a script contained in a file. If you execute it without options, it will generate a PDF document with JavaScript to display a message box (calling app.alert). It’s essentially glue-code for the mPDF.py module which contains a class with methods to create headers, indirect objects, stream objects, trailers and XREFs. Make-pdf-javascript.py allows one to create a simple PDF document with embedded JavaScript that will execute upon opening of the PDF document. The type is a Name and as such is case-sensitive and must start with a slash-character (/). Type allows you to select all objects of a given type. Reference allows you to select all objects referencing the specified indirect object.
If more than one object have the same ID (disregarding the version), all these objects will be outputted. Objects outputs the data of the indirect object which ID was specified. not the printable Python representation). The raw option makes pdf-parser output raw data (e.g. For the moment, only FlateDecode is supported (e.g. The search is not case-sensitive, and is susceptible to the obfuscation techniques I documented (as I’ve yet to encounter these obfuscation techniques in the wild, I decided no to resort to canonicalization).įilter option applies the filter(s) to the stream. The search option searches for a string in indirect objects (not inside the stream of indirect objects). For example, I generated statistics for 2 malicious PDF files, and although they were very different in content and size, the statistics were identical, proving that they used the same attack vector and shared the same origin. Use this to identify PDF documents with unusual/unexpected objects, or to classify PDF documents. The stats option display statistics of the objects found in the PDF document. You can see the parser in action in this screencast.
#Malicious pdf attributes code
The code of the parser is quick-and-dirty, I’m not recommending this as text book case for PDF parsers, but it gets the job done. This tool will parse a PDF document to identify the fundamental elements used in the analyzed file.
#Malicious pdf attributes how to
Here is a set of free YouTube videos showing how to use my tools: Malicious PDF Analysis Workshop.