This article is intended for Facutly/Staff who may have certain types of PI (personal information) located on a computer, running the Linux or Solaris operating systems.
Overview:
Find_SSNs is a piece of software written in python at VirginiaTech that searches a computers files for Social Security #'s and Credit Card numbers. It requires python version 2.4+ to run. By default Find_SSNs searches the following file types: doc, docx, xlsx, xls, rtf, zip, text files (e.g. html, xml, txt) and Open Office 2 documents. It can additionally search pdf files when the pdftotext binary is installed. (It's part of the poppler package.) We provide two versions of Find_SSNs: One that searchs pdfs and another version that doesn't search pdfs (in case you can't install the poppler package). Our instructions below will include the necessary steps to get the poppler package installed.
The Find_SSNs software webpage at Virginia Polytechnic Institute is located here: http://security.vt.edu/resources_and_information/find_ssns.html
The full Find_SSNs documentation at Virginia Polytechnic Institute is located here: http://security.vt.edu/Find_SSNs/find_ssns_referance_manual.html
Installation:
Linux:
Note: While these install steps should work on any modern Linux distro we've only verified that they work on RHEL5, RHEL6 and Ubuntu 11.04.
The requirements to run Find_SSNs are:
Note: RHEL and Ubuntu with a default install come with python installed.
On RHEL5 install it with this command:
yum install poppler-utils
On Ubuntu install it with this command:
apt-get install poppler-utils
Note: If for some reason you can't install poppler-utils to scan pdf files you can grab a copy of Find_SSNs with pdf searching turned off: http://www.hawaii.edu/its/docs/find_ssns_nopdf.tar
Solaris 10:
Requirements:
Scanning:
Scanning your filesystem(s) for files that contain SSN or CC #'s is the same across all Unix/Linux boxes.
Note: Find_SSNs uses a few innovative methods to reduce false positives, (If you're interested, check out their webpage http://security.vt.edu/resources_and_information/find_ssns.html), but it *will* still find some false positives when it scans your computer.
We've found that the best way to reduce the number of false positives is to only scan locations on the servers that could hold PII information. For example, /home, /fileshare, etc...
We've included the false positives that Find_SSNs finds on a full scan of a default install of RHEL5 and Solaris 10 in the Find_SSNs packages in the directory named "default_false_positives".
The steps that are required for Find_SSNs to successfully run:
Note: For the full documentation on Find_SSNs, please refer to the Find_SSNs official documentation, located here: http://security.vt.edu/Find_SSNs/find_ssns_referance_manual.html.
A basic scenario of using Find_SSNs are these:
To scan your whole computer for SSN's and CC #'s use this command:
python Find_SSNs.pyw -p / -o /root/find_ssns/ -t csv -a
After reviewing the two output files they should be securely deleted from the computer.