Installing and Using Find_SSNs on Linux and Solaris

This article is intended for Facutly/Staff who may have certain types of PI (personal information) located on a computer, running the Linux or Solaris operating systems.

Overview:

Find_SSNs is a piece of software written in python at VirginiaTech that searches a computers files for Social Security #'s and Credit Card numbers. It requires python version 2.4+ to run. By default Find_SSNs searches the following file types: doc, docx, xlsx, xls, rtf, zip, text files (e.g. html, xml, txt) and Open Office 2 documents. It can additionally search pdf files when the pdftotext binary is installed. (It's part of the poppler package.) We provide two versions of Find_SSNs: One that searchs pdfs and another version that doesn't search pdfs (in case you can't install the poppler package). Our instructions below will include the necessary steps to get the poppler package installed.

The Find_SSNs software webpage at Virginia Polytechnic Institute is located here: https://security.vt.edu/software/Find_SSNs.html
The full Find_SSNs documentation at Virginia Polytechnic Institute is located here: https://security.vt.edu/software/Find_SSNs/find_ssns_referance_manual.html
Find SSN download: http://www.hawaii.edu/its/docs/find_ssns.tar
Find SSN without pdf download: http://www.hawaii.edu/its/docs/find_ssns_nopdf.tar

Requirements

Linux:

  • Python v2.4+ Note: RHEL and Ubuntu with a default install come with python installed.
  • pdftotext binary, which is part of the poppler-utils package on both RHEL and Ubuntu.

Solaris 10:

  • Python v2.4+
  • pdftotext binary which is part of the poppler package*. (if you need to search pdfs)

*The poppler package is not a part of the default install on Solaris so needs to be installed from a third-party package (and it's dependencies). To ease the process of installing poppler and it's deps. we've created a tar download which bundles the poppler package and it's dependencies together with a install script which will automate the process of installing the packages.
The script verifies that none of the packages it installs are already installed on the system and it installs the packages in the /usr/local directory structure.
Download the Solaris poppler install bundle here: http://www.hawaii.edu/its/docs/poppler_install.tar.gz
Extract it and run the 'install_poppler.sh' which will install poppler and it's dependencies if any are not already installed.

Installation:

Linux:
Note: While these install steps should work on any modern Linux distro we've only verified that they work on RHEL5, RHEL6 and Ubuntu 11.04.

  1. Install Python, if not already installed
  2. Install poppler-utils package
    1. On RHEL5 install it with this command: yum install poppler-utils
    2. On Ubuntu install it with this command: apt-get install poppler-utils
  3. Grab a copy of Find_SSNs here: http://www.hawaii.edu/its/docs/find_ssns.tar
  4. Extract the Find_SSNs to the root users home directory or somewhere else where only root has access to it.

Note: If for some reason you can't install poppler-utils to scan pdf files you can grab a copy of Find_SSNs with pdf searching turned off: http://www.hawaii.edu/its/docs/find_ssns_nopdf.tar

 

Solaris 10:

  1. Install the poppler-utils package. By default it is not installed on Solaris, and will need to be installed from a third-party package (and it's dependencies).
  2. Download the Find_SSNs package here: http://www.hawaii.edu/its/docs/find_ssns.tar
  3. Extract the Find_SSNs package.

Note: If you cannot install poppler and it's dependencies to search pdfs you can download a copy of Find_SSNs with pdf support turned off: http://www.hawaii.edu/its/docs/find_ssns_nopdf.tar

 

Before Scanning:

  • Ensure poppler-utils is installed
  • Ensure you are connected to the internet, so the program can download the latest SSN patterns
  • Ensure Python interpreter in your $PATH environmental variable.
  • pdftotext binary in your $PATH environmental variable. (unless you are using the version of Find_SSNs with pdf support disabled)

Reducing False Positives:

Find_SSNs uses a few innovative methods to reduce false positives, but it *will* still find some false positives when it scans your computer.

  • We've found that the best way to reduce the number of false positives is to only scan locations on the servers that could hold PII information. For example, /home, /fileshare, etc...
  • We've included the false positives that Find_SSNs finds on a full scan of a default install of RHEL5 and Solaris 10 in the Find_SSNs packages in the directory named "default_false_positives".

Scanning:

For the full documentation on Find_SSNs, please refer to the Find_SSNs official documentation, located here: http://security.vt.edu/software/Find_SSNs/find_ssns_referance_manual.html.
Scanning your filesystem(s) for files that contain SSN or CC #'s is the same across all Unix/Linux boxes.

To scan your whole computer for SSN's and CC #'s use this command:
python Find_SSNs.pyw -p / -o /root/find_ssns/ -t csv -a

    '-p' indicates the starting path.
    '-o' indicates the directory to output the scan results.
    '-t' tells Find_SSNs that you want your results in a csv file.
    '-a' tells Find_SSNs to search for both SSN #'s and CC #'s.

Note: If you're receiving this error: "Error - Cannot load SSA areas to groups information. Are you connected to the Internet?"
Replace the URL in numbers.py line 89 to http://www.hawaii.edu/infosec/assets/find_ssn/areas_groups.txt

 

Output Files:

  • A csv file which lists the files that have suspicious numbers
  • A txt file which lists the filenames and the actual suspicious numbers.

After reviewing the two output files they should be securely deleted from the computer.

Please rate the quality of this answer: Poor Fair Okay Good Excellent
Not the answer you were looking for? Try different keyword combinations and if you still can’t find your answer, please contact us.
Article ID: 1323
Created: Sun, 11 Sep 2011 4:14pm
Modified: Tue, 15 Sep 2020 1:40pm