Installing and Using Find_SSNs on Linux and Solaris

This article is intended for Facutly/Staff who may have certain types of PI (personal information) located on a computer, running the Linux or Solaris operating systems.

Overview:



Find_SSNs is a piece of software written in python at VirginiaTech that searches a computers files for Social Security #'s and Credit Card numbers. It requires python version 2.4+ to run. By default Find_SSNs searches the following file types: doc, docx, xlsx, xls, rtf, zip, text files (e.g. html, xml, txt) and Open Office 2 documents. It can additionally search pdf files when the pdftotext binary is installed. (It's part of the poppler package.) We provide two versions of Find_SSNs: One that searchs pdfs and another version that doesn't search pdfs (in case you can't install the poppler package). Our instructions below will include the necessary steps to get the poppler package installed.

The Find_SSNs software webpage at Virginia Polytechnic Institute is located here: http://security.vt.edu/resources_and_information/find_ssns.html

The full Find_SSNs documentation at Virginia Polytechnic Institute is located here: http://security.vt.edu/Find_SSNs/find_ssns_referance_manual.html

 
Installation:



Linux:

Note: While these install steps should work on any modern Linux distro we've only verified that they work on RHEL5, RHEL6 and Ubuntu 11.04.


The requirements to run Find_SSNs are:

  • python v2.4+
  • pdftotext binary, which is part of the poppler-utils package on both RHEL and Ubuntu.

Note: RHEL and Ubuntu with a default install come with python installed.

  • Grab a copy of Find_SSNs here: http://www.hawaii.edu/its/docs/find_ssns.tar
  • Extract the Find_SSNs to the root users home directory or somewhere else where only root has access to it.
  • Before you run Find_SSNs you need to have the poppler-utils package installed.


On RHEL5 install it with this command:

yum install poppler-utils

On Ubuntu install it with this command:

apt-get install poppler-utils

Note: If for some reason you can't install poppler-utils to scan pdf files you can grab a copy of Find_SSNs with pdf searching turned off: http://www.hawaii.edu/its/docs/find_ssns_nopdf.tar

Solaris 10:

Requirements:

  • Python 2.4+ (part of a standard solaris 10 install)
  • pdftotext binary which is part of the poppler package. (if you need to search pdfs)

    The poppler package is not a part of the default install on Solaris so needs to be installed from a third-party package (and it's dependencies). To ease the process of installing poppler and it's deps. we've created a tar download which bundles the poppler package and it's dependencies together with a install script which will automate the process of installing the packages. The script verifies that none of the packages it installs are already installed on the system and it installs the packages in the /usr/local directory structure.


Scanning:



Scanning your filesystem(s) for files that contain SSN or CC #'s is the same across all Unix/Linux boxes.

Note: Find_SSNs uses a few innovative methods to reduce false positives, (If you're interested, check out their webpage http://security.vt.edu/resources_and_information/find_ssns.html), but it *will* still find some false positives when it scans your computer.
         We've found that the best way to reduce the number of false positives is to only scan locations on the servers that could hold PII information. For example, /home, /fileshare, etc...
         We've included the false positives that Find_SSNs finds on a full scan of a default install of RHEL5 and Solaris 10 in the Find_SSNs packages in the directory named "default_false_positives".

The steps that are required for Find_SSNs to successfully run:

  • An active Internet connection. Find_SSN's uses an internet connection at program startup to contact a Virginia Tech webserver to pull down the latest SSN patterns. This feature greatly reduces the number of false positives since Find_SSNs only flags SSN number patterns that actually have numbers that could possibly make a real SSN.
  • Python interpreter in your $PATH environmental variable.
  • pdftotext binary in your $PATH environmental variable. (unless you are using the version of Find_SSNs with pdf support disabled)


Note: For the full documentation on Find_SSNs, please refer to the Find_SSNs official documentation, located here: http://security.vt.edu/Find_SSNs/find_ssns_referance_manual.html.

A basic scenario of using Find_SSNs are these:

To scan your whole computer for SSN's and CC #'s use this command:

python Find_SSNs.pyw -p / -o /root/find_ssns/ -t csv -a

  • '-p' indicates the starting path.
  • '-o' indicates the directory to output the scan results.
  • '-t' tells Find_SSNs that you want your results in a csv file.
  • '-a' tells Find_SSNs to search for both SSN #'s and CC #'s.


    Find_SSNs outputs two files:
  • A csv file which lists the files that have suspicious numbers
  • A txt file which lists the filenames and the actual suspicious numbers.

After reviewing the two output files they should be securely deleted from the computer.

Please rate the quality of this answer: PoorFairOkayGoodExcellent
Not the answer you were looking for? Try different keyword combinations and if you still can’t find your answer, please contact us.
Article ID: 1323
Created: Sun, 11 Sep 2011 4:14pm
Modified: Fri, 26 Oct 2012 10:45am