Home

Script to Find Referenced Files in HTML, ColdFusion, and Other Languages

The Problem: Missing References in ColdFusion Code

Some rip-off contractors dumped a pile of trash filled ColdFusion code on some totally naive (or crooked) project managers. Not a process ran without errors, including many 404 errors in the web server logs. The project manager told me to make it work. The code came in a big wad of multiple directory trees and many thousands of files, many types of file, no installation package, no build tools.

I decided to run a bunch of lint type searches on the ColdFusion files for each type of error, (like missing references,) to document the bugs. I found thousands of errors: missing file references, unused files, empty files, empty directories, as well as many coding errors that indicated the contractor never tested the code.

A project that uses common software engineering processes (specifications, testing to specifications, bug tracking, build tools, packaging code for release, regression testing and release testing) would not find much with a lint script, usually lint type programs give out a bunch of false positive errors, but on this bunch of code there were so many real errors I would have been happy to see a false positive.

Here is an example of a bash shell script to find missing referenced files from tags in HTML, it is easy to modify for other file reference tags, like "link" tags for css, "script" tags for javascript, "form" tags, "a" tags, ColdFusion "cfinclude" tags, etc. And I did do that and found hundreds of missing references for every type of tag I checked. It is a quick and dirty script, but it was useful for the situation.

Missing File Reference Script for Tag Based Languages

#######################
# Note: use the HTML source of this page to copy the script!!
#       The browser hides some of the script when it interprets the HTML!!

# find files that do a call to  image file
# Uses (dirname, basename) to get paths

# Root directory of code files.
LOCATION=docs

find_img () {

cd ${LOCATION}
pwd

CHECK_FILES=$(find . -name "*.htm*" -exec grep -ci '\/    {next;} 
     /\<[Ii][Mm][Gg] /, /\>/ { 
                              sub(/^.*[Ss][Rr][Cc]=/,"");
                              $0=$1;
                              gsub(/"/,"");
                              print;}' ${FILE} | 
                                 egrep -i 'gif|jpg|jpeg|png' | sort -u  )

     for REF_FILE in ${REFERENCES}
     do
         DIRECTORYROOT=$(dirname ${FILE})
         ls ${DIRECTORYROOT}/${REF_FILE} >/dev/null 2>&1
         [ $? != 0 ] && print "file:${FILE}     missing:${REF_FILE}";
     done
done  
}

find_img

# Here are some other type of file references that can use
# a modified form of this script:
#cfinclude template = "../cfdocs/dochome.htm"
#link rel="prefetch" href="/images/big.jpeg"
#link rel="StyleSheet" href="CSS/default.css" type="text/css" 
#form method="get" action="/some/form_script/form_stuff.php"
#script src="missing_javascript_file.js" type="text/javascript"
#a href="missing_link" ...