P_W999

Because life has no try-catch

Source code swear count

2 Comments

After some discussions in our development team I was curious about how much we swear in our source code, especially considering that there is a complete log available on the Linux Swear Count.

I downloaded the script from Vidar and realized it was indeed rubbish ;) . I used a little excerpt from the script to create my script which would count swears in every file found in a directory. It uses basic Linux commands like awk, grep, …

#!/bin/bash
# P_W999 - 2013 - This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
# http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US
# This script comes as-is, with no guarantees or warranties and it is used at own risk.
# http://pw999.wordpress.com
# v0.1 - 2014-04-30
# v0.2 - 2014-05-01 : change 'find' so files with spaces in it work too /+ no more suffix in awk match part
# * Initial release
# Small script based on http://www.vidarholen.net/contents/wordcount/ (Vidar Holen) to calculate the number of swears (and some other words) that can be found in your source code.
# The script takes a single parameter; the folder where your source code is located. So for example ./swearcount.sh ./trunk will lookup all nasty words in the trunk folder and subfolder.
# The script will look for exact matches with spaces before and after the word (otherwise 'fix' will also match 'prefix' )

# Check input
SRCPATH="$1"
if [[ "$SRCPATH" == "" ]] ; then
	echo "You must pass the source code folder as first parameter"
	exit 1
fi

# Define paths to ignore
IGNORES='branch\|tag\|target\|generated\|test'
echo "Script will ignore files with following patterns $IGNORES" | sed 's/\\|/, /g'

# Magic
find $SRCPATH -type f -name '*.java' -o -name '*.jsp' -print0 | xargs -0 -n 1 | grep -v -e $IGNORES | xargs cat | awk '
        BEGIN {
            lines=0;
            w="fuck fucking shit love piss fire bastard crap crappy goto bullshit xxx todo fixme temporary bastard bug fix";
            print "Looking for words: " w
	    n=split(w,t," ");
            for(i=1; i<=n; i++) {
                c[t[i]]=0;
            }
        }
        {
            lines++;
            for(k in c) {
                a=0;
                a1=0;
                a2=0;
                f=tolower($0);
                do {
                    a=index(f," " k);
                    a1=index(f,"//" k);
                    a2=index(f,"," k);
                    if(a!=0 || a1!=0 || a2!=0) {
			c[k]++;
			# print the lines found
			gsub(/^[ \t]+/, "", $0);
			print k ": " $0
		     }
                    else break;
                    f=substr(f,a+length(k));
                } while(1);
            }
        }
	# Print stats
        END {
            print "lines " w;
            printf "%d ",lines;
            for(i=1; i<=n; i++) printf "%d ",c[t[i]];
            printf "\n";
        }'

About these ads

Author: Phillip

Belgian Java EE consultant, software developer, technology enthusiast and digital photographer.

2 thoughts on “Source code swear count

  1. Pingback: Github top 10 projects: swear count | P_W999

  2. In case you’re wondering, this is the swearcount of some of our client’s projects:
    Our swearcount

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 34 other followers