Because life has no try-catch

Source code swear count


After some discussions in our development team I was curious about how much we swear in our source code, especially considering that there is a complete log available on the Linux Swear Count.

I downloaded the script from Vidar and realized it was indeed rubbish ;) . I used a little excerpt from the script to create my script which would count swears in every file found in a directory. It uses basic Linux commands like awk, grep, …

# P_W999 - 2013 - This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
# http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US
# This script comes as-is, with no guarantees or warranties and it is used at own risk.
# https://pw999.wordpress.com
# v0.1 - 2014-04-30
# v0.2 - 2014-05-01 : change 'find' so files with spaces in it work too /+ no more suffix in awk match part
# * Initial release
# Small script based on http://www.vidarholen.net/contents/wordcount/ (Vidar Holen) to calculate the number of swears (and some other words) that can be found in your source code.
# The script takes a single parameter; the folder where your source code is located. So for example ./swearcount.sh ./trunk will lookup all nasty words in the trunk folder and subfolder.
# The script will look for exact matches with spaces before and after the word (otherwise 'fix' will also match 'prefix' )

# Check input
if [[ "$SRCPATH" == "" ]] ; then
	echo "You must pass the source code folder as first parameter"
	exit 1

# Define paths to ignore
echo "Script will ignore files with following patterns $IGNORES" | sed 's/\\|/, /g'

# Magic
find $SRCPATH -type f -name '*.java' -o -name '*.jsp' -print0 | xargs -0 -n 1 | grep -v -e $IGNORES | xargs cat | awk '
        BEGIN {
            w="fuck fucking shit love piss fire bastard crap crappy goto bullshit xxx todo fixme temporary bastard bug fix";
            print "Looking for words: " w
	    n=split(w,t," ");
            for(i=1; i<=n; i++) {
            for(k in c) {
                do {
                    a=index(f," " k);
                    a1=index(f,"//" k);
                    a2=index(f,"," k);
                    if(a!=0 || a1!=0 || a2!=0) {
			# print the lines found
			gsub(/^[ \t]+/, "", $0);
			print k ": " $0
                    else break;
                } while(1);
	# Print stats
        END {
            print "lines " w;
            printf "%d ",lines;
            for(i=1; i<=n; i++) printf "%d ",c[t[i]];
            printf "\n";

About these ads

Author: Phillip

Belgian Java EE developer, technology enthusiast and digital photographer urbex'er.

2 thoughts on “Source code swear count

  1. Pingback: Github top 10 projects: swear count | P_W999

  2. In case you’re wondering, this is the swearcount of some of our client’s projects:
    Our swearcount

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 36 other followers