Github top 10 projects: swear count

After my yesterday’s post about the swear count script, I got a bit bored and decided to pull the top 10 projects on Github and get a swear count on those projects. Turns out these people are very patient 🙂 .

Github top 10 projects - Swear Count
Github top 10 projects – Swear Count

I slightly tweaked the swearcount script so it would match the words if they were prefixed with a space ‘ ‘, a double forward slash ‘//’ or a comma ‘,’.
It will not use a suffix anymore, because it would exclude nasty words at the end of a line or words followed by a dot ‘.’.

 

If you’re interested: this is the tweaked swearcount script:

#!/bin/bash
# P_W999 - 2013 - This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
# http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US
# This script comes as-is, with no guarantees or warranties and it is used at own risk.
# https://pw999.wordpress.com
# v0.1 - 2014-04-30
# * Initial release
# Small script based on http://www.vidarholen.net/contents/wordcount/ (Vidar Holen) to calculate the number of swears (and some other words) that can be found in your source code.
# The script takes a single parameter; the folder where your source code is located. So for example ./swearcount.sh ./trunk will lookup all nasty words in the trunk folder and subfolder.
# The script will look for exact matches with spaces before and after the word (otherwise 'fix' will also match 'prefix' )
 
# Check input
SRCPATH="$1"
if [[ "$SRCPATH" == "" ]] ; then
	    echo "You must pass the source code folder as first parameter"
	        exit 1
	fi
	# Define paths to ignore
	IGNORES='branch\|tag\|target\|generated'
#	echo "Script will ignore files with following patterns $IGNORES" | sed 's/\\|/, /g'
	 
	 
	# Magic
	find $SRCPATH -type f -print0 | xargs -0 -n 1 | grep -v -e $IGNORES | xargs cat | awk '
	        BEGIN {
			lines=0;
			w="fuck shit love piss fire bastard crap goto bullshit xxx todo fixme temporary bastard bug fix";			
			#    print "Looking for words: " w
			n=split(w,t," ");
			for(i=1; i<=n; i++) {
				c[t[i]]=0;
			}
		}
		{
			lines++;
			for(k in c) {
				a=0;
				a2=0;
				a3=0;
				a4=0;
				a5=0;
				f=tolower($0);
				do {
					a=index(f," " k);
					a2=index(f,"," k);
				        
				        a4=index(f, "//" k); 	
					
					if(a!=0 || a2!=0 || a3!=0 || a4!=0 || a5!=0) {
					c[k]++;
					# print the lines found
					gsub(/^[ \t]+/, "", $0);
					# print k ": " $0
				}
				else break;
				f=substr(f,a+length(k));
				} while(1);
			}
		}
		# Print stats
		END {
			print "lines " w;
			printf "%d ",lines;
			for(i=1; i<=n; i++) printf "%d ",c[t[i]];
			printf "\n";
		}'

And this is the script I wrote to get the results:

#!/bin/bash
# P_W999 - 2013 - This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
# http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US
# This script comes as-is, with no guarantees or warranties and it is used at own risk.
# https://pw999.wordpress.com
# v0.1 - 2014-05-01
# * Initial release


#Step 1: call Github API and find everything with 1 or more stars and sort result on number of stars. Then grep the 10 first 'clone_urls' found and remove unneeded chars using following sed file:
#
#s/"//g
#s/,//g
#s/clone_url//g
#s/://

curl -s "https://api.github.com/search/repositories?q=stars:%3E1&sort=stars" | grep -i "clone_url" -m 10 | sed -f seds > urls.out

# Step 2: clone the top 10 projects on disk
xargs -a urls.out -n 1 git clone
# Too lazy 🙂
mkdir output

#  List all folders (github repo's), filter out the current directory '.' then forward it to the swearcount script and redirect the output
find . -maxdepth 1 -type d | grep -e '[a-zA-Z0-9/]' | sed s/\\.\\///g | xargs -I{} sh -c "./swearcount.sh ./{} > output/{}.out"

# Generate output file
# Get header and append projectname
cd output
sed -n '1p' output.out | sed s/^/"project "/ > result.txt

# Get numbers and append name
find . -type f -not -name output.out -not -name result.txt | xargs -n 1 -I{} sh -c "echo -n '{} ' && sed -n '2p' {}" >> result.txt

Disclaimer: results come as they are, I do not guarantee that they are 100% correct. This is just something I did for the fun, no pun intented.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s