Lately, I had to convert the encoding of a multimodule maven project from our default Cp-1252 encoding to UTF-8. Changing the project settings is rather easy and there are multiple guides availble on the internet, so I won’t re-invent the hot water.
The most dificult task however was converting all our source files from Cp-1252 to UTF-8 and preferably on Windows 🙂 . I’ve been looking into applications that would auto-convert everything for me, but none of them actually converted to content, resulting in garbage files. I almost started converting all the files by hand using Notepad++ when I discovered this process could be automated !
First of all you’ll need to install the Python Script plugin using the Notepad++ Plugin Manager. Then, after installing and restarting, you have to create a new script with the following code:
import os; import sys; filePathSrc="C:\\Temp\\UTF8" for root, dirs, files in os.walk(filePathSrc): for fn in files: if fn[-4:] != '.jar' and fn[-5:] != '.ear' and fn[-4:] != '.gif' and fn[-4:] != '.jpg' and fn[-5:] != '.jpeg' and fn[-4:] != '.xls' and fn[-4:] != '.GIF' and fn[-4:] != '.JPG' and fn[-5:] != '.JPEG' and fn[-4:] != '.XLS' and fn[-4:] != '.PNG' and fn[-4:] != '.png' and fn[-4:] != '.cab' and fn[-4:] != '.CAB' and fn[-4:] != '.ico': notepad.open(root + "\\" + fn) console.write(root + "\\" + fn + "\r\n") notepad.runMenuCommand("Encoding", "Convert to UTF-8 without BOM") notepad.save() notepad.close()
I think the code speaks for itself, just be 100% sure the you do the conversion to UTF-8 without the UTF-8 byte order mark (BOM) since javac does not support this special character.
If you have problems running the script, then first open the console (Plugins > Python Script > Show Console). Chances are that the indents got messed up (for those who don’t know Python, it doesn’t use curly brackets to identify a code block, it uses correct indentation instead).