Mass convert a project to UTF-8 using Notepad++

Lately, I had to convert the encoding of a multimodule maven project from our default Cp-1252 encoding to UTF-8. Changing the project settings is rather easy and there are multiple guides availble on the internet, so I won’t re-invent the hot water.

The most dificult task however was converting all our source files from Cp-1252 to UTF-8 and preferably on Windows 🙂 . I’ve been looking into applications that would auto-convert everything for me, but none of them actually converted to content, resulting in garbage files. I almost started converting all the files by hand using Notepad++ when I discovered this process could be automated !

First of all you’ll need to install the Python Script plugin using the Notepad++ Plugin Manager. Then, after installing and restarting, you have to create a new script with the following code:

import os;
import sys;
filePathSrc="C:\\Temp\\UTF8"
for root, dirs, files in os.walk(filePathSrc):
	for fn in files:
	  if fn[-4:] != '.jar' and fn[-5:] != '.ear' and fn[-4:] != '.gif' and fn[-4:] != '.jpg' and fn[-5:] != '.jpeg' and fn[-4:] != '.xls' and fn[-4:] != '.GIF' and fn[-4:] != '.JPG' and fn[-5:] != '.JPEG' and fn[-4:] != '.XLS' and fn[-4:] != '.PNG' and fn[-4:] != '.png' and fn[-4:] != '.cab' and fn[-4:] != '.CAB' and fn[-4:] != '.ico':
		notepad.open(root + "\\" + fn)
		console.write(root + "\\" + fn + "\r\n")
		notepad.runMenuCommand("Encoding", "Convert to UTF-8 without BOM")
		notepad.save()
		notepad.close()

I think the code speaks for itself, just be 100% sure the you do the conversion to UTF-8 without the UTF-8 byte order mark (BOM) since javac does not support this special character.

If you have problems running the script, then first open the console (Plugins > Python Script > Show Console). Chances are that the indents got messed up (for those who don’t know Python, it doesn’t use curly brackets to identify a code block, it uses correct indentation instead).

Advertisements

54 thoughts on “Mass convert a project to UTF-8 using Notepad++

  1. In case someone runs into this problem as well. I had to move the folder out from upper level folder that had umlauts in its’ name to get it to work.

  2. I would not recommend do it this way:
    if fn[-4:] != ‘.jar’ and fn[-5:] != ‘.ear’ […]
    The script could modify any files which is not in the != list
    In my case, it found my .git directory and messed up my local repository.
    You should rather type in what kind of file extensions you want to modify.
    In my case
    if fn[-4:] == ‘.php’

  3. Great post.
    The strange thing is, that if you use a “localized version” of notepad++ you should adapt the runMenuCommand.

    German Example: notepad.runMenuCommand(“Encoding”, “Konvertiere zu UTF-8”)

    Even if “Encoding” is still english, it only worked for me after using the german Menu.

    1. Looks more like an encoding issue in Notepad++ or the Python plugin. I’d say, try to contact those developer as I can not help you with this issue.

  4. Hi, htanks for the wisdom

    I get this error:

    File “C:\Users\user\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\convertutf8.py”, line 5
    for fn in files:
    ^
    IndentationError: expected an indented block

    1. In Python, indentations have the same meaning as curly brackets in many other languages. Make sure the indentation is exactly the same as in this post (and make sure it are tabs and not spaces)

      1. Ok, tried everything (with / without tabs on every lines) but no changes, strange thing.

        Is there any way you can embed the code as it stays the same from copy to paste?

        Bless

  5. When I click to my script in order to run it, nothing happen 😦 I have anything in the console and nothing is converted. It’s like nothing occured 😦

      1. Oh actually, I solved my problem.

        I didn’t use the New Script Option :/ Using this New Script Option, everything works perfectly ! And as laczkour said, I prefer to use == ‘.java’ than something, but this script does really well the job, Thank you !

  6. I adapted the example (paths and file types modified) and tried running it after installing Python Script.
    All I got was a runtime crash in Notepad++ .
    It seems that the plugin is immature, and the documentation is lousy.

  7. Awesome script, thank you very much for sharing it! I would like to add some comments with my experiences:
    – Python script plugin did not work me if I installed through the Plugin Manager. (I use Np++ 6.6.6 on Win 8.1 x64). I downloaded the MSI installer from the plugin’s developer site http://npppythonscript.sourceforge.net/download.shtml and it worked well
    – First it seemed to run OK, but when I checked my files I realized that nothing happened. I made some tests and found out that the script did not recognize the command. I tried the english and local (Hungarian) version of the command as well, neither worked. Finally I changed the language of Np++ to english and I worked well with the english command. The Hungarian has special characters, probably it caused the problem.

  8. henri
    using notepad++ 6.5.5 and npppythonscript 1.0.8.0 full 7zip
    it worked,
    but with 1.0.6.0 from plugin manager it throw a runtime error

  9. I’m new to Python, I ran the same thing, but i don’t see the encoding changed to UTF -8.

    Plugins–>Python Script–> Scripts–>myfile, when i did this nothing is happening.

    I see nothing on the console.

    1. Update on this.. Code is running, but it says.. “NameError: name ‘notepad’ is not defined”

      I think i need import some Notepad++ dll i guess. Could you help me with this?

      1. I’ve updated my NPP to v6.8.1 and the Python script plugin is at version 1.0.6.0 and it’s working perfectly fine from within NPP. I have no idea why it’s not working for you, maybe try reinstalling NPP and the Python plugin ?

  10. I have, in C: Temp, folder and subfolder more
    In each folder and subfolders are more * .html files that have encoding “windows-1250” or “UTF-8 with BOM”
    I need a script to Notepad ++, which will in a single action do convert encode to UTF-8
    I have Notepad ++ 6.8.1 and I installed Python script plugin

    Can you help me

  11. Saved me tons of time, thanks a lot!

    If you run into exception while trying to show the Python console or running the script you have to install the FULL newest version of the Python plugin!

  12. Hi Philip,
    Thanks man for the powerfull script you shared with us 🙂
    This is my tuned version which worked for me on windows 7 pro 64bit, I share it with the community:

    N.B: I list files to be converted rather then exclude those which not should be converted as laczkour suggested, because, it is the safest way.

    import os;
    import sys;
    filePathSrc=”E:\\vhosts\\ticket_support\\191\\bataille”
    for root, dirs, files in os.walk(filePathSrc):
    for fn in files:
    if fn[-4:] == ‘.php’ or fn[-6:] == ‘.phtml’ or fn[-4:] == ‘.htm’ or fn[-5:] == ‘.html’ or fn[-3:] == ‘.js’ or fn[-4:] == ‘.css’ or fn[-4:] == ‘.txt’:
    Notepad.open(root + “\\” + fn)
    console.write(root + “\\” + fn + “\r\n”)
    Notepad.runMenuCommor(“Encoding”, “Convert to UTF-8 without BOM”)
    Notepad.messageBox(‘Project successfully converted to UTF-8 boss’, ‘Converting current project to UTF-8’, 0)
    Notepad.save()
    Notepad.close()

    Thanks one more time 🙂

    1. I have no idea, but you can always test it by hand in Notepad++ by opening the file and converting it to UTF-8 and check the result 😉

  13. Firstly, Thank you TamasToth (3:18 PM on 12/16/2014) for providing this reference “Python script plugin did not work me if I installed through the Plugin Manager. (I use Np++ 6.6.6 on Win 8.1 x64). I downloaded the MSI installer from the plugin’s developer site http://npppythonscript.sourceforge.net/download.shtml and it worked well”. I had the same problem. After installing from your provided link, my Python Script worked greatly.

    Secondly, for the author of this article. Using
    Notepad.runMenuCommor(“Encoding”, “Convert to UTF-8 without BOM”)
    didn’t work. I used
    Notepad.runMenuCommor(“Encoding”, “Convert to UTF-8”)

    Lastly, I have some references for people interested.
    http://stackoverflow.com/questions/7256049/notepad-converting-ansi-encoded-file-to-utf-8
    http://www.joelonsoftware.com/articles/Unicode.html

    All in all, I find this WordPress very useful. Thank you Phillip for writing this article.

    1. Thank you very much ! It may be possible that notepad changed the name of the menu item causing it to stop work. However, for Java projects you may not use UTF-8 with the byte order mark because it will fail at compilation time (at least on Java 6).

  14. Its my first time using python, so forgive me if the question is sily, but I have error:
    “Traceback (most recent call last):
    File “f:\! rapid\utf.py”, line 7, in
    notepad.open(root + “\\” + fn)
    NameError: name ‘notepad’ is not defined”
    I run it with idle.py as showed here: https://www.youtube.com/watch?v=sJipYE1JT38

  15. Thank you so much. But in notepad++ 6.9.1 and Pythonscript 1.0.8.0 we have to put “Convert to UTF-8” only. it work for me for converting files in utf-8 without bom.

    Thanks again.

  16. Thank you for this very useful script. It helps a lot.
    However, I met a problem : when a .txt filename contains accent (for example a e acute), the conversion failed.
    Does someone already got this situation ?
    Is it possible to convert the content of the txt file to UTF-8 but to leave the filename intact ?
    Ps : Im running under Windows 10, 64bit.

    Thanks in advance

  17. When I want to run it, and click on the “scripts” nothing appears. How can I fix this problem?
    Thank you

  18. I am running Notepad v6.9.2.
    I have the already mentioned problem, that by clicking
    Plugins->Python Script->Scripts->Convert.py
    Absolutely nothing happens at all. I have included a “console.write(‘Test’) at the beginning, but it actually does not show up on execution.
    On an additional note, I have Python 2.7 installed on the machine.
    Any Help would be welcome.

  19. Hi, thanks a lot. Just want to add that this doesn’t work with the latest “Python Script” plugin for Notepad++. You have to use Python Script version 1.0.8.

  20. Hello,

    I’m looking for a way to limit the depth of os.walk method. I want to apply the conversion to HTML files located in the first level only, excluding those in subdirectories.

    Can someone help on this? Thanks.

    import os;
    import sys;
    Path=”C:\\python-test”
    for root, dirs, files in os.walk(Path):
    for filename in files:
    if filename[-5:] == ‘.html’:
    notepad.open(root + “\\” + filename)
    console.write(root + “\\” + filename + “\r\n”)
    notepad.runMenuCommand(“Encodage”, “Convertir en UTF-8”)
    notepad.save()
    notepad.close()

  21. OK, I found 2 solutions for my problem, both need to be added below os.walk(Path)


    for root, dirs, files in os.walk(Path):
    if root == Path:

    OR

    for root, dirs, files in os.walk(Path):
    nested_levels = root.split(‘/’)
    if len(nested_levels) > 0:
    del dirs[:]

  22. Sorry, I did a mistake inside the second solution, it should be :


    nested_levels = root.split(“\\”)
    if len(nested_levels) == 2:

  23. Thank you for sharing, awesome job.
    Two notes for my usage that others may find handy:

    I had to install Python plugin through msi installer as well.
    Changed Convert to UTF-8 without BOM to Convert to UTF-8 as it didn´t work.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s