Tower Help & Support

Character encoding for commit messages

Creating Commits

When Git creates and stores a commit, the commit message entered by the user is stored as binary data and there is no conversion between encodings. The encoding of your commit message is determined by the client you are using to compose the commit message.

However, Git stores the name of the commit encoding if the config key "i18n.commitEncoding" is set (and if it's not the default value "utf-8"). You can print its current value with the following command:

$ git config i18n.commitEncoding

If it shows no output, it defaults to "utf-8".

If you commit changes from the command line, this value must match the encoding set in your shell environment. Otherwise, a wrong encoding is stored with the commit and can result in garbled output when viewing the commit history.

Tower uses and enforces UTF-8 as encoding for commits (regardless of what is set for "i18n.commitEncoding") to ensure a valid commit encoding.

On the command line, you can verify your encoding with the following command:

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

This prints your current character encoding settings. Additionally, when using Terminal.app on OS X, you should make sure that your preferred encoding is correctly set in the settings as well.
You can set your preferred shell encoding with the following lines in your shell profile:

export LANG="en_US.UTF-8"
export LC_ALL="en_US.UTF-8"

Note: You should rather adjust your shell environment to UTF-8 than your Git config - because UTF-8 is the recommended encoding.

Viewing Commit History and Encodings

If you view the commit log on the command line, the config value "i18n.logOutputEncoding" (which defaults to "i18n.commitEncoding") needs to match your shell encoding as well. The command converts messages from the commit encoding to the output encoding. If your shell encoding does not match the output encoding, you will again receive garbled output!

However, if the commit message is stored with the wrong encoding and viewed with the wrong encoding, the commit message will display correctly. While this may look fine on your system, as soon as you share the commits with someone else, she will receive garbled output.

Inspecting Commit Encodings

Once a commit has a wrong encoding, there is no reliable way to detect and fix the encoding when the commit is displayed by clients. If possible, try to recreate the commit with the correct encoding by rebasing it.

If you want to examine a commit and its stored encoding, you can use the following command to inspect it:

$ git log -1 --pretty='format:%h: "%B" (Encoding: "%e")' SHA

You can also override the config value for "i18n.logOutputEncoding" when invoking the command to convert the encoding to the given output:

$ git -c i18n.logOutputEncoding=UTF-8 log -1 --pretty='format:%h: "%B" (Encoding: "%e")' SHA