Tower Help & Support

Untracked files with unicode names

Using unicode filenames (e.g. like the German umlaut "Ü") in Git can cause problems - unless the correct configuration is used before working with the repository.

Certain characters can be represented in two different forms in Unicode, for example, Ü can be represented as the single character Ü (known as “precomposed form” or NFC) or as two characters U and ¨ (known as “decomposed form” or NFD). Both are valid representations of the same string.

You can read more about this topic on Wikipedia – Unicode equivalence popup:true.

Depending on the operating system and file system, unicode file names might get converted to either form. Mac OS X (with HFS+) decomposes file names before storing them (thus using NFD), whereas Linux and Windows usually use NFC.

The Git config setting core.precomposeunicode converts between NFD filenames on Mac OS X and NFC filenames in Git:

core.precomposeunicode
    This option is only used by Mac OS implementation of Git.
    When core.precomposeunicode=true, Git reverts the unicode decomposition
    of filenames done by Mac OS. This is useful when sharing a repository
    between Mac OS and Linux or Windows. (Git for Windows 1.7.10 or higher is
    needed, or Git under cygwin 1.7). When false, file names are handled fully
    transparent by Git, which is backward compatible with older versions of Git.

The default for this config setting on OS X is true and there should be no reason to override it. Note that the setting is also important when sharing repos between Macs.

The descripton does not go into great detail, but the setting affects various Git commands, most importantly git add – if core.precomposeunicode is false when a unicode file name is added to Git on OS X, Git registers the decomposed file name. This leads to the following problems:

  • Users on Mac OS X with core.precomposeunicode set to true will see the file as untracked in Git status
  • Users on Linux or Windows will see the file as untracked in Git status (as core.precomposeunicode is not used on those platforms)

This can easily be reproduced with the following test repository on Mac OS X:

  git init .

  touch decomposed-filename-with-ü
  git -c core.precomposeunicode=false add decomposed-*
  git commit -m "Add file with decomposed filename on Mac OS X (core.precomposeunicode=false)"

  touch precomposed-filename-with-ü
  git -c core.precomposeunicode=true add precomposed-*
  git commit -m "Add file with precomposed filename on Mac OS X (core.precomposeunicode=true)"

  git -c core.precomposeunicode=false status --porcelain
  => ?? precomposed-filename-with-ü

  git -c core.precomposeunicode=true status --porcelain
  => ?? decomposed-filename-with-ü

Once a file name has been added in decomposed form to a Git repository, the only way of solving the problem is to remove these files from Git and re-add them with core.precomposeunicode set to true on Mac OS X or perform this action on Linux or Windows.

To recap, if you have problems with unicode file names showing up as untracked:

  1. Make sure core.precomposeunicode is globally set to true on OS X

     $ git config --global core.precomposeunicode
     => true
  2. All files still shown as untracked need to be removed from and re-added to Git.