Linux & Windows - NTFS differences and potential problems
Post date: Aug 14, 2016 9:11:16 AM
NTFS file names are quite different on POSIX systems vs on Windows. Some more or less interesting problems might arise from these. I've had experiences earlier, where git internals got broken due to file naming confusions and issues.
First create files: testFile, TestFile, testfile, Testfile - Same or different file? On Windows, it's the same file. But on Linux it isn't. From this single observation grow many kind of interesting cases and problems, which all lead to the same root cause.
With git some files are named differently on Linux and Windows versions. Windows version thinks those files are same file as on Linux those are different files. It's just something which will guaranteed that things will get screwed up at some point. This is similar fail to the failures related with git and NTFS alternate streams, which were reported affecting git users a month or so ago. On Windows, some data goes into NTFS alternate stream, but on Linux it's being held in own separate file with only one primary stream. As examples files test and test:alternate. Phew, luckily the problems I got were small, but in some cases those might cause major headache and security & other problems.
I created the files and the content of the file is same as the filename. Funny thing is that on windows when I write type testfile I'll get testFile, testFile, testFile, testFile. It's the content of the first file four times over. It seems that type command loops through filename mask, even if I'm not using any wildcards. Probably the loop technically goes through the name variations, but the actual open command always ends up opening the same file.
Yet this is hardly anything unknown. All the issues are well known and are 'features' not bugs. Wikipedia got nice post about this. POSIX and Windows do use different namespaces for NTFS files. "In POSIX namespace, any UTF-16 code unit (case-sensitive) except U+0000 (NUL) and / (slash). In Win32 namespace, any UTF-16 code unit (case-insensitive) except U+0000 (NUL) / (slash) \ (backslash) : (colon) * (asterisk) ? (Question mark) " (quote) < (less than) > (greater than) and | (pipe)"
Yet more fun stuff, when I use powershell instead of cmd. I'll get exception when listing files.
PS T:\tst> dir
dir : The given path's format is not supported.
At line:1 char:1
+ dir
+ ~~~
+ CategoryInfo : NotSpecified: (:) [Get-ChildItem], NotSupportedException
+ FullyQualifiedErrorId : System.NotSupportedException,Microsoft.PowerShell.Commands.GetChildItemCommand
This is because of the test:alternate filename, I've created on Linux in the tst path. cmd nicely lists the files.
T:\tst> dir
01.08.2016 00:00 18 test:alternate
02.08.2016 00:00 9 testFile
31.07.2016 00:00 9 TestFile
31.07.2016 00:00 9 testFile
31.07.2016 00:00 9 testfile
Next a few even more complex examples.
One great real world example is .git\logs\refs\remotes\origin\selectTest or SelectTest. (Windows path format) On Linux I had both, but with different content. But on windows, both files have same content which is the content of SelectTest file.
Windows 10 chkdsk did something quite expected. Actually this was just what I was expecting.
Deleted invalid filename test:alternate (51378) in directory 51301.
File 51378 has been orphaned since all its filenames were invalid
Windows will recover the file in the orphan recovery phase.
Correcting minor file name errors in file 51378.
Deleting index entry test:alternate in index $I30 of file 51378.
As stated, POSIX filenaming allows also filenames like \test\file and *test*file* and so on, including backslashes and stars. No surprises there, those just won't work too well on Windows. As well as filename like:
<haX> said: "Are we having fun yet?"
Yes, the previous like is the filename, including <>:"? characters. No problem (on Linux). Let's see what Windows likes about those filenames. I know, | is still missing. Just to be sure, let's create file with it. And just as expected, it worked flawlessly. So don't be surprised about file names like these. It's perfectly ok.
When using windows Cmd claims that directory is empty and Powershell and File Manager say that directory is corrupted and unreadable. Yeah, way to go. That's all folks in this case.
Using chkdsk on Windows, removed all files with "invalid file name". As additional test I also extracted 7zip file which contained 'invalid file names'. 7-zip seemed to be pretty smart about that and replaced non-allowed characters with _ underscores.
Ref: Wikipedia NTFS