Xah Lee, 2009-06-03, 2009-09-04, 2009-10-31
This document contains technical info and tools for transferring files from Mac to Windows. For example, dealing with Mac Resource Fork, and file names characters not supported in Windows.
This exposition is technically oriented, and may be useful to sys admins or programers, who needs to move files between Mac/Unix/Windows and preserve data as much as possible.
In the 1990s, before the days of OS X, Mac OS files heavily relies on Resource fork. With OS X, it is decided in the early 2000s that resource fork is to be deprecated.
Vast majority of Mac apps today do not create files with resource fork. However, Mac applications (those in “/Applications/” folder, may still rely on resource fork to function.
Warning: You cannot simply delete the resource fork of a file and expect the file to function, because for some files, such as “unflattened” QuickTime movie files, the main data is in resource fork.
For Perl Scripts that tells you which files in a dir have resource fork, and other scripts for preparing transfer of Mac files to Windows, See: Perl Scripts For Mac/Windows File Moving.
Mac files have a 4-letter Type code, to indicate file type and creator type. (Its purpose is similar to Filename extension and Internet media type (aka MIME type)) This type code is not in the resource fork. It is a feature of the HFS+ file system.
File type/creator code are largely deprecated in preference to file name extensions. However, they are still created by Mac OS X apps today. It is pretty safe to delete the type/creator code, if the file has file name extensions.
When you move your files to Windows, the type/creator code is automatically gone.
Another Mac specific file are those files named as “Icon^M”, where the “^M” is the Return character (ASCII 13). These are folder icon files. I cannot find info on the web about them. I don't know if they are Apple Icon Image format.
You can still find these “Icon^M” file names in OS X. For example, you'll find it in “/Applications/Adobe Reader 8/”, as well in StuffIt 10, Mac Pov-Ray 3.6, Adium (v 1.3.7), and if you use Jamie Zawinski's XScreenSaver for OS X, you'll find many “Icon^M” files in your “~/Library/” dir. (I think the “Icon^M”, at least the filename itself, is deprecated.)
For Perl Scripts, See: Perl Scripts For Mac/Windows File Moving.
Mac OS X uses HFS+ file system, Windows use NTFS. Both encode file names using UTF-16, although the encoding scheme is a bit different. Both also allow a max of 255 unicode chars in file name. NTFS does not allow some characters, such as the following: / \ : * ? " < > |. In Mac OS X, you cannot use “:”. In practice, this means, when you have files with a lot less-often-used chars, you'll have problems transfer it from Mac to Windows. Depending on what tool you use to transfer the file, the tool may stop dead, or change the file name in different ways.
You may think that these weird chars doesn't happen in practice. Actually, they do.
For examples, Adium chat client, save its chat log's filenames like this:
233598025 (2004|10|29).html
Note that the “|” char is not allowed on Windows.
The chars ? / " * may also happen often, such as arbitrary webpages you pulled from online over the years, some math files names may use the asterisk “*” and the slash “/” as part of the math formula in the file name, your mom may saved files with question mark and slashes in them, etc.
You may think a few file name screwup is ok. True. However, some critical places matters. For example, saved html files have local links that relies on correct file names. And, for programing systems to work (database, language libraries, code repositories, etc), correct file names is critical.
For Perl Scripts, See: Perl Scripts For Mac/Windows File Moving.
If you have Chinese char in file names, the file names may be fucked up when moving the file between Mac OS X and Windows, because that particular application or media or file transferring protocol may not understand non-ascii chars.
• Windows Vista (64 bits, SP2) zip utility does not handle Chinese. (right click, send to, Compressed (zipped) Folder) If your folder or file name has Chinese chars, Windows will complain and refuse to compress.
• Windows Console does not support unicode. It prints Chinese chars as gibberish. This applies to any app using Windows Console, such as cmd.exe, PowerShell, Cygwin bash. (Windows Console may support unicode, if so, involves some non-trivial setup. http://blogs.msdn.com/michkap/archive/2005/06/29/433669.aspx)
• Unison file sync tool does not handle Chinese names. (unison version 2.27.57) (Detail at The Complexity And Tedium of Software Engineering.)
• GNU Emacs's dired do not handle file with Chinese names. It shows up gibberish. (GNU Emacs 23.1.1 (i386-mingw-nt6.0.6002) of 2009-07-29 on SOFT-MJASON)
• Not sure if rsync supports Chinese fully. I think when using rsync on OS X to copy files from Mac to Windows (rsync version 2.6.9 protocol version 29), it works fine, but when using rsync on Windows (thru cygwin. rsync version 3.0.4 protocol version 30), to copy files from Windows to Mac, it has problems. Here's example of its error message:
building file list ... file has vanished: "/cygdrive/c/Users/xah/Documents/kacma pixra/prenu/200403_tony_relative/????.JPG" file has vanished: "/cygdrive/c/Users/xah/Documents/kacma pixra/prenu/200403_tony_relative/???.JPG" file has vanished: "/cygdrive/c/Users/xah/Documents/kacma pixra/prenu/200403_tony_relative/????.JPG"
Here's sample files names that create such error:
OS X 10.5's Terminal app supports Chinese fully. (OS X 10.4.x doesn't though. Detail: OS X 10.4.11's Terminal app, can display Chinese chars encoded with utf-8, e.g. “cat ‹text file›” where the text file is utf-8 encoded. However, if a file name has Chinese, it does not show up correctly when doing “ls”. (because file names in OS X are encoding with utf-16, because it is HFS+.) The Terminal app has a option under “Terminal‣Window Settings...‣Display‣Character Set Encoding”. However, the menu doesn't have utf-16 as a choice.)
Mac OS X 10.5's zip tool supports Chinese. (untested by me)
OS X 10.4.11, when connecting to Windows shares, supports Chinese.
Windows Vista (sp2), when connecting to Mac shares, supports Chinese.
Today, both Windows and Mac allow filenames to be a max of 255 chars. However, i'm not sure what's the max length for dir path. At least, i know that as of 2000, the unix gnu tar util will fuckup if you have path that's longer than about 120 chars. (for detail, see bottom of: Unix, RFC, and Line Truncation.)
Mac creates .DS_Store file in each folder. You'd want to remove them if you are copying them to Windows. You can run the following command in bash to remove them:
find . -name ".DS_Store" find . -name ".DS_Store" -exec rm {} \;
Note, Windows creates Thumbs.db. Windows Vista no longer produces that, but when accessing non-Windows networked files, it does create that in the dir.
For Perl Scripts, See: Perl Scripts For Mac/Windows File Moving.
There are many file transfer method and tools. You can copy it thru usb flash drive, or you can use the built-in file sharing on Mac OS X's Finder or Windows's Explorer, or you can use unix tools such as rsync, unison. Also, you might want to compress it first by zip or tar gz.
The issue of preserving filenames with non-ascii chars, unicode chars, or filename length, depends on the transfer method and tool.
When shared thru Windows file sharing , it is done thru SMB/CIFS protocol. Am not sure how that protocol handles file name transfer, but i do know, that as late as 2006, the open source Samba software as part of Mac OS X for sharing Windows files, will fuckup Chinese chars in filenames.
The Unison file syncing utility also doesn't understand Unicode as of 2009-06. See: The Complexity And Tedium of Software Engineering.
Another method to transfer file is to zip it first, then pass it thru network. ZIP itself has problems with non-ascii characters. (i.e. as far as i know there are few variants of zip, some don't handle unicode well. As late as OS X 10.4.x, when you have a downloaded zip file containing chinese names, and you unzip thru Finder's BOMArchiveHelper, the Chinese names will become gibberish. The solution is to use The Unarchiver, which is used in OS X 10.5.x to replace BOMArchiveHelper.)
Another way is using tar or tar gz instead of zip. (similarly, you can use any file compression scheme) Tar had problems with file length back in 2000. Am not sure how it deals with unusual chars, unicode chars, or file lengths today.
Another common way today is to put on the USB flash drive first. This method will depends on the file system used on the USB drive. Most usb drives are pre-formatted with FAT32 system. I think in practice there are variations and parameter differences that effects file transfer by it. Also see: Long filename.
Whatever is your method or tool for for file transfer, if you want to preserve really long file names or paths, or unicode chars, Chinese chars, your should test it out first. Also, the result also may be different depending on you are moving from Mac to Windows or Windows to Mac.
Another issue is whether file date, owner/group, permissions, etc, are preserved. This is important when you are dealing with software data. This again depends on the tool.
See also:
Misc notes:
Mac version of AOL Instant Messenger saves its chat log's file names like this:
xahlee’s Logs IM “Gesutus”-2003-12.html node60091’ Logs IM “rogerhoward@mac…”-2004.02.html
rsync in cygwin used from Windows Vista, when used to sync files from Windows to Mac, and if the file name on the Windows contains single curly quotes or double curly quotes, rsync will chock.
It'll also chock if the file name contains ellipses …. Here's a sample error:
xah@xah-PC ~ $ Documents/scripts/sync_pc_mac.sh Password: building file list ... done rsync: recv_generator: failed to stat "/Users/xah/Documents/tavla_vreji/AIM logs 2005/node60091's Logs/IM [rogerhoward@mac.]-2004.02.html": Invalid argument (22 )
xah@xah-PC ~
$ rsync --version
rsync version 3.0.4 protocol version 30
Copyright (C) 1996-2008 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 32-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, no IPv6, batchfiles, inplace,
append, ACLs, no xattrs, iconv, symtimes
rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. See the GNU
General Public Licence for details.