Xah Lee, 2009-06-03, 2009-09-04, 2009-10-31
This article discuss issues about moving files between Mac and Windows. For example, dealing with Mac's Resource Fork, and file names characters not supported in Windows. It also provides tips, tech info, tools, that helps you do it.
In the 1990s, before the days of OS X, Mac OS files heavily relies on Resource fork. With OS X, it is decided in the early 2000s that resource fork is to be deprecated.
Vast majority of Mac apps today do not create files with resource fork. However, Mac applications (those in 〔/Applications/〕 folder, may still rely on resource fork to function.
Warning: You cannot simply delete the resource fork of a file and expect the file to function, because for some files, such as “unflattened” QuickTime movie files, the main data is in resource fork.
For Perl Scripts that tells you which files in a dir have resource fork, and other scripts for preparing transfer of Mac files to Windows, See: Perl Scripts For Mac/Windows File Moving.
For tech detail on resource fork, type/creator code, and OS X command tools for them, see: Mac OS X Resource Fork and Command Line Tips.
Mac files have a 4-letter Type code, to indicate file type and creator type. (Its purpose is similar to Filename extension and Internet media type (aka MIME type)) This type code is not in the resource fork. It is a feature of Mac's file system the HFS and HFS+.
File type/creator code are largely deprecated in preference to file name extensions. However, it is still common to see Mac OS X apps creating them. It is pretty safe to delete the type/creator code, if the file has file name extensions.
When you move your files to Windows, the type/creator code is automatically gone.
Another Mac specific file are those files named as “Icon^M”, where the “^M” is the Return character (ASCII 13). These are folder icon files. I cannot find info on the web about them. I don't know if they are Apple Icon Image format.
You can still find these “Icon^M” file names in OS X. For example, you'll find it in 〔/Applications/Adobe Reader 8/〕, as well in StuffIt 10, Mac Pov-Ray 3.6, Adium (v 1.3.7), and if you use Jamie Zawinski's XScreenSaver for OS X, you'll find many “Icon^M” files in your 〔~/Library/〕 dir. (I think the “Icon^M”, at least the filename itself, is deprecated.)
For Perl Script that find these files or remove them, see: Perl Scripts For Mac/Windows File Moving.
See: What Characters Are Not Allowed in File Names?
If you have Chinese char in file names, the file names may be fucked up when moving the file between Mac OS X and Windows, because that particular application or media or file transferring protocol may not understand non-ascii chars.
• Windows Vista (64 bits, SP2) zip utility does not handle Chinese. (right click, send to, Compressed (zipped) Folder) If your folder or file name has Chinese chars, Windows will complain and refuse to compress.
• Windows Console does not support unicode. It prints Chinese chars as gibberish. This applies to any app using Windows Console, such as cmd.exe, PowerShell, Cygwin bash. (Windows Console may support unicode, if so, involves some non-trivial setup. http://blogs.msdn.com/michkap/archive/2005/06/29/433669.aspx)
• Unison file sync tool does not handle Chinese names. (unison version 2.27.57) (Detail at The Complexity And Tedium of Software Engineering.)
• GNU Emacs's dired do not handle file with Chinese names. It shows up gibberish. (GNU Emacs 23.1.1 (i386-mingw-nt6.0.6002) of 2009-07-29 on SOFT-MJASON)
• Not sure if rsync supports Chinese fully. I think when using rsync on OS X to copy files from Mac to Windows (rsync version 2.6.9 protocol version 29), it works fine, but when using rsync on Windows (thru cygwin. rsync version 3.0.4 protocol version 30), to copy files from Windows to Mac, it has problems. Here's example of its error message:
building file list ... file has vanished: "/cygdrive/c/Users/xah/Documents/kacma pixra/prenu/200403_tony_relative/????.JPG" file has vanished: "/cygdrive/c/Users/xah/Documents/kacma pixra/prenu/200403_tony_relative/???.JPG" file has vanished: "/cygdrive/c/Users/xah/Documents/kacma pixra/prenu/200403_tony_relative/????.JPG"
Here's sample files names that create such error:
OS X 10.5's Terminal app supports Chinese fully. (OS X 10.4.x doesn't though. Detail: OS X 10.4.11's Terminal app, can display Chinese chars encoded with utf-8, e.g. “cat ‹text file›” where the text file is utf-8 encoded. However, if a file name has Chinese, it does not show up correctly when doing “ls”. (because file names in OS X are encoding with utf-16, because it is HFS+.) The Terminal app has a option under “Terminal‣Window Settings...‣Display‣Character Set Encoding”. However, the menu doesn't have utf-16 as a choice.)
Mac OS X 10.5's zip tool supports Chinese. (untested by me)
OS X 10.4.11, when connecting to Windows shares, supports Chinese.
Windows Vista (sp2), when connecting to Mac shares, supports Chinese.
Today, both Windows and Mac allow filenames to be a max of 255 chars. However, i'm not sure what's the max length for dir path. At least, i know that as of 2000, the unix gnu tar util will fuckup if you have path that's longer than about 120 chars. So, if you are using tar to archive a directory, be careful if the file paths is long. (for detail, see bottom of: Unix, RFC, and Line Truncation.)
Mac creates .DS_Store file in each folder. You'd want to remove them if you are copying them to Windows. You can run the following command in bash to remove them:
find . -name ".DS_Store" find . -name ".DS_Store" -exec rm {} \;
Note, Windows creates Thumbs.db. Windows Vista no longer produces that, but when accessing non-Windows networked files, it does create that in the dir.
For Perl Scripts, See: Perl Scripts For Mac/Windows File Moving.
There are many file transfer method and tools. You can copy it thru usb flash drive, or you can use the built-in file sharing on Mac OS X's Finder or Windows's Explorer, or you can use unix tools such as rsync, unison. Also, you might compress it using zip or tar gz, before using any of the above transfering method.
The issue of preserving filenames with non-ascii chars, unicode chars, or filename length, depends on the transfer method and the compression tool.
When shared thru Windows file sharing , it is done thru SMB/CIFS protocol. Am not sure how that protocol handles file name transfer, but i do know, that as late as 2006, the open source Samba software as part of Mac OS X for sharing Windows files, will fuckup Chinese chars in filenames.
The Unison file syncing utility also doesn't understand Unicode as of 2009-06. See: The Complexity And Tedium of Software Engineering.
Another method to transfer file is to zip it first, then pass it thru network. ZIP itself has problems with non-ascii characters. (i.e. as far as i know there are few variants of zip, some don't handle unicode well. As late as OS X 10.4.x, when you have a downloaded zip file containing chinese names, and you unzip thru Finder's BOMArchiveHelper, the Chinese names will become gibberish. The solution is to use The Unarchiver, which is used in OS X 10.5.x to replace BOMArchiveHelper.)
Another way is using tar or tar gz instead of zip. (similarly, you can use any file compression scheme) Tar had problems with file full path length back in 2000. Am not sure how it deals with unusual chars, unicode chars, or file lengths today.
Another common way today is to copy it to a USB flash drive, then copy it to another machine. How well this method preserves the file integrity depends on the file system used on the USB drive. Most usb drives are pre-formatted with FAT32 file system, which is a old file system. I think in practice there are variations and parameter differences that effects file transfer by it. Also see: Long filename.
Whatever is your method or tool for file transfer, if you want to preserve really long file names or paths, or unicode chars, Chinese chars, your should test it out first. Also, the result may be different depending on whether you are moving from Mac to Windows or Windows to Mac.
Another issue is whether file date, owner/group, permissions, etc, are preserved. This is important when you are dealing with software data. This again depends on the tool. In general, my experience is that, expect these data to be lost.
See also:
Misc notes:
Mac version of AOL Instant Messenger saves its chat log's file names like this:
xahlee’s Logs IM “Gesutus”-2003-12.html node60091’ Logs IM “rogerhoward@mac…”-2004.02.html
Note the curly single quote, curly double quotes, and the ellipsis char.
rsync in cygwin used from Windows Vista, when used to sync files from Windows to Mac, and if the file name on the Windows contains single curly quotes or double curly quotes, rsync will chock.
It'll also chock if the file name contains ellipsis char “…”. Here's a sample error:
xah@xah-PC ~ $ Documents/scripts/sync_pc_mac.sh Password: building file list ... done rsync: recv_generator: failed to stat "/Users/xah/Documents/tavla_vreji/AIM logs 2005/node60091's Logs/IM [rogerhoward@mac.]-2004.02.html": Invalid argument (22 )
xah@xah-PC ~
$ rsync --version
rsync version 3.0.4 protocol version 30
Copyright (C) 1996-2008 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 32-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, no IPv6, batchfiles, inplace,
append, ACLs, no xattrs, iconv, symtimes
rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. See the GNU
General Public Licence for details.