Wednesday, April 13, 2011

Cross-platform file name handling



The challenge

On many operating systems file names are case sensitive. The windows OS, however, has case insensitive file names. The tool preferably should be cross-platform, which in my opinion means that we should be able to analyze:

  • code written on a linux system with a tool running under windows
  • code written on a linux system with a tool running under linux
  • code written on a windows system with a tool running under windows
  • code written on a windows system with a tool running under linux

The first option is not possible in the most general case. In linux it would be perfectly possible to have two different files in the same folder: includeme.h and IncludeMe.h. On windows those two would be the same file. Too bad, nothing we can do about it. The second option also presents no problems.

The headache starts when trying to implement scenarios 3 and 4. Code written on windows systems tends to be rather sloppy when it comes to specifying include files. The compiler certainly won't enforce using correct case for file names, and writing


on windows means exactly the same as writing:


The approach

To be able to handle all four scenarios transparantly, I have opted to add a command-line option to the tool to explicitly inform it about the required case sensitivity mode of operation. File names are wrapped in a FileNode object that always holds the case sensitive file name (including full path).

The FileNode object then exposes two methods: file_name (which returns the case sensitive name), and analysis_name which, for the case insensitive operation mode, is the all-lowercase version of the file, and for case sensitive operation mode is the case sensitive version of the file name, possibly truncated to a maximum length (no one wants to write queries involving a file name like /home/username/development/cpp/largeproject/library/sublibrary/longfilename.cpp, right?)



Apparently path.py only recognizes os.sep as path separator. This makes it a bit harder to analyze windows software, where both forward slash / and backslash \ are valid path separators. Path.py seems to only recognize \ on windows and / on linux. This issue explains the kludgy lines where path separators are replaced at the top of method FileNode.__pp(self, for_display).
To make the tool work cross-platform as defined in this post, we pass the allowed path separators into the tool via a command line option.

No comments:

Post a Comment