Wednesday, June 23. 2010
Perl exec Posted by Daniel Fischer
in Compilers, Operating Systems at
08:34
Comments (0) Trackbacks (0) Perl exec
Perl is a popular scripting language in the UNIX world, and there are several ports available for Windows. If your UNIX project uses perl, then porting that code to Windows is fairly painless. Perl doesn't attempt to be entirely portable, but most of the differences are quite obvious.
One that is apparently not obvious is perl's exec function. I've seen this happen to two different projects recently, in short succession. The symptom was that a process was started by a perl script, but no output from it was recorded on Windows. On UNIX, everything worked as expected. This turns out to be not very complicated. On UNIX, exec is implemented in terms of the execl()/execv() family of syscalls. This means the child process replaces the parent, and any process waiting for that parent automatically waits for the child instead. On Windows, there is no such thing. Windows perl implements exec by spawning a child process and returning control to the parent immediately, which then exits. Any process waiting for the parent regains control instantly and is left unaware of the child. A script written for UNIX will assume that the work is done before it has even started. There is a simple workaround: Use system (plus exit, if necessary) instead of exec. While the semantics are slightly different from exec, they are at least the same on both platforms; system spawns a new process and then waits for it, returning its exit code. Notice that system has a list form too so you don't need to worry about shell escaping either. Friday, May 29. 2009Bad register name "dil" (or "sil")
This is a piece of code from a project that runs on both the x86 and x86_64 architectures:
1 static inline int swap_int(int *a, int b) { 2 asm volatile ("xchg %0, %1" : "+r" (b) , "+m" (*a)); 3 return b; 4 } It's fairly easy to see what it does: It swaps two values of type int. This code works perfectly fine on both architectures, provided that you're using a compiler that understands the asm statement, such as gcc. Later, this similar piece of code appears:1 static inline int swap_char(char *a, char b) { 2 asm volatile ("xchg %0, %1" : "+r" (b) , "+m" (*a)); 3 return b; 4 } This code still compiles and works just fine. For the most part. But when optimisation is turned on, you may get this error from gcc when building for x86: Then you try the same on x86_64 and the error is gone again. No wonder: As opposed to x86, the x86_64 architecture actually has the %dil register. At first, this appears to be a compiler bug. After all, the compiler is choosing a register that doesn't exist on x86. On closer look, it's a bug in the example code. The issue is that for the b argument, the constraint r is used, indicating that the value should be stored in any general-purpose register. In the first example, this is just fine. All of them will do for 32-bit operations. The second example, on closer examination, actually requires a register whose lower byte is accessible. On x86, there are only four general-purpose registers where this is true: EAX, EBX, ECX and EDX. On x86_64, this is also true for the ESI and EDI registers that are also treated as general-purpose registers on x86.So what happens is that the compiler correctly chooses the %edi register, which satisfies the r constraint. Later, the xchg instruction is interpreted as referring to two byte-sized values due to the size of the arguments *a and b. Thus, the compiler translates the instruction to its 8-bit form and replaces the register placeholder %0 with the 8-bit form of the %edi register, which is %dil. During assembly, this fails because %dil doesn't actually exist on x86.If there is a compiler bug, it is only that the error output is misleading. It shouldn't even try to use %dil, it should warn about the real problem. The real bug is that in the example source, a byte-sized argument was qualified with a constraint that allowed any general-purpose register to be used, where instead, the set should be constrained to registers whose lower byte is available. In gcc, this can be achieved by using the q constraint instead of r.
Tuesday, March 24. 2009
SO_SNDTIMEO and SO_RCVTIMEO Posted by Daniel Fischer
in Operating Systems at
13:30
Comments (3) Trackbacks (0) SO_SNDTIMEO and SO_RCVTIMEO
Implementations of the BSD socket interface support various socket options. Two of them are SO_SNDTIMEO and SO_RCVTIMEO. They allow the user to specify a timeout for otherwise blocking send() and recv() calls. They're often described as the two socket options that have the most different implementations and are therefore among the most unportable ones at all.
It's not quite as bad, but among the major UNIX and Unix-like operating systems, there are at least three different beaviours for stream sockets, ignoring that the behaviour for datagram sockets may yet be different. Most of the operating systems I tested this on actually support these timeouts. Mac OS X, Linux, FreeBSD and AIX were all among the platforms where I was able to use these timeouts in a simple test, using reasonably recent versions of these operating systems. This means that the OSes accepted setting the option and the timeout that was configured did actually work. The BSD socket implementation that is used in Microsoft Windows also falls into this category, the timeouts work reliably there. One notable exception was Solaris, which reported that the protocol did not support this option. This means that the setsockopt() call failed. Since this is detectable, it's not a problem; there are other ways to implement timeouts without support from the socket layer. The other notable exception was HP-UX (tested with 11iv3 and 11iv1). On HP-UX, you can get the same behaviour as on Solaris if you use the UNIX03 socket library, meaning setsockopt() fails with ENOPROTOOPT. However, if you're using the BSD socket library that is also provided with HP-UX, then you are allowed to set the timeouts, but they are silently ignored. Even worse, the system will remember the timout you set, and querying it with getsockopt() will return it. This means you can't verify that the timeout is available by querying it after setting it and comparing it to the value you set it to. Setting it is allowed and querying it will yield the timeout that was previously set, but the timeout setting will be silently ignored by HP-UX. In summary, you can actually expect these timeouts to either work or cause an error when you try to set them on most UNIX and Unix-like operating systems, and also on Windows, but if you are really concerned about portability, you need a backup plan, and either a whitelist of platforms where the backup plan is not necessary, or a very short blacklist that mostly consists of HP-UX. Friday, January 18. 2008
Version Number Formatting Posted by Daniel Fischer
in Operating Systems at
12:01
Comments (0) Trackbacks (0) Version Number Formatting
One thing that's biting me currently is imposed version number formatting. The product I'm working on uses the traditional Major-Minor-Patch format, that is, there are three numerical components in a version number. Additionally, we add a suffix for special builds based on a previous release, such as a single letter for a hotfix build, or a service pack identifier, or a support ticket identifier.
Getting software certified for Windows Vista necessitates including a manifest with the software that, among other information, contains the name of the executable, the version number, security settings, and a signature. The version number consists of four components: Major, minor, build and revision. The version number goes into an XML attribute, so one programmer apparently forgot to double-check and put in our regular three-component version number. Needless to say, we figured that out fairly fast, since Windows 2003 will refuse to run applications with embedded manifests that contain errors. Interestingly, the tool that actually embeds the manifest doesn't comment at all. And the error that you get from Windows is not very transparent - you have to look into the system event log to get a clue, and unless you're on Vista, the clue you're getting is only that there's something wrong with your manifest. Anyway, we just added a zero as the fourth component and all was well. Over time, we started to use an 1 instead of a 0 as the fourth component for a revision based on a previous release. Some time later, we figured out that we actually have multiple different types of revisions. So now, the fourth component is calculated from all of them (or, all of those that we thought of, so far). It's starting to look like a bit field. Of course, the "revision" field is completely meaningless for comparison purposes: Revision 100 is based on completely different changes to the source code than Revision 1. Granted, you can still tell from revision 101 that it includes both sets of updates, but at this point I'm just glad that our other platforms aren't as limiting. Friday, December 28. 2007
O_DIRECT Posted by Daniel Fischer
in Operating Systems at
11:03
Comments (0) Trackbacks (0) Defined tags for this entry: linux
O_DIRECT
On UNIXoid operating systems, you can open(2) a file in many modes. On some operating systems, one of them is O_DIRECT, which stands for direct i/o without any caching. To sum it all up:
The whole notion of "direct IO" is totally braindamaged. Just say no. --Linus Torvalds Accessing files that were opened with O_DIRECT requires aligned buffers for reading and writing. For example, on Linux 2.6, all buffers must be aligned to 512 bytes and reads and writes can only happen in multiples of 512 bytes. It's fairly easy to align your data to 512 bytes, though. On Linux 2.4, on the other hand, buffers have to be aligned to multiples of the underlying file system's logical block size - which is generally much larger than 512 bytes. Size also has to be a multiple of the block size. It's not so easy to solve this generally. It's also often forgotten because nobody wants to use Linux 2.4 anymore, at least not for development work. We ran into this problem at least three times in three different places in 2007. You can tell that I work for a database company. Saturday, December 22. 2007
socklen_t confusion Posted by Daniel Fischer
in Operating Systems at
14:22
Comments (2) Trackbacks (0) socklen_t confusion
The BSD socket API (accept, bind, and so on) uses a struct sockaddr to pass socket addresses. Additionally, there's a parameter for passing the size of the memory block allocated for a struct sockaddr to the socket functions. This argument is passed as a pointer to the actual size, and upon completion of the API call, will contain the actual length of the address stored in the memory block instead of its size. In the original BSD API, this argument was of type int *.
At one point, a draft of the POSIX.1g standard defined this to be size_t *. This was a bit broken because size_t is usually not the same type as int on 64-bit platforms, the re-definition thus resulting in an unintended and incompatible change. The short of it is that people complained, and the type was changed again. Instead of reverting to int *, however, a new type socklen_t was introduced. This new type is defined to be the same as int on most platforms. As Linus Torvalds puts it, _Any_ sane library _must_ have "socklen_t" be the same size as int. Anything else breaks any BSD socket layer stuff. POSIX initially did make it a size_t, and I (and hopefully others, but obviously not too many) complained to them very loudly indeed. Making it a size_t is completely broken, exactly because size_t very seldom is the same size as "int" on 64-bit architectures, for example. And it has to be the same size as "int" because that's what the BSD socket interface is. (Quote taken from man 2 accept on Linux.) So for a small period of time, operating system vendors were preparing for POSIX.1g as it was known back then and started to use size_t instead of int. As the draft was changed to use socklen_t, and later SUSv2 included socklen_t, they mostly started to use socklen_t and defined it to int. Some defined it to be the same type as size_t. One example that is mentioned in Linux' man 2 accept is SunOS 5. This includes the current release of Solaris, being based on SunOS 5.10. However, that's not a big portability issue. It's broken as described above, but code that uses socklen_t in all places should be just fine. On HP-UX, you get the worst of both worlds. On the one hand, you can use the old BSD API with pointers to int. On the other hand, you can define _XOPEN_SOURCE_EXTENDED and get the new API with socklen_t. However, if you don't define _XOPEN_SOURCE_EXTENDED, you still get a definition of socklen_t to size_t. The type exists but is completely worthless as it can't be used with the socket API, which expects int. I've actually seen code that fell over this in both possible ways, using int * in one place and socklen_t * in another... Wednesday, December 19. 2007
UNIX domain sockets Posted by Daniel Fischer
in Operating Systems at
12:05
Comments (0) Trackbacks (0) UNIX domain sockets
Unlike sockets in other domains, sockets in the UNIX domain are visible in the file system. Because of this, they're sometimes confused with regular files. When they're not being confused with regular files, their specific restrictions that don't apply to other files are still easily forgotten. One such restriction of a UNIX domain socket is the length of its name.
Path names can be rather long on UNIX-like systems these days. Gone are the days of file names limited to 14 characters and for path names, POSIX-compliant operating systems generally support up to 256 characters. On many platforms, path names can be even longer than that. For example, PATH_MAX is 1024 on Mac OS X and 4096 on GNU/Linux. In contrast, the full path name of a UNIX domain socket must fit into a struct sockaddr_un. Its component sun_path generally has much less room for the socket's name than 1024 characters. Typical numbers are 108 (Linux, Solaris, Cygwin), 104 (AIX, BSDs, Mac OS X), and 92 (HP-UX). At some point, it was only 14 in Interix 5.2. This means that, while UNIX domain sockets do appear as if they were files, they can't be placed in an arbitrary location in the file system. Now, imagine an automated testing process that can test multiple instances of the software at a time, and keeps all files relevant to one test run within one directory. Path names can easily become longer than 92 characters in a scenario like this. In one case, this happened in a system where the name of one such instance's directory didn't have a constant length and occasionally brought the complete path to more than 92 characters, causing random total failures on HP-UX. |
Calendar
QuicksearchtagsArchivesCategoriesSyndicate This BlogBlog AdministrationImprintAs required by German federal law, contact details and imprint for this web site.
![]() Blog posts are licensed under a Creative Commons Attribution-Share Alike 2.0 Germany License. |
|||||||||||||||||||||||||||||||||||||||||||||||||

