Computers, Science, Technology, Xen Virtualization, Hosting, Photography, The Internet, Geekdom And More

Interesting internationalization

Posted on | September 18, 2008 | No Comments

I’ve been having a rather enjoyable discussion with Lorenzo Bettini, the author of GNU Gengetopt and Gengen. If you are not familiar with Gengetopt, its a tool that simplifies generating the code required for programs to accept options. I talked about gengetopt a while back on this blog.

The discussion surrounds support for internationalization, which simply means a program printing output in whatever language your system is using. A very popular means to accomplish this is the GNU gettext library, or its counterpart in various other operating systems. Disclamer, if you do not care about C standards and common UNIX programming, this post will probably bore you to tears. However, your welcome to read on if only to discover how much I value life in boring geeky flavors :)

Recently, while working on HelenOS I found the need to implement getopt. I’m working on something a lot like busybox, but from scratch. Rather than re-invent the getopt wheel yet again, I dove into the source code of GNU getopt as well as the various *BSD implementations to find the one that was easiest to port. One thing that stood out at me was that most implementations declared error strings and formats as static, English strings.

A week later, a proposal to add gettext support came across the gengetopt mailing list. I voted to not include it because the resulting output would be at best mixed, as error strings in basic C89/C99 libc are hard coded, usually in English. The getopt family of functions are an interesting corner case, most core libc functions never print anything … they just exit with some value and set a global errno to inform the calling function of what went wrong. Its then up to the calling function to (use or not use) something like gettext to display the information in the appropriate language.

This got me thinking .. if we’re making a truly modern operating system based on C89/C99 .. internationalization must be easy to leverage. Talking with Lorenzo, some interesting ideas came to light.

Gettext is a great tool, however it introduces dependencies on shared objects (or requires static linking). Simply parsing command line arguments should not introduce additional dependencies. Additionally, standard libc should not favor one internationalization library over another, therefore its just not practical to include internationalization in standard libc. However, stuff in libc that prints strings can remain friendly to internationalization by simply requiring one tiny additional step – have the programmer register a callback function that gets all output instead of printing to the error file descriptor directly in libc.

For instance, a modern getopt might look like this:

void getopt_callback(int level, char *format, int line) {  switch (level) {  case GETOPT_EOK:   fprintf(stdout, "line %d of getopt.c says %s", line, format);   break;  case GETOPT_EFAIL:   fprintf(stderr, "line %d of getopt.c says %s", line, format);   break;  }  return; } int main(int argc, char **argv) {  int c, opt_ind;  getopt_register_callback(getopt_callback);  for (c = 0, optind = 0, opt_ind = 0; c != -1;) {   c = getopt_long(argc, argv, "f:o:O", long_options, &opt_ind);   ......

As we can see, the getopt implementation does not actually print anything, it allows the callback function to handle that. One could very easily use internationalization tools within the callback, if they wanted to do so. This goes (basically) for anything in core libc that prints anything. The callback may be better typed as something like getopt_callback_t.

This seems to be the favorable approach. Of course, getopt would feature its own static callback that printed the strings as-is if the calling function did not register one, this keeps from breaking existing use.

After I finish some other tasks, I am going to take a stroll through the HelenOS userspace generic libc to see if getopt is the only case where this should be implemented. At the least, I’ll implement this getopt method there so that programs can be fully internationalized.

This also shows the virtue of not using printf() directly when you can avoid it. Writing your own functions to print warnings or messages allows someone else (even you) to go back and make only a few minor changes in order for your program to support any language.

Working on a new OS that aims to be fully modern presents interesting opportunities to pick things apart. Hopefully, this didn’t put you to sleep :)


Leave a Reply

  • Monkey Plus Typewriter
  • Stack Overflow

  • Me According To Ohloh

  • Meta