This is the old RCRchive. It's available only for reading and reference. To submit RCRs for Ruby 1.9/2.0, and to participate in discussions about them, please visit the new RCRchive.
ruby picture

RCR 345: Make Ruby 2 standard libraries use Symbols in place of C-sty

Submitted by Tomasz Wegrzanowski (Sun Aug 20 03:31:30 UTC 2006)

Abstract

In C most libraries use #define'd integers to emulate symbols. This is very inconvenient and error-prone. Ruby has real Symbols, so there is no reason to emulate C-style pseudosymbols. The standard library should use "real" symbols instead.

Problem

C programming language doesn't have any ways of passing "tags" around. The most common way of doing so is by creating a header with set of: #DEFINE NAMESPACE_SYMBOL number And then using NAMESPACE_SYMBOL when one wants to pass tags around.

Currently Ruby standard library uses C-style pseudo-symbols all over the place. For example to create a socket one has to say:

socket = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM, 0)

Instead of more Ruby-like:

socket = Socket.new(:inet, :stream, 0)

This is highly verbose - one needs to remember many multi-level namespaces (here Socket::SOCK_* and Socket::AF_*) and also highly error-prone. It is very easy to make a mistake like:

socket = Socket.new(Socket::SOCK_STREAM, Socket::AF_INET, 0)

and it doesn't even throw an error, just creates a wrong kind of socket, what is very hard to debug.

We can make Socket::SOCK_STREAM evaluate to Socket::SOCKEnum object (some Ruby libraries do something like that), but this would be ugly. It's much easier to simply use Symbols.

Proposal

The proposal has two parts (and a third optional part) - first make functions like Socket.new accept symbols instead of Integers, and dispatch them at runtime.

Second, make all existing pseudo-symbols evaluate to real symbols. So Socket::SOCK_STREAM will evaluate to :stream etc. This way most programs will continue to work.

Third, optional, is to define Symbol#| to return some object that can be converted to a number in a right way later. Probably not an Array, because Array#| already does something, and :a|:b|:c would mean [:a,:b]|:c what would be bad.

Maybe we should let such functions accept integers too, so people can pass nonstandard pseudosymbols. This would be very rarely used of course.

Analysis

The change will make the Ruby standard library significantly easier and less error-prone to use.

This proposal is mostly backwards-compatible, however there are a few cases where it's not.

Case 1 - if the fact that pseudosymbols are integers is actually used in a program. The most common case is probably flags, like open("file", Fcntl::O_CREAT|Fcntl::O_EXCL). We can make it backwards-compatible by defining Symbol#| or accept the incompatibility. I'm not sure whether Symbol#| is a good idea or not. If we added Symbol#|, the backwards compatibility would be pretty much complete in this case.

Case 2 - we don't really know what namespace to use. Most calls like Socket#new know how to convert each of their arguments. Some however don't, like ioctl(2), and they'd need more complex solutions.

It should not be slower in most cases.

Both:

 socket = Socket.new(Socket::SOCK_STREAM, Socket::AF_INET, 0)
and
 socket = Socket.new(:stream, :inet, 0)
have to convert symbols to numbers at run-time. Only one does it before the call, and other after the call. Well, unless Ruby does some sort of optimization here, but both cases seem to be as optimizable.

Because of the slight backwards incompatibility, it's best to do the switch when moving to Ruby 2.

Implementation

Implementation should be relatively straighforward.
ruby picture
Comments Current voting
How can I superseed? It superseeds my 343 which superseeds 178. My is bit different because you could pass fixnum istead symbol which makes it backward compatible. Implementation is not so easy. Easy part is add methods from my rcr. Harder part is browse source and replace all occurences of const by rb_const_get.


again forget signature. Ondrej Bilka


I have an impression that it should be fairly simple to code it. If most of us agree that this is a good idea, I can try doing some implementation. But it's better to ask first and code later, especially if the coding has to be done in C :-)

Number of RCRs aimed at this problem is really huge, so it seems the feeling that something ought to be done about it is quite prevalent. I think this RCR is the most backward-compatible and most elegant so far, but well - I'm ceratinly biased here, so I'd like to hear your views too :-)


Translating C code to

  socket = Socket.new(Socket::SOCK_STREAM, Socket::AF_INET, 0)

is pretty straight forward, whereas

 socket = Socket.new(:stream, :inet, 0)

is not. One might confuse to choose :SOCK_STREAM or :stream or :STREAM or something totally different. I am not refusing this idea, but there must be some specific rule to convert C code to Ruby symbolic code.

matz.


As far as I can tell, C symbols in almost all libraries are in all-uppercase, and even when they use mixed case, it is almost unheard of to have different cases mean different things. Grepping /usr/include on a Ubuntu system (33520 different symbols that are #defined as numbers) I've only found one such case in X11/keysymdef.h where Unicode characters are #define'd and case matters:

 #define XK_Egrave 0x00c8 /* U+00C8 LATIN CAPITAL LETTER E WITH GRAVE */
 #define XK_egrave 0x00e8 /* U+00E8 LATIN SMALL LETTER E WITH GRAVE */
etc.

So we can simply select any rules for character case. I propose :all_lower_case_with_underscores as this would be most consistent with the rest of Ruby.

So the only problem left is whether to use :stream or :sock_stream, as about 67% pseudosymbols have two or more underscores. I think that most of the time it should be obvious what's "the right thing" to do (usually cutting the first part, or first two parts if they are always together like in XML_SCHEMAS_ELEM_DEFAULT becomes :elem_default). Even in cases where it's not - it's often better than current situation. If the RCR gets accepted the choice will be between :o_rdonly and :rdonly. Now one must guess whether it's Socket::O_RDONLY or IO::O_RDONLY or File::O_RDONLY or Fcntl::O_RDONLY or something else.

There is one more problem - some C symbols have numbers just after underscore. As many as 7.5% have a number following some underscore, but most of them are safe and the number is somewhere further in the symbol, like GL_DOT_PRODUCT_TEXTURE_1D_NV. However 1.4% have number following the first underscore, and in Ruby they'd need to be translated to :"3d_color_texture". This isn't perfect, but we can probably live with it.

-- Tomasz Wegrzanowski


[You wrote]"I have an impression that it should be fairly simple to code it. If most of us agree that this is a good idea, I can try doing some implementation. But it's better to ask first and code later, especially if the coding has to be done in C :-)"

Actually the implementation is part of the RCR process. That makes it easier for people to try it out and decide what they think of it. Sometimes an implementation can't be written (as when someone suggests a fundamental syntax change), but in general the approach should be: ask and code at the same time, not ask first and code later.


I submited at [ruby-core:08850] my implementation. Is needed write enum class holding symbols, alowing negation and perhaps make from CONSTANTS constants. Ondrej Bilka


Can't think of a place where it would make sense to do different things if you receive a :socket_o_rdonly, a :io_o_rdonly or, :file_o_rdonly. You could just have :o_rdonly, let the function decide. That's the beauty of symbols, context is everything. It even gives a more "what you expect" feel.

my 2cents


Strongly opposed0
Opposed0
Neutral1
In favor1
Strongly advocate4
ruby picture
ruby picture

Powered by Ruby on Rails.