#3 new
Hector E. Gomez Morales

ASCII-8BIT encoding of query results in rails 2.3.2 and ruby 1.9.1

Reported by Hector E. Gomez Morales | April 8th, 2009 @ 07:00 PM

From #2188:

Hello! We've got the same problem! Only the error occurs when we fetch data from the database. We're using Mysql and Charset is UTF-8, but the Active Record returns ASCII-8BIT. Is it possible to do similar changes to the activerecord as you did to the actionpack? Seems as we're not the only ones with that problem

Problem

Fetching data from any database (Mysql, Postgresql, Sqlite2 & 3), all configured to have UTF-8 as it's character set still returns the data with ASCII-8BIT in ruby 1.9.1 and rails 2.3.2.1

This has been reported in #2188 and in the rails talk group (1).

Possible Solution

Again like in #2188 rails is not the culprit here the only problem with rails is it inherent trust that all the data it gets is UTF-8. When the data is in another encoding the problems arise.

The real problem is that all the adapters use native C extensions as glue in which they use rb_str_new function that in ruby 1.9.1 creates a String with ASCII-8BIT encoding (2). So that is why all the data is returned with this encoding. Because the initial problems where detected in MySQL(mysql-ruby) I made the needed modification and created a fork in github (3). This fork is only 1.9.1 compatible, returns ASCII-8BIT for binary fields and UTF-8 for all other fields.

Well the problem is that $KCODE is not set in ruby19 so the call to mb_chars doesn't proxy the string. So that why I did the explicit wrapping.

With this modified mysql-ruby gem, all activerecord test for mysql passes except test_validate_case_sensitive_uniqueness. This is because in the implementation of this validation the data is downcase and a query using LOWER() in the unique field is executed. The downcase in ruby 1.8.1 for non-ASCII strings is done with MultiByte Given that in ruby 1.9.1 downcase still does nothing for non-ASCII encoding strings so I use Multibyte downcase method to do it.

I attach a patch so validates_uniqueness uses Multibyte downcase method on the string if we are using ruby 1.9.1. With this patch all test pass.

Links

  1. Rails Group Post
  2. ASCII-8BIT default in rb_str_new
  3. mysql-ruby fork


# activesupport/lib/active_support/core_ext/string/multibyte.rb
def mb_chars
  if ActiveSupport::Multibyte.proxy_class.wants?(self)
    ActiveSupport::Multibyte.proxy_class.new(self)
  else
    self
  end
end

# activesupport/lib/active_support/multibyte/chars.rb
def self.wants?(string)
  $KCODE == 'UTF8' && consumes?(string)
end

# railties/lib/initializer.rb 
def initialize_encoding
  $KCODE='u' if RUBY_VERSION < '1.9'
end

No comments found

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Proposal for the implementation of ActiveEncoding library that will make transparent the manipulation of string of different encodings (compatible and not compatible).

People watching this ticket

Pages