static typing for Ruby
July 6th, 2008looks like some people did some research on a ‘type checker’ for Ruby
http://blog.segment7.net/articles/2008/04/16/static-typing-for-ruby
looks like some people did some research on a ‘type checker’ for Ruby
http://blog.segment7.net/articles/2008/04/16/static-typing-for-ruby
overcome the errors in the comments by upgrading to the latest svn of 1.9 ![]()
Our goal is to create a single threaded, evented ruby web server that doesn’t block on IO for requests.
Background:
Typical ruby web servers run something like this
get a request process it until you hit a DB access. Wait until the DB access comes back. Parse the results. Rinse, repeat. Render.
Unfortunately during the time you are hitting the DB, the current thread ‘blocks’ all other executing threads from running. You are basically stuck, frozen in all threads, until MySql calculates the result and sends it back to you.
Note that on typical web services, in my experience, the total time spent is something like 50% DB “waiting” and 50% rendering. So theoretically if we didn’t block on IO, we could use the 50% DB time to be doing the rendering of other requests. Currently, it just goes lacking. Also there is some overhead with using threads, in general, versus a true single threaded implementation.
So our goal is to somehow create a web server in Ruby that doesn’t block on DB access, while still leaving the code looking the same as it now does. The tools we’ll use for this are basically fibers and an evented MySql driver.
Enter EventMachine + asynchronous MySql driver: [1]
With asynchronous MySql [asymy package -- http://github.com/tqbf/asymy/tree/master], you send off a request and when the request comes back it runs a block. In the interim, however, it is not blocking the CPU. It uses EventMachine to receive data in a non blocking manner.
example:
EventMachine::run {
c = Asymy::Connection.new(:target => "localhost",
:port => 13306,
:username => "user",
:password => "pass",
:database => "mysql")
c.exec("show databases") do |fields, rows|
# this block is run possibly much later
pp fields
pp rows
end
}
So if you wanted to compare asymy to ‘normal’ threaded mysql:
Here’s the code for a “normal” ruby threaded mysql
http://code.google.com/p/ruby-roger-useful-functions/source/browse/trunk/asymy/big_thread.rb
and here’s evented:
http://code.google.com/p/ruby-roger-useful-functions/source/browse/trunk/asymy/big_em.rb
For my tests, it ran something like
~/dev/ruby-roger-useful-functions/asymy time ruby big_em.rb real 0m4.731s user 0m0.121s sys 0m0.037s ~/dev/ruby-roger-useful-functions/asymy time ruby big_thread.rb real 0m14.033s user 0m0.089s sys 0m0.039s
because it is able to run several requests against the MySql server [which happened to be on localhost], so the MySql server was able to “work harder” and utilitize more CPU, and also it does them asynchronously, so the average time spent per query is FAR less. The quick queries return almost immediately, despite the fact that long running queries were already running when the quick ones queries began.
So asynchronous mysql does work in an asynchronous manner and not block. Snozzleberries are snozzleberries.
Here’s an example of a web server that spins off some SQL queries per request.
It works as expected — a server can receive an incoming request and handle it even if there are DB queries outstanding [ex:
http://code.google.com/p/ruby-roger-useful-functions/source/browse/trunk/asymy/prototype_server_non_fiber.rb
Unfortunately "just" using the asym adapter would mean you'd have to put all your code in nested blocks, like
def action
conn.exec(query) do|answer|
a = answer
conn.exec(query_2) do |answer|
b = answer
render
end
end
end
Which is unnatural and hard to use with subroutines and the like.
We will use fibers to create a solution which doesn't require existing code to change.
Fibers:
With Ruby 1.9 you can, instead of using a thread, you can use 'fibers' which give you almost as much functionality, as a far lower cost.
We want to avoid using threads because, at least with ruby 1.8.6, threads are slow and have high overhead.
Fibers are basically threads that run until they 'pause themselves', then return to running after you, later, at some point, tell them to resume. They're like uinterruptible threads that have a 'sleep' function.
ex:
require 'fiber'
a = Fiber.new {
print 'in fiber'
Fiber.yield # pause myself
print 'fiber has continued'
}
a.resume # this prints "in fiber"
print 'it has deferred execution back to us'
a.resume # this prints fiber has continued
So the fiber we created is basically a block which can stop itself arbitrarily at certain points. And then it can be restarted.
So now let's join this with the mysql example, above, so that we can have code that "pauses" when it is waiting for a DB query, then resumes when the query returns.
The code will be something like this:
fib = Fiber.new {
myself_as_fiber = Fiber.current
a = nil # declare it here so we can set it within the block and return it
conn.exec(query) do |answer|
# this runs much later, when query returns
a = answer
myself_as_fiber.resume # this will resume us "past" the yield call, below
end
Fiber.yield # pause ourselves until the block, above gets run and we are resumed.
# the above line basically returns control
return a
}
fib.resume # start it
an ugly example of a server with asymy and fibers is
http://code.google.com/p/ruby-roger-useful-functions/source/browse/trunk/asymy/fibers/prototype_server_fibers.rb
So now we have a fibered web server. We have accomplished our task.
So let's add it to mongrel:
Well, it turns out that it is pretty easy to add fibers to mongrel. We just add a wrapper to Kirk Haines' existing fibered mongrel by wrapping requests in a Fiber.new {}.resume, so that requests can pause their fiber when they need to for a DB query to return, thus transferring control back to mongrel to continue [and accept new requests, etc.]
See
http://code.google.com/p/ruby-roger-useful-functions/source/browse/trunk/asymy/fibers/mongrel_test/fibered_mongrel.rb
Note: for mongrel to work with 1.9 you’ll have to do a minor edit: install the gem, then run it
irb >> require 'rubygems'; require 'mongrel'
It’ll give a syntax error–go to those lines within handlers.rb and replace the :’s with “then”’s
Here
http://code.google.com/p/ruby-roger-useful-functions/source/browse/trunk/asymy/fibers/mongrel_test/test_simple_mongrel.rb
is a “normal” mongrel handler, and here
http://code.google.com/p/ruby-roger-useful-functions/source/browse/trunk/asymy/fibers/mongrel_test/test_mongrel_with_db.rb
is a mongrel handler that uses asymy [and our fibered mongrel, above] to create a working, single threaded, evented ruby web server that doesn’t block on requests.
from an email:
“And, snozzleberries it is. I’m able to pound it with lots of requests
and those that do “very little” IO finish [as hoped for]
very quickly [like 20ms], and those with “lots” of IO finish…after
their IO finishes, Much later. Like 8s [since their query was long].
”
It also seems to work slightly faster if you use a connection pool, than when you just have one connection per fiber. see http://rubyforge.org/pipermail/eventmachine-talk/2008-June/001887.html
The next steps would be to get ramaze or merb working [really working] using this, as well as to make sure there aren’t any other weird bugs
Update: got it working [well at least slightly] on ramaze!
This ramaze project incorporates them all, with instructions given in start.rb.
Let me know if anybody actually wants to set this up, I can help you through it since it requires some code edits [they're described in the files, but are still slightly complicated].
I created several test url’s, for comparisons:
http://127.0.0.1:7000/{short, long, mixed}_{new, old}_school
with short and long meaning a quick sql query or one that takes a long time [like 1s+], mixed meaning every 5th connection is a long sql query [so short with one long one every so often], and new meaning our event driven sql adapter and old school being the ‘ask and wait’ existing C mysql adapter.
for these tests we’re using the latest git ramaze, a connection pool size of 20 for the evented sql adapter. Ramaze is run with sourcereload on [so slightly slower than it could be] and some amount of debugging info output to the screen per request.
The most interesting results come from the mixed cases:
ab -n 50 -c 20 http://127.0.0.1:7000/mixed_old_school
ab -n 50 -c 20 http://127.0.0.1:7000/short_new_school
which is close but a bit slower.
So overall snozzleberries are snozzleberries–we were able to increase the throughput of ’short requests’ significantly, and only have a small’ish drawback for short requests. I’d say…it’s more successful than today’s ruby web servers. Except that this was for an artificial [and flattering] workload, so true results have yet to be found.
Also note that if we run concurrent request to the ‘long queries’ that again the evented version comes out on top, as it can basically pass those requests all on to the mysql server which then uses about 160% cpu instead of 130% and answers the queries more quickly.
Future work:
We need to get DataMapper or Sequel or ActiveRecord to work with it is the next step.
It might be convenient to build a ‘duck typed mysql compatible’ drop in replacement for the mysql C library. This might work with all of them.
There’s a Google SoC project to help make rails thread safe, so it may even theoretically be possible for us to see Rails + non blocking IO on the horizon. We can only hope. This probably wouldn’t be enough to make rails actually fast, but it might help.
[1] EventMachine runs with its own event loop [single threaded] and responds to events on sockets, so works more quickly than a multi-threaded approach. Rev, built on libev, could also be used as a drop in replacement for the same.
So…doing the old unofficial benchmark:
a controller with a trivial view
ab -n 400 -c 100
rails 2.1/ruby 1.8.6/Evented Mongrel serving static [i.e. / of a newly created rails app]:
Percentage of the requests served within a certain time (ms)
50% 64
66% 86
75% 87
80% 87
90% 88
95% 92
98% 96
99% 100
100% 100 (longest request)
[note that in the logs, it says that it sends them in
Completed in 0.00040 (2531 reqs/sec) | Rendering: 0.00025 (62%) | DB: 0.00000 (0%) | 200 OK]
rails2.1/1.8.6/EM serving trivial view
Percentage of the requests served within a certain time (ms)
50% 151
66% 218
75% 228
80% 232
90% 233
95% 233
98% 234
99% 234
100% 234 (longest request)
django 0.96 serving / of a newly created app [I think that's static]
Percentage of the requests served within a certain time (ms)
50% 7
66% 7
75% 7
80% 8
90% 9
95% 13
98% 16
99% 16
100% 17 (longest request)
So overall it seems that EM didn’t help rails out as mightily as we’d expected it to, for some reason.
Without EM, though:
rails 2.1/1.8.6/Mongrel trivial request:
Percentage of the requests served within a certain time (ms)
50% 286
66% 326
75% 328
80% 330
90% 487
95% 561
98% 591
99% 603
100% 616 (longest request)
serving static:
Percentage of the requests served within a certain time (ms)
50% 213
66% 220
75% 254
80% 255
90% 257
95% 258
98% 262
99% 265
100% 267 (longest request)
So…overall in this situation [very very high load] it appears that EM + Rails come up far far short, somehow. And Django is indeed the speed demon.
Now ramaze + 1.8.6 + Mongrel [unable to test non Evented] serving a trivial view:
Percentage of the requests served within a certain time (ms)
50% 260
66% 283
75% 284
80% 285
90% 287
95% 287
98% 288
99% 288
100% 288 (longest request)
and ramaze + 1.8.6 + EM
Percentage of the requests served within a certain time (ms)
50% 279
66% 281
75% 282
80% 283
90% 286
95% 286
98% 304
99% 304
100% 305 (longest request)
and ramaze/1.9 trivial view [evented and normal seem the same]
Percentage of the requests served within a certain time (ms)
50% 360
66% 375
75% 382
80% 398
90% 447
95% 457
98% 491
99% 504
100% 514 (longest request)
So overall I’d say there appears to be something very wrong with how Ruby currently handles multiple connections, which thing is very odd to me.
EM tweak? Not sure.
Note that with low concurrency the speed is…well…hard to tell the difference:
EM:
instructions here
http://groups.google.com/group/merb/browse_thread/thread/12387cbf81127f0c/e606dc37c420c67d?lnk=gst&q=1.9#e606dc37c420c67d
except at that point in time, it still wouldn’t work at all.
Seems that a few dependencies need just a little love.
apparently it’s a datamapper specific mysql driver.
See http://datamapper.org/getting_started.html
unable to get your javascript to work right on your conversion page? Make sure it’s AFTER the html and body and everything.
Maybe that will help, or also remember that it might not be letting it get to that page, if it relies on post data.
Socket.getaddrinfo(Socket.gethostname, nil, Socket::AF_UNSPEC, Socket::SOCK_STREAM, nil, Socket::AI_CANONNAME).select{|type| type[0] == ‘AF_INET’}[0][3] # ltodo hmm we don’t want it to do the reverse lookups!
there it is ![]()
./lib/em/lib/eventmachine.rb:235: [BUG] Segmentation fault
ruby 1.8.6 (2007-09-24) [i486-linux]
or
Illegal instruction
for me meant “it seems that EM does call backs during its process loop. During these callbacks the EM thread itself can spontaneously “jump” to do its release_machine [for example, if there's an uncaught exception, or if the root thread terminates, of if you call Thread.raise on it]. This means that during its release_machine, its Descriptor array is in a half-terminated state.
To overcome it: go to em.cpp and change the logic to just substitute in the end one when it fails.
http://dev.rubyonrails.org/ticket/2172
appears to be one.
http://dev.rubyonrails.org/ticket/10978 was already committed.
So…we need to find the bottleneck, and fix that bottleneck. Fix it. As in fix it.
todo: override :find and :find_by_sql and also output ‘this long creating AR stuffs!’ per request lol.
Investigate 11000 hash calls.
Use it with mfp, not the big one! no!!!!!
trivial erb takes .0002[1]
[1]http://www.ruby-doc.org/stdlib/libdoc/erb/rdoc/