Examples:
* A required Ruby module is not compatible with JRuby (e.g. Qt)
* The Java code is only needed by optional parts of the system such as plugins
The solution is to use Distributed Ruby (DRb) to bridge between a Ruby application and a JRuby process. A JRuby application is written that contains a DRb service which acts as a facade for the Java code. A Ruby class provides an API for managing the JRuby process. The Ruby application uses the API to start the JRuby process, connect to the facade object via DRb, and stop the JRuby process when it is no longer needed.
The following example provides a simple implementation which provides access to the Apache Tika library. This is, of course, provided only for illustration purposes; Tika comes with a much more useful command-line utility, and a (J)ruby-tika project already exists.
The directory structure for this example is as follows:
bin/test_app.rb
lib/tika-app-1.1.jar
lib/tika_service.rb
lib/tika_service_jruby
The JRuby application, tika_service_jruby, requires the java module and the tika jar. It also uses the tika_service module in order to define default port numbers and such.
#!/usr/bin/env jruby
raise ScriptError.new("Tika requires JRuby") unless RUBY_PLATFORM =~ /java/
require 'java'
require 'tika-app-1.1.jar'
require 'tika_service'
# =============================================================================
module Tika
# ------------------------------------------------------------------------
# Namespaces for Tika plugins
module ContentHandler
Body = Java::org.apache.tika.sax.BodyContentHandler
Boilerpipe = Java::org.apache.tika.parser.html.BoilerpipeContentHandler
Writeout = Java::org.apache.tika.sax.WriteOutContentHandler
end
module Parser
Auto = Java::org.apache.tika.parser.AutoDetectParser
end
module Detector
Default = Java::org.apache.tika.detect.DefaultDetector
Language = Java::org.apache.tika.language.LanguageIdentifier
end
Metadata = Java::org.apache.tika.metadata.Metadata
class Service
# ----------------------------------------------------------------------
# JRuby Bridge
# Number of clients connected to TikaServer
attr_reader :usage_count
def initialize
@usage_count = 0
Tika::Detector::Language.initProfiles
end
def inc_usage; @usage_count += 1; end
def dec_usage; @usage_count -= 1; end
def stop_if_unused; DRb.stop_service if (usage_count <= 0); end
def self.drb_start(port)
port ||= DEFAULT_PORT
DRb.start_service "druby://localhost:#{port.to_i}", self.new
puts "tika daemon started (#{Process.pid}). Connect to #{DRb.uri}"
trap('HUP') { DRb.stop_service; Tika::Service.drb_start(port) }
trap('INT') { puts 'Stopping tika daemon'; DRb.stop_service }
DRb.thread.join
end
# ----------------------------------------------------------------------
# Tika Facade
def parse(str)
input = java.io.ByteArrayInputStream.new(str.to_java.get_bytes)
content = Tika::ContentHandler::Body.new(-1)
metadata = Tika::Metadata.new
Tika::Parser::Auto.new.parse(input, content, metadata)
lang = Tika::Detector::Language.new(input.to_string)
{ :content => content.to_string,
:language => lang.getLanguage(),
:metadata => metadata_to_hash(metadata) }
end
def metadata_to_hash(mdata)
h = {}
Metadata.constants.each do |name|
begin
val = mdata.get(Metadata.const_get name)
h[name.downcase.to_sym] = val if val
rescue NameError
# nop
end
end
h
end
end
end
# ----------------------------------------------------------------------
# main()
Tika::Service.drb_start ARGV.first if __FILE__ == $0
The details of the Tika Facade are not of interest here. What is important for the technique is the if __FILE__ == 0 line, the drb_start class method, and the inc_usage, dec_usage, and stop_if_unused instance methods. These will be used by the Ruby tika_service module to manage the Tika::Service instance.
When run, this application starts a DRb instance on the requested port number, starts a Tika::Service instance, and returns a DRb Proxy object for that instance when a DRb client connects.
The Ruby module, tika_service.rb, uses fork-exec to launch a JRuby process running tika_service_jruby application. Note that the port number for the DRb service to listen on is passed as an argument to tika_service_jruby.
#!/usr/bin/env ruby
require 'drb'
module Tika
class Service
DAEMON = File.join(File.dirname(__FILE__), 'tika_service_jruby')
DEFAULT_PORT = 44344
DEFAULT_URI = "druby://localhost:#{DEFAULT_PORT}"
TIMEOUT = 300 # in 100-ms increments
# Return command to launch JRuby interpreter
def self.get_jruby
# 1. detect system JRuby
jruby = `which jruby`
return jruby.chomp if (! jruby.empty?)
# 2. detect RVM-managed JRuby
return nil if (`which rvm`).empty?
jruby = `rvm list`.split("\n").select { |rb| rb.include? 'jruby' }.first
return nil if (! jruby)
"rvm #{jruby.strip.split(' ').first} do ruby "
end
# Replace current process with JRuby running Tika Service
def self.exec(port)
jruby = get_jruby
Kernel.exec "#{jruby} #{DAEMON} #{port || ''}" if jruby
$stderr.puts "No JRUBY found!"
return 1
end
def self.start
return @pid if @pid
@pid = Process.fork do
exit(::Tika::Service::exec DEFAULT_PORT)
end
Process.detach(@pid)
connected = false
TIMEOUT.times do
begin
DRb::DRbObject.new_with_uri(DEFAULT_URI).to_s
connected = true
break
rescue DRb::DRbConnError
sleep 0.1
end
end
raise "Could not connect to #{DEFAULT_URI}" if ! connected
end
def self.stop
service_send(:stop_if_unused)
end
# this will return a new Tika DRuby connection
def self.service_send(method, *args)
begin
obj = DRb::DRbObject.new_with_uri(DEFAULT_URI)
obj.send(method, *args)
obj
rescue DRb::DRbConnError => e
$stderr.puts "Could not connect to #{DEFAULT_URI}"
raise e
end
end
def self.connect
service_send(:inc_usage)
end
def self.disconnect
service_send(:dec_usage)
end
end
end
The API provided by this module is straightforward: an application uses the start/stop class methods to execute or terminate the JRuby process as-needed, and the connect/disconnect methods to obtain (and free) a DRb Proxy object for the "remote" Tika::Service instance.
The only complications are in the detection of the JRuby interpreter (including support for RVM-managed interpreters) and the timeout while waiting for JRuby to initialize (which can take many seconds).
The test_app.rb example application is a simple proof-of-concept. It takes any number of filenames as arguments, and uses Tika to analyze the contents of each file. The results are printed to STDOUT via inspect.
#!/usr/bin/env ruby
require 'tika_service'
Tika::Service.start
begin
tika = Tika::Service.connect
ARGV.each { |x| File.open(x, 'rb') {|f| puts tika.parse(f.read).inspect} } if tika
ensure
Tika::Service.disconnect
Tika::Service.stop
end
The real meat of this technique lies in the tika_service module. This contains a Service class that will manage a JRuby application in a manner that conforms to an abstract Service API (start/stop, connect/disconnect) and which can be generalized to support any number of specific JRuby-based services.
Update: A generalized (but entirely untested) version is up on Github.
Thanks!
ReplyDeleteExcellent idea. We needed to access Ruby methods in a JRuby process, from a non-JRuby process. This code served as an excellent base for what we ended up hacking together last night.
Github for the jruby_bridge
Rubygems for the jruby_bridge
That is a very nice implementation. If I need to write a third Ruby-JRuby application (grog forbid), I'll give it a go.
Delete