GSoC 2015 Abinash Meher: Ruby bindings for CSymPy

Ruby bindings to the CSymPy C++ symbolic library

Motivation

The motivation was to have a computer algebra system for Ruby. At the beginning, the idea was to use ruby wrapper for sage which uses Pynac, based on GiNaC, because it is much faster.
However, from the benchmarks, CSymPy is much faster than Pynac. The reason Pynac is slower is probably because it uses Sage's GMP functionality. In case of CSymPy there is no Python being called from the C++ code. The code is entirely in C++. It was for this reason that there's no need to worry about any potential Python overhead. It also allows us to use it from other languages, with the help of wrappers.
Another option was to wrap GiNaC directly, but CSymPy also seems to be a bit faster than GiNaC itself by now. The code has been designed in such a way so that one can play with various data structures and really make sure the code is fast.
There is one small limitation however. Compared to GiNaC, CSymPy is missing specialized polynomial manipulation and series expansion and pattern matching, all of which are being worked on and will get the functionality soon. Most part of it might get completed during this summer as part of GSoC projects. Execution

Tool to generate the wrappers
There were many choices like Ruby inline, Rice, FFI, SWIG and manually using Ruby C API . From the first 3 FFI seems to be the fastest as benchmarked in this link. However, with the FFI method, we get a segfault at runtime while running Ruby tests (which we will be adding whatever tool we use). Here manual method is advantageous since it is compiled, so we immediately get a compile error on Travis if we change an interface in CSymPy, so we know we have to fix it, while merging the patch. It will be a huge comfort. Also, the manual method is preferred while dealing with a lot of pointers in the C++ code. Because either way we would end up doing as much work. Also the code is clearer in the manual method. With SWIG, C++ references are supported, but SWIG transforms them back into pointers, as mentioned here. It's a feature that we might be needing some time later. So, going with the manual method seems the wisest.
File structure
Currently all the python wrappers are in a folder csympy under the root folder. The idea is to keep all the wrapper code at a single place, i.e. inside the src in separate folders like src/c, src/python and src/ruby. The same logic can be applied to other languages later like src/julia, etc.
Each folder can then be configured to a ready to install package like the python wrappers as a pip-package and the ruby wrappers as a gem.
Exposing the C++ functions to C with extern "C"
Ruby provides interfacing to only C functions. For that we need to expose the C++ code through extern. The functions can now be called from C. Let's take an example C++ class, using one header file (Test.hpp) for demonstration

//File: Test.hpp
class Test {
  public:
      void testfunc();
      Test(int i);

  private:
      int testint;
};

and one implementation file (Test.cpp)

//File: Test.cpp
#include <iostream>
#include "Test.hpp"

using namespace std;

Test::Test(int i) {
  this->testint = i;
}

void Test::testfunc() {
  cout << "test " << this->testint << endl;
}

This is how csympy is structured, C++ code that does the actual job.

We will have some glue code. This code is something in-between C and C++. Again, we will have one header file (TestWrapper.h, just .h as it doesn't contain any C++ code)

//File: TestWrapper.h
typedef void CTest;

#ifdef __cplusplus
extern "C" {
#endif

CTest * test_new(int i);
void test_testfunc(const CTest *t);
void test_delete(CTest *t);
#ifdef __cplusplus
}
#endif

and the function implementations (TestWrapper.cc, .cc as it contains C++ code):

//File: TestWrapper.cc
#include "TestWrapper.h"
#include "Test.hh"

extern "C" {

  CTest * test_new(int i) {
      Test *t = new Test(i);
      return (CTest *)t;
  }

  void test_testfunc(const CTest *test) {
      Test *t = (Test *)test;
      t->testfunc();
  }

  void test_delete(CTest *test) {
      Test *t = (Test *)test;
      delete t;//TODO: static cast this
  }
}

These are the wrappers that expose the C++ functions to C.

Writing the extensions
I will be following the documentation for this from README.EXT. Also the Chris Lalance's blog.
mkmf and extconf.rb The extconf.rb configures a Makefile that will build our extension based. The extconf.rb must check for the necessary functions, macros and shared libraries your extension depends upon. The extconf.rb must exit with an error if any of these are missing. It requires the mkmf or the MakeMakefile module for that matter.

require 'mkmf'

#Gives the ability to easily use alternate compilers to build the extension
RbConfig::MAKEFILE_CONFIG['CC'] = ENV['CC'] if ENV['CC']

extension_name = 'csympy'

#Check to see if the csympy library required to build this extension exists.
#Typically, we would want to use libcsympy installed, including the header files
unless pkg_config('libcsympy')
      raise "libcsympy not found"
end

have_func('useful_function', 'libcsympy/lib.h')#if found, will define a HAVE_USEFUL_FUNCTION in extconf.h
have_type('useful_type', 'libcsympy/lib.h')#if found, will define a HAVE_TYPE_USEFUL_TYPE in extconf.h

#creates the header file extconf.h, based on the results from all of the previous have_*() functions.
#The extconf.h file will be included by all of the C files in the project 
#to gain access to the HAVE_* macros that extconf defines.
create_header

create_makefile(extension_name)

Data structures in CSymPy
CSymPy uses some STL data types like std::vector and std::map. These map cleanly to the
Exception Handling
All the C++ code that is to be interfaced will have to be wrapped with a C function first, to be used with the Ruby C API. However in C, there doesn't seem to be a way to handle the exceptions that the underlying C++ code might throw. Therefore all the exceptions will have to be handled in the underlying C++ code itself. Since the ruby C API defined exceptions can also be called from C++ the idea is to catch the exceptions from the underlying C++ code and rethrow it with rb_raise after copying the required information from the exception passed. For example, let's say that the member function Test::test_func() throws an exception. This is how we will be changing the TestWrapper.cc file

//the previous function definition of test_testfunc will be replaced by
void test_testfunc(const CTest *test) {
    try{
        Test *t = static_cast<Test *>(test);
        t->testfunc();
    }catch(ExceptionClass1 e1){
        VALUE exception;
        //Copy the required information from e1
        //to exception;
        exception=rb_exc_new2(rb_eException, "Error message");
        rb_iv_set(exception, "@additional_info",
                 rb_str_new2("information from e1"));
        rb_exc_raise(exception);
    }
}

This way all the exception handling is done within C wrapper implementation, and the ruby wrappers just interface the C functions.

Garbage Collection
Ruby 2.1 introduced a generational garbage collector (called RGenGC). RGenGC (mostly) keeps compatibility. Generally, the use of the technique called write barriers is required in extension libraries for generational GC (http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29). RGenGC works fine without write barriers in extension libraries. However, caring about write barrier can improve the performance of the GC.
But we won't need to if we are using built-in types from Ruby C API. Most built-in types support write barrier. If we are using the T_DATA datatype from ruby, which doesn't have write barriers, we might have to write one. But that is strongly discouraged since writing write barriers are easy to introduce critical bugs and there is too much risk associated with it. We won't use T_DATA unless there's no other way.
For this to work properly it's advised not to touch the pointers directly. Rather use the C-API's methods to acquire pointers to the internal data structures.
Making this a gem
So that it's easier to install. User will have the choice if he wants to install along with csympy or only the wrappers. A check can be included to automate this. Also a functionality to compile the extensions separately.

Tools I am going to use

I am using a system dual-booted with Ubuntu 14.04.2 LTS and Windows 8.1. Following are the configurations on my machine

abinashmeher999@JARVIS:~$ ruby --version
ruby 2.0.0p481 (2014-05-08 revision 45883) [x86_64-linux]
abinashmeher999@JARVIS:~$ gem --version
1.8.23
abinashmeher999@JARVIS:~$ rake --version
rake, version 10.0.4
abinashmeher999@JARVIS:~$ rspec --version
3.2.2
abinashmeher999@JARVIS:~$ bundle --version
Bundler version 1.3.5
abinashmeher999@JARVIS:~$ rdoc --version
rdoc 3.9.5

I am using RVM to manage the ruby versions in my system. Besides that, I will be using vim as my primary text editor. Apart from that I will be using the following

Rake-compiler
rake-compiler is a set of rake tasks for automating extension building. Rake eases the process of making extensions by its 'rake/extensiontask'. If a proper project structure is followed, generating extensions requires only a few lines of code.
RSpec
RSpec tests it the way a developer would like it to, to make sure all works as he intended them. More like the unit tests. Whereas, Cucumber tests it the way a client/consumer would expect from the software. Like the integration tests. Most of the places, people suggest that both go hand in hand. But since the underlying C++ code is tested elsewhere, integration testing won't be needed.
Bundler
Bundler provides a consistent environment for Ruby projects by tracking and installing the exact gems and versions that are needed. It ensures that the gems you need are present in development, staging, and production.
RDoc
For documentation of code and tests. I would also like to look into to generate ri(ruby interactive) documentation if time permits.

Timeline (tentative)

Community Bonding period (27th April - 24th May) and Week 1

I don't know everybody yet, neither did I get enough time to know at least a few developers in the community before the application. This will be a great time to get to know everybody and the fellow students. My summer vacation will start from 29th of April.

Week 2

I can get up to speed by reading the documentation and getting to know the practices followed in the community. I will also use this time to read the documentation for the tools I will be using, so that I am aware of the best way to achieve the result and make informed decisions.
Writing the C wrappers with extern.

Week 3

Week 4

Week 5

Writing the source code for the wrappers.
Documenting it on the go.

Week 6

Week 7

Week 8

Testing and Week 9

Writing and carrying out tests using RSpec
Might need to go back to make some changes to accommodate the coding style in ruby

GSoC 2015 Abinash Meher: Ruby bindings for CSymPy

Ruby bindings to the CSymPy C++ symbolic library

Motivation

Tools I am going to use

Timeline (tentative)

Community Bonding period (27th April - 24th May) and Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Testing and Week 9

Week 10

Week 11

Buffer Period and Week 12 and 13

Relevant Issues/ Discussions and References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally