Craic Computing Tech Tips: 2019

Thursday, April 25, 2019

Ruby serializable_hash removes ruby/object from YAML output

I use YAML as a convenient way to serialize data in a number of projects, most of which are written in Ruby and Rails

You generate the YAML representation with the .to_yaml method - really simple.

p = Paper.find(359)
puts p.to_yaml

But if the input is a ruby or rails object the output is prefixed with !ruby/object and that causes problems when you try and load that document in another script that does not know about this object

For example - here is an example of a Paper object from a Rails application.

--- !ruby/object:Paper
attributes:
id: 359
pmid: 7945531
title: 'Pharmacokinetics of a new human monoclonal antibody against cytomegalovirus.
Third communication: correspondence of the idiotype activity and virus neutralization
activity of the new monoclonal antibody, regavirumab in rat serum and its pharmacokinetics'
[...]

If I try and read this file in a separate script I get this error because that script has no concept of a Paper object.

y = YAML.load_file('test.yml')
ArgumentError: undefined class/module Paper
from /Users/jones/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0/gems/psych-2.0.5/lib/psych/class_loader.rb:53:in `path2class'
from /Users/jones/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0/gems/psych-2.0.5/lib/psych/class_loader.rb:53:in `resolve'
from /Users/jones/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0/gems/psych-2.0.5/lib/psych/class_loader.rb:45:in `find'
[...]

The way to strip off the ruby/object 'header' is to use serializable_hash before to_yaml.

---
abstract: TI-23 consists of lyophilized regavirumab (monoclonal antibody C23, MCA
[...]
id: 359
pmid: 7945531
publication_date: 1994-07-01
title: 'Pharmacokinetics of a new human monoclonal antibody against cytomegalovirus.
Third communication: correspondence of the idiotype activity and virus neutralization
activity of the new monoclonal antibody, regavirumab in rat serum and its pharmacokinetics'

It looks like the keys in the yaml block are output in alphabetical order.

It's a simple fix but I had to hunt around to find it.

Thursday, March 14, 2019

Using Google BERT to Classify Biomedical Papers

I have been using Google's BERT language representation model to help classify a certain type of biomedical paper based on abstracts in the PubMed database.

The scripts that I use for data preparation and a detailed walk through of the process are written up in a new GitHub repository that I have created.

My work was based on a blog post by Javed Qadrud-Din that I found extremely helpful.

With my dataset I am getting around 91% accuracy - which is much better than my earlier experiments with LSTM, CNN, etc approaches.

Craic Computing Tech Tips

Thursday, April 25, 2019

Ruby serializable_hash removes ruby/object from YAML output

Thursday, March 14, 2019

Using Google BERT to Classify Biomedical Papers

Contributors

Archive of Tips

Craic Computing Tech Tips

Thursday, April 25, 2019

Ruby serializable_hash removes ruby/object from YAML output

Thursday, March 14, 2019

Using Google BERT to Classify Biomedical Papers

Contributors

Subscribe To This Site

Archive of Tips