I use YAML as a convenient way to serialize data in a number of projects, most of which are written in Ruby and Rails
You generate the YAML representation with the .to_yaml method - really simple.
p = Paper.find(359)
puts p.to_yaml
But if the input is a ruby or rails object the output is prefixed with !ruby/object and that causes problems when you try and load that document in another script that does not know about this object
For example - here is an example of a Paper object from a Rails application.
--- !ruby/object:Paper
attributes:
  id: 359
  pmid: 7945531
  title: 'Pharmacokinetics of a new human monoclonal antibody against cytomegalovirus.
    Third communication: correspondence of the idiotype activity and virus neutralization
    activity of the new monoclonal antibody, regavirumab in rat serum and its pharmacokinetics'
[...]
If I try and read this file in a separate script I get this error because that script has no concept of a Paper object.
y = YAML.load_file('test.yml')
ArgumentError: undefined class/module Paper
 from /Users/jones/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0/gems/psych-2.0.5/lib/psych/class_loader.rb:53:in `path2class'
 from /Users/jones/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0/gems/psych-2.0.5/lib/psych/class_loader.rb:53:in `resolve'
 from /Users/jones/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0/gems/psych-2.0.5/lib/psych/class_loader.rb:45:in `find'
[...]
The way to strip off the ruby/object 'header' is to use serializable_hash before to_yaml.
---
abstract: TI-23 consists of lyophilized regavirumab (monoclonal antibody C23, MCA
[...]
id: 359
pmid: 7945531
publication_date: 1994-07-01
title: 'Pharmacokinetics of a new human monoclonal antibody against cytomegalovirus.
  Third communication: correspondence of the idiotype activity and virus neutralization
  activity of the new monoclonal antibody, regavirumab in rat serum and its pharmacokinetics'
It looks like the keys in the yaml block are output in alphabetical order.
It's a simple fix but I had to hunt around to find it.
A collection of computer systems and programming tips that you may find useful.
 
Brought to you by Craic Computing LLC, a bioinformatics consulting company.
Thursday, April 25, 2019
Thursday, March 14, 2019
Using Google BERT to Classify Biomedical Papers
I have been using Google's BERT language representation model to help classify a certain type of biomedical paper based on abstracts in the PubMed database.
The scripts that I use for data preparation and a detailed walk through of the process are written up in a new GitHub repository that I have created.
My work was based on a blog post by Javed Qadrud-Din that I found extremely helpful.
With my dataset I am getting around 91% accuracy - which is much better than my earlier experiments with LSTM, CNN, etc approaches.
The scripts that I use for data preparation and a detailed walk through of the process are written up in a new GitHub repository that I have created.
My work was based on a blog post by Javed Qadrud-Din that I found extremely helpful.
With my dataset I am getting around 91% accuracy - which is much better than my earlier experiments with LSTM, CNN, etc approaches.
Subscribe to:
Comments (Atom)
 
 
 
 Posts
Posts
 
