I have been using Google's BERT language representation model to help classify a certain type of biomedical paper based on abstracts in the PubMed database.
The scripts that I use for data preparation and a detailed walk through of the process are written up in a new GitHub repository that I have created.
My work was based on a blog post by Javed Qadrud-Din that I found extremely helpful.
With my dataset I am getting around 91% accuracy - which is much better than my earlier experiments with LSTM, CNN, etc approaches.
A collection of computer systems and programming tips that you may find useful.
Brought to you by Craic Computing LLC, a bioinformatics consulting company.
Thursday, March 14, 2019
Subscribe to:
Posts (Atom)