Skip to the content.

Experiment with OpenAI for lab mapping to LOINC codes

I did an experiment using Azure OpenAI to map free-text short descriptions of lab tests along with their units (“hemoglobin g/dL”) to industry-standard LOINC codes. I was surprised to see I got a decent result with minimal effort: somewhere in the ballpark of 70% match against known results, with only a few hours playing around with the prompt.

Read More

Spark encoders, implicits and custom encoders

One of the nice things about Spark SQL is that you can reference datasets as if they were like statically-typed collections of Scala case classes. However, Spark datasets do not natively store case class instances; Spark has its own internal format for representing rows in datasets. Conversion happens on demand in something called an encoder. When you write code like this:

Read More

Run azcopy from AWS Fargate

Microsoft provides the azcopy tool for copying data between Azure storage accounts and AWS S3. If you’re otherwise serverless or fully containerized, and don’t already have an EC2 instance up, it makes sense to run azcopy in a Fargate task.

Read More