Skip to the content.

Spark encoders, implicits and custom encoders

One of the nice things about Spark SQL is that you can reference datasets as if they were like statically-typed collections of Scala case classes. However, Spark datasets do not natively store case class instances; Spark has its own internal format for representing rows in datasets. Conversion happens on demand in something called an encoder. When you write code like this:

Read More

Run azcopy from AWS Fargate

Microsoft provides the azcopy tool for copying data between Azure storage accounts and AWS S3. If you’re otherwise serverless or fully containerized, and don’t already have an EC2 instance up, it makes sense to run azcopy in a Fargate task.

Read More

Spark mistakes I made

I built a Spark process to extract a SQL Server Database to Parquet files on S3, using an EMR cluster. I am using as much parallelism as possible, extracting both multiple tables at a time and splitting tables up into partitions to be extracted in parallel. My goal is to size the EMR cluster and number of total parallel threads to the point where I saturate the SQL Server.

Read More

Another look back at C

I just worked on C code again for the first time in a long while. I’ve worked almost exclusively in some kind of managed runtime environment – Java, C#, Python, or Javascript in a browser. So working on C again was like going through a bit of a time warp. I spent a while trying to figure out how to connect the old stuff I used to know with the new. Like, are gcc, gdb etc. all still there? And how do I get them to work with VS.Code, and on MacOS?

Read More

SQL Server tempdb on EC2 instance storage, on Linux

Normally I’m a big fan of using managed AWS services like RDS, Redshift and Aurora, so you don’t have to be in the business of managing your own database. Still, there are some some edge cases where you need finer-grained control over storage, and running a DB like SQL Server on EC2 makes sense. AWS makes a SQL Server AMI for Linux available on the marketplace.

Read More