Get Spark to use your AWS credentials file for S3
Spark can access files in S3, even when running in local mode, given AWS
credentials. By default, with s3a
URLs, Spark will search for credentials
in a few different places:
Spark can access files in S3, even when running in local mode, given AWS
credentials. By default, with s3a
URLs, Spark will search for credentials
in a few different places:
I’ve worked with both old-school ETL tools (Informatica, SSIS), and more recently worked with Spark. My takeaway is that AWS Glue is a mash-up of both concepts in a single tool.
I just passed the exam for AWS Solution Architect - Associate level. I wanted to share some observations and tips from my personal experience with the exam. (Disclaimer: this is only my personal experience so YMMV.)
I was surprised how little code I needed to get a Spring Boot application listening to an Amazon SQS queue.
I previously wrote about integrating MSTR with SAML.
MicroStrategy’s directions for enabling single sign-on with SAML
are actually pretty good. MSTR bundles Spring Security with SAML support and provides directions for how to
enable it by editing web.xml
.
MicroStrategy offers two different ways to connect to databases with ad hoc SQL, bypassing the managed schema (metrics and attributes):
MicroStrategy doesn’t provide a lot of guidance on how to manage Web plugins in source control, or build and deployment automation. However, it turns out that Maven WAR overlays are the perfect solution to MSTR web customizations.
RDS instances in AWS do not get a static IP address. This is usually a good thing, not a problem. This provides flexibility to preserve availability while the physical RDS host may shift around for resizing, or failing over to a different availability zone (AZ). In either case, clients connect to RDS by hostname, and AWS magically updates the hostname to point at the IP address for the currently active host.
Sometimes in Spark you will see code like
In music, you can often write the same note two different ways, for example, B-flat and A-sharp correspond to the same key on a piano keyboard. When you use which depends on surrounding context. A chord C/E/G/B-flat is a C dominant 7th and resolves to an F chord. The same chord written as C/E/G/A-sharp is an augmented 6th and resolves to B major. So which way the chord is written tells you something about where it’s going next.
I have a Lenovo desktop PC at home and some of the diagnostics on the hard drive were failing, like the Targeted Read Test. The drive is a 2TB Seagate.
I found it helpful to create Spark UDFs to make it easier to migrate logic in SQL from another database like SQL Server.
I began to learn Scala specifically to work with Spark. The sheer number of language features in Scala can be overwhelming, so, I find it useful to learn Scala features one by one, in context of specific use cases. In a sense I’m treating Scala like a DSL for writing Spark jobs.
I managed to get Spark to run on Windows in local mode, and to submit jobs to an EMR cluster in AWS.