We want to self-host Observable Framework on Azure. In the Azure DevOps pipeline we’re using a container image based on the framework-runtime to build the project . We then publish the build using Azure Static Web Apps.
Most of the data loaders are using DuckDB + Ibis to read from an Azure Data Lake Storage. However, some of the data loaders really need much more memory and CPU than what the build agents have. Any suggestions on how to deal with this? I guess we are forced to construct data loaders that kick off remote jobs, e.g., using Azure Functions or some other serverless service.
Yeah, what you’re dealing with is basically the classic CI/CD vs. heavy-data problem.
Azure DevOps hosted agents are meant to be small and tempoary, so anything that uses a lot of memory or CPU with DuckDB and Ibis will have trouble there.
You can choose from a few things: First, move those heavy data loaders to a remote execution context.
Serverless (Azure Functions) works for short jobs that can handle cold starts, but you’ll quickly hit limits when you need to process a lot of data.
Two, think about setting up an Azure Batch or Azure Container Instance just for these loaders.
You can control the vCPU and RAM and scale them up or down for each job.
Three, if you want something that works more like traditional ETL, you can run the heavy loads asyncronously on a dedicated VM or AKS pod and then send the results back to ADLS for your Observable build to pick up.
Don’t fight the DevOps agent; think of it as a light orchestrator and let it do the hard work.