Serverless Cells

tomlarkworthy · January 21, 2021, 11:40am

Serverless cells are http endpoints that read a notebook when called and run the executable code within. This has a few nice properties:

API endpoints are not a black box, you can see what code is running before sending data to it, lowering the risk of data shenanigans that is too common these days.
You can fork your own serverside implementation of an API service using Observable notebook features.
You can program backend services using front end technology
You can learn how others approach backend programming from tangible, live implementations.
You can distribute work and exploit parallelism.
Its faster to redeploy code than other Function-as-a-services

Currently live at

but I expect I will move it to

soon.

tomlarkworthy · January 21, 2021, 11:41am

Console logging and network requests of serverside cells can be viewed/searched in Google Cloud Logging now. This makes serverless cells development experience on parity with other FaaS implementations.

tomlarkworthy · January 24, 2021, 9:39pm

Add yet-another-cors-proxy using serverless cells. Very simple implementation because the proxy is also a web environment so you are just forwarding the arguments of a fetch call to execute remotely “as is”.

(apologies to Alec Glassford / Observable for stealing his joke from his so-fetch notebook.)

Notebooks have a rate limit, but you can fork and run in your own subdomain/team with distinct resource limits rather than share that one. This is simpler than creating an account at glitch IMHO, as its semi native to Observable.

BTW I saw a similar concept to Serverless cells at #1 on HN recently: Gist.cafe – Execute Gists of Python Node Deno C# Dart Swift Go Kotlin Java VB F# | Hacker News so its not a terrible idea

chonghorizons · January 25, 2021, 3:05am

Very cool. I’m new to ObservableHQ, so knowing that this can be done is pretty great.

I copied the example 1 and got it working painlessly! (at Play with serverless cells / chonghorizons / Observable)

-[] next step will be to implement it as a webAPI point for a COVID calculator: Covid Individual Event Risk Calculator v0.2 / chonghorizons / Observable
-[] get it working with req.params
-[] I’ll return some json.

tomlarkworthy · January 26, 2021, 9:56pm

I have upgraded the reactive testing library to export a Test Anything Protocol (TAP) report link using serverless cells.

So now you can see if a notebook is passing its tests remotely via a simple URL.

The link at the bottom is:
https://endpointservice.web.app/notebooks/@tomlarkworthy/testing/deployments/tests
which is a plain text TAP report.

TAP version 13
1..4
ok 1 - async function
not ok 2 - sync fail
ok 3 - sync pass
ok 4 - throw exception

tomlarkworthy · February 3, 2021, 8:48pm

Federated login to allow login even after forking.

Often you want data behind a security layer, so a login is a good choice for providing security. However, usually is tied to a specific notebook origin (Due to Oauth 2.0 redirect URL). This then breaks the ability for other people to improve your notebook by forking, because the notebook login feature breaks after forking.

So I have tried to improve on this by building Federated Login / Endpoint Services / Observable

If you are the user associated with a domain, you can fork federated login notebooks and still login. However, third parties won’t be able to login, as there is a risk you might booby trap a notebook and mess with their data. You can of course merge back with the original (making it generally availible), or 3rd parties can fork your notebook if they trust you, and because then the notebook is on their domain they will be able to login to it.

To get this working serverless cells have had encrypted cookie support added. Federate login works by exchange single domain login, for a signed SameSite=None cookie serverside. That cookie can follow the user around across notebooks.

As I said, the major constrain I have artificially added is login federation only works for domain I know you have proved you have write access to

update: seems that this use of cookies will break in 2022 for Chrome and already has broken for Safari.

tomlarkworthy · February 14, 2021, 3:08pm

Now you can deploy static sites from Observable!

You call “deployStaticFile” in a notebook which returns a UI widget for triggering partial Netlify deploys. Some cool stuff is done behind the scenes to avoid the painful slow deploy times of other static site technologies.

I started developing this a while ago but realised it would not work nicely until I solved federated login. Federated login means you can access your content database from your subdomain even though endpointservices is the infra provider. So if you want to migrate from Netlify, but keep your content database, this is entirely possible. If you want to script your deploys or whatever, you can write whatever tooling you want.

So I provide you some basics on static website deployment, but you can create your own. Everything has been implemented under an ISC license*, all within notebooks, so you are free to customize, and you can do so without ever leaving the Observiverse

Please send me your feature requests. Thanks to @chonghorizons for helping to improve the testing framework another level.

tomlarkworthy · February 22, 2021, 9:21am

I am trying to decide what to add next to the Serverless cells. Options are

Usage/Billing, not a very interesting feature but it might make people feel comfortable about the incentives behind offering a serverless runtime if the pricing is known. (FYI, I think I will add a fixed invocation price to the headline Cloud Run pricing, i.e. usage based costs plus) Fixed plus, because I do not want to be disincentivize on making the runtime faster. I also intend to have a pay upfront model as I think an issue with many clouds is the potential for unbounded spend.
Cron, call and endpoint regularly. I think serverless cells will really shine for Dataviz when you can do periodic tasks like collect data daily, or generate test reports, or tweet.
Improved debugging. The technical feature I am most excited about adding a step debugger to the serverless runtime. This is something other FaaS platforms don’t have, and its enabled because the runtime is a V8 runtime. Visibility is a sore spot for most serverless runtimes (that and deploy times which we are already ahead on) so that would address a real industry problem.
Open source the runtime. The serverless runtime is the only closed source part at the moment. This is not the long term position but it needs some abstraction and cleanup work before I can consider open sourcing it. That won’t add anything but it might make people feel more comfortable using the runtime if they know they can run their own version.
Other. There is a ton of other stuff I could do. Latency improvements, regionalisation, resource customisation.

Next feature for Serverless Cells

Pricing
Cron
Step debugger
Open Source
Something else

0 voters

keystroke · February 26, 2021, 7:33am

I think I finally unlocked the ability to comment. Sorry if I left too many comments on your notebook

I think open source the backend would be a great next step! Or just info about how to setup your own. Sort of like setting-up your own CORS proxy just for your user subdomain on observablehq. I think it would be awesome to be building-out a development environment on top of ovservabehq. You can write a notebook to be your API client to deploy something to some cloud service, then another to be a dashboard to monitor it in real-time, testing, etc. and then various different API clients in notebooks and frontend interfaces, all of which can then be embedded into a website on your own domain for customers, or even directly from observablehq would work most of the time. And then people can build various “dev plugins” and things like your CI testing framework that others can use, and get a cool ecosystem going!

keystroke · February 28, 2021, 10:21pm

I messed around with this a bit more and got below working for me. I changed your approach and set things up so I can “deploy” a new server cell from a notebook without importing anything:

const express = require('express');
const cors = require('cors');
const puppeteer = require("puppeteer");

const host = process.env.HOST || '0.0.0.0';
const port = process.env.PORT || 8080;

herokuChromeOptions = {
    args: [
        '--incognito',
        '--no-sandbox',
        '--single-process',
        '--no-zygote',
    ],
};

const app = express();

const x = '([\-0-9@A-Z_a-z]+)';
app.get(`/api/:user${x}/:notebook${x}/:cell${x}?`, cors(), async (req, res) => {
    const start = new Date();
    const { method, url, params: { user, notebook, cell = 'app' } } = req;
    try {
        const content = getRunNoteBookScript({ user, notebook, cell });
        const browser = await puppeteer.launch(herokuChromeOptions);
        const page = await browser.newPage();
        await page.addScriptTag({ type: 'module', content });
        await page.waitForFunction(`window['${cell}']`, { timeout: 5000 });
        const result = await page.evaluate(
            (req, cell) => window[cell](req),
            { url, method },
            cell);
        // return string as html, otherwise as json
        switch (typeof result) {
            case 'string':
                log('html');
                res.send(result);
                break;
            default:
                log('json');
                res.json(result);
                break;
        }
    } catch (error) {
        log('error', error.message);
        res.status(500).json({ error: error.message });
    }
    function log(resultType, resultData) {
        const end = new Date();
        const duration = ((new Date() - start) / 1000).toPrecision(3);
        console.log(`(+${duration}s) ${method} [${resultType}] ${url}\n${resultData || ''}`.trim());
    }
});

function getRunNoteBookScript({ user, notebook, cell } = {}) {
    return `
import { Runtime } from "https://cdn.jsdelivr.net/npm/@observablehq/runtime@4/dist/runtime.js";
import define from "https://api.observablehq.com/${user}/${notebook}.js?v=3";
new Runtime().module(define, name => {
    if (name === '${cell}') return {
        fulfilled(value) { 
            window['${cell}'] = value;
        },
        rejected(error) {
            window['${cell}'] = () => { throw error; };
        }
    };
});`
}

app.listen(port, host);

So then I can make use of this in a new link-shared notebook:

app = async function(req) {
    return { message: 'Hello world!', req }
}

keystroke · March 1, 2021, 1:05am

I made a few more changes. Still need to pass body / url / search params.

Added a 3-second timeout for all evaluations (rate limiting still needs to be added though)
The browser is launched on server startup and instance re-used
- A new page is launched for each request
- Alternative setup could re-use page for all requests if you dedicate your server to specific notebook app for better perf
Return type processing adjusted
- More work can be done here to refine what properties of request are set; Tom emulates the look and feel of express, whereas my approach is simplified but restricted.
- I wanted this to feel more like ObservableHQ, so if my cell returns “html<p>Hello!” then I’ll get an html response back, just like I would on the site.
- You could still adjust my approach below to check the returned object for other properties, for example perhaps you check for a “result.res” object and copy status / headers / content to send for more control over the response.

const express = require('express');
const cors = require('cors');
const puppeteer = require("puppeteer");

const host = process.env.HOST || '0.0.0.0';
const port = process.env.PORT || 8080;

herokuChromeOptions = {
    args: [
        '--incognito',
        '--no-sandbox',
        '--single-process',
        '--no-zygote',
    ],
};

const app = express();

const browser = puppeteer.launch(herokuChromeOptions);

const x = '([\-0-9@A-Z_a-z]+)';
app.get(`/api/:user${x}/:notebook${x}/:cell${x}?`, cors(), async (req, res) => {
    const start = new Date();
    const { method, url, params: { user, notebook, cell = 'app' } } = req;
    try {
        const content = getRunNoteBookScript({ user, notebook, cell });
        const page = await (await browser).newPage();
        await page.addScriptTag({ type: 'module', content });
        const handle = await page.waitForFunction(
            async (req, cell) => {
                const func = window[cell];
                if (!func) return false;
                let result;
                try { result = await func(req); }
                catch (error) { return { error: error.message }; }
                if (!result) return {};
                if (typeof result === 'string') return { html: result };
                if (result.outerHTML) return { html: result.outerHTML };
                return { json: result };
            },
            { timeout: 3000 },
            { url, method },
            cell);

        const result = await handle.jsonValue();
        page.close();

        if (result.error) {
            log('error', result.error);
            res.status(500).json({ error: result.error });
        } else if (result.html) {
            log('html');
            res.send(result.html);
        } else if (result.json) {
            log('json');
            res.json(result.json);
        } else {
            log('empty');
            res.status(204).end();
        }
    } catch (error) {
        log('error', error.message);
        res.status(500).json({ error: error.message });
    }
    function log(resultType, resultData) {
        const end = new Date();
        const duration = ((new Date() - start) / 1000).toPrecision(3);
        console.log(`(+${duration}s) ${method} [${resultType}] ${url}\n${resultData || ''}`.trim());
    }
});

function getRunNoteBookScript({ user, notebook, cell } = {}) {
    return `
import { Runtime } from "https://cdn.jsdelivr.net/npm/@observablehq/runtime@4/dist/runtime.js";
import define from "https://api.observablehq.com/${user}/${notebook}.js?v=3";
new Runtime().module(define, name => {
    if (name === '${cell}') return {
        fulfilled(value) { 
            window['${cell}'] = value;
        },
        rejected(error) {
            window['${cell}'] = () => { throw error; };
        }
    };
});`
}

app.listen(port, host);

tomlarkworthy · March 1, 2021, 1:44am

This is cool, I do think about a simpler DX, so I like the simpler return processing, but the requirement for the user to write “deploy …” into a notebook is a deliberate choice to collect consent from the domain owner. Its quite elegant because that consent is recorded in the version log, and can only be performed by a logged in Observable user that owns the domain. There are no quibbles about whether the copywriter owner for the notebook wants the service to read it.

If you let people read cells arbitrary through your API you run the risk of taking the heat for being someone else’s scraper. So having the user define exactly where they want an external 3rd party service to integrate is a feature IMHO.

I try to be super transparent with my service and set the browser user agent too, so Observable ops team can filter it out if they want.

I will open source mine too as the next job. I use this as the rate limiter: Exponentially Weighted Moving Rate Estimation with Fast Initialization / Tom Larkworthy / Observable

keystroke · March 1, 2021, 2:02am

Good point about the opt-in, I was also think that it helps you see who is using it as your notebook would be linked to others by pulling-in the deploy function which is a nifty thing for observable to track. However I prefer approach where people host their own instance of the server rather than using a shared one, just like with cors-anywhere. This removes consent problem and you can do some optimizations like re-using a page and simplifying the URLs. Of course that means restricting the notebooks to be your own notebooks then, but we cant support link-shared draft versions in that approach (not a big deal).

I also thought about not using puppeteer then and having some sort of control in the notebook (would be grayed-out if you have unpublished changes) to do a “push” deployment to heroku, that is setup with a build task to npm install the latest version of your notebook, and thus it could run directly without puppeteer. That gives better perf (assuming you don’t do browser stuff), but you don’t want to be running npm install on untrusted code in your server, so would have to be a private instance that only pulls from your own notebooks.

The goal would be to setup a notebook with info to “deploy your own instance of this to heroku”. I also like the idea of using private server and replicating the secrets syntax to backend and frontend, so the code in your notebooks and server to use secrets looks the same, and then we could enable flows like debugging in your notebook that calls the function locally instead of making request to the “live” version which isn’t update yet. Also imagine that the deploy could push your referenced secret values from observable to sync them with the heroku env vars! AH IM SO EXCITED this could be really cool. I need to review the security of the oauth2 controls to push to heroku / github / etc. though.

tomlarkworthy · March 1, 2021, 7:41pm

People voted for the runtime to be open sourced*. Here it is serverlesscells/index.mjs at main · endpointservices/serverlesscells · GitHub

I have subsequently realised I should have used the term source-available license. I have one of those licenses aimed to prevent AWS/GCP/Azure directly competing with my service, which is not an OSI approved “Open Source” license. Sorry for the confusion. Hopefully people still find it useful as I definitely want people to run their own infra/customise and self host if they wish (personal or enterprise).

tomlarkworthy · March 4, 2021, 1:21pm

I wrote a quickstart guide to show how easy it is to create your own HTTP endpoint

Now I need to figure out what to do next, I quite liked the poll last time, let’s see what people choose this time. There were not so many votes so your opinion carries weight.

I am thinking I should maybe do something more #dataviz focussed, so for your consideration is hooking this up to Google Colab so its easier to leverage the python ecosystem. I am still gonna vote cron though as that will help me do stuff like latency monitoring and CI pipelines.

What to build next for Endpoint Services?

cron (regular tasks)
colab
step debugger
billing
something else

0 voters

tomlarkworthy · March 7, 2021, 10:26pm

OK! Cron it is!

Schedule regular tasks including stuff like “Every hour during work hours”

The expectation is you will want to run other notebooks serving serverless-cells on a schedule, but the functionality will poll any URL actually so you can use it to automate things outside of the platform too.

I am personally quite excited about this. Serverless cells are quite slow but I have held off optimising them until I can make scientific measurements. With cron I can test them regularly and identify

the overall reliability
their latency

As always, this feature was 100% implemented in notebook code. I have not written a single line of code in any other environment! This whole thing is bootstrapped off the severless-cell runtime. This time I put the “backend” code in its own notebook Cron backend / Endpoint Services / Observable which is where we mint access_tokens using a service account, login to Firebase and call GCP APIs using a GAPI client. As its my 3rd round of doing a service like this I think I am beginning to boil it down to a nice pattern.

Oh the other cool thing is that cron does not require a login. You have to publish a notebook containing your desired cron schedule, and then sync it. I do not need a login as the presence of the config in the notebook is enough to demonstrate an authorised person wanted it. So its petty easy to try out.

tomlarkworthy · March 12, 2021, 12:57pm

I wrote a tutorial on how to use cron to drive a twitter bot with Zapier

I realised sometimes you want to dynamically generate an image, so Serverless Cells now support serving binary data by res.send(<ArrayBuffer>) which is a pretty close equivalent to the node.js res.send(<Buffer>).

Being able to serve data should make it much simpler to get data out of a notebook from an always-on direct link (see Serverless Cells / Endpoint Services / Observable for more details)

tomlarkworthy · March 19, 2021, 8:12pm

I built the latency monitor I wanted

I find it very cool that a full end-to-end prober and dashboard can be expressed (securely) in a notebook.

tomlarkworthy · March 21, 2021, 7:08pm

This morning I added regions! us-central1 (Iowa) and asia-east1 (Taiwan) are new (default is Netherlands).

I upgraded the latency monitor to measure the performance of the different regions, and Asia was TERRIBLE, 22 second cold starts!!! (they were all pretty bad though)

So I spend the day tuning, and I am happy to say the latency is now around 2 seconds for Asia, can be as low as 800ms, and 700ms for US. It was the serverside-cell dependancy tree that was slowing things down a lot.

The effect on the latency monitor is clear

Anyway, now the inline tests and examples for serverless cells are hosted in their own notebook (Serverless Cell Tests / Endpoint Services / Observable) so they do not affect serverless cell users.

Enjoy!

tomlarkworthy · March 23, 2021, 12:36pm

When building the twitter bot/latency monitor/TAP continuous integration tester and things the rule that serverless cells cannot call other cells regularly got in the way.

So I have finally relaxed that! Now the TAP Report links work properly even for serverless cell based services https://endpointservice.web.app/notebooks/@tomlarkworthy/fetchp/deploys/tests/mods/O

Serverless cells can now be designated “orchestrator”, “external” or “terminal”. Stuff like the TAP report generator is an orchestrator and is allowed to call the other two. Terminal cells, like the “send to Zapier” call can be called by any other cell but cannot call cells themselves.

With these roles in place it is still impossible to create self triggering loops, but the design space is much larger now.

The point of all this is to build an OPEN and SAFE cloud.

Serverless cells are OPEN because they ONLY run public source code. You can always audit them. You can always fork and self host them. You are never boxed in.

Serverless cells are SAFE, because they cannot self-trigger and expose you to unbounded financial risk (see We Burnt $72K testing Firebase + Cloud Run and almost went Bankrupt [Part 1] | Milkie Way). Not that they charge money anyway, but at some point they will and I want to indemnifying you against unbounded risk.

Topic		Replies	Views
Notebooks for generating and publishing versioned datasets Feedback	2	557	May 23, 2021
Deploying notebooks as web services Show and Tell	4	1236	June 7, 2020
A Better (?) AWS Serverless Notebook Show and Tell	2	716	July 19, 2021
Run your notebooks on a server without you	4	1808	August 26, 2018
any plans for cors-anywhere.observablehq.com? Feedback	0	1861	September 19, 2018

Related topics