Vega-Lite: External Image Problem

jonhelfman · June 28, 2020, 6:44pm

Can anyone please help me understand how to coerce Vega-Lite to show an external image in a detail panel?

The following notebook only seems to work for the oldest victim, Monroe Isadore (the mark at the top center) https://observablehq.com/@jonhelfman/vega-lite-external-image-problem

Others throw the following console error and then show a duplicate of the top scatterplot instead of an image!

“ERROR DOMException: Failed to execute ‘drawImage’ on ‘CanvasRenderingContext2D’: The HTMLImageElement provided is in the ‘broken’ state.”

I have tried changing the image URLs to use https as suggested here: https://github.com/altair-viz/altair/issues/407

Thanks in advance for any guidance!
–jon

a10k · June 28, 2020, 7:16pm

The problem seems to be not with vega, but due to CORS settings on the source images, more details about the issue and solution here: https://observablehq.com/@mbostock/cross-origin-images

Console error: Access to image at ‘https://fatalencounters.org/wp-content/uploads/2020/06/Caine-Van-Pelt..jpg’ from origin ‘https://jonhelfman.static.observableusercontent.com’ has been blocked by CORS policy: No ‘Access-Control-Allow-Origin’ header is present on the requested resource.

jonhelfman · June 28, 2020, 8:06pm

Thanks a10k.

I understand that part of the problem is CORS-related.

But Vega-Lite API runs in a web notebook and has a way to set a URL for imageMark – how is anyone to use this feature ?

Is there a way to have Vega-Lite add the equivalent of image.crossOrigin = “anonymous”?

Or is there a way for Vega-Lite to handle the error and perhaps show a broken-image icon, instead of duplicating the first chart?

Thanks!
–jon

mootari · June 28, 2020, 10:42pm

By using images from the same host (or a CORS enabled host).

Vega-lite already does this by default, but it still requires that the server hosting the images serves them with a matching Access-Control-Allow-Origin header.

If we try to fetch each image in your dataset and log the results, we get:

network error (including blocked requests): 12529
HTTP 200: 304
HTTP 404: 31
HTTP 403: 4

As you can see, only 304 images are actually usable. Because of the enormous number of images a (free) CORS proxy is also no longer an option.

Regardless of your expectations, there are other problems with this strategy:

You’re hotlinking images, which can put additional strain on affected servers or may even be prohibited.
There is no image attribution, which might pose a serious problem.

Code to produce the statistics:

{
  const counts = {};
  const count = status => { counts[status] = (counts[status] || 0) + 1 };

  let done;
  Promise.allSettled(data.map(d => 
    fetch(d.image, {method: 'head'})
      .then(r => { count(`HTTP ${r.status}`) })
      .catch(e => { count('network error') })
  )).then(() => done = true);

  while(!done) yield counts;
  return counts;
}

jonhelfman · June 28, 2020, 11:16pm

Thanks Mootari for the clear explanation and for including your test code.

Please note that this is not ‘my dataset’; I am trying to make some visualizations of the datasets linked to from here https://observablehq.com/@observablehq/black-lives-matter-and-racial-equality-resources?collection=@observablehq/equality

I am using a subset of a dataset hosted by fatalencounters.org – just the rows that have an image URL.

Actually, fatalencounters.org does not seem to be listed in the black-lives-matter-and-racial-equality-resources notebook, so I’m not sure how I came upon it now.

I was really hoping to include images of the victims because aggregate statistics of death can seem quite dehumanizing, while images of people can make the data personal and meaningful.

I don’t understand the nuances or how to work around the other problems you noted (hot-linking or image attribution), but I welcome any further ideas or insights.

Thanks again!
–jon

jonhelfman · June 29, 2020, 4:16am

Mootari there is something else I do not understand about your answer:

My intention is not to ‘fetch each image’ but instead to show a single image at a time in a detail-on-demand type of interaction using vl.selectSingle().empty(‘none’)

Does that change any of your answers about a proxy, hotlinking, or image attributions?

I really do not know but it seems that it might.

Thank you very much!
–jon

mootari · August 3, 2020, 9:37pm

@jonhelfman Sorry for the long delay, it took me some time to find the answers to your initial problem.

Debugging the minified Vega/Vega-Lite mashup turned out to be quite horrible, so I’ve provided an unminified version of the official VL notebook here:

Given the rather sparse Vega Lite API documentation, this made things a lot easier. And, as it turns out, you can solve most of your problems by switching to the SVG renderer.

To change the renderer, replace the code at the end of your cell with the following:

const element = await vl.vconcat(plot1, plot2)
    .data(data)
    .autosize({type: 'fit-x', contains: 'padding'})
    .render();
// .value contains the Vega view
element.value.renderer('svg').runAsync();
return element;

I stopped looking into replacing the loader, because this solution seemed sufficient.

Going back to the image/attribution problem (yes yes, I know, sorry about the earlier lecture ), I tried to find solutions to preserve the images in ways that satisfy all constraints. My suggestion, if you want to contribute to the project in a big way, would be the following:

For each image URL, try to find the website / page from which the image originated
Go to the Internet Archive’s WaybackMachine, and enter the URL there so that it will get indexed (if it hasn’t already).
Once/if the page is indexed, fetch the image URL from there, and document it (along with the parent source).

The benefits from this approach are:

Site owners who object the use of an image can contact the Internet Archive to get their site blacklisted.
For each image you have now at least one source for attribution.
The archive ensures that the images will be preserved for the years to come.
Images that are already missing might still be available through the archive.

Sadly the Internet Archive doesn’t serve images with a CORS header. If you still need to render to canvas, your options are:

Use a CORS proxy (you’d likely have to host one yourself, as the free ones will impose a rate limit that won’t suffice for the number of expected requests).
Host on dedicated image hosting sites like imgur.com. Ignoring any legal aspects, you’d probably have to upload each image via script and store the resulting URL in your data set.
Perhaps generate a tilemap of all images? With ~13000 entries you could probably fit all pictures into a single (very) large image, with a size of around 150x150 per portrait. Again, this would have to be scripted, and the tile offsets would have to be stored in your data set.

Topic		Replies	Views
Debugging Cross-origin Image Problem with Vega-Lite Community Help	4	276	May 24, 2023
[solved] vega-datasets blocked by AdBlock Community Help	2	774	July 31, 2019
How to create or download high resolution figure using Vega-Lite rather than SVG? Community Help	10	1608	November 25, 2020
Vega lite api miss-behaving Community Help	7	421	June 6, 2022
SVG export from a vega-lite api chart: error on opening the file	10	1393	January 25, 2021

Vega-Lite: External Image Problem

Related topics