Vega-Lite: External Image Problem

Can anyone please help me understand how to coerce Vega-Lite to show an external image in a detail panel?

The following notebook only seems to work for the oldest victim, Monroe Isadore (the mark at the top center) https://observablehq.com/@jonhelfman/vega-lite-external-image-problem

Others throw the following console error and then show a duplicate of the top scatterplot instead of an image!

“ERROR DOMException: Failed to execute ‘drawImage’ on ‘CanvasRenderingContext2D’: The HTMLImageElement provided is in the ‘broken’ state.”

I have tried changing the image URLs to use https as suggested here: https://github.com/altair-viz/altair/issues/407

Thanks in advance for any guidance!
–jon

The problem seems to be not with vega, but due to CORS settings on the source images, more details about the issue and solution here: https://observablehq.com/@mbostock/cross-origin-images

Console error: Access to image at ‘https://fatalencounters.org/wp-content/uploads/2020/06/Caine-Van-Pelt..jpg’ from origin ‘https://jonhelfman.static.observableusercontent.com’ has been blocked by CORS policy: No ‘Access-Control-Allow-Origin’ header is present on the requested resource.

Thanks a10k.

I understand that part of the problem is CORS-related.

But Vega-Lite API runs in a web notebook and has a way to set a URL for imageMark – how is anyone to use this feature ?

Is there a way to have Vega-Lite add the equivalent of image.crossOrigin = “anonymous”?

Or is there a way for Vega-Lite to handle the error and perhaps show a broken-image icon, instead of duplicating the first chart?

Thanks!
–jon

By using images from the same host (or a CORS enabled host). :wink:

Vega-lite already does this by default, but it still requires that the server hosting the images serves them with a matching Access-Control-Allow-Origin header.

If we try to fetch each image in your dataset and log the results, we get:

  • network error (including blocked requests): 12529
  • HTTP 200: 304
  • HTTP 404: 31
  • HTTP 403: 4

As you can see, only 304 images are actually usable. Because of the enormous number of images a (free) CORS proxy is also no longer an option.

Regardless of your expectations, there are other problems with this strategy:

  • You’re hotlinking images, which can put additional strain on affected servers or may even be prohibited.
  • There is no image attribution, which might pose a serious problem.

Code to produce the statistics:

{
  const counts = {};
  const count = status => { counts[status] = (counts[status] || 0) + 1 };

  let done;
  Promise.allSettled(data.map(d => 
    fetch(d.image, {method: 'head'})
      .then(r => { count(`HTTP ${r.status}`) })
      .catch(e => { count('network error') })
  )).then(() => done = true);

  while(!done) yield counts;
  return counts;
}

Thanks Mootari for the clear explanation and for including your test code.

Please note that this is not ‘my dataset’; I am trying to make some visualizations of the datasets linked to from here https://observablehq.com/@observablehq/black-lives-matter-and-racial-equality-resources?collection=@observablehq/equality

I am using a subset of a dataset hosted by fatalencounters.org – just the rows that have an image URL.

Actually, fatalencounters.org does not seem to be listed in the black-lives-matter-and-racial-equality-resources notebook, so I’m not sure how I came upon it now.

I was really hoping to include images of the victims because aggregate statistics of death can seem quite dehumanizing, while images of people can make the data personal and meaningful.

I don’t understand the nuances or how to work around the other problems you noted (hot-linking or image attribution), but I welcome any further ideas or insights.

Thanks again!
–jon

Mootari there is something else I do not understand about your answer:

My intention is not to ‘fetch each image’ but instead to show a single image at a time in a detail-on-demand type of interaction using vl.selectSingle().empty(‘none’)

Does that change any of your answers about a proxy, hotlinking, or image attributions?

I really do not know but it seems that it might.

Thank you very much!
–jon

@jonhelfman Sorry for the long delay, it took me some time to find the answers to your initial problem.

Debugging the minified Vega/Vega-Lite mashup turned out to be quite horrible, so I’ve provided an unminified version of the official VL notebook here:

Given the rather sparse Vega Lite API documentation, this made things a lot easier. And, as it turns out, you can solve most of your problems by switching to the SVG renderer.

To change the renderer, replace the code at the end of your cell with the following:

const element = await vl.vconcat(plot1, plot2)
    .data(data)
    .autosize({type: 'fit-x', contains: 'padding'})
    .render();
// .value contains the Vega view
element.value.renderer('svg').runAsync();
return element;

I stopped looking into replacing the loader, because this solution seemed sufficient.


Going back to the image/attribution problem (yes yes, I know, sorry about the earlier lecture :slightly_smiling_face: ), I tried to find solutions to preserve the images in ways that satisfy all constraints. My suggestion, if you want to contribute to the project in a big way, would be the following:

  1. For each image URL, try to find the website / page from which the image originated
  2. Go to the Internet Archive’s WaybackMachine, and enter the URL there so that it will get indexed (if it hasn’t already).
  3. Once/if the page is indexed, fetch the image URL from there, and document it (along with the parent source).

The benefits from this approach are:

  1. Site owners who object the use of an image can contact the Internet Archive to get their site blacklisted.
  2. For each image you have now at least one source for attribution.
  3. The archive ensures that the images will be preserved for the years to come.
  4. Images that are already missing might still be available through the archive.

Sadly the Internet Archive doesn’t serve images with a CORS header. If you still need to render to canvas, your options are:

  • Use a CORS proxy (you’d likely have to host one yourself, as the free ones will impose a rate limit that won’t suffice for the number of expected requests).
  • Host on dedicated image hosting sites like imgur.com. Ignoring any legal aspects, you’d probably have to upload each image via script and store the resulting URL in your data set.
  • Perhaps generate a tilemap of all images? With ~13000 entries you could probably fit all pictures into a single (very) large image, with a size of around 150x150 per portrait. Again, this would have to be scripted, and the tile offsets would have to be stored in your data set.