By using images from the same host (or a CORS enabled host).
Vega-lite already does this by default, but it still requires that the server hosting the images serves them with a matching Access-Control-Allow-Origin header.
If we try to fetch each image in your dataset and log the results, we get:
network error (including blocked requests): 12529
HTTP 200: 304
HTTP 404: 31
HTTP 403: 4
As you can see, only 304 images are actually usable. Because of the enormous number of images a (free) CORS proxy is also no longer an option.
Regardless of your expectations, there are other problems with this strategy:
You’re hotlinking images, which can put additional strain on affected servers or may even be prohibited.
There is no image attribution, which might pose a serious problem.
I am using a subset of a dataset hosted by fatalencounters.org – just the rows that have an image URL.
Actually, fatalencounters.org does not seem to be listed in the black-lives-matter-and-racial-equality-resources notebook, so I’m not sure how I came upon it now.
I was really hoping to include images of the victims because aggregate statistics of death can seem quite dehumanizing, while images of people can make the data personal and meaningful.
I don’t understand the nuances or how to work around the other problems you noted (hot-linking or image attribution), but I welcome any further ideas or insights.
Mootari there is something else I do not understand about your answer:
My intention is not to ‘fetch each image’ but instead to show a single image at a time in a detail-on-demand type of interaction using vl.selectSingle().empty(‘none’)
Does that change any of your answers about a proxy, hotlinking, or image attributions?
@jonhelfman Sorry for the long delay, it took me some time to find the answers to your initial problem.
Debugging the minified Vega/Vega-Lite mashup turned out to be quite horrible, so I’ve provided an unminified version of the official VL notebook here:
Given the rather sparse Vega Lite API documentation, this made things a lot easier. And, as it turns out, you can solve most of your problems by switching to the SVG renderer.
To change the renderer, replace the code at the end of your cell with the following:
I stopped looking into replacing the loader, because this solution seemed sufficient.
Going back to the image/attribution problem (yes yes, I know, sorry about the earlier lecture ), I tried to find solutions to preserve the images in ways that satisfy all constraints. My suggestion, if you want to contribute to the project in a big way, would be the following:
For each image URL, try to find the website / page from which the image originated
Go to the Internet Archive’s WaybackMachine, and enter the URL there so that it will get indexed (if it hasn’t already).
Once/if the page is indexed, fetch the image URL from there, and document it (along with the parent source).
The benefits from this approach are:
Site owners who object the use of an image can contact the Internet Archive to get their site blacklisted.
For each image you have now at least one source for attribution.
The archive ensures that the images will be preserved for the years to come.
Images that are already missing might still be available through the archive.
Sadly the Internet Archive doesn’t serve images with a CORS header. If you still need to render to canvas, your options are:
Use a CORS proxy (you’d likely have to host one yourself, as the free ones will impose a rate limit that won’t suffice for the number of expected requests).
Host on dedicated image hosting sites like imgur.com. Ignoring any legal aspects, you’d probably have to upload each image via script and store the resulting URL in your data set.
Perhaps generate a tilemap of all images? With ~13000 entries you could probably fit all pictures into a single (very) large image, with a size of around 150x150 per portrait. Again, this would have to be scripted, and the tile offsets would have to be stored in your data set.