Hiding data source

I have been a free user of observablehq for many years, and will be moving to the individual pro tier on December 1.
Is there any option for hiding the data source, or disable the download data? To be clear, say, I have a csv which is the basis of all visualizations in a notebook. Can I keep other users from downloading the csv file?

Thanks in advance for responding,
Rini (rbhttchr@illinois.edu)

1 Like

Hi Rini. This isn’t directly possible, but I think it’s worth explaining why not.

Everything you build on Observable runs in the browser. That means that the code the generates the visualizations runs on the computer of the person looking at the visualization. That code needs access to the csv file, and so the viewer’s browser needs to download the file. If the file was restricted, then the code would not have what it needs to generate the visualizations, and wouldn’t be able to render the file.

It would in theory be possible for Observable to disable downloading through the sidebar. I personally think that this would be a false sense of security however. The users would still have access to the data, they would just have to download it from another place (the code on the page instead of a button on the page).

One method that you could use if you want to hide the original data is to use a private tool to generate a derived data set that is ok to publish and can be used to generate your visualization. You could then only publicize that derived data and keep the original data private. You could even use a private notebook to do this derivation.

3 Likes

Thanks so much, mythmon. Would you please elaborate on or give an example of the derived data? I am sure it would be very useful to lots of users.
Rini

It’s a general technique that’s commonly used in data processing. I can give a toy example here. Lets say we have a list of people with their eye colors.

people = [
  {name: "Alex", eye: "blue"},
  {name: "Brianna", eye: "green"},
  {name: "Charles", eye: "brown"},
  {name: "Davis", eye: "green"},
];

And then lets say we want to have a bar chart showing how common each eye color is.

Plot.plot({
  marks: [Plot.barY(people, Plot.groupX({ y: "count" }, { x: "eye" }))]
})

But this would leak private data of names. Instead, we could make a derived data set:

derived = Object.fromEntries(
  d3.rollup(
    people,
    (ds) => ds.length,
    (d) => d.eye
  )
)

That would produce the value

[
  {"color":"blue","count":1},
  {"color":"green","count":2},
  {"color":"brown","count":1}
]

You can then choose “Download JSON” on that cell to get that output.

image

You can then use that derived data set as an attachment in a brand new notebook.

Plot.plot({
  marks: [
    Plot.barY(await FileAttachment("derived
.json").json(), {
      x: "color",
      y: "count"
    })
  ]
})

That new notebook will produce the same output, but won’t reveal the private data.

Of course, the process of producing the intermediate data is going to depend on your exact situation. It also won’t update as the original source data changes, since it is of course just a snapshot.

1 Like

Thanks for your answer, @mythmon

I have a related question at Slack and am curious for your thoughts. Thanks!