Hiding data source

rbhttchr · October 13, 2022, 5:08pm

I have been a free user of observablehq for many years, and will be moving to the individual pro tier on December 1.
Is there any option for hiding the data source, or disable the download data? To be clear, say, I have a csv which is the basis of all visualizations in a notebook. Can I keep other users from downloading the csv file?

Thanks in advance for responding,
Rini (rbhttchr@illinois.edu)

mythmon · October 13, 2022, 5:55pm

Hi Rini. This isn’t directly possible, but I think it’s worth explaining why not.

Everything you build on Observable runs in the browser. That means that the code the generates the visualizations runs on the computer of the person looking at the visualization. That code needs access to the csv file, and so the viewer’s browser needs to download the file. If the file was restricted, then the code would not have what it needs to generate the visualizations, and wouldn’t be able to render the file.

It would in theory be possible for Observable to disable downloading through the sidebar. I personally think that this would be a false sense of security however. The users would still have access to the data, they would just have to download it from another place (the code on the page instead of a button on the page).

One method that you could use if you want to hide the original data is to use a private tool to generate a derived data set that is ok to publish and can be used to generate your visualization. You could then only publicize that derived data and keep the original data private. You could even use a private notebook to do this derivation.

rbhttchr · October 13, 2022, 8:44pm

Thanks so much, mythmon. Would you please elaborate on or give an example of the derived data? I am sure it would be very useful to lots of users.
Rini

mythmon · October 18, 2022, 11:23pm

It’s a general technique that’s commonly used in data processing. I can give a toy example here. Lets say we have a list of people with their eye colors.

people = [
  {name: "Alex", eye: "blue"},
  {name: "Brianna", eye: "green"},
  {name: "Charles", eye: "brown"},
  {name: "Davis", eye: "green"},
];

And then lets say we want to have a bar chart showing how common each eye color is.

Plot.plot({
  marks: [Plot.barY(people, Plot.groupX({ y: "count" }, { x: "eye" }))]
})

But this would leak private data of names. Instead, we could make a derived data set:

derived = Object.fromEntries(
  d3.rollup(
    people,
    (ds) => ds.length,
    (d) => d.eye
  )
)

That would produce the value

[
  {"color":"blue","count":1},
  {"color":"green","count":2},
  {"color":"brown","count":1}
]

You can then choose “Download JSON” on that cell to get that output.

You can then use that derived data set as an attachment in a brand new notebook.

Plot.plot({
  marks: [
    Plot.barY(await FileAttachment("derived
.json").json(), {
      x: "color",
      y: "count"
    })
  ]
})

That new notebook will produce the same output, but won’t reveal the private data.

Of course, the process of producing the intermediate data is going to depend on your exact situation. It also won’t update as the original source data changes, since it is of course just a snapshot.

w_rose_ai · July 29, 2024, 6:32pm

Thanks for your answer, @mythmon

I have a related question at Slack and am curious for your thoughts. Thanks!

Topic		Replies	Views
understanding secrets Community Help	4	658	April 2, 2019
View-only mode? Feedback	16	5040	September 28, 2023
How does private data hosting work for teams? Community Help	12	2469	July 1, 2019
Observable's Practical Purpose Feedback	3	564	June 30, 2021
Is there a way to run Observable on my local server Community Help	7	7045	March 20, 2021

Hiding data source

Related topics