Well, I have a problem with this notebook. Mastodon instances list / neocarto | Observable. To collect the data, I had used heroku but it doesn’t work anymore. So, I moved to corsproxy.io. But the problem is that the result of https://corsproxy.io/?https://mastodon.help/instances/ is not se same as
Mastodon Help - Instances. This is not a problem with Observable. But there is a way to solve it?
Can you share an example / excerpt of the differences? Note that you can also inspect the server response (both headers and content) in your Developer Tools’ network tab.
Note that Servers - Mastodon has a much more accessible list that you can consume directly via its API, for example:
fetch('https://api.joinmastodon.org/servers').then(r => r.json())
But the api fetch only 169 servers only. I don’t see were I can add parameters. Anyway, I do not understand why the rendering of the site changes when using http://corsproxy.io
Again, please describe or share an example of how the rendering changes. Are you only talking about the missing styles, or is the actual HTML output different?
Mastodon Help - Instances give the first screenshot.
https://corsproxy.io/?https://mastodon.help/instances give the second one
and the code written to retrieve the data no longer works.
What happens if you try:
url = https://corsproxy.io/?' + encodeURIComponent('https://mastodon.help/instances')
Edit: I see that you have this listed in your “params” cell, so I would have to play to figure out how to better offer help (I am trying to get this working in your ‘scrapPage’ cell), and I see @mootari is responding. He is the one who pointed out to me that the corsproxy.io works a bit differently… So best to let the experts respond. However if you missed it as I did, scrolling to the bottom of the corsproxy.io site reveals this different construct for standard uses in notebooks
As hinted at by Aaron, the URL that you produce lacks any encoding and is thus invalid.
I suggest that you create proper abstractions:
proxyUrl = url => `https://corsproxy.io/?${encodeURIComponent(url)}`
getUrl = ({proxy, url, ...params}, page) => proxyUrl(`${url}?${new URLSearchParams({
...params,
ord: "tusersd",
p: page
})}`)
Once you have that you can modify your intro cell to inspect the data:
intro = {
let htmlString = await fetch(getUrl(params, 1)).then((res) => res.text());
let $ = cheerio.load(htmlString);
let data = $(".introe")[0]
.children.filter((d) => d.type == "tag")
.map((d) => d.children[0].data.replaceAll(".", ""));
return data;
// ...
}
which will give you
intro = Array(4) ["12,538", "5,673,157", "2,300,187", "566,287,683"]
What you can see here is that some numbers appear to be formatted in the browser’s locale.
Once again, you save my life
Thank you so much.