what's a good approach for identifying broken links?

… and you won’t, because CORS limitations will prevent you from checking these links inside a notebook iframe.
Off the top of my head I can think of these alternative approaches:

  1. Use one of the many online link checker tools that are provided by online SEO services. There’s a huge amount of these, so you may have to dig a bit until you find one that:
    • checks external links
    • handles dynamic pages (or allows you to paste text/html)
    • is free
  2. Use a browser extension that scans links for you. Prefer extensions that are hosted on Github, be wary of those that are offered by SEO companies.
  3. Use a JS bookmarklet that you can run in the current tab context. Requires programming and might not be worth the effort. No real benefit over extensions.
  4. Set up a glitch.me server that you can pass a list of links (or a blob of HTML) to check (e.g. via a POST request), and use the broken-link-checker package there. A benefit would be that you can integrate “check links” button into your notebooks (and even make it accessible only to yourself).

If you have several non-public notebooks that can contain stale links, you might also need a crawler. I’d recommend an external one that can handle dynamic pages, because scanning the actual sources would likely be a painful process (html, markdown, etc).

1 Like