Transient error with dates and BigQuery

Hi,

I have a strange transient error with BigQuery and I have a hard time figuring out how to troubleshoot.

I have 3 queries in my observable notebook, the last query sometimes results in invalid dates.

  1. Sometimes my query fails resulting in a column with invalid date (all dates for the entire year are invalid) but the second column is fine.

  2. When I check it directly in BigQuery it works. It also works if I use a browser on a different computer.

  3. When it fails, it will work after a few hours or minutes.

The query is very simple, it just returns all the days in the year and sometimes returns Invalid Date, just now on Safari or Chrome on my iPad. I find no errors in the BQ logs:

-- This should not fail with Invalid date:
SELECT CURRENT_DATE() AS current_date

Here is the complete use case:

WITH all_dates AS (
  SELECT day
  FROM UNNEST(GENERATE_DATE_ARRAY(DATE_TRUNC(CURRENT_DATE(), YEAR), DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 YEAR), YEAR), INTERVAL 1 DAY)) AS day
)
SELECT all_dates.day AS dag, COALESCE(SUM(antall), 0) AS total
FROM all_dates
LEFT JOIN `my_long_table_of_stuff`
ON DATE(timestamp) = all_dates.day
WHERE all_dates.day >= DATE_TRUNC(CURRENT_DATE(), YEAR)
AND all_dates.day < DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 YEAR), YEAR)
GROUP BY all_dates.day
ORDER BY all_dates.day

Could it be a variation in how dates are handled based on which browser is used, or something else ?

Any suggestion on how I might make this stable as a rock?

As in, you’re logged into your Observable account on a different device and run the query there? Does the query still fail then in the first browser? If yes, are you able to tweak it to a point where it starts producing valid dates again?

Could it be that you’re encountering a rate limit?

Hi,

I did a minimal test.

  1. a single query in Chrome on my Mac, it succeeded getting the date.
  2. I opened the notebook in Safari on iPad, and it failed.
  3. I opened it in Safari on my Mac and it failed.
  4. i opened it in Chrome on my iPad and it failed.
  5. I did a page refresh in the fist chrome instance, it succeded getting the date.

Since it is only one query and does not push any limits, I wonder if it might be a cache somewhere ?

Thanks for the additional info, I can reproduce the problem. I’ll check if this may be a problem with our database proxy or the Node.JS client library for BigQuery.

Good news: this is not an issue with the query, merely with the table display in SQL cells. We’ll investigate, but in the meantime you can use the following workaround:

db = DatabaseClient("homepage-961")
results = db.sql`select current_date`

Edit: Sadly it’s not that simple. I’ll continue to look into it.


Edit 2: This appears to be a browser issue. BigQuery returns the date as 2023-07-19Z. In Chrome new Date("2023-07-19Z") produces a valid date, in Firefox and Safari it does not.

The workaround is to cast to datetime:

select cast(current_date as datetime)
2 Likes

Thank you for investigating this issue. I couldn´t figure out the workaround so I made myself a small Cloud function in GCP which runs the BQ-query and writes the result to json in Google Cloud Storage every hour, and then I retrieve the file from there using fetch in Observable.

It would be handy if I could use BQ directly in the future, as it makes iteration faster.

Try changing

SELECT all_dates.day AS dag, COALESCE(SUM(antall), 0) AS total

to

SELECT CAST(all_dates.day AS DATETIME) AS dag, COALESCE(SUM(antall), 0) AS total

You only need to ensure that every DATE field that is returned by the query gets cast to DATETIME.

Thank you for the clarification, here is my resulting query:

WITH all_dates AS (
  SELECT CAST(day AS DATETIME) as day
  FROM UNNEST(GENERATE_DATE_ARRAY(DATE_TRUNC(CURRENT_DATE(), YEAR), DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 YEAR), YEAR), INTERVAL 1 DAY)) AS day
)
SELECT all_dates.day AS dag, COALESCE(SUM(antall), 0) AS total
FROM all_dates
LEFT JOIN `myproject.mydb.mytable`
ON DATE(timestamp) = all_dates.day
WHERE all_dates.day >= CAST(DATE_TRUNC(CURRENT_DATE(), YEAR) AS DATETIME)
AND all_dates.day < CAST(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 YEAR), YEAR) AS DATETIME)
GROUP BY all_dates.day
ORDER BY all_dates.day