Skip to content

feat(bigquery): Support DATE-type event timestamp columns#6362

Merged
ntkathole merged 6 commits into
feast-dev:masterfrom
Jwrede:feat/bq-date-timestamp-type
May 14, 2026
Merged

feat(bigquery): Support DATE-type event timestamp columns#6362
ntkathole merged 6 commits into
feast-dev:masterfrom
Jwrede:feat/bq-date-timestamp-type

Conversation

@Jwrede
Copy link
Copy Markdown
Contributor

@Jwrede Jwrede commented May 3, 2026

What this PR does / why we need it:

When the event_timestamp column in BigQuery is a DATE type (not TIMESTAMP), the generated SQL wraps comparison values in TIMESTAMP(), causing a type mismatch error. This makes DATE-partitioned summary tables unusable without creating views or duplicate tables.

This PR adds an optional timestamp_field_type parameter to BigQuerySource. When set to "DATE", SQL generation uses DATE('YYYY-MM-DD') comparisons instead of TIMESTAMP('...'), both in direct queries (pull_latest_from_table_or_query, pull_all_from_table_or_query) and in the point-in-time join Jinja template.

Usage:

BigQuerySource(
    table="project:dataset.daily_features",
    timestamp_field="event_date",
    timestamp_field_type="DATE",
)

Changes:

  • Proto: add timestamp_field_type string field (field 28) to DataSource
  • DataSource base class: add timestamp_field_type attribute, equality check, and __init__ parameter
  • BigQuerySource: wire timestamp_field_type through __init__, from_proto, and _to_proto_impl
  • get_timestamp_filter_sql(): add "date_func" cast style that generates DATE('YYYY-MM-DD')
  • BigQueryOfflineStore: select cast style based on timestamp_field_type
  • Jinja template: conditional DATE() comparisons for DATE-type timestamp fields
  • FeatureViewQueryContext: propagate timestamp_field_type to template context

Backward-compatible: when timestamp_field_type is unset, behavior is unchanged.

Which issue(s) this PR fixes:

Fixes #2530 (part 2 -- DATE type event_timestamp support; part 1 was addressed by #6076)

How to test:

python -m pytest sdk/python/tests/unit/infra/offline_stores/test_bigquery.py -v

4 new tests added:

  • test_pull_latest_date_type_timestamp_field -- verifies DATE() cast in pull_latest
  • test_pull_all_date_type_timestamp_field -- verifies DATE() cast in pull_all
  • test_pull_latest_date_type_with_partition_column -- DATE type combined with partition pruning
  • test_bigquery_source_date_type_proto_roundtrip -- proto serialization roundtrip

@Jwrede Jwrede requested review from a team and sudohainguyen as code owners May 3, 2026 08:47
@Jwrede Jwrede requested review from HaoXuAI, nquinn408 and shuchu and removed request for a team May 3, 2026 08:47
@Jwrede
Copy link
Copy Markdown
Contributor Author

Jwrede commented May 13, 2026

Friendly ping @sudohainguyen -- this and #6365 have been open ~10 days. Happy to address feedback.

@ntkathole
Copy link
Copy Markdown
Member

@Jwrede please resolve the conflicts and fix linting

@Jwrede Jwrede force-pushed the feat/bq-date-timestamp-type branch from d8ccd7e to aed2012 Compare May 14, 2026 06:43
@Jwrede
Copy link
Copy Markdown
Contributor Author

Jwrede commented May 14, 2026

@ntkathole Rebased onto master and verified ruff check/format passes. All conflicts resolved.

@ntkathole
Copy link
Copy Markdown
Member

@Jwrede can you please regenerate protos using correct mypy-protobuf version pinned in CI ?

@Jwrede
Copy link
Copy Markdown
Contributor Author

Jwrede commented May 14, 2026

@ntkathole Done -- reset all non-DataSource generated files to match master, then hand-edited DataSource_pb2.pyi to add only the timestamp_field_type field in the existing import style. Only DataSource_pb2.py and DataSource_pb2.pyi differ from master now.

Jwrede added 6 commits May 14, 2026 14:32
When the event_timestamp column in BigQuery is a DATE type, the
generated SQL wraps comparison values in TIMESTAMP(), causing a type
mismatch error. This adds a timestamp_field_type parameter to
BigQuerySource that, when set to "DATE", generates DATE() comparisons
instead.

Closes feast-dev#2530 (part 2)

Signed-off-by: Jonathan Wrede <[email protected]>
The proto files were regenerated with protobuf 6.31.1 / grpcio-tools
1.80.0, which imports runtime_version -- a module that does not exist
in protobuf 4.25.x used by the project. Revert generated code to
4.25.1 format while keeping the new timestamp_field_type field.

Signed-off-by: Jonathan Wrede <[email protected]>
Mypy infers str from the ternary expression; annotate with the
exact Literal union so the call to get_timestamp_filter_sql passes
type checking.

Signed-off-by: Jonathan Wrede <[email protected]>
…text

Callers that do not use DATE-typed timestamp fields (e.g. Spark offline
store tests) should not be forced to pass timestamp_field_type. Adding
a default keeps the new field backward-compatible.

Signed-off-by: Jonathan Wrede <[email protected]>
A default value on timestamp_field_type breaks the
SparkFeatureViewQueryContext subclass because its non-default fields
(min_date_partition, max_date_partition) would follow a field with a
default. Instead, keep it required and update the Spark test to pass it.

Signed-off-by: Jonathan Wrede <[email protected]>
Reset all non-DataSource generated files to match master.
Only DataSource_pb2.py and DataSource_pb2.pyi contain our
timestamp_field_type additions (field 28). The .pyi stub
is hand-edited to match the existing import style used on
master.

Signed-off-by: Jonathan Wrede <[email protected]>
@ntkathole ntkathole force-pushed the feat/bq-date-timestamp-type branch from 27579ae to a28b283 Compare May 14, 2026 09:03
@ntkathole ntkathole merged commit 753dee5 into feast-dev:master May 14, 2026
22 of 25 checks passed
rpathade pushed a commit to rpathade/feast that referenced this pull request May 21, 2026
…6362)

* feat(bigquery): Support DATE-type event timestamp columns

When the event_timestamp column in BigQuery is a DATE type, the
generated SQL wraps comparison values in TIMESTAMP(), causing a type
mismatch error. This adds a timestamp_field_type parameter to
BigQuerySource that, when set to "DATE", generates DATE() comparisons
instead.

Closes feast-dev#2530 (part 2)

Signed-off-by: Jonathan Wrede <[email protected]>

* fix(bigquery): Use protobuf 4.25.x compatible generated code

The proto files were regenerated with protobuf 6.31.1 / grpcio-tools
1.80.0, which imports runtime_version -- a module that does not exist
in protobuf 4.25.x used by the project. Revert generated code to
4.25.1 format while keeping the new timestamp_field_type field.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix(bigquery): Add Literal type annotation for cast_style

Mypy infers str from the ternary expression; annotate with the
exact Literal union so the call to get_timestamp_filter_sql passes
type checking.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: Make timestamp_field_type default to None in FeatureViewQueryContext

Callers that do not use DATE-typed timestamp fields (e.g. Spark offline
store tests) should not be forced to pass timestamp_field_type. Adding
a default keeps the new field backward-compatible.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: Keep timestamp_field_type required in FeatureViewQueryContext

A default value on timestamp_field_type breaks the
SparkFeatureViewQueryContext subclass because its non-default fields
(min_date_partition, max_date_partition) would follow a field with a
default. Instead, keep it required and update the Spark test to pass it.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: regenerate protos matching upstream mypy-protobuf style

Reset all non-DataSource generated files to match master.
Only DataSource_pb2.py and DataSource_pb2.pyi contain our
timestamp_field_type additions (field 28). The .pyi stub
is hand-edited to match the existing import style used on
master.

Signed-off-by: Jonathan Wrede <[email protected]>

---------

Signed-off-by: Jonathan Wrede <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>
rpathade pushed a commit to rpathade/feast that referenced this pull request May 21, 2026
…6362)

* feat(bigquery): Support DATE-type event timestamp columns

When the event_timestamp column in BigQuery is a DATE type, the
generated SQL wraps comparison values in TIMESTAMP(), causing a type
mismatch error. This adds a timestamp_field_type parameter to
BigQuerySource that, when set to "DATE", generates DATE() comparisons
instead.

Closes feast-dev#2530 (part 2)

Signed-off-by: Jonathan Wrede <[email protected]>

* fix(bigquery): Use protobuf 4.25.x compatible generated code

The proto files were regenerated with protobuf 6.31.1 / grpcio-tools
1.80.0, which imports runtime_version -- a module that does not exist
in protobuf 4.25.x used by the project. Revert generated code to
4.25.1 format while keeping the new timestamp_field_type field.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix(bigquery): Add Literal type annotation for cast_style

Mypy infers str from the ternary expression; annotate with the
exact Literal union so the call to get_timestamp_filter_sql passes
type checking.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: Make timestamp_field_type default to None in FeatureViewQueryContext

Callers that do not use DATE-typed timestamp fields (e.g. Spark offline
store tests) should not be forced to pass timestamp_field_type. Adding
a default keeps the new field backward-compatible.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: Keep timestamp_field_type required in FeatureViewQueryContext

A default value on timestamp_field_type breaks the
SparkFeatureViewQueryContext subclass because its non-default fields
(min_date_partition, max_date_partition) would follow a field with a
default. Instead, keep it required and update the Spark test to pass it.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: regenerate protos matching upstream mypy-protobuf style

Reset all non-DataSource generated files to match master.
Only DataSource_pb2.py and DataSource_pb2.pyi contain our
timestamp_field_type additions (field 28). The .pyi stub
is hand-edited to match the existing import style used on
master.

Signed-off-by: Jonathan Wrede <[email protected]>

---------

Signed-off-by: Jonathan Wrede <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>
ntkathole added a commit that referenced this pull request May 23, 2026
* feat: Add enabled/disabled toggle for feature views

Signed-off-by: RutujaPathade <[email protected]>

* feat: Add demo noteboooks for users

Signed-off-by: ntkathole <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>

* feat: Add CLI enable/disable commands and registry metadata support

Signed-off-by: RutujaPathade <[email protected]>

* Added features

Signed-off-by: RutujaPathade <[email protected]>

* fix(compute-engine/local): Honor field_mapping on join keys in dedup + join nodes (#6395)

* fix: Apply field mapping to join keys in local compute engine nodes

When a batch source defines a `field_mapping` that renames an entity join
key (e.g. `USERID` -> `user_id`), the source-read node renames the columns
on the pulled Arrow table to their mapped names. Downstream `LocalDedupNode`
and `LocalJoinNode` then look up the *pre-mapping* names from
`column_info.join_keys`, which raises `KeyError: Index(['USERID'])` during
materialization (or returns an empty join).

Add a `join_keys_columns` property on `ColumnInfo` that mirrors the existing
`timestamp_column` / `created_timestamp_column` properties — returning join
keys translated through `field_mapping` — and use it from the dedup and
join nodes.

Fixes #5942.

Signed-off-by: 1fanwang <[email protected]>

* test: also cover LocalJoinNode field_mapping case

Signed-off-by: 1fanwang <[email protected]>

---------

Signed-off-by: 1fanwang <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>

* feat: Add Prometheus gauges for FeatureStore installation telemetry (#6354)

Signed-off-by: ntkathole <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>

* docs: Rename Atlas Vector Search to MongoDB Vector Search and fix code examples

Signed-off-by: jvincent-mongodb <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>

* feat(dynamodb): Use ProjectionExpression when requested_features is set

The requested_features parameter was accepted by online_read and
online_read_async but never used -- DynamoDB always fetched all
features stored in the values map regardless. Add a
ProjectionExpression to BatchGetItem requests when requested_features
is provided, reducing data transfer, latency, and read costs.

Fixes #6058

Signed-off-by: Jonathan Wrede <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>

* fix(dynamodb): Fix mypy type for _build_projection_expression return

The return dict contains both str and Dict[str, str] values, so the
return type must be Dict[str, Any] not Dict[str, str].

Signed-off-by: Jonathan Wrede <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>

* fix(bigquery): Enable list inference for parquet loads in offline_write_batch

When pushing features with array/list types (e.g. STRING_LIST) to
BigQuery via offline_write_batch, the data arrives as empty arrays
because BigQuery's parquet loader does not infer list structure by
default. Set parquet_options.enable_list_inference = True on the
LoadJobConfig so array columns are written correctly.

Fixes #5845

Signed-off-by: Jonathan Wrede <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>

* fix(trino): Clean up temporary entity tables after retrieval (#6381)

* fix(trino): Clean up temporary entity tables after retrieval

TrinoOfflineStore.get_historical_features() creates a temporary table
for the entity DataFrame but never drops it, leaking tables
indefinitely. Apply the same context manager pattern used by
BigQuery, Redshift, and Athena offline stores: wrap the query in a
generator that issues DROP TABLE IF EXISTS in a finally block.

Fixes #6306

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: sort imports for ruff compliance

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: decouple temp table cleanup from query access

Avoid dropping the temporary entity table on to_sql() calls.
Previously, every method used a context manager that dropped
the table on exit, so calling to_sql() before to_df() would
destroy the table and cause subsequent queries to fail.

Now the query is stored as a plain string and cleanup is
handled by a dedicated _drop_temp_table() method called only
after query execution (to_df, to_trino). A __del__ fallback
ensures cleanup if execution methods are never called. The
_cleaned_up flag makes the drop idempotent.

Signed-off-by: Jonathan Wrede <[email protected]>

---------

Signed-off-by: Jonathan Wrede <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>

* feat(bigquery): Support DATE-type event timestamp columns (#6362)

* feat(bigquery): Support DATE-type event timestamp columns

When the event_timestamp column in BigQuery is a DATE type, the
generated SQL wraps comparison values in TIMESTAMP(), causing a type
mismatch error. This adds a timestamp_field_type parameter to
BigQuerySource that, when set to "DATE", generates DATE() comparisons
instead.

Closes #2530 (part 2)

Signed-off-by: Jonathan Wrede <[email protected]>

* fix(bigquery): Use protobuf 4.25.x compatible generated code

The proto files were regenerated with protobuf 6.31.1 / grpcio-tools
1.80.0, which imports runtime_version -- a module that does not exist
in protobuf 4.25.x used by the project. Revert generated code to
4.25.1 format while keeping the new timestamp_field_type field.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix(bigquery): Add Literal type annotation for cast_style

Mypy infers str from the ternary expression; annotate with the
exact Literal union so the call to get_timestamp_filter_sql passes
type checking.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: Make timestamp_field_type default to None in FeatureViewQueryContext

Callers that do not use DATE-typed timestamp fields (e.g. Spark offline
store tests) should not be forced to pass timestamp_field_type. Adding
a default keeps the new field backward-compatible.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: Keep timestamp_field_type required in FeatureViewQueryContext

A default value on timestamp_field_type breaks the
SparkFeatureViewQueryContext subclass because its non-default fields
(min_date_partition, max_date_partition) would follow a field with a
default. Instead, keep it required and update the Spark test to pass it.

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: regenerate protos matching upstream mypy-protobuf style

Reset all non-DataSource generated files to match master.
Only DataSource_pb2.py and DataSource_pb2.pyi contain our
timestamp_field_type additions (field 28). The .pyi stub
is hand-edited to match the existing import style used on
master.

Signed-off-by: Jonathan Wrede <[email protected]>

---------

Signed-off-by: Jonathan Wrede <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>

* fix: Fixes for ray source

Signed-off-by: ntkathole <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>

* feat: Expose registry endpoints on feature server for MCP access

Mount the existing REST registry routers under /registry on the feature
server so that fastapi_mcp automatically exposes registry introspection
(list/get for entities, feature views, data sources, feature services,
permissions, projects, saved datasets, lineage, search) as MCP tools.

The RegistryServer is created in-process from store.registry — no
external registry server is required. Auth is enforced via
inject_user_details on every mounted router.

Made-with: Cursor
Signed-off-by: Chaitany patel <[email protected]>
Made-with: Cursor
Signed-off-by: RutujaPathade <[email protected]>

* fix: Revert state propagation to always update in _update_metadata_fields

Signed-off-by: RutujaPathade <[email protected]>

* fix: Recompile protos for protobuf 4.x compatibility and fix state machine to be opt-in

Signed-off-by: RutujaPathade <[email protected]>

* feat: Add unit tests for state machine and clean up lazy imports in registry

Signed-off-by: RutujaPathade <[email protected]>

* fix: Address review comments for feature view state management

Signed-off-by: RutujaPathade <[email protected]>

* fix: Resolve integration test failures in apply loop

Signed-off-by: RutujaPathade <[email protected]>

* fix: Resolve integration test failures in apply loop

Signed-off-by: RutujaPathade <[email protected]>

* Apply suggestion from @ntkathole

Co-authored-by: Nikhil Kathole <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>

* fix: Resolve review comments for feature_store

Signed-off-by: RutujaPathade <[email protected]>

* fix: Resolve review comments for feature_views.py

Signed-off-by: RutujaPathade <[email protected]>

* feat: Add FeatureStore methods and update describe for enabled/state

Signed-off-by: RutujaPathade <[email protected]>

* fix: Add type: ignore comments for mypy on BaseFeatureView attr access

Signed-off-by: RutujaPathade <[email protected]>

* fix: Remove REST API endpoints for enable/disable/set-state (deferred to follow-up PR)

Signed-off-by: RutujaPathade <[email protected]>

---------

Signed-off-by: RutujaPathade <[email protected]>
Signed-off-by: ntkathole <[email protected]>
Signed-off-by: 1fanwang <[email protected]>
Signed-off-by: jvincent-mongodb <[email protected]>
Signed-off-by: Jonathan Wrede <[email protected]>
Signed-off-by: Chaitany patel <[email protected]>
Co-authored-by: RutujaPathade <[email protected]>
Co-authored-by: ntkathole <[email protected]>
Co-authored-by: Stefan Wang <[email protected]>
Co-authored-by: jvincent-mongodb <[email protected]>
Co-authored-by: Jonathan Wrede <[email protected]>
Co-authored-by: Jwrede <[email protected]>
Co-authored-by: Chaitany patel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle BigQuery partitions when event_timestamp is not the partition column, deal with event_timestamp columns of DATE type

3 participants