Skip to content

Reusing Providers

Featureform's API allows you to reuse already applied definitions. You can easily get pre-applied providers and resources to continue building off of.

To reuse a provider, simply use the associated get method for that provider.

Example

from featureform as ff
postgres = ff.get_postgres("prod-instance")

postgres.register_table(
    name="transactions",
    variant="2022",
    table="2022_transactions",
)

Available Providers

BigQuery

Get a BigQuery provider. The returned object can be used to register additional resources.

Examples:

bigquery = ff.get_bigquery("bigquery-quickstart")
transactions = bigquery.register_table(
    name="transactions",
    variant="kaggle",
    description="Fraud Dataset From Kaggle",
    table="Transactions",  # This is the table's name in BigQuery
)

Parameters:

Name Type Description Default
name str

Name of BigQuery provider to be retrieved

required

Returns:

Name Type Description
bigquery OfflineSQLProvider

Provider

K8s Runner

Get a k8s provider. The returned object can be used to register additional resources.

Examples:

k8s = ff.get_kubernetes("k8s-azure-quickstart")
transactions = k8s.register_file(
    name="transactions",
    variant="kaggle",
    description="Fraud Dataset From Kaggle",
    path="path/to/blob",
)

Parameters:

Name Type Description Default
name str

Name of k8s provider to be retrieved

required

Returns:

Name Type Description
k8s OfflineK8sProvider

Provider

MongoDB

Get a MongoDB provider. The returned object can be used to register additional resources.

Examples:

mongodb = ff.get_mongodb("mongodb-quickstart")

average_user_transaction.register_resources(
    entity=user,
    entity_column="user_id",
    inference_store=mongodb,
    features=[
        {"name": "avg_transactions", "variant": "quickstart", "column": "avg_transaction_amt", "type": "float32"},
    ],
)

Parameters:

Name Type Description Default
name str

Name of MongoDB provider to be retrieved

required

Returns:

Name Type Description
mongodb OnlineProvider

Provider

Postgres

Get a Postgres provider. The returned object can be used to register additional resources.

Examples:

postgres = ff.get_postgres("postgres-quickstart")
transactions = postgres.register_table(
    name="transactions",
    variant="kaggle",
    description="Fraud Dataset From Kaggle",
    table="Transactions",  # This is the table's name in Postgres
)

Parameters:

Name Type Description Default
name str

Name of Postgres provider to be retrieved

required

Returns:

Name Type Description
postgres OfflineSQLProvider

Provider

ClickHouse

Get a ClickHouse provider. The returned object can be used to register additional resources.

Examples:

clickhouse = ff.get_clickhouse("clickhouse-quickstart")
transactions = clickhouse.register_table(
    name="transactions",
    variant="kaggle",
    description="Fraud Dataset From Kaggle",
    table="Transactions",  # This is the table's name in ClickHouse
)

Parameters:

Name Type Description Default
name str

Name of ClickHouse provider to be retrieved

required

Returns:

Name Type Description
clickhouse OfflineSQLProvider

Provider

Redis

Get a Redis provider. The returned object can be used to register additional resources.

Examples:

redis = ff.get_redis("redis-quickstart")

average_user_transaction.register_resources(
    entity=user,
    entity_column="user_id",
    inference_store=redis,
    features=[
        {"name": "avg_transactions", "variant": "quickstart", "column": "avg_transaction_amt", "type": "float32"},
    ],
)

Parameters:

Name Type Description Default
name str

Name of Redis provider to be retrieved

required

Returns:

Name Type Description
redis OnlineProvider

Provider

Redshift

Get a Redshift provider. The returned object can be used to register additional resources.

Examples:

redshift = ff.get_redshift("redshift-quickstart")
transactions = redshift.register_table(
    name="transactions",
    variant="kaggle",
    description="Fraud Dataset From Kaggle",
    table="Transactions",  # This is the table's name in Postgres
)

Parameters:

Name Type Description Default
name str

Name of Redshift provider to be retrieved

required

Returns:

Name Type Description
redshift OfflineSQLProvider

Provider

S3

Get a S3 provider. The returned object can be used with other providers such as Spark and Databricks.

Examples:

s3 = ff.get_s3("s3-quickstart")
spark = ff.register_spark(
    name=f"spark-emr-s3",
    description="A Spark deployment we created for the Featureform quickstart",
    team="featureform-team",
    executor=emr,
    filestore=s3,
)

Parameters:

Name Type Description Default
name str

Name of S3 to be retrieved

required

Returns:

Name Type Description
s3 FileStore

Provider

Snowflake

Get a Snowflake provider. The returned object can be used to register additional resources.

Examples:

snowflake = ff.get_snowflake("snowflake-quickstart")
transactions = snowflake.register_table(
    name="transactions",
    variant="kaggle",
    description="Fraud Dataset From Kaggle",
    table="Transactions",  # This is the table's name in Postgres
)

Parameters:

Name Type Description Default
name str

Name of Snowflake provider to be retrieved

required

Returns:

Name Type Description
snowflake OfflineSQLProvider

Provider

Spark

Get a Spark provider. The returned object can be used to register additional resources.

Examples:

spark = ff.get_spark("spark-quickstart")
transactions = spark.register_file(
    name="transactions",
    variant="kaggle",
    description="Fraud Dataset From Kaggle",
    file_path="s3://bucket/path/to/file/transactions.parquet",  # This is the path to file
)

Parameters:

Name Type Description Default
name str

Name of Spark provider to be retrieved

required

Returns:

Name Type Description
spark OfflineSQLProvider

Provider