Skip to content

Reusing Sources

get_source() can be used to get a reference to an already registered primary source or transformation. The returned object can be used to register features and labels or be extended off of to create additional transformations.

Examples:

Registering a transformation from an existing source.

spark = ff.get_spark("prod-spark")
transactions = ff.get_source("transactions","kaggle")

@spark.df_transformation(inputs=[transactions]):
def customer_count(transactions):
    return transactions.groupBy("CustomerID").count()

Registering a feature from an existing source.

transactions = ff.get_source("transactions","kaggle")

transactions.register_resources(
    entity=user,
    entity_column="customerid",
    labels=[
        {"name": "fraudulent", "variant": "quickstart", "column": "isfraud", "type": "bool"},
    ],
)

Parameters:

Name Type Description Default
name str

Name of source to be retrieved

required
variant str

Name of variant of source to be retrieved

required
local bool

If localmode is being used

False

Returns:

Name Type Description
source ColumnSourceRegistrar

Source