Reusing Sources
get_source() can be used to get a reference to an already registered primary source or transformation. The returned object can be used to register features and labels or be extended off of to create additional transformations.
Examples:
Registering a transformation from an existing source.
spark = ff.get_spark("prod-spark")
transactions = ff.get_source("transactions","kaggle")
@spark.df_transformation(inputs=[transactions]):
def customer_count(transactions):
return transactions.groupBy("CustomerID").count()
Registering a feature from an existing source.
transactions = ff.get_source("transactions","kaggle")
transactions.register_resources(
entity=user,
entity_column="customerid",
labels=[
{"name": "fraudulent", "variant": "quickstart", "column": "isfraud", "type": "bool"},
],
)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Name of source to be retrieved |
required |
variant |
str
|
Name of variant of source to be retrieved |
required |
local |
bool
|
If localmode is being used |
False
|
Returns:
Name | Type | Description |
---|---|---|
source |
ColumnSourceRegistrar
|
Source |