Skip to content

Transformations

SQL Providers

Register a SQL transformation source.

The name of the function is the name of the resulting source.

Sources for the transformation can be specified by adding the Name and Variant in brackets '{{ name.variant }}'. The correct source is substituted when the query is run.

Examples:

postgres = client.get_provider("my_postgres")
@postgres.sql_transformation(variant="quickstart")
def average_user_transaction():
    return "SELECT CustomerID as user_id, avg(TransactionAmount) as avg_transaction_amt from {{transactions.v1}} GROUP BY user_id"

Parameters:

Name Type Description Default
name str

Name of source

''
variant str

Name of variant

''
schedule str

The frequency at which the transformation is run as a cron expression

''
owner Union[str, UserRegistrar]

Owner

''
description str

Description of primary data to be registered

''
inputs list

A list of Source NameVariant Tuples to input into the transformation

None

Returns:

Name Type Description
source ColumnSourceRegistrar

Source

Spark

SQL Transformation

Register a SQL transformation source. The spark.sql_transformation decorator takes the returned string in the following function and executes it as a SQL Query.

The name of the function is the name of the resulting source.

Sources for the transformation can be specified by adding the Name and Variant in brackets '{{ name.variant }}'. The correct source is substituted when the query is run.

Examples:

@spark.sql_transformation(variant="quickstart")
def average_user_transaction():
    return "SELECT CustomerID as user_id, avg(TransactionAmount) as avg_transaction_amt from {{transactions.v1}} GROUP BY user_id"

Parameters:

Name Type Description Default
name str

Name of source

''
variant str

Name of variant

''
owner Union[str, UserRegistrar]

Owner

''
description str

Description of primary data to be registered

''

Returns:

Name Type Description
source ColumnSourceRegistrar

Source

Dataframe Transformation

Register a Dataframe transformation source. The spark.df_transformation decorator takes the contents of the following function and executes the code it contains at serving time.

The name of the function is used as the name of the source when being registered.

The specified inputs are loaded into dataframes that can be accessed using the function parameters.

Examples:

@spark.df_transformation(inputs=[("source", "one")])        # Sources are added as inputs
def average_user_transaction(df):                           # Sources can be manipulated by adding them as params
    return df

Parameters:

Name Type Description Default
name str

Name of source

''
variant str

Name of variant

''
owner Union[str, UserRegistrar]

Owner

''
description str

Description of primary data to be registered

''
inputs list[Tuple(str, str)]

A list of Source NameVariant Tuples to input into the transformation

[]

Returns:

Name Type Description
source ColumnSourceRegistrar

Source

Kubernetes Pandas Runner

SQL Transformation

Register a SQL transformation source. The k8s.sql_transformation decorator takes the returned string in the following function and executes it as a SQL Query.

The name of the function is the name of the resulting source.

Sources for the transformation can be specified by adding the Name and Variant in brackets '{{ name.variant }}'. The correct source is substituted when the query is run.

Examples:

@k8s.sql_transformation(variant="quickstart")
def average_user_transaction():
    return "SELECT CustomerID as user_id, avg(TransactionAmount) as avg_transaction_amt from {{transactions.v1}} GROUP BY user_id"

Parameters:

Name Type Description Default
name str

Name of source

''
variant str

Name of variant

''
owner Union[str, UserRegistrar]

Owner

''
inputs list

A list of Source NameVariant Tuples to input into the transformation

None
description str

Description of primary data to be registered

''
docker_image str

A custom Docker image to run the transformation

''
resource_specs K8sResourceSpecs

Custom resource requests and limits

None

Returns:

Name Type Description
source ColumnSourceRegistrar

Source

Dataframe Transformation

Register a Dataframe transformation source. The k8s.df_transformation decorator takes the contents of the following function and executes the code it contains at serving time.

The name of the function is used as the name of the source when being registered.

The specified inputs are loaded into dataframes that can be accessed using the function parameters.

Examples:

@k8s.df_transformation(inputs=[("source", "one")])        # Sources are added as inputs
def average_user_transaction(df):                         # Sources can be manipulated by adding them as params
    return df

Parameters:

Name Type Description Default
name str

Name of source

''
variant str

Name of variant

''
owner Union[str, UserRegistrar]

Owner

''
description str

Description of primary data to be registered

''
inputs list[Tuple(str, str)]

A list of Source NameVariant Tuples to input into the transformation

[]
docker_image str

A custom Docker image to run the transformation

''
resource_specs K8sResourceSpecs

Custom resource requests and limits

None

Returns:

Name Type Description
source ColumnSourceRegistrar

Source