Serving

Features

Returns the feature values for the specified entities.

Examples:

    client = ff.Client()
    fpf = client.features([("avg_transactions", "quickstart")], {"user": "C1410926"})
    # Run features through model

Parameters:

Name	Type	Description	Default
`features`	`(list[str, str], list[str])`	List of Name Variant Tuples	required
`entities`	`dict`	Dictionary of entity name/value pairs	required

Returns:

Name	Type	Description
`features`	`Array`	An Numpy array of feature values in the order given by the inputs

Training Sets

Return an iterator that iterates through the specified training set.

Examples:

    client = ff.Client()
    dataset = client.training_set("fraud_training", "quickstart")
    training_dataset = dataset.repeat(10).shuffle(1000).batch(8)
    for feature_batch in training_dataset:
        # Train model

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of training set to be retrieved	required
`variant`	`str`	Variant of training set to be retrieved	`''`

Returns:

Name	Type	Description
`training_set`	`Dataset`	A training set iterator

Train/Test Split (Beta)

(This functionality is currently only available for Clickhouse).

Splits an existing training set into training and testing iterators. The split is processed on the underlying provider and calculated at serving time.

Examples:

import featureform as ff
client = ff.Client()
train, test = client
    .training_set("fraud_training", "v1")
    .train_test_split(
        test_size=0.7,
        train_size=0.3,
        shuffle=True,
        random_state=None,
        batch_size=5
    )

for features, label in train:
    print(features)
    print(label)
    clf.partial_fit(features, label)

for features, label in test:
    print(features)
    print(label)
    clf.score(features, label)


# TRAIN OUTPUT
# np.array([
#   [1, 1, 3],
#   [5, 1, 2],
#   [7, 6, 5],
#   [8, 3, 3],
#   [5, 2, 2],
# ])
# np.array([2, 4, 2, 3, 4])
# np.array([
#   [3, 1, 2],
#   [5, 4, 5],
# ])
# np.array([6, 7])

# TEST OUTPUT
# np.array([
#   [5, 1, 3],
#   [4, 3, 1],
#   [6, 6, 7],
# ])
# np.array([4, 6, 7])

Parameters:

Name	Type	Description	Default
`test_size`	`float`	The ratio of test set size to train set size. Must be a value between 0 and 1. If excluded it will be the complement to the train_size. One of test_size or train_size must be specified.	`0`
`train_size`	`float`	The ratio of train set size to train set size. Must be a value between 0 and 1. If excluded it will be the complement to the test_size. One of test_size or train_size must be specified.	`0`
`shuffle`	`bool`	Whether to shuffle the dataset before splitting.	`True`
`random_state`	`Optional[int]`	A random state to shuffle the dataset. If None, the dataset will be shuffled randomly on every call. If >0, the value will be used a seed to create random shuffle that can be repeated if subsequent calls use the same seed.	`None`
`batch_size`	`int`	The size of the batch to return from the iterator. Must be greater than 0.	`1`

Returns:

Name	Type	Description
`train`	`Iterator`	An iterator for training values.
`test`	`Iterator`	An iterator for testing values.

Sources

Return a dataframe from a registered source or transformation

Example:

definitions.py

transactions_df = client.dataframe("transactions", "quickstart")

avg_user_transaction_df = transactions_df.groupby("CustomerID")["TransactionAmount"].mean()

Parameters:

Name	Type	Description	Default
`source`	`Union[SourceRegistrar, SubscriptableTransformation, str]`	The source or transformation to compute the dataframe from	required
`variant`	`str`	The source variant; can't be None if source is a string	`None`
`limit`	`int`	The maximum number of records to return; defaults to NO_RECORD_LIMIT	`NO_RECORD_LIMIT`
`asynchronous`	`bool`	Flag to determine whether the client should wait for resources to be in either a READY or FAILED state before returning. Defaults to False to ensure that newly registered resources are in a READY state prior to serving them as dataframes.	`False`

Returns:

Name	Type	Description
`df`	`DataFrame`	The dataframe computed from the source or transformation

Nearest Neighbors

Query the K nearest neighbors of a provider vector in the index of a registered feature variant

Example:

definitions.py

# Get the 5 nearest neighbors of the vector [0.1, 0.2, 0.3] in the index of the feature "my_feature" with variant "my_variant"
nearest_neighbors = client.nearest("my_feature", "my_variant", [0.1, 0.2, 0.3], 5)
print(nearest_neighbors) # prints a list of entities (e.g. ["entity1", "entity2", "entity3", "entity4", "entity5"])

Parameters:

Name	Type	Description	Default
`feature`	`Union[FeatureColumnResource, tuple(str, str)]`	Feature object or tuple of Feature name and variant	required
`vector`	`List[float]`	Query vector	required
`k`	`int`	Number of nearest neighbors to return	required

Resource Location

Returns the location of a registered resource. For SQL resources, it will return the table name and for file resources, it will return the file path.

Example:

definitions.py

transaction_location = client.location("transactions", "quickstart", ff.SOURCE)

Parameters:

Name	Type	Description	Default
`source`	`Union[SourceRegistrar, SubscriptableTransformation, str]`	The source or transformation to compute the dataframe from	required
`variant`	`str`	The source variant; can't be None if source is a string	`None`
`resource_type`	`ResourceType`	The type of resource; can be one of ff.SOURCE, ff.FEATURE, ff.LABEL, or ff.TRAINING_SET	`None`