Skip to content

Serving

Features

Returns the feature values for the specified entities.

Examples:

    client = ff.Client()
    fpf = client.features([("avg_transactions", "quickstart")], {"user": "C1410926"})
    # Run features through model

Parameters:

Name Type Description Default
features (list[str, str], list[str])

List of Name Variant Tuples

required
entities dict

Dictionary of entity name/value pairs

required

Returns:

Name Type Description
features Array

An Numpy array of feature values in the order given by the inputs

Training Sets

Return an iterator that iterates through the specified training set.

Examples:

    client = ff.Client()
    dataset = client.training_set("fraud_training", "quickstart")
    training_dataset = dataset.repeat(10).shuffle(1000).batch(8)
    for feature_batch in training_dataset:
        # Train model

Parameters:

Name Type Description Default
name str

Name of training set to be retrieved

required
variant str

Variant of training set to be retrieved

''

Returns:

Name Type Description
training_set Dataset

A training set iterator

Train/Test Split (Beta)

(This functionality is currently only available for Clickhouse).

Splits an existing training set into training and testing iterators. The split is processed on the underlying provider and calculated at serving time.

Examples:

import featureform as ff
client = ff.Client()
train, test = client
    .training_set("fraud_training", "v1")
    .train_test_split(
        test_size=0.7,
        train_size=0.3,
        shuffle=True,
        random_state=None,
        batch_size=5
    )

for features, label in train:
    print(features)
    print(label)
    clf.partial_fit(features, label)

for features, label in test:
    print(features)
    print(label)
    clf.score(features, label)


# TRAIN OUTPUT
# np.array([
#   [1, 1, 3],
#   [5, 1, 2],
#   [7, 6, 5],
#   [8, 3, 3],
#   [5, 2, 2],
# ])
# np.array([2, 4, 2, 3, 4])
# np.array([
#   [3, 1, 2],
#   [5, 4, 5],
# ])
# np.array([6, 7])

# TEST OUTPUT
# np.array([
#   [5, 1, 3],
#   [4, 3, 1],
#   [6, 6, 7],
# ])
# np.array([4, 6, 7])

Parameters:

Name Type Description Default
test_size float

The ratio of test set size to train set size. Must be a value between 0 and 1. If excluded it will be the complement to the train_size. One of test_size or train_size must be specified.

0
train_size float

The ratio of train set size to train set size. Must be a value between 0 and 1. If excluded it will be the complement to the test_size. One of test_size or train_size must be specified.

0
shuffle bool

Whether to shuffle the dataset before splitting.

True
random_state Optional[int]

A random state to shuffle the dataset. If None, the dataset will be shuffled randomly on every call. If >0, the value will be used a seed to create random shuffle that can be repeated if subsequent calls use the same seed.

None
batch_size int

The size of the batch to return from the iterator. Must be greater than 0.

1

Returns:

Name Type Description
train Iterator

An iterator for training values.

test Iterator

An iterator for testing values.

Sources

Return a dataframe from a registered source or transformation

Example:

definitions.py
transactions_df = client.dataframe("transactions", "quickstart")

avg_user_transaction_df = transactions_df.groupby("CustomerID")["TransactionAmount"].mean()

Parameters:

Name Type Description Default
source Union[SourceRegistrar, SubscriptableTransformation, str]

The source or transformation to compute the dataframe from

required
variant str

The source variant; can't be None if source is a string

None
limit int

The maximum number of records to return; defaults to NO_RECORD_LIMIT

NO_RECORD_LIMIT
asynchronous bool

Flag to determine whether the client should wait for resources to be in either a READY or FAILED state before returning. Defaults to False to ensure that newly registered resources are in a READY state prior to serving them as dataframes.

False

Returns:

Name Type Description
df DataFrame

The dataframe computed from the source or transformation

Nearest Neighbors

Query the K nearest neighbors of a provider vector in the index of a registered feature variant

Example:

definitions.py
# Get the 5 nearest neighbors of the vector [0.1, 0.2, 0.3] in the index of the feature "my_feature" with variant "my_variant"
nearest_neighbors = client.nearest("my_feature", "my_variant", [0.1, 0.2, 0.3], 5)
print(nearest_neighbors) # prints a list of entities (e.g. ["entity1", "entity2", "entity3", "entity4", "entity5"])

Parameters:

Name Type Description Default
feature Union[FeatureColumnResource, tuple(str, str)]

Feature object or tuple of Feature name and variant

required
vector List[float]

Query vector

required
k int

Number of nearest neighbors to return

required

Resource Location

Returns the location of a registered resource. For SQL resources, it will return the table name and for file resources, it will return the file path.

Example:

definitions.py
transaction_location = client.location("transactions", "quickstart", ff.SOURCE)

Parameters:

Name Type Description Default
source Union[SourceRegistrar, SubscriptableTransformation, str]

The source or transformation to compute the dataframe from

required
variant str

The source variant; can't be None if source is a string

None
resource_type ResourceType

The type of resource; can be one of ff.SOURCE, ff.FEATURE, ff.LABEL, or ff.TRAINING_SET

None