Serving
Features
Returns the feature values for the specified entities.
Examples:
client = ff.Client()
fpf = client.features([("avg_transactions", "quickstart")], {"user": "C1410926"})
# Run features through model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
(list[str, str], list[str])
|
List of Name Variant Tuples |
required |
entities |
dict
|
Dictionary of entity name/value pairs |
required |
Returns:
Name | Type | Description |
---|---|---|
features |
Array
|
An Numpy array of feature values in the order given by the inputs |
Training Sets
Return an iterator that iterates through the specified training set.
Examples:
client = ff.Client()
dataset = client.training_set("fraud_training", "quickstart")
training_dataset = dataset.repeat(10).shuffle(1000).batch(8)
for feature_batch in training_dataset:
# Train model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Name of training set to be retrieved |
required |
variant |
str
|
Variant of training set to be retrieved |
''
|
Returns:
Name | Type | Description |
---|---|---|
training_set |
Dataset
|
A training set iterator |
Train/Test Split (Beta)
(This functionality is currently only available for Clickhouse).
Splits an existing training set into training and testing iterators. The split is processed on the underlying provider and calculated at serving time.
Examples:
import featureform as ff
client = ff.Client()
train, test = client
.training_set("fraud_training", "v1")
.train_test_split(
test_size=0.7,
train_size=0.3,
shuffle=True,
random_state=None,
batch_size=5
)
for features, label in train:
print(features)
print(label)
clf.partial_fit(features, label)
for features, label in test:
print(features)
print(label)
clf.score(features, label)
# TRAIN OUTPUT
# np.array([
# [1, 1, 3],
# [5, 1, 2],
# [7, 6, 5],
# [8, 3, 3],
# [5, 2, 2],
# ])
# np.array([2, 4, 2, 3, 4])
# np.array([
# [3, 1, 2],
# [5, 4, 5],
# ])
# np.array([6, 7])
# TEST OUTPUT
# np.array([
# [5, 1, 3],
# [4, 3, 1],
# [6, 6, 7],
# ])
# np.array([4, 6, 7])
Parameters:
Name | Type | Description | Default |
---|---|---|---|
test_size |
float
|
The ratio of test set size to train set size. Must be a value between 0 and 1. If excluded it will be the complement to the train_size. One of test_size or train_size must be specified. |
0
|
train_size |
float
|
The ratio of train set size to train set size. Must be a value between 0 and 1. If excluded it will be the complement to the test_size. One of test_size or train_size must be specified. |
0
|
shuffle |
bool
|
Whether to shuffle the dataset before splitting. |
True
|
random_state |
Optional[int]
|
A random state to shuffle the dataset. If None, the dataset will be shuffled randomly on every call. If >0, the value will be used a seed to create random shuffle that can be repeated if subsequent calls use the same seed. |
None
|
batch_size |
int
|
The size of the batch to return from the iterator. Must be greater than 0. |
1
|
Returns:
Name | Type | Description |
---|---|---|
train |
Iterator
|
An iterator for training values. |
test |
Iterator
|
An iterator for testing values. |
Sources
Return a dataframe from a registered source or transformation
Example:
transactions_df = client.dataframe("transactions", "quickstart")
avg_user_transaction_df = transactions_df.groupby("CustomerID")["TransactionAmount"].mean()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
Union[SourceRegistrar, SubscriptableTransformation, str]
|
The source or transformation to compute the dataframe from |
required |
variant |
str
|
The source variant; can't be None if source is a string |
None
|
limit |
int
|
The maximum number of records to return; defaults to NO_RECORD_LIMIT |
NO_RECORD_LIMIT
|
asynchronous |
bool
|
Flag to determine whether the client should wait for resources to be in either a READY or FAILED state before returning. Defaults to False to ensure that newly registered resources are in a READY state prior to serving them as dataframes. |
False
|
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
The dataframe computed from the source or transformation |
Nearest Neighbors
Query the K nearest neighbors of a provider vector in the index of a registered feature variant
Example:
# Get the 5 nearest neighbors of the vector [0.1, 0.2, 0.3] in the index of the feature "my_feature" with variant "my_variant"
nearest_neighbors = client.nearest("my_feature", "my_variant", [0.1, 0.2, 0.3], 5)
print(nearest_neighbors) # prints a list of entities (e.g. ["entity1", "entity2", "entity3", "entity4", "entity5"])
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feature |
Union[FeatureColumnResource, tuple(str, str)]
|
Feature object or tuple of Feature name and variant |
required |
vector |
List[float]
|
Query vector |
required |
k |
int
|
Number of nearest neighbors to return |
required |
Resource Location
Returns the location of a registered resource. For SQL resources, it will return the table name and for file resources, it will return the file path.
Example:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
Union[SourceRegistrar, SubscriptableTransformation, str]
|
The source or transformation to compute the dataframe from |
required |
variant |
str
|
The source variant; can't be None if source is a string |
None
|
resource_type |
ResourceType
|
The type of resource; can be one of ff.SOURCE, ff.FEATURE, ff.LABEL, or ff.TRAINING_SET |
None
|