Quickstart¶
Let’s start by installing Rexify
[1]:
!pip install rexify
Requirement already satisfied: rexify in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (0.1.20)
Requirement already satisfied: numpy>=1.22.3 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from rexify) (1.24.3)
Requirement already satisfied: pandas<2.0.0,>=1.4.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from rexify) (1.5.3)
Requirement already satisfied: scikit-learn<2.0.0,>=1.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from rexify) (1.2.2)
Requirement already satisfied: tensorflow==2.9.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from rexify) (2.9.0)
Requirement already satisfied: tensorflow_recommenders>=0.7.2 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from rexify) (0.7.3)
Requirement already satisfied: absl-py>=1.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (1.4.0)
Requirement already satisfied: astunparse>=1.6.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (1.6.3)
Requirement already satisfied: flatbuffers<2,>=1.12 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (1.12)
Requirement already satisfied: gast<=0.4.0,>=0.2.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (0.4.0)
Requirement already satisfied: google-pasta>=0.1.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (0.2.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (1.54.0)
Requirement already satisfied: h5py>=2.9.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (3.8.0)
Requirement already satisfied: keras<2.10.0,>=2.9.0rc0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (2.9.0)
Requirement already satisfied: keras-preprocessing>=1.1.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (1.1.2)
Requirement already satisfied: libclang>=13.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (16.0.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (3.3.0)
Requirement already satisfied: packaging in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (23.1)
Requirement already satisfied: protobuf>=3.9.2 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (3.19.6)
Requirement already satisfied: setuptools in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (67.7.2)
Requirement already satisfied: six>=1.12.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (1.16.0)
Requirement already satisfied: tensorboard<2.10,>=2.9 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (2.9.1)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (0.32.0)
Requirement already satisfied: tensorflow-estimator<2.10.0,>=2.9.0rc0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (2.9.0)
Requirement already satisfied: termcolor>=1.1.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (2.3.0)
Requirement already satisfied: typing-extensions>=3.6.6 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (4.5.0)
Requirement already satisfied: wrapt>=1.11.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorflow==2.9.0->rexify) (1.15.0)
Requirement already satisfied: python-dateutil>=2.8.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from pandas<2.0.0,>=1.4.0->rexify) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from pandas<2.0.0,>=1.4.0->rexify) (2023.3)
Requirement already satisfied: scipy>=1.3.2 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from scikit-learn<2.0.0,>=1.0.0->rexify) (1.10.1)
Requirement already satisfied: joblib>=1.1.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from scikit-learn<2.0.0,>=1.0.0->rexify) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from scikit-learn<2.0.0,>=1.0.0->rexify) (3.1.0)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from astunparse>=1.6.0->tensorflow==2.9.0->rexify) (0.37.0)
Requirement already satisfied: google-auth<3,>=1.6.3 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (2.17.3)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (0.4.6)
Requirement already satisfied: markdown>=2.6.8 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (3.4.3)
Requirement already satisfied: requests<3,>=2.21.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (2.30.0)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (0.6.1)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (1.8.1)
Requirement already satisfied: werkzeug>=1.0.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (2.3.4)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (5.3.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (0.3.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (4.9)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (1.3.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (2.0.2)
Requirement already satisfied: certifi>=2017.4.17 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (2023.5.7)
Requirement already satisfied: MarkupSafe>=2.1.1 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from werkzeug>=1.0.1->tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (2.1.2)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (0.5.0)
Requirement already satisfied: oauthlib>=3.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.10,>=2.9->tensorflow==2.9.0->rexify) (3.2.2)
Get some data:
[2]:
!mkdir data
!curl --get https://storage.googleapis.com/roostr-ratings-matrices/rexify/completions.csv > data/events.csv
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 55.9M 100 55.9M 0 0 18.7M 0 0:00:02 0:00:02 --:--:-- 18.7M
[3]:
import pandas as pd
[4]:
events = pd.read_csv('data/events.csv')
events
[4]:
| id | type | account_id | program_id | date | |
|---|---|---|---|---|---|
| 0 | 2102 | psa | 51 | CPEC-54 | 2015-08-07 |
| 1 | 2129 | psa | 51 | CPEC-81 | 2015-08-14 |
| 2 | 2132 | psa | 51 | CPEC-84 | 2015-08-14 |
| 3 | 2277 | psa | 50 | CPEC-198 | 2015-08-18 |
| 4 | 3255 | psa | 49 | CPEC-175 | 2015-11-02 |
| ... | ... | ... | ... | ... | ... |
| 1213766 | 80470238 | psa | 525487 | CPEC-22181 | 2021-12-25 |
| 1213767 | 80470249 | psa | 677934 | CPEC-9248 | 2021-12-25 |
| 1213768 | 80470250 | psa | 677934 | CPEC-17386 | 2021-12-25 |
| 1213769 | 80470277 | psa | 682006 | CPEC-11016 | 2021-12-25 |
| 1213770 | 80470278 | psa | 678228 | CPEA-17314 | 2021-12-25 |
1213771 rows × 5 columns
Next, we need to specify our schema:
[5]:
schema = {
"user": {
"account_id": "id",
},
"item": {
"program_id": "id",
},
"context": {}
}
To preprocess our data, we can use the FeatureExtractor
[6]:
from rexify.features import FeatureExtractor
2023-05-09 13:28:35.376199: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-05-09 13:28:35.376232: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[6], line 1
----> 1 from rexify.features import FeatureExtractor
ImportError: cannot import name 'FeatureExtractor' from 'rexify.features' (/home/docs/checkouts/readthedocs.org/user_builds/rexify/envs/latest/lib/python3.10/site-packages/rexify/features/__init__.py)
We just need to pass it the schema, and it’s ready to roll out.
[7]:
feat = FeatureExtractor(schema=schema)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 1
----> 1 feat = FeatureExtractor(schema=schema)
NameError: name 'FeatureExtractor' is not defined
As a scikit-learn Transformer, it has two main methods: .fit() and .transform(). What .fit_transform() essentially does is: .fit().transform().
During .fit(), it will take the schema, and infer what the preprocessing should look like - what transformations it should apply to the data before it’s ready to be passed to the model. During .transform() it will apply those transformations, resulting in a numpy.array with the same number of rows as the original data.
[8]:
features = feat.fit_transform(events)
features
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 features = feat.fit_transform(events)
2 features
NameError: name 'feat' is not defined
The .make_dataset() method converts the numpy array to a tf.data.Dataset with the format it’s expecting.
[9]:
dataset = feat.make_dataset(features).batch(512)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 1
----> 1 dataset = feat.make_dataset(features).batch(512)
NameError: name 'feat' is not defined
We can now take our Recommender model and instantiate it.
During .fit, our FeatureExtractor also learns the right model parameters, so we don’t need to worry about them. They’re stored in the model_params property.
[10]:
from rexify.models import Recommender
[11]:
model = Recommender(**feat.model_params)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 1
----> 1 model = Recommender(**feat.model_params)
NameError: name 'feat' is not defined
Being a tensorflow.keras.Model itself, in order to fit it, we need to first compile it:
[12]:
model.compile()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 1
----> 1 model.compile()
NameError: name 'model' is not defined
To fit it, all we need to do is pass our tf.data.Dataset:
[13]:
# model.fit(dataset)