{ "cells": [ { "cell_type": "markdown", "id": "07e2eea0-dc4a-436c-8605-04df80a20d45", "metadata": {}, "source": [ "# Quickstart\n", "\n", "Let's start by installing Rexify" ] }, { "cell_type": "code", "execution_count": null, "id": "fee1baf9-f430-44d3-a2f0-82f9cb17f107", "metadata": {}, "outputs": [], "source": [ "!pip install rexify" ] }, { "cell_type": "markdown", "id": "f6ed5c5a-f691-4871-94f3-97895132bf91", "metadata": {}, "source": [ "Get some data:" ] }, { "cell_type": "code", "execution_count": null, "id": "7e7c8d3a-400c-4a6b-bf1f-171c73793c16", "metadata": {}, "outputs": [], "source": [ "!mkdir data\n", "!curl --get https://storage.googleapis.com/roostr-ratings-matrices/rexify/completions.csv > data/events.csv" ] }, { "cell_type": "code", "execution_count": null, "id": "e9fbc3cd-e598-4270-a15e-d9a5cfb9ba5f", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "id": "89e2d0b3-f0fd-4094-b64e-ccca7ae24705", "metadata": {}, "outputs": [], "source": [ "events = pd.read_csv('data/events.csv')\n", "events" ] }, { "cell_type": "markdown", "id": "47ab6ec6-0d08-40c4-83c6-bd797ae40aca", "metadata": {}, "source": [ "Next, we need to specify our schema:" ] }, { "cell_type": "code", "execution_count": null, "id": "09a944b4-045a-49c0-9e6a-efa2f2be14ae", "metadata": {}, "outputs": [], "source": [ "schema = {\n", " \"user\": {\n", " \"account_id\": \"id\",\n", " },\n", " \"item\": {\n", " \"program_id\": \"id\",\n", " },\n", " \"context\": {}\n", "}" ] }, { "cell_type": "markdown", "id": "ea75dc34-0aa3-4d2f-a938-12734d57bff9", "metadata": {}, "source": [ "To preprocess our data, we can use the `FeatureExtractor`" ] }, { "cell_type": "code", "execution_count": null, "id": "cbb99040-4e6c-42f9-87dc-1cbe033989b6", "metadata": {}, "outputs": [], "source": [ "from rexify.features import FeatureExtractor" ] }, { "cell_type": "markdown", "id": "616e0441-d2ef-4d2d-8524-35635ed310a1", "metadata": {}, "source": [ "We just need to pass it the schema, and it's ready to roll out." ] }, { "cell_type": "code", "execution_count": null, "id": "0198ea5f-bd27-4304-a4ae-9218fcccc7eb", "metadata": {}, "outputs": [], "source": [ "feat = FeatureExtractor(schema=schema)" ] }, { "cell_type": "markdown", "id": "40911616-99d7-4510-8946-7219d507b87b", "metadata": {}, "source": [ "As a scikit-learn Transformer, it has two main methods: `.fit()` and `.transform()`. What `.fit_transform()` essentially does is: `.fit().transform()`.\n", "\n", "During `.fit()`, it will take the schema, and infer what the preprocessing should look like - what transformations it should apply to the data before it's ready to be passed to the model. During `.transform()` it will apply those transformations, resulting in a `numpy.array` with the same number of rows as the original data." ] }, { "cell_type": "code", "execution_count": null, "id": "8f12e2f1-a724-4139-9102-009b11cda8df", "metadata": {}, "outputs": [], "source": [ "features = feat.fit_transform(events)\n", "features" ] }, { "cell_type": "markdown", "id": "011cd59c-d754-4a22-af0a-de65e81b68f3", "metadata": {}, "source": [ "The `.make_dataset()` method converts the numpy array to a `tf.data.Dataset` with the format it's expecting." ] }, { "cell_type": "code", "execution_count": null, "id": "213b3c47-d612-41d1-a2f1-015f6c0b9b92", "metadata": {}, "outputs": [], "source": [ "dataset = feat.make_dataset(features).batch(512)" ] }, { "cell_type": "markdown", "id": "d356f43c-a722-4bfd-bb0c-12a081d39316", "metadata": {}, "source": [ "We can now take our `Recommender` model and instantiate it.\n", "\n", "During `.fit`, our `FeatureExtractor` also learns the right model parameters, so we don't need to worry about them. They're stored in the `model_params` property." ] }, { "cell_type": "code", "execution_count": null, "id": "b1826f76-56a2-44a9-bf49-0854ce1c678a", "metadata": {}, "outputs": [], "source": [ "from rexify.models import Recommender" ] }, { "cell_type": "code", "execution_count": null, "id": "73ff6889-8fc9-4cdf-bf5e-3be307e03235", "metadata": {}, "outputs": [], "source": [ "model = Recommender(**feat.model_params)" ] }, { "cell_type": "markdown", "id": "59a0a545-6e0d-4b3d-927e-0282e7760820", "metadata": {}, "source": [ "Being a `tensorflow.keras.Model` itself, in order to fit it, we need to first compile it:" ] }, { "cell_type": "code", "execution_count": null, "id": "62e89747-42fb-4fee-a49f-56328f208b5c", "metadata": {}, "outputs": [], "source": [ "model.compile()" ] }, { "cell_type": "markdown", "id": "d507a703-afa6-44f9-b24c-7362971da047", "metadata": {}, "source": [ "To fit it, all we need to do is pass our `tf.data.Dataset`:" ] }, { "cell_type": "code", "execution_count": null, "id": "0d1ef245-2b9c-4bd0-a256-60595a0b699f", "metadata": {}, "outputs": [], "source": [ "# model.fit(dataset)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.10" } }, "nbformat": 4, "nbformat_minor": 5 }