{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "07e2eea0-dc4a-436c-8605-04df80a20d45",
   "metadata": {},
   "source": [
    "# Quickstart\n",
    "\n",
    "Let's start by installing Rexify"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fee1baf9-f430-44d3-a2f0-82f9cb17f107",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install rexify"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6ed5c5a-f691-4871-94f3-97895132bf91",
   "metadata": {},
   "source": [
    "Get some data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7e7c8d3a-400c-4a6b-bf1f-171c73793c16",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir data\n",
    "!curl --get https://storage.googleapis.com/roostr-ratings-matrices/rexify/completions.csv > data/events.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e9fbc3cd-e598-4270-a15e-d9a5cfb9ba5f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89e2d0b3-f0fd-4094-b64e-ccca7ae24705",
   "metadata": {},
   "outputs": [],
   "source": [
    "events = pd.read_csv('data/events.csv')\n",
    "events"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47ab6ec6-0d08-40c4-83c6-bd797ae40aca",
   "metadata": {},
   "source": [
    "Next, we need to specify our schema:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09a944b4-045a-49c0-9e6a-efa2f2be14ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema = {\n",
    "    \"user\": {\n",
    "        \"account_id\": \"id\",\n",
    "    },\n",
    "    \"item\": {\n",
    "        \"program_id\": \"id\",\n",
    "    },\n",
    "    \"context\": {}\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea75dc34-0aa3-4d2f-a938-12734d57bff9",
   "metadata": {},
   "source": [
    "To preprocess our data, we can use the `FeatureExtractor`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cbb99040-4e6c-42f9-87dc-1cbe033989b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from rexify.features import FeatureExtractor"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "616e0441-d2ef-4d2d-8524-35635ed310a1",
   "metadata": {},
   "source": [
    "We just need to pass it the schema, and it's ready to roll out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0198ea5f-bd27-4304-a4ae-9218fcccc7eb",
   "metadata": {},
   "outputs": [],
   "source": [
    "feat = FeatureExtractor(schema=schema)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40911616-99d7-4510-8946-7219d507b87b",
   "metadata": {},
   "source": [
    "As a scikit-learn Transformer, it has two main methods: `.fit()` and `.transform()`. What `.fit_transform()` essentially does is: `.fit().transform()`.\n",
    "\n",
    "During `.fit()`, it will take the schema, and infer what the preprocessing should look like - what transformations it should apply to the data before it's ready to be passed to the model. During `.transform()` it will apply those transformations, resulting in a `numpy.array` with the same number of rows as the original data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f12e2f1-a724-4139-9102-009b11cda8df",
   "metadata": {},
   "outputs": [],
   "source": [
    "features = feat.fit_transform(events)\n",
    "features"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "011cd59c-d754-4a22-af0a-de65e81b68f3",
   "metadata": {},
   "source": [
    "The `.make_dataset()` method converts the numpy array to a `tf.data.Dataset` with the format it's expecting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "213b3c47-d612-41d1-a2f1-015f6c0b9b92",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = feat.make_dataset(features).batch(512)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d356f43c-a722-4bfd-bb0c-12a081d39316",
   "metadata": {},
   "source": [
    "We can now take our `Recommender` model and instantiate it.\n",
    "\n",
    "During `.fit`, our `FeatureExtractor` also learns the right model parameters, so we don't need to worry about them. They're stored in the `model_params` property."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1826f76-56a2-44a9-bf49-0854ce1c678a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from rexify.models import Recommender"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73ff6889-8fc9-4cdf-bf5e-3be307e03235",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Recommender(**feat.model_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59a0a545-6e0d-4b3d-927e-0282e7760820",
   "metadata": {},
   "source": [
    "Being a `tensorflow.keras.Model` itself, in order to fit it, we need to first compile it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62e89747-42fb-4fee-a49f-56328f208b5c",
   "metadata": {},
   "outputs": [],
   "source": [
    "model.compile()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d507a703-afa6-44f9-b24c-7362971da047",
   "metadata": {},
   "source": [
    "To fit it, all we need to do is pass our `tf.data.Dataset`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d1ef245-2b9c-4bd0-a256-60595a0b699f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# model.fit(dataset)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}