Developing the Gridsome Recommender Plugin

Overview

Intro

When users hit one of your blog or product pages comming from Google or other sources, they are interested in that very specific page. When they have / have not what they wanted, they usually leave the website and move on.

The time they stay on your site is measured as Average Time on Page or Average Session Duration and specifically if your website is ad-driven, you like to have each user as long on your website as possible to show him as much ads as possible.

But this is not only crucial for ad-driven websites but also for brands or blogs as you want the user to engage with your site to build up a deeper connection and get him "hooked".

Improving Average Time on Page

To improve the Average Session Duration of each user visit on our page, we need to prevent each user from leaving quickly. One way of doing so is by recommending him other sub-pages of our static site that are related to the site he is interested in.

The problem with Static Sites

In this case static sites have a "disadvantage". Incoming user requests won´t hit any server that would let us analyze and build relations based on certain parameters.

Instead we have to analyze all posts or products and generate all possible relations between them in advance during build time.

Which is why I built the Gridsome Recommender Plugin that fills exactly that need.

It uses text analysis to find related contexts in given fields of a Gridsome Collection, wheather it is the title, description or even the body / content of a post or product. Thresholds allow you to adjust the sensitivity of the matching algorithm, which may vary depending on the content length.

The simplest example is loading one collection of posts and setting up the recommender:

    {
            use: '@gridsome/source-filesystem',
            options: {
                typeName: 'BlogPost',
                path: './content/blog/**/*.md',
            }
        },
        {
            use: 'gridsome-plugin-recommender',
            options: {
                enabled: true,
                debug: true,
                typeName: 'BlogPost',
                field: 'title',
                relatedFieldName: 'related',
                minScore: 0.1,
                maxRelations: 3,
            }
        },

With this configuration we are telling the Gridsome Recommender Plugin to use the loaded BlogPost collection from above and train a new model based on it´s title field. Similar posts lead to relations stored in each ones node as a "related" field in our GraphQL schema for further usage.

Taking this site as an example, we are using it do display a "Related Posts Widget" on the bottom of each post page.

With that information you can render a "related posts"-widget like I do on this page that is sitting under each article.

Taking things one step further (Auto-Tagging & Contextual Ads) (Part II)

With our first task being accomplished, we are now able to automatically find relationships within one Post Collection to create Post-To-Post Relationships.

So I thought....If we would be able to have relations between different collections, there would be a bunch of new use cases made possible.

Changes made to the Content-Based-Recommender library

The Recommender Plugin is based on https://github.com/stanleyfok/content-based-recommender created by Stanley Fok which supported training data of a single collection.

I´ve created a Pull Request that allows training of two datasets simultaneously and finding similarities between entries of the first collection and the opposite collection of documents.

The feature is covered by the trainBidirectional() function

const collectionA =  [
  {
    id: '1',
    content: 'Some content',
  }
];

const collectionB = [
  {
    id: '1',
    content: 'Some other content',
  }
];

recommender.trainBidirectional(collectionA,collectionB);

recommender.findSimilarities(1);

Upgrading the Gridsome Recommender Plugin to support multi-collections

After the changes made to the content-based-recommender plugin I was now able to upgrade the gridsome-plugin-recommender library and enhance it to support multiple Gridsome Collections.

The "API" of the plugin can now be configured the following way:

      {
            use: 'gridsome-plugin-recommender',
            options: {
                enabled: true,
                typeName: 'CollectionA',
                field: 'title',
                referenceTypeName: 'CollectionB',
                referenceField: 'title',
                minScore: 0.1,
                maxRelations: 3,
            }
        },

Which will created matches between collection A and B based on the title field of each entity and store references of the opposite collection in a node.related array.

Use Case #2 - Auto Tagging for Static Sites

One way to use the new functionality is by automatically letting the plugin tag or categorize your Blog Posts (as an example, this blog post has a tag list under each article).

It makes a lot of sense if you have many tags created and you are showing tag pages with a list of all related posts and you don´t want to manually find and add each tag. You also don´t want to forget important tags.

Focus on the blog post instead and let the plugin do the categorization for you :)

As an example, here we are defining two collections, one of posts and one of tags and we are passing it to the content-based-recommender to train our model.

const pseudoPostCollection =  [
  {
    id: '1000001',
    content: 'Why studying javascript is fun?',
  },
  {
    id: '1000002',
    content: 'The trend for javascript in machine learning',
  },
  {
    id: '1000003',
    content: 'The most insightful stories about JavaScript',
  },
  {
    id: '1000004',
    content: 'Introduction to Machine Learning',
  },
  {
    id: '1000005',
    content: 'Machine learning and its application',
  },
  {
    id: '1000006',
    content: 'Python vs Javascript, which is better?',
  },
  {
    id: '1000007',
    content: 'How Python saved my life?',
  },
  {
    id: '1000008',
    content: 'The future of Bitcoin technology',
  },
  {
    id: '1000009',
    content: 'Is it possible to use javascript for machine learning?',
  },
];

const pseudoTagCollection = [
  {
    id: '1',
    content: 'Javascript',
  },
  {
    id: '2',
    content: 'machine learning',
  },
  {
    id: '3',
    content: 'application',
  },
  {
    id: '4',
    content: 'introduction',
  },
  {
    id: '5',
    content: 'future',
  },
  {
    id: '6',
    content: 'Python',
  },
  {
    id: '7',
    content: 'Bitcoin',
  },
];




recommender = trainBidirectional(pseudoPostCollection, pseudoTagCollection);

The full example can be seen here

Under the hood, the recommender plugin does exactly what we´ve seen above, we just need to tell it what collections it has to operate on and what fields it has to use to train our model.

The Gridsome Recommender Plugin example below is showing how to use it.

        {
            use: '@gridsome/source-filesystem',
            options: {
                typeName: 'BlogPost',
                path: './content/blog/**/*.md',
            }
        },
        {
            use: '@gridsome/source-filesystem',
            options: {
                typeName: 'Tag',
                path: './content/tags/**/*.md',
            }
        },
        {
            use: 'gridsome-plugin-recommender',
            options: {
                enabled: true,
                debug: true,
                typeName: 'BlogPost',
                field: 'title',
                referenceTypeName: 'Tag',
                referenceField: 'title',
                relatedFieldName: 'tags',
                minScore: 0.1,
                maxRelations: 3,
            }
        },

We´ve defined two collections that are getting loaded from markdown files using the @gridsome/source-filesystem plugin and we´ve set the typeNames to BlogPost and Tag.

We are now telling the gridsome-plugin-recommender on what collections it should operate and what fields should be taken as a reference to do text analysis on, which is in both cases the title field.

By default the recommender plugin will store relations in the related field in each node, which is perfectly fine for the Tag-BlogPost relationship, but for the opposite BlogPost-Tag relationship we want to call our field tags. We do so by setting the relatedFieldName property.

This demo shows the Recommender Plugin in Action

You can see the Example Source Code here

Use Case #3 - Contextual Ads for Static Sites

Sometimes we want to promote products or services in our blog posts or articles, but we likely want to reuse the same advertisement on multiple pages, which is why defining them in our blog markdown doesn´t make sense.

Instead we define a new collection, in our case "Books" that are having the following properties:

---
id: NB07PywgC
title: Clean Code - A Handbook of Agile Software Craftsmanship (Robert C. Martin)
url: https://www.amazon.de/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
image: https://images-na.ssl-images-amazon.com/images/I/41-+g1a2Y1L._SX375_BO1,204,203,200_.jpg
---
Even bad code can function. But if code isn’t clean, it can bring a development organization to its knees. Every year, countless hours and significant resources are lost because of poorly written code. But it doesn’t have to be that way. Noted software expert Robert C. Martin presents a revolutionary paradigm with Clean Code: A Handbook of Agile Software Craftsmanship . Martin has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code “on the fly” into a book that will instill within you the values of a software craftsman and make you a better programmer—but only if you work at it. What kind of work will you be doing? You’ll be reading code—lots of code. And you will be challenged to think about what’s right about that code, and what’s wrong with it. More importantly, you will be challenged to reassess your professional values and your commitment to your craft. Clean Code is divided into three parts. The first describes the principles, patterns, and practices of writing clean code. The second part consists of several case studies of increasing complexity. Each case study is an exercise in cleaning up code—of transforming a code base that has some problems into one that is sound and efficient. The third part is the payoff: a single chapter containing a list of heuristics and “smells” gathered while creating the case studies. The result is a knowledge base that describes the way we think when we write, read, and clean code. Readers will come away from this book understanding How to tell the difference between good and bad code How to write good code and how to transform bad code into good code How to create good names, good functions, good objects, and good classes How to format code for maximum readability How to implement complete error handling without obscuring code logic How to unit test and practice test-driven development This book is a must for any developer, software engineer, project manager, team lead, or systems analyst with an interest in producing better code.

We again define our gridsome-recommender plugin to use the "Book" collection as the reference collection and we train agains the content, not the title, as book titles can rather be short and not really telling about the content itself.

    {
      use: 'gridsome-plugin-recommender',
      options: {
        enabled: true,
        debug: false,
        typeName: 'Post',
        field: 'content',
        relatedFieldName: 'recommended_books',
        referenceTypeName: 'Book',
        referenceField: 'content',
        minScore: 0.01,
        fillWithRandom: false,
        maxRelations: 3,
      }
    },

On this site you can see the matched books in the right sidebar.

The advantage of this approach is that we are showing users ads that he is interested in, as he has found the way to our article already and the book has the same or similar topics than the article.

We do so without intruding the privacy of our users like we would do using ad services such as Google AdSense that do user targeting by creating cookies in the browser.

Known Restrictions

This plugin operates on the collection layer of Gridsome, which is one step before the GraphQL schema is getting created from the data model. This means all sources that are pluged in via GraphQL directly will run after this plugin and can therefore not be used with the recommender plugin. GitHub Issue

I currently don´t see any chance to make it working on the GraphQL layer directly and I´m not sure if that is even possible with the current Gridsome API.

See all posts by Mklueh →