Saturday, February 3, 2024

Update on Our Scrapbook Project


Scrapbook is what we call our software/service platform that is part personal information management system, part asset management system, part diary and journal, and part digital keepsake. Here are some existing blog entries describing our Scrapbook project:

Since we started writing about Scrapbook, technologies have come and gone. Technologies we mention in the 2017 post for example have taken a back seat to the new technology of the day AI agents and large language models. That said, throughout the years the basic theme we’ve tried to illustrate and encourage is that you should take ownership of your own data regardless of the technology.


Here are some facts and figures about Scrapbook at it stands at the beginning of 2024.

In our main collection “MyJournal”, we have over 16,600 entries distributed over 30 categories. Each entry is a JSON document stored in Azure Cosmos and associated assets (images, videos, documents, etc.) stored in Azure Blob Storage. (We use other Azure services like registry, key vault, app registration, and more.)
Left: Growth of two Scrapbook collections. Right: Category distribution for MyJournal collection.

There are over 21,000 links between entries. A link can be something like A is related to B or A is in B.

We estimate that we use Scrapbook around 30-60 minutes a day, which means adding new entries, editing existing entries, and looking up information. The amount of development time is another thing!

Costs to run our Scrapbook depend on many factors such as amount of storage, services used, and regions selected to name just a few. For example, if you are setting up a prototype of our Scrapbook with minimal services and redundancy, you might be looking at 20 - 50 USD or so a month as a ballpark figure.

Recent Activity

This is what we’ve been up to since the last post. (Utterances are in quotes. Underlined words are a category type or a synonym of a category. We use # and @ as in other platforms for hashtags and mentions.)


We moved from Azure App Service to Azure App Containers. There were several assumptions made when using App Service that did not work when going to containers. For example, for our container version of Scrapbook, we needed to re-think our Data Protection Layer and how we were handling/storing session state. Here’s what we followed:

Automated ingest

We found ourselves doing too much manual work to capture data. So we created hooks to pull in different types of data.

For example, our Travelmarx blog RSS feed (for one post) can be pulled in and automatically converted from HTML to markdown with all the photos pulled in to create a new Scrapbook entry. We have a similar hook we use for Spotify API and our Lostvibe music collection. The goal here is to use Scrapbook as a backup of other content creation points or to automate entry creation.

Natural language improvements

We rewrote from scratch our Natural Language Query Engine (see this post) to use Cognitive Language Understanding (CLU). We migrated from Azure LUIS (deprecated) to CLU.

The query engine now supports
  • Sorting, for example:
    • “Show my wines from France sorted by rating”
    • “Tell me about books sorted by modified date”
    • “What are hikes sorted by length ascending”
  • Compound logic (logical operators in queries
    • “Show me albums with the keyword cowboy and keyword blue”
  • Numerical comparisons (nice for rating fields)
    • “Show me hikes longer than 20 km”
  • Qualitative comparisons (high, low, great, bad)
    • “Show our favorite hikes great than 20 km”
  • Inferred queries - “show hashtag XYZ”, “show @person”
    • “Show Greece last year”
    • “Tell me about @Roberto”
  • Negations
    • “Show me items without assets”
    • “Show me activities except for yoga”
    • “Show me food except pasta”
    • “Show me books without ratings”
    • “Show me dining out without lunch”
We know that our NL query engine is a bridge to the time when we can implement Large Language Models (LLM), which at this point seem like our future.

First class assets

We promoted assets to be their own document records in Cosmos.

When we started years ago, Item records were the only document type and everything was packed into them. First, we pulled out relationship data (we call them edges) into their own records. Now, we did the same with assets.

It’s kind of like a normalization you’d do in traditional databases.

Asset document records will help us capture richer data around assets as we move forward and not clutter Item records with asset-related data. Think of AI work, OCR, tagging, etc. Updates can be done to each Asset record without updating Item records. While partial updates exist in Cosmos, updating an Item to update Asset info is clumsy. Plus, separate Asset records allow more easily sharing of assets between items.

Moving to Blazor

The current version of Scrapbook that we run is based on ASP.NET MVC / Web Forms.

As described here, “...ASP.NET Web Forms framework is based on a page-centric architecture. Each HTTP request for a location in the app is a separate page with which ASP.NET responds.”

On the other hand, “Blazor is a client-side web UI framework similar in nature to JavaScript front-end frameworks like Angular or React. Blazor handles user interactions and renders the necessary UI updates. Blazor isn't based on a request-reply model. User interactions are handled as events that aren't in the context of any particular HTTP request.”

We invested a lot of time in our ASP.NET Core MVC architecture with its server-side razor pages, JavaScript, and Bootstrap. This has proven to be solid for us. And usuable from desktop or mobile. But we are at a point where we can’t make changes as quickly as we’d like and the code is complicated, especially the JavaScript we’ve developed.

So, we’ve been working hard on a Blazor implementation of Scrapbook. There is a big difference between these two approaches. As it goes with new technology, you start rearchitecting and rethinking a lot of what you have done not so well ­čśŐ.

Interesting Scenarios

We covered scenarios in the post: Scrapbook Platform – A Personal Information System – Ten User Scenarios. Here we’ll mention just a few others we find particularly useful.

Organizing and documenting trips

We are now using Scrapbook to stub out upcoming trips. If work turns out to just be planning, that’s fine. We keep it in Scrapbook. Our philosophy is of it is something we spent time to prepare, something we learned about, we save it. Chances are it could be useful later.

Years later, it’s easy and gratifying to use a simple query to pull up everything related to the trip. For example, with our Natural Language query capability, we can say “Show hashtag #Turkey2023” and pull up all the related items for the trip.

show hashtag turkey2023
Query "show hashtag Turkey2023". Behind each of these image tiles is as much data about the item as we put in. Who, what, where, when and how.

Staying on the theme of Turkey, we could also query:
  • “Show museums in Istanbul”
  • “Show me hotels in Pamukkale, Turkey”
  • “What were our favorite restaurants in within 10 km of Ephesus”
We often get asked by friends for travel recommendations. Our platform makes it easier to pull out this information.

Capturing recurring themes

It’s often the case that the same subject comes up in different contexts (a museum exhibition, a talk) and it captures our attention. For example, in September 2023 we went to a museum in Spoleto where we saw Beverly Pepper sculptures. The name seemed familiar, and it was because we had saved an article 10 years ago in Scrapbook that was about her. It was very gratifying to make the connection between the new entry and the old entry.

Another example is an idea that we return to a lot in different contexts and that is “memory house”. We can links items together dealing with the subject and create a hashtag to quickly access them.

show hashtag memoryhouse - redacted Show exhibits with Hayez - redacted
Left: Query "show hashtag memoryhouse". Right: "show exhibits with hayez".

Other examples:
  • “Show me articles with democracy”
  • “Show me books about naples”
  • “Show me items with Pythagoras”.

Collection of ideas

We’ve started thinking that our platform is really a “research platform”. The things it contains could be about physical things, but just as easily about ideas or anything really.

Take our category “Books”. It includes books we’ve read and own and ones we haven’t read or don’t own. For example, when friends recommend books to us, and we make notes on them, and may read them, but always save any information as a “book” entry. Also, we often read a lot about a book and understand its major themes but never actually read it. We make a note of all that information and save it as a book entry. This is the sense of research platform that we are talking about.

Another example is our “Botanical” category. It contains plants we own, we owned and don’t anymore, and ones we may have seen in a garden somewhere. The idea is to give all these plants an “entry” in Scrapbook.

show books - redacted - card view  show books in 2005
Left: Query "show books" in card view. Right: Query "show books in 2005" in list view.

show botanical - image view
Query "show botanical" in image view.

Other example queries:
  • “Show books read in 2005”
  • “Show plants in the family Crassulaceae”

Museum visits are more interesting

We like going to museums. We often find that we are standing in front of a work (say a painting) and wondering if we’ve seen it before. Often, we think yes, we have. But how to know that? We take out our phones and search for the artist's name in our “exhibition” category. If we find that we have something about the artist, it makes the museum experience richer for us. A connection is made. It’s like having a personal reference guide and context always on hand.

show assets with lorenzo lotto show assets with Picasso
Left: Query "show assets with lorenzo lotto". Right:  Query "Show assets with picasso".

Other related queries:
  • “Show me exhibits with Picasso sorted by date in ascending order
  • “Tell me about museums in Milan, Italy”
  • “What are exhibits that we really liked in 2022”.
  • “Show exhibits with ‘video installation’"

No comments:

Post a Comment

All comments go through a moderation process. Even though it may not look like the comment was accepted, it probably was. Check back in a day if you asked a question. Thanks!