I’ve been brushing up on intellectual property and patenting lately (case in point, How to Design Around a Problematic Patent). In this post, I wanted to highlight patent landscaping approaches and explore how they’re evolving with technology.

What is a “Patent Landscape?”

Once you have an idea you naturally want to know if someone already thought of it.

“There is no such thing as a new idea. It is impossible. We simply take a lot of old ideas and put them into a sort of mental kaleidoscope. We give them a turn and they make new and curious combinations. We keep on turning and making new combinations indefinitely; but they are the same old pieces of colored glass that have been in use through all the ages.”
— Mark Twain, Mark Twain’s Own Autobiography

Love that quote, bit of a downer though.

A patent landscape is a high level review of similar patents in a given space. It takes some work to scope out appropriate keywords and query terms, but with that in hand, one can assess closely related published patents and make a determination how to move forward. Here is a great visualization from UNC-Chapel Hill’s Market Landscape Research Service using an organic solar technology chemistry example (Reifsnider, Snedden, and Brown 2014):

Patent Landscape Visualization

This visualization is a topology map of similarity. Ideally one wants to avoid the crowds and carve out your own ‘peak.’ Short of that, crowded areas can be good in that indicate activity which presumably may translate to commercial value. However, too crowded and your slice of the patent landscape could be smaller and smaller.

The example above uses a commercial product from Clarivate™. I can’t say I understand the details, I’m guessing it’s largely proprietary.

Old School

Conventionally, there a few different methods to go about patent searches:

  • Keywords: Using a combination of a relevant keywords, Boolean logic and wildcard operators, you can get pretty far. Defining these greatly benefits from a subject matter expert. For example, searching for “laser” will miss some key patents, they were originally described as “optical masers.”
  • Classification: Cooperative Patent Classification (CPC), among others, is an alphanumeric coding scheme to group similar patents. In the past, I’ve not found this approach practical but could be valuable in combination with the methods listed. Perhaps it’s most valuable to find primary patents and review their classification(s).
  • Forward and Backward Citations: A given patent will include both “References Cited” and “Referenced By.” This can be worth exploring as they’ve done some of the work already for you. You might also use this forward/backward approach with an eye for Assignees and/or Inventors.

While there are some great patent search tools out there, Google Patents is pretty solid and easy to use.

New School

Google’s BigQuery has some really interesting public data focused on patents, in addition to many other cool data sources. They published this whitepaper in 2019 which gives a really detailed look at patent landscaping using ML (“Expanding Your Patent Set with ML and BigQuery,” n.d.).

Using this approach, you have more control over clustering algorithms, but requires some knowledge of Python and SQL.

Expanding your patent set with ML and BigQuery

Expanding your patent set with ML and BigQuery

This is on my list of projects to try. Seems very cool and I’m sure already in production for companies and patent firms.

CapitalVX

I reviewed an interesting paper this week that takes this concept a few steps further. The authors created a machine learning model to predict outcomes for startup companies, specifically exits like IPO or M&A (Ross et al. 2021). They used a combination of data from Crunchbase and USPTO to train the ML model with a few notable features, including:

  • Average time between funding rounds
  • Number of male / female founders
  • Number of patents
  • Missing data

The authors put some emphasis on using missing data as a feature. While I’ll let the market (i.e., VC folks) decide on it’s value, I did note the limitations of early stage companies, see below:

Ensemble Recall in CapitalVX

Precision and Recall

Worth take a quick aside to define these two metrics for ML models

Precision: What proportion of positive identifications was actually correct?

Recall: What proportion of actual positives was identified correctly?

Back to the CapitalVX’s ML model, there is an inflection point around Series B — so the model falls short the earlier stage you get. Seems obvious. Risk is all upfront, particularly when predicting something like IPO.

The data notably includes over a million companies, including tech and other sectors. Biotech and medtech are really different animals with lots of technical risk. I think it’d be interesting to know how precision and recall differ across sectors.

In this analysis, it’s not entirely clear how weighted patents are in the algorithims. It’s also focused on objective measure, less so on quality of claims, issued claims, etc. Early stage intellectual property is mostly about demonstrating credibility about the technology, often for investors or funding agencies. If you don’t have it, it’s hard to get far outside of bootstrapping.

The topic of key drivers of startup success reminds me of more traditional research done by Scott Shane at Case Western University. A future topic perhaps…

References

“Expanding Your Patent Set with ML and BigQuery.” n.d. https://cloud.google.com/blog/products/data-analytics/expanding-your-patent-set-with-ml-and-bigquery. https://cloud.google.com/blog/products/data-analytics/expanding-your-patent-set-with-ml-and-bigquery.

Reifsnider, Cynthia, Pam Snedden, and Ashley Brown. 2014. “Patent Landscape and Market Evaluation Report.” UNC-Chapel Hill. https://otc.unc.edu/wp-content/uploads/sites/295/2017/09/PLME-example-report-redacted-122014.pdf.

Ross, Greg, Sanjiv Das, Daniel Sciro, and Hussain Raza. 2021. “CapitalVX: A Machine Learning Model for Startup Selection and Exit Prediction.” The Journal of Finance and Data Science 7 (November): 94–114. https://doi.org/10.1016/j.jfds.2021.04.001.