1. Set a goalIs it an invalidity search, a patentability search, or a freedom to operate search? Should the search be exhaustive, exhaustive within a certain field, or simply limited in time or cost? Define a number of search sub-tasks to complete, and budget your time for each sub-task. It likely goes without saying that a little planning goes a long way.
2. Scout out search terms by numbers aloneThis trick will help you to determine what you’re up against when running complex keyword/classification database queries. Start with a list of keywords (including synonyms) and classifications. Build query “atoms” using the list and query each atom individually and then in pairs. For example: Atom “A”: “toaster” returns 12,208 records from a database containing 86,619,634 total records, or 0.0141%. Atom “B”: “butter or jam or spread” returns 1,801,994 records from the same database, or 2.0804%. This shows that Atom “A” is much more specific than Atom “B” and should be treated as a limiting factor. But this is only part of the story. If a pair of atoms is unrelated, the expected number of documents returned by a query for both atoms is given by multiplying the percentages of records returned. For the above example, we would thus expect a query for “toaster AND (butter or jam or spread)” to return 0.0003% of all records, or 254 records. However, the query “toaster AND (butter or jam or spread)” actually returns 1,689 records, which suggests that these two terms are strongly (positively) correlated. The higher the ratio of actual records to expected records, the stronger the correlation. This can be used to identify terms that go together strongly, or that are rarely encountered together.
3. Use random sampling to gauge the effectiveness of a search queryPatent databases are closing in on 100 million entries, and even a well-constructed query can return a large number of irrelevant results. You can estimate the effectiveness of a large query without reviewing every single document by randomly choosing a much smaller subset for review. The proportion of relevant results in the subset will be roughly the same as the proportion of relevant results overall, and a higher proportion means a better query. What size should your sample be? The Statistics 101 answer is “as big as you can make it,” followed by a set of precise formulas for various situations. You might choose, as a rule of thumb, a sample size of at least 30 documents, at least 50 documents, or a proportion such as 5% to 10% of the total number of results.
4. Follow forward and backward citations, not just by one degree but by multiple degreesAccording to this 2011 article by K. Itakura and G. Shlomo, the average degree of separation between two related patents is 6, while the average distance for a random pair of patents is 15. This suggests that searching within 6 degrees of separation will return some (but not all) related patents. However, the number of patent documents within 6 to 13 degrees of separation of a given “starting” patent is typically very large. Therefore, the recommended approach is to compile a list of patents which fall within X degrees of a “starting” patent, and then search this list using keywords, classifications, etc. This heuristic has application to both patent documents and non-patent academic literature.
(Speaking of citation networks, Amberscope is currently offering a free trial tool for visualizing patent citation networks.)
5. Engage the communityFor invalidity searches, Google Patents and StackExchange have teamed up to provide a forum which is intended to allow users to ask others to provide prior art on a particular patent. As well, companies such as Article One Partners allow you to put up a $5,000 bounty for prior art relevant to a particular patent.
By Michael Maskery