|
home
about us
news
features
products
media information
contact us
subscribe
|
PATENT PICTURES
It's patently good news
Making pictures out of the words in patents is changing the way high-tech companies do business. And it's good for the patent searchers themselves, Peter Rees discovers
Patents are wordy things. And comparing and contrasting large numbers of words has always been tough. But before words were written down, there were pictures - and maps (the earliest map, from Catal Hoyuk, in Anatolia, predates writing by 3,000 years). Now, patent searchers are increasingly turning to software-generated maps, and other visualisation tools, in order to make sense of endless pages of new patent information available over the Internet. This is bringing a different character to the profession, says Stuart Dodd, vice-president of professional services at MicroPatent, the US provider of patent information: 'Search professionals have now been tasked, not with giving reams of paper, but with giving analysis'.
These visualisation tools are helping to spread patent-searching to staff at almost every level in a company but, rather than making search professionals redundant, it makes them more valuable, argues Dodd. 'It takes human intervention to decipher what's coming back. They elevate the status of researchers and give them more visibility, and more recognition for the value of the information they provide.' The visualisations complement Boolean searching. 'They sit on top of the dataset you created, so that then you can use Boolean searches within that visualisation to probe and analyse further. It creates the ability to do your searches iteratively that much faster - to go back and refine the search strategies'.
MicroPatent is something of a latecomer to the visualisation party, but in mid-2002 it quickly jumped to the front rank with the $12.4m acquisition, from the bankruptcy courts, of text-mining pioneer Aurigin, after bidding against Thomson and Reuters.
The mid-1990s saw the birth of Aurigin, providing a searchable PDF patent called SmartPatent. It was centred primarily on the legal market, as an information tool for lawyers going to court. 'This was the genesis of Aureka,' says Dodd. Whereas in the mid-1990s MicroPatent was concerned with putting as much content as possible on the Web, and providing an interface that people could to use to search online, Aureka developed along different lines, 'providing content but with visualisation capabilities.' The software went through numerous generations, from a client/server application in the mid-1990s, it continued to evolve, becoming a completely Web-based architecture.
One of the key tools in Aureka is a text-mining module called ThemeScape. Aurigin brought this out to attract business customers by helping corporate strategists to: defend intellectual property against competitors; understand what business areas to operate in; and explore where to license out or buy-in technology. 'So IP started into the strategic marketing and financial areas of the firm, working in tandem with legal librarians, thereby making IP an integral part of the business decision process of the corporation,' says Dodd.
It helped solve the problem of how to compare, in a simple way, the portfolio of company A with that of company B, when tens of thousands of patents are involved. 'You want to be able to show to your management team, to your executive board, the lie of the landscape for intellectual property holdings. You can't do that with a search report the librarian's run,' says Dodd. 'Corporate strategists don't need to care about the details of what's in each patent, they need to see at a very macro level'. ThemeScape produces pseudo-3D maps - like those used by geographers and walkers - with contoured hills representing the patent themes identified.
This sort of analysis has changed the way companies approach mergers and acquisitions. In the standard due diligence process, the intellectual property portfolio was at the tail-end of the review. Now, says Dodd: 'We see more and more companies, when they're doing acquisitions, say "take my portfolio; see how that company I'm going to acquire overlaps" or "show me my portfolio and show me all the potential candidates that I'm looking at, and show me how those portfolios overlap and which ones should I be buying", based on their IP'.
The overview provided by ThemeScape can be probed and analysed, including with other software tools, to answer more detailed questions asked by less elevated users. Having done a search on 10,000 documents, and seen a rough landscape, users may want to focus on one small subset of patents (see Figure 1 below).
- Figure 1: Technology analysis by using comparative colours to highlight the position of technologies and patent assignees
For example, by selecting a keyword, users can overlay the contour map with coloured dots, each representing a patent containing that keyword, or a contour level can be highlighted and all the patents contained within in it selected for citation analysis. This could include how these patents are referred to by other firms (see Figure 2 below).
- Figure 2: Highly cited patents with a cite tree; both backward and forward citations appear on the same screen
The visualisations derived from citation analysis yield information about: the companies involved in an area; what technology is hot and what's not; and who are the key inventors (see Figure 3 below). For example, if a company can find a key inventor who is independent of a company - an academic perhaps - there may be potential to form a partnership. Patent citations can serve as an analogue for their potential market value. 'Citation says a lot, because it's generated by third parties,' says Dodd.
- Figure 3: using IPC analysis with citation trees to highlight new technology trends
With this sort of detailed information: 'Directors of R&D, competitive intelligence/business intelligence people, can then really start to make business decisions on IP portfolios,' says Dodd. Such decisions could include whether to enter a new sector and how to formulate a marketing plan for the attempt. 'In the past, companies typically, and this is probably five years ago, would not really look at the IP in this detail.' Visualisation tools change that, by showing exactly who a company's rivals are, and the evolution and direction of an industry or technological area. 'You know their IP portfolio, you know what they've patented and you know, if you move into that market area, you would have to patent as well,' says Dodd.
The software can help answer the sort of questions that keep R&D directors awake at night: 'Am I betting on the right technology? Is there someone else in this market that I'm not aware of? 'There's no other public information that's as germane or as standardised as the patent information to gather this input,' says Dodd. 'If I'm a multinational pharmaceutical company, I could say, "pull up some of the chemical compounds they [my competitors] may have, see who's doing some patenting around that," or if I'm a large telco, "pull up some of the cellphone technology, to see what others in this space are patenting". This helps research directors answer the question: "Should I continue to invest in this technology?"
'One thing we've heard from our customers, especially in the R&D community, is how much time the tools save them,' says Dodd. Work that might previously have taken weeks or even months - depending on the complexity of the project - can be done in hours or days. At the same time, the tools expand the scope of the investigation, says Dodd. 'When you start the project, you have some preconceived notion of what you might get out of it. But you might be focusing just on that area, and it opens up a much broader set of possibilities that would allow a researcher to go and find things that he or she wouldn't have thought of, because of the visualisation aspect of the tool.'
For the future, MicroPatent's President, Daniel Videtto, sees a number of ways in which Aureka might change and adapt. Like other large-scale information providers, MicroPatent is seeing an increasing interest in taking patent-search tools in-house. The search application can already sit behind a customer's firewall, with only the patent data remaining with MicroPatent and accessed via the Web (and there are even some that bring customised data in-house). This allows customers to store and analyse corporate and other non-patent documents. Tighter integration, between internal databases and search and visualisation software, is the wave of the future, says Videtto. Another area of interest is putting a value on patents. 'There's so much research going on right now about how to put a monetary value on IP. IP monetisation is something people need to understand. We see this as an important area for growth, and plan to be at the forefront of it.'
This is part of the greater penetration of intellectual property management into overall business strategy. 'It's about how we moved from providing data to providing analysis. Really, it's about how you extract value and make all this actionable, not just providing that patent report, but: "Should I invest in R&D technology? Should I go to acquire company XYZ? Should I outlicense my patents? What's the value of my patents? How do I integrate my patents? Should I be maintaining this large portfolio?",' reels off Dodd. And it's following a well-trodden path for information systems. 'It's almost like the customer-relationship management industry or ERP [enterprise resource planning] or HR systems, they were very focused on a functional area, then ERP moved from manufacturing to finance'. It's the same with intellectual property. 'It's not only the domain of the researcher or the librarian, but it's for people across the organisation, expanding enterprise-wide solutions.'
Algorithms used in ThemeScape
ThemeScape uses three main algorithms to analyse the words used in patents, and to identify themes and relationships between documents, in a four-step process of harvesting, analysing, clustering, and mapping.
In order to cut the time needed to process thousands of documents, words that aren't essential to classifying them - so-called stopwords - are filtered out. Another trick is to group associated words together. After this the first algorithm, TFIDF (term frequency inverted document frequency), gets to work. This scores words in the document, in order to select concepts that represent the whole document. Words are scored by the number of times they appear (the 'term frequency') multiplied by the log of the number of documents divided by the number of documents containing the words (the 'document frequency'). Words that appear in too many documents are removed, leaving the remaining words as 'topics'. There may be 100 to 300 of these, depending on size of the original document sample.
Next, using the 'Naive Bayes' algorithm, each document is assigned a 'radial co-ordinate' or 'vector' in n-dimensional space, based upon the topics in the document. Documents with topics in common are clustered more closely than those where topics are not related. These closely related documents are then clustered around a central coordinate. The algorithms then flatten the representation to produce a two-dimensional, ThemeScape map. Each patent is placed once on the map in the same clustered relationship as in n-dimensional space, using a self-organising map (SOM) algorithm. More closely related documents are placed nearer to each other, and colour-shading shows the density of documents.
|