Extract A Specific Header From Html Using Beautiful Soup
Solution 1:
The number of citations is created dynamically via JavaScript. But you can count number of elements with itemprop="forwardReferencesFamily"
to get the count. For example:
import requests
from bs4 import BeautifulSoup
url = 'https://patents.google.com/patent/EP1208209A1/en?oq=medicinal+chemistry'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
print(len(soup.select('tr[itemprop="forwardReferencesFamily"]')))
Prints:
4
Solution 2:
Hi in this link https://patents.google.com/patent/WO2012061469A3/en?oq=medicinal+chemistry I want the code to print the patent citations which should give publication number, title. I then want to use pandas to put publication number in a column and the title in another column. so far I have used beautiful soup to convert the HTML file into a readable format.I have selected backward references HTML tag and under that I want it to print the publication number and title of the citations. I am citing one single example, but I have a folder full of HTML files which I will do later.
x=soup.select('tr[itemprop="backwardReferences"]')
y=soup.select('td[itemprop="title"]') # this line gives all the titles in the document not particularly under the patent citations
print(y)
Post a Comment for "Extract A Specific Header From Html Using Beautiful Soup"