-
-
Notifications
You must be signed in to change notification settings - Fork 46.9k
Scraping prescription drug prices from Rx site using the prescription drug name and zipcode #5959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
ec3fcad
d01bf72
bf85317
604e457
a9dbb6a
7538066
0717469
c41a97b
e23f308
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
""" | ||
|
||
Scrape the price and pharmacy name for a prescription drug from rx site | ||
after providing the drug name and zipcode. | ||
|
||
""" | ||
|
||
import lxml | ||
|
||
from typing import Union | ||
from requests import Response, get | ||
from bs4 import BeautifulSoup | ||
|
||
|
||
def format_price(price: str) -> float: | ||
"""[summary] | ||
|
||
Remove the dollar from the string and convert it to float. | ||
|
||
Args: | ||
price (str): [price of drug in string format] | ||
|
||
Returns: | ||
float: [formatted price of drug in float] | ||
|
||
>>> format_price("$14") | ||
14.0 | ||
|
||
>>> format_price("$15.67") | ||
15.67 | ||
|
||
>>> format_price("$0.00") | ||
0.0 | ||
|
||
""" | ||
dollar_removed: str = price.replace("$", "") | ||
formatted_price: float = float(dollar_removed) | ||
return formatted_price | ||
|
||
|
||
def fetch_pharmacy_and_price_list(drug_name: str, zip_code: str) -> Union[list, None]: | ||
"""[summary] | ||
|
||
This function will take input of drug name and zipcode, then request to the BASE_URL site, | ||
Get the page data and scrape it to the generate the list of lowest prices for the prescription drug. | ||
|
||
Args: | ||
drug_name (str): [Drug name] | ||
zip_code(str): [Zip code] | ||
|
||
Returns: | ||
list: [List of pharmacy name and price] | ||
|
||
>>> fetch_pharmacy_and_price_list(None, None) | ||
|
||
>>> fetch_pharmacy_and_price_list(None, 30303) | ||
|
||
>>> fetch_pharmacy_and_price_list("eliquis", None) | ||
|
||
""" | ||
|
||
try: | ||
|
||
# Has user provided both inputs? | ||
if not drug_name or not zip_code: | ||
return None | ||
|
||
request_url: str = f'https://www.wellrx.com/prescriptions/{drug_name}/{zip_code}/?freshSearch=true' | ||
response: Response = get(request_url) | ||
|
||
# Is the status code ok? | ||
if response.status_code == 200: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added response.raise_for_status() and changed the code accordingly. |
||
|
||
# Scrape the data using bs4 | ||
soup: BeautifulSoup = BeautifulSoup(response.text, "lxml") | ||
saptarshi1996 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# This list will store the name and price. | ||
pharmacy_price_list: list = [] | ||
|
||
# Fetch all the grids that contains the items. | ||
grid_list: list = soup.find_all( | ||
"div", {"class": "grid-x pharmCard"}) | ||
if grid_list and len(grid_list) > 0: | ||
for grid in grid_list: | ||
|
||
# Get the pharmacy price. | ||
pharmacy_name: str = grid.find( | ||
"p", {"class": "list-title"}).text | ||
|
||
# Get price of the drug. | ||
price: str = grid.find( | ||
"span", {"p", "price price-large"}).text | ||
formatted_price: float = format_price(price) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reason to get rid of the $ and the two digits to the right of the decimal point? Are we going to do math (add subtract, multiply, divide) on these numbers? If not, let's not modify the formatting. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed the conversion as we were not using those values |
||
|
||
pharmacy_price_list.append({ | ||
"pharmacy_name": pharmacy_name, | ||
"price": formatted_price, | ||
}) | ||
|
||
return pharmacy_price_list | ||
|
||
else: | ||
return None | ||
|
||
except Exception as e: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed except Exception as e: with except (HTTPError, exceptions.RequestException, ValueError):. This was a new learning for me. |
||
return None | ||
|
||
|
||
if __name__ == "__main__": | ||
|
||
# Enter a drug name and a zip code | ||
drug_name: str = input("Enter drug Name:\n") | ||
zip_code: str = input("Enter zip code:\n") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See README.md for advise for leading and/or trailing spaces in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Went through readme and codebase and added the input as |
||
pharmacy_price_list: Union[list, None] = fetch_pharmacy_and_price_list( | ||
drug_name, zip_code) | ||
|
||
if pharmacy_price_list: | ||
print(f'\nSearch results for {drug_name} at location {zip_code}:') | ||
for pharmacy_price in pharmacy_price_list: | ||
print( | ||
f'Pharmacy: {pharmacy_price["pharmacy_name"]} Price: {pharmacy_price["price"]}') | ||
else: | ||
print("No results found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put type hints on function parameters and function return types but we do not need them everywhere. Both Python and mypy are capable of figuring out that a string literal is a string. ;-) Overuse slows down both the writer and reader of code.