Skip to content

Pre/beta #136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
## [0.7.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.6.2...v0.7.0-beta.1) (2024-05-03)


### Features

* add base_node to __init__.py ([cb1cb61](https://github.com/VinciGit00/Scrapegraph-ai/commit/cb1cb616b7998d3624bf57b19b5f1b1945fea4ef))

## [0.6.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.6.1...v0.6.2) (2024-05-02)


Expand Down
120 changes: 120 additions & 0 deletions examples/azure/inputs/books.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
<book id="bk105">
<author>Corets, Eva</author>
<title>The Sundered Grail</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-09-10</publish_date>
<description>The two daughters of Maeve, half-sisters,
battle one another for control of England. Sequel to
Oberon's Legacy.</description>
</book>
<book id="bk106">
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-09-02</publish_date>
<description>When Carla meets Paul at an ornithology
conference, tempers fly as feathers get ruffled.</description>
</book>
<book id="bk107">
<author>Thurman, Paula</author>
<title>Splish Splash</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-11-02</publish_date>
<description>A deep sea diver finds true love twenty
thousand leagues beneath the sea.</description>
</book>
<book id="bk108">
<author>Knorr, Stefan</author>
<title>Creepy Crawlies</title>
<genre>Horror</genre>
<price>4.95</price>
<publish_date>2000-12-06</publish_date>
<description>An anthology of horror stories about roaches,
centipedes, scorpions and other insects.</description>
</book>
<book id="bk109">
<author>Kress, Peter</author>
<title>Paradox Lost</title>
<genre>Science Fiction</genre>
<price>6.95</price>
<publish_date>2000-11-02</publish_date>
<description>After an inadvertant trip through a Heisenberg
Uncertainty Device, James Salway discovers the problems
of being quantum.</description>
</book>
<book id="bk110">
<author>O'Brien, Tim</author>
<title>Microsoft .NET: The Programming Bible</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-09</publish_date>
<description>Microsoft's .NET initiative is explored in
detail in this deep programmer's reference.</description>
</book>
<book id="bk111">
<author>O'Brien, Tim</author>
<title>MSXML3: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-01</publish_date>
<description>The Microsoft MSXML3 parser is covered in
detail, with attention to XML DOM interfaces, XSLT processing,
SAX and more.</description>
</book>
<book id="bk112">
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>49.95</price>
<publish_date>2001-04-16</publish_date>
<description>Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.</description>
</book>
</catalog>
182 changes: 182 additions & 0 deletions examples/azure/inputs/example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
{
"kind":"youtube#searchListResponse",
"etag":"q4ibjmYp1KA3RqMF4jFLl6PBwOg",
"nextPageToken":"CAUQAA",
"regionCode":"NL",
"pageInfo":{
"totalResults":1000000,
"resultsPerPage":5
},
"items":[
{
"kind":"youtube#searchResult",
"etag":"QCsHBifbaernVCbLv8Cu6rAeaDQ",
"id":{
"kind":"youtube#video",
"videoId":"TvWDY4Mm5GM"
},
"snippet":{
"publishedAt":"2023-07-24T14:15:01Z",
"channelId":"UCwozCpFp9g9x0wAzuFh0hwQ",
"title":"3 Football Clubs Kylian Mbappe Should Avoid Signing ✍️❌⚽️ #football #mbappe #shorts",
"description":"",
"thumbnails":{
"default":{
"url":"https://i.ytimg.com/vi/TvWDY4Mm5GM/default.jpg",
"width":120,
"height":90
},
"medium":{
"url":"https://i.ytimg.com/vi/TvWDY4Mm5GM/mqdefault.jpg",
"width":320,
"height":180
},
"high":{
"url":"https://i.ytimg.com/vi/TvWDY4Mm5GM/hqdefault.jpg",
"width":480,
"height":360
}
},
"channelTitle":"FC Motivate",
"liveBroadcastContent":"none",
"publishTime":"2023-07-24T14:15:01Z"
}
},
{
"kind":"youtube#searchResult",
"etag":"0NG5QHdtIQM_V-DBJDEf-jK_Y9k",
"id":{
"kind":"youtube#video",
"videoId":"aZM_42CcNZ4"
},
"snippet":{
"publishedAt":"2023-07-24T16:09:27Z",
"channelId":"UCM5gMM_HqfKHYIEJ3lstMUA",
"title":"Which Football Club Could Cristiano Ronaldo Afford To Buy? 💰",
"description":"Sign up to Sorare and get a FREE card: https://sorare.pxf.io/NellisShorts Give Soraredata a go for FREE: ...",
"thumbnails":{
"default":{
"url":"https://i.ytimg.com/vi/aZM_42CcNZ4/default.jpg",
"width":120,
"height":90
},
"medium":{
"url":"https://i.ytimg.com/vi/aZM_42CcNZ4/mqdefault.jpg",
"width":320,
"height":180
},
"high":{
"url":"https://i.ytimg.com/vi/aZM_42CcNZ4/hqdefault.jpg",
"width":480,
"height":360
}
},
"channelTitle":"John Nellis",
"liveBroadcastContent":"none",
"publishTime":"2023-07-24T16:09:27Z"
}
},
{
"kind":"youtube#searchResult",
"etag":"WbBz4oh9I5VaYj91LjeJvffrBVY",
"id":{
"kind":"youtube#video",
"videoId":"wkP3XS3aNAY"
},
"snippet":{
"publishedAt":"2023-07-24T16:00:50Z",
"channelId":"UC4EP1dxFDPup_aFLt0ElsDw",
"title":"PAULO DYBALA vs THE WORLD'S LONGEST FREEKICK WALL",
"description":"Can Paulo Dybala curl a football around the World's longest free kick wall? We met up with the World Cup winner and put him to ...",
"thumbnails":{
"default":{
"url":"https://i.ytimg.com/vi/wkP3XS3aNAY/default.jpg",
"width":120,
"height":90
},
"medium":{
"url":"https://i.ytimg.com/vi/wkP3XS3aNAY/mqdefault.jpg",
"width":320,
"height":180
},
"high":{
"url":"https://i.ytimg.com/vi/wkP3XS3aNAY/hqdefault.jpg",
"width":480,
"height":360
}
},
"channelTitle":"Shoot for Love",
"liveBroadcastContent":"none",
"publishTime":"2023-07-24T16:00:50Z"
}
},
{
"kind":"youtube#searchResult",
"etag":"juxv_FhT_l4qrR05S1QTrb4CGh8",
"id":{
"kind":"youtube#video",
"videoId":"rJkDZ0WvfT8"
},
"snippet":{
"publishedAt":"2023-07-24T10:00:39Z",
"channelId":"UCO8qj5u80Ga7N_tP3BZWWhQ",
"title":"TOP 10 DEFENDERS 2023",
"description":"SoccerKingz https://soccerkingz.nl Use code: 'ILOVEHOF' to get 10% off. TOP 10 DEFENDERS 2023 Follow us! • Instagram ...",
"thumbnails":{
"default":{
"url":"https://i.ytimg.com/vi/rJkDZ0WvfT8/default.jpg",
"width":120,
"height":90
},
"medium":{
"url":"https://i.ytimg.com/vi/rJkDZ0WvfT8/mqdefault.jpg",
"width":320,
"height":180
},
"high":{
"url":"https://i.ytimg.com/vi/rJkDZ0WvfT8/hqdefault.jpg",
"width":480,
"height":360
}
},
"channelTitle":"Home of Football",
"liveBroadcastContent":"none",
"publishTime":"2023-07-24T10:00:39Z"
}
},
{
"kind":"youtube#searchResult",
"etag":"wtuknXTmI1txoULeH3aWaOuXOow",
"id":{
"kind":"youtube#video",
"videoId":"XH0rtu4U6SE"
},
"snippet":{
"publishedAt":"2023-07-21T16:30:05Z",
"channelId":"UCwozCpFp9g9x0wAzuFh0hwQ",
"title":"3 Things You Didn't Know About Erling Haaland ⚽️🇳🇴 #football #haaland #shorts",
"description":"",
"thumbnails":{
"default":{
"url":"https://i.ytimg.com/vi/XH0rtu4U6SE/default.jpg",
"width":120,
"height":90
},
"medium":{
"url":"https://i.ytimg.com/vi/XH0rtu4U6SE/mqdefault.jpg",
"width":320,
"height":180
},
"high":{
"url":"https://i.ytimg.com/vi/XH0rtu4U6SE/hqdefault.jpg",
"width":480,
"height":360
}
},
"channelTitle":"FC Motivate",
"liveBroadcastContent":"none",
"publishTime":"2023-07-21T16:30:05Z"
}
}
]
}
7 changes: 7 additions & 0 deletions examples/azure/inputs/username.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Username; Identifier;First name;Last name
booker12;9012;Rachel;Booker
grey07;2070;Laura;Grey
johnson81;4081;Craig;Johnson
jenkins46;9346;Mary;Jenkins
smith79;5079;Jamie;Smith

68 changes: 68 additions & 0 deletions examples/azure/json_scraper_azure.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""
Basic example of scraping pipeline using SmartScraper using Azure OpenAI Key
"""

import os
from dotenv import load_dotenv
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from scrapegraphai.graphs import JSONScraperGraph
from scrapegraphai.utils import prettify_exec_info


# required environment variable in .env
# AZURE_OPENAI_ENDPOINT
# AZURE_OPENAI_CHAT_DEPLOYMENT_NAME
# MODEL_NAME
# AZURE_OPENAI_API_KEY
# OPENAI_API_TYPE
# AZURE_OPENAI_API_VERSION
# AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME
load_dotenv()


FILE_NAME = "inputs/example.json"
curr_dir = os.path.dirname(os.path.realpath(__file__))
file_path = os.path.join(curr_dir, FILE_NAME)

with open(file_path, 'r', encoding="utf-8") as file:
text = file.read()

# ************************************************
# Initialize the model instances
# ************************************************

llm_model_instance = AzureChatOpenAI(
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
azure_deployment=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"]
)

embedder_model_instance = AzureOpenAIEmbeddings(
azure_deployment=os.environ["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME"],
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)

# ************************************************
# Create the JSONScraperGraph instance and run it
# ************************************************

graph_config = {
"llm": {"model_instance": llm_model_instance},
"embeddings": {"model_instance": embedder_model_instance}
}

smart_scraper_graph = JSONScraperGraph(
prompt="List me all the authors, title and genres of the books",
source=text, # Pass the content of the file, not the file object
config=graph_config
)

result = smart_scraper_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))
Loading