Skip to content

Commit 3b7b701

Browse files
committed
feat: refactoring of mdscraper
1 parent 528a974 commit 3b7b701

File tree

2 files changed

+6
-3
lines changed

2 files changed

+6
-3
lines changed

examples/openai/md_scraper_openai.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
# ************************************************
3838

3939
md_scraper_graph = DocumentScraperGraph(
40-
prompt="List me all the authors, title and genres of the books",
40+
prompt="List me all the projects",
4141
source=text, # Pass the content of the file, not the file object
4242
config=graph_config
4343
)

scrapegraphai/nodes/parse_node.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,13 @@ def execute(self, state: dict) -> dict:
8585
else:
8686
docs_transformed = docs_transformed[0]
8787

88-
link_urls, img_urls = self._extract_urls(docs_transformed.page_content, source)
88+
try:
89+
link_urls, img_urls = self._extract_urls(docs_transformed.page_content, source)
90+
except Exception as e:
91+
link_urls, img_urls = "", ""
8992

9093
chunk_size = self.chunk_size
91-
chunk_size = min(chunk_size - 500, int(chunk_size * 0.75))
94+
chunk_size = min(chunk_size - 500, int(chunk_size * 0.8))
9295

9396
if isinstance(docs_transformed, Document):
9497
chunks = split_text_into_chunks(text=docs_transformed.page_content,

0 commit comments

Comments
 (0)