Skip to content

Commit 5239a89

Browse files
committed
Upd readme with authentication
1 parent edc859e commit 5239a89

File tree

1 file changed

+31
-0
lines changed

1 file changed

+31
-0
lines changed

README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ A scraper for your documentation website that indexes the scraped content into a
99
- [About the API Key](#about-the-api-key)
1010
- [Configuration file](#configuration-file)
1111
- [And for the search bar?](#and-for-the-search-bar)
12+
- [Authentication](#authentication)
13+
- [Installing Chrome Headless](#installing-chrome-headless)
1214
- [Development Workflow](#development-workflow)
1315
- [Credits](#credits)
1416

@@ -19,6 +21,8 @@ This project supports Python 3.6+.
1921

2022
### From Source Code
2123

24+
The [`pipenv` command](https://pipenv.readthedocs.io/en/latest/install/#installing-pipenv) must be installed.
25+
2226
Set both environment variables `MEILISEARCH_HOST_URL` and `MEILISEARCH_API_KEY`.
2327

2428
Then, run:
@@ -111,10 +115,37 @@ After having crawled your documentation, you might need a search bar to improve
111115

112116
For the front part, check out the [docs-searchbar.js repository](https://github.com/meilisearch/docs-searchbar.js), wich provides a front-end search bar adapted for documentation.
113117

118+
## Authentication
119+
120+
__WARNING:__ Please be aware that the scraper will send authentication headers to every scraped site, so use `allowed_domains` to adjust the scope accordingly!
121+
122+
### Basic HTTP <!-- omit in TOC -->
123+
124+
Basic HTTP authentication is supported by setting these environment variables:
125+
- `DOCS_SCRAPER_BASICAUTH_USERNAME`
126+
- `DOCS_SCRAPER_BASICAUTH_PASSWORD`
127+
128+
### Cloudflare Access: Identity and Access Management <!-- omit in TOC -->
129+
130+
If it happens to you to scrape sites protected by Cloudflare Access, you have to set appropriate HTTP headers.
131+
132+
Values for these headers are taken from env variables `CF_ACCESS_CLIENT_ID` and `CF_ACCESS_CLIENT_SECRET`.
133+
134+
In case of Google Cloud Identity-Aware Proxy, please specify these env variables:
135+
- `IAP_AUTH_CLIENT_ID` - # pick [client ID of the application](https://console.cloud.google.com/apis/credentials) you are connecting to
136+
- `IAP_AUTH_SERVICE_ACCOUNT_JSON` - # generate in [Actions](https://console.cloud.google.com/iam-admin/serviceaccounts) -> Create key -> JSON
137+
138+
## Installing Chrome Headless
139+
140+
Websites that need JavaScript for rendering are passed through ChromeDriver.<br>
141+
[Download the version](http://chromedriver.chromium.org/downloads) suited to your OS and then set the environment variable `CHROMEDRIVER_PATH`.
142+
114143
## Development Workflow
115144

116145
### Install and Launch <!-- omit in TOC -->
117146

147+
The [`pipenv` command](https://pipenv.readthedocs.io/en/latest/install/#installing-pipenv) must be installed.
148+
118149
Set both environment variables `MEILISEARCH_HOST_URL` and `MEILISEARCH_API_KEY`.
119150

120151
Then, run:

0 commit comments

Comments
 (0)