Skip to content

Commit a075155

Browse files
committed
Upd README
1 parent fa1e6b1 commit a075155

File tree

1 file changed

+94
-7
lines changed

1 file changed

+94
-7
lines changed

README.md

Lines changed: 94 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,114 @@
1-
# Docs Scraper
1+
# Docs Scraper <!-- omit in TOC -->
22

3-
A scraper for MeiliSearch's documentation, indexing the content into a MeiliSearch instance.
3+
A scraper for your documentation website, indexing the content into a MeiliSearch instance.
4+
5+
- [Installation and Usage](#installation-and-usage)
6+
- [From source code](#from-source-code)
7+
- [With Docker](#with-docker)
8+
- [In a GitHub Action](#in-a-github-action)
9+
- [About the API Key](#about-the-api-key)
10+
- [Configuration file](#configuration-file)
11+
- [Related projects](#related-projects)
12+
- [Credits](#credits)
413

5-
_Will be generalized soon for all documentations_
614

715
## Installation and Usage
816

917
This project supports Python 3.6+.
1018

19+
### From source code
20+
1121
Set both environment variables `MEILISEARCH_HOST_URL` and `MEILISEARCH_API_KEY`.
1222

1323
Then, run:
1424
```bash
1525
$ pipenv install
16-
$ pipenv shell
17-
$ ./docs_scraper run config/config.json
26+
$ pipenv run ./docs_scraper run <path-to-your-config-file>
27+
```
28+
29+
### With Docker
30+
31+
```bash
32+
$ docker run -t --rm \
33+
-e MEILISEARCH_HOST_URL=<your-meilisearch-host-url> \
34+
-e MEILISEARCH_API_KEY=<your-meilisearch-api-key> \
35+
-v <absolute-path-to-your-config-file>:/docs-scraper/config.json \
36+
getmeili/docs-scraper:v0.9.0 pipenv run ./docs_scraper config.json
1837
```
1938

20-
_WIP_
39+
### In a GitHub Action
40+
41+
To run after your deployment job:
42+
43+
```yml
44+
run-scraper:
45+
needs: <your-deployment-job>
46+
runs-on: ubuntu-18.04
47+
steps:
48+
- uses: actions/checkout@master
49+
- name: Run scraper
50+
env:
51+
HOST_URL: ${{ secrets.MEILISEARCH_HOST_URL }}
52+
API_KEY: ${{ secrets.MEILISEARCH_API_KEY }}
53+
CONFIG_FILE_PATH: <path-to-your-config-file>
54+
run: |
55+
docker run -t --rm \
56+
-e MEILISEARCH_HOST_URL=$HOST_URL \
57+
-e MEILISEARCH_API_KEY=$API_KEY \
58+
-v $CONFIG_FILE_PATH:/docs-scraper/config.json \
59+
getmeili/docs-scraper:v0.9.0 pipenv run ./docs_scraper config.json
60+
```
61+
62+
Here is the [GitHub Action file](https://github.com/meilisearch/documentation/blob/master/.github/workflows/gh-pages-scraping.yml) we use in production for the MeiliSearch documentation.
63+
64+
### About the API Key
65+
66+
The API key you must provide as environment variable should have the permissions to add documents into your MeiliSearch instance.
67+
68+
Thus, you need to provide the private key or the master key.
69+
70+
_More about [MeiliSearch authentication](https://docs.meilisearch.com/guides/advanced_guides/authentication.html)._
71+
72+
## Configuration file
73+
74+
A generic configuration file:
75+
76+
```json
77+
{
78+
"index_uid": "docs",
79+
"start_urls": ["https://www.example.com/doc/"],
80+
"sitemap_urls": ["https://www.example.com/sitemap.xml"],
81+
"stop_urls": [],
82+
"selectors": {
83+
"lvl0": {
84+
"selector": ".docs-lvl0",
85+
"global": true,
86+
"default_value": "Documentation"
87+
},
88+
"lvl1": {
89+
"selector": ".docs-lvl1",
90+
"global": true,
91+
"default_value": "Chapter"
92+
},
93+
"lvl2": ".docs-content .docs-lvl2",
94+
"lvl3": ".docs-content .docs-lvl3",
95+
"lvl4": ".docs-content .docs-lvl4",
96+
"lvl5": ".docs-content .docs-lvl5",
97+
"lvl6": ".docs-content .docs-lvl6",
98+
"text": ".docs-content p, .docs-content li"
99+
}
100+
}
101+
```
102+
103+
The scraper will focus on the highlighted information depending on your selectors.
104+
105+
Here is the [configuration file](https://github.com/meilisearch/documentation/blob/master/.vuepress/scraper/config.json) we use for the MeiliSearch documentation.
21106

22107
## Related projects
23108

24-
For the front part, check out the [docs-searchbar.js repository](https://github.com/meilisearch/docs-searchbar.js), wich provides a front-end search bar.
109+
After having crawled your documentation, you might need a search bar to improve your user experience!
110+
111+
For the front part, check out the [docs-searchbar.js repository](https://github.com/meilisearch/docs-searchbar.js), wich provides a front-end search bar adapted for documentation.
25112

26113
## Credits
27114

0 commit comments

Comments
 (0)