Skip to content

Commit e1e6746

Browse files
authored
Merge pull request #10 from meilisearch/docs
Upd README
2 parents fa1e6b1 + a85b980 commit e1e6746

File tree

1 file changed

+121
-8
lines changed

1 file changed

+121
-8
lines changed

README.md

Lines changed: 121 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,140 @@
1-
# Docs Scraper
1+
# Docs Scraper <!-- omit in TOC -->
22

3-
A scraper for MeiliSearch's documentation, indexing the content into a MeiliSearch instance.
3+
A scraper for your documentation website that indexes the scraped content into a MeiliSearch instance.
4+
5+
- [Installation and Usage](#installation-and-usage)
6+
- [From Source Code](#from-source-code)
7+
- [With Docker](#with-docker)
8+
- [In a GitHub Action](#in-a-github-action)
9+
- [About the API Key](#about-the-api-key)
10+
- [Configuration file](#configuration-file)
11+
- [And for the search bar?](#and-for-the-search-bar)
12+
- [Development Workflow](#development-workflow)
13+
- [Credits](#credits)
414

5-
_Will be generalized soon for all documentations_
615

716
## Installation and Usage
817

918
This project supports Python 3.6+.
1019

20+
### From Source Code
21+
22+
Set both environment variables `MEILISEARCH_HOST_URL` and `MEILISEARCH_API_KEY`.
23+
24+
Then, run:
25+
```bash
26+
$ pipenv install
27+
$ pipenv run ./docs_scraper run <path-to-your-config-file>
28+
```
29+
30+
### With Docker
31+
32+
```bash
33+
$ docker run -t --rm \
34+
-e MEILISEARCH_HOST_URL=<your-meilisearch-host-url> \
35+
-e MEILISEARCH_API_KEY=<your-meilisearch-api-key> \
36+
-v <absolute-path-to-your-config-file>:/docs-scraper/config.json \
37+
getmeili/docs-scraper:v0.9.0 pipenv run ./docs_scraper config.json
38+
```
39+
40+
### In a GitHub Action
41+
42+
To run after your deployment job:
43+
44+
```yml
45+
run-scraper:
46+
needs: <your-deployment-job>
47+
runs-on: ubuntu-18.04
48+
steps:
49+
- uses: actions/checkout@master
50+
- name: Run scraper
51+
env:
52+
HOST_URL: ${{ secrets.MEILISEARCH_HOST_URL }}
53+
API_KEY: ${{ secrets.MEILISEARCH_API_KEY }}
54+
CONFIG_FILE_PATH: <path-to-your-config-file>
55+
run: |
56+
docker run -t --rm \
57+
-e MEILISEARCH_HOST_URL=$HOST_URL \
58+
-e MEILISEARCH_API_KEY=$API_KEY \
59+
-v $CONFIG_FILE_PATH:/docs-scraper/config.json \
60+
getmeili/docs-scraper:v0.9.0 pipenv run ./docs_scraper config.json
61+
```
62+
63+
Here is the [GitHub Action file](https://github.com/meilisearch/documentation/blob/master/.github/workflows/gh-pages-scraping.yml) we use in production for the MeiliSearch documentation.
64+
65+
### About the API Key
66+
67+
The API key you must provide as environment variable should have the permissions to add documents into your MeiliSearch instance.
68+
69+
Thus, you need to provide the private key or the master key.
70+
71+
_More about [MeiliSearch authentication](https://docs.meilisearch.com/guides/advanced_guides/authentication.html)._
72+
73+
## Configuration file
74+
75+
A generic configuration file:
76+
77+
```json
78+
{
79+
"index_uid": "docs",
80+
"start_urls": ["https://www.example.com/doc/"],
81+
"sitemap_urls": ["https://www.example.com/sitemap.xml"],
82+
"stop_urls": [],
83+
"selectors": {
84+
"lvl0": {
85+
"selector": ".docs-lvl0",
86+
"global": true,
87+
"default_value": "Documentation"
88+
},
89+
"lvl1": {
90+
"selector": ".docs-lvl1",
91+
"global": true,
92+
"default_value": "Chapter"
93+
},
94+
"lvl2": ".docs-content .docs-lvl2",
95+
"lvl3": ".docs-content .docs-lvl3",
96+
"lvl4": ".docs-content .docs-lvl4",
97+
"lvl5": ".docs-content .docs-lvl5",
98+
"lvl6": ".docs-content .docs-lvl6",
99+
"text": ".docs-content p, .docs-content li"
100+
}
101+
}
102+
```
103+
104+
The scraper will focus on the highlighted information depending on your selectors.
105+
106+
Here is the [configuration file](https://github.com/meilisearch/documentation/blob/master/.vuepress/scraper/config.json) we use for the MeiliSearch documentation.
107+
108+
## And for the search bar?
109+
110+
After having crawled your documentation, you might need a search bar to improve your user experience!
111+
112+
For the front part, check out the [docs-searchbar.js repository](https://github.com/meilisearch/docs-searchbar.js), wich provides a front-end search bar adapted for documentation.
113+
114+
## Development Workflow
115+
116+
### Install and Launch <!-- omit in TOC -->
117+
11118
Set both environment variables `MEILISEARCH_HOST_URL` and `MEILISEARCH_API_KEY`.
12119

13120
Then, run:
14121
```bash
15122
$ pipenv install
16-
$ pipenv shell
17-
$ ./docs_scraper run config/config.json
123+
$ pipenv run ./docs_scraper run <path-to-your-config-file>
18124
```
19125

20-
_WIP_
126+
### Release <!-- omit in TOC -->
127+
128+
Once the changes are merged on `master`, in your terminal, you must be on the `master` branch and push a new tag with the right version:
21129

22-
## Related projects
130+
```bash
131+
$ git checkout master
132+
$ git pull origin master
133+
$ git tag vX.X.X
134+
$ git push --tag origin master
135+
```
23136

24-
For the front part, check out the [docs-searchbar.js repository](https://github.com/meilisearch/docs-searchbar.js), wich provides a front-end search bar.
137+
A GitHub Action will be triggered and push the `latest` and `vX.X.X` version of Docker image on [DockerHub](https://hub.docker.com/repository/docker/getmeili/docs-scraper)
25138

26139
## Credits
27140

0 commit comments

Comments
 (0)