Skip to content

feat(rule): "には" を 一つの助詞として認識するように #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 4, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,17 @@ textlint --rule no-doubled-joshi README.md

接続助詞 "て" の重なりは例外として許可する。

### 連語(助詞)

- [連語(助詞) - 修飾語 - 品詞の分類 - Weblio 辞書](http://www.weblio.jp/parts-of-speech/%E9%80%A3%E8%AA%9E(%E5%8A%A9%E8%A9%9E)_1 "連語(助詞) - 修飾語 - 品詞の分類 - Weblio 辞書")

連語は一つの助詞の塊として認識します。

```
OK: 文字列の長さを正確**に**測る**には**ある程度の妥協が必要になります。
NG: 文字列**には**そこ**には***問題がある。
```

### その他の助詞

その他の助詞も例外として扱いたい場合は `allow` オプションを利用する。
Expand Down
28 changes: 17 additions & 11 deletions src/no-doubled-joshi.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,21 @@ import {split as splitSentences, Syntax as SentenceSyntax} from "sentence-splitt
import StringSource from "textlint-util-to-string";
import {
is助詞Token, is読点Token,
createKeyFromKey, restoreToSurfaceFromKey
concatJoishiTokens,
createKeyFromKey,
restoreToSurfaceFromKey
} from "./token-utils";
/**
* Create token map object
* {
* "で": [token, token],
* "の": [token, token]
* "は:助詞.係助詞": [token, token]
* }
* @param tokens
* @returns {*}
*/
function createSurfaceKeyMap(tokens) {
// 助詞のみを対象とする
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ここはいらなかった。 tokensには助詞tokensのみしかこない

return tokens.filter(is助詞Token).reduce((keyMap, token) => {
return tokens.reduce((keyMap, token) => {
// "は:助詞.係助詞" : [token]
const tokenKey = createKeyFromKey(token);
if (!keyMap[tokenKey]) {
Expand Down Expand Up @@ -70,7 +71,7 @@ export default function(context, options = {}) {
const isStrict = options.strict || defaultOptions.strict;
const allow = options.allow || defaultOptions.allow;
const separatorChars = options.separatorChars || defaultOptions.separatorChars;
const {Syntax, report, getSource, RuleError} = context;
const {Syntax, report, RuleError} = context;
return {
[Syntax.Paragraph](node){
if (helper.isChildNode(node, [Syntax.Link, Syntax.Image, Syntax.BlockQuote, Syntax.Emphasis])) {
Expand All @@ -81,13 +82,18 @@ export default function(context, options = {}) {
const isSentenceNode = node => {
return node.type === SentenceSyntax.Sentence;
};
let sentences = splitSentences(text, {
const sentences = splitSentences(text, {
separatorChars: separatorChars
}).filter(isSentenceNode);
return getTokenizer().then(tokenizer => {
const checkSentence = (sentence) => {
let tokens = tokenizer.tokenizeForSentence(sentence.raw);
let countableTokens = tokens.filter(token => {
const tokens = tokenizer.tokenizeForSentence(sentence.raw);
// 助詞 + 助詞は 一つの助詞として扱う
// https://github.com/textlint-ja/textlint-rule-no-doubled-joshi/issues/15
// 連語(助詞)の対応
// http://www.weblio.jp/parts-of-speech/%E9%80%A3%E8%AA%9E(%E5%8A%A9%E8%A9%9E)_1
const concatTokens = concatJoishiTokens(tokens);
const countableTokens = concatTokens.filter(token => {
if (isStrict) {
return is助詞Token(token);
}
Expand All @@ -96,14 +102,14 @@ export default function(context, options = {}) {
// https://github.com/azu/textlint-rule-no-doubled-joshi/issues/2
return is助詞Token(token) || is読点Token(token);
});
let joshiTokenSurfaceKeyMap = createSurfaceKeyMap(countableTokens);
const joshiTokenSurfaceKeyMap = createSurfaceKeyMap(countableTokens);
/*
# Data Structure

joshiTokens = [tokenA, tokenB, tokenC, tokenD, tokenE, tokenF]
joshiTokenSurfaceKeyMap = {
"は:助詞.係助詞": [tokenA, tokenC, tokenE],
"で:助詞.係助詞": [tokenB, tokenD, tokenF]
"は:助詞.係助詞": [tokenA, tokenC, tokenE],
"で:助詞.係助詞": [tokenB, tokenD, tokenF]
}
*/
Object.keys(joshiTokenSurfaceKeyMap).forEach(key => {
Expand Down
35 changes: 32 additions & 3 deletions src/token-utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,48 @@
"use strict";
// 助詞どうか
export const is助詞Token = (token) => {
return token.pos === "助詞";
// 結合しているtokenは助詞助詞のようになってるため先頭一致で見る
return token && /^助詞/.test(token.pos);
};

export const is読点Token = (token) => {
return token.surface_form === "、" && token.pos === "名詞";
};

/**
* aTokenの_extraKeyに結合したkeyを追加する
* @param {Object} aToken
* @param {Object} bToken
* @returns {Object}
*/
const concatToken = (aToken, bToken) => {
aToken.surface_form += bToken.surface_form;
aToken.pos += bToken.pos;
aToken.pos_detail_1 += bToken.surface_form;
return aToken;
};
/**
* 助詞+助詞 というように連続しているtokenを結合し直したtokenの配列を返す
* @param {Array} tokens
* @returns {Array}
*/
export const concatJoishiTokens = (tokens) => {
const newTokens = [];
tokens.forEach((token) => {
const prevToken = newTokens[newTokens.length - 1];
if (is助詞Token(token) && is助詞Token(prevToken)) {
newTokens[newTokens.length - 1] = concatToken(prevToken, token);
} else {
newTokens.push(token);
}
});
return newTokens;
};
// 助詞tokenから品詞細分類1までを元にしたkeyを作る
// http://www.unixuser.org/~euske/doc/postag/index.html#chasen
// http://chasen.naist.jp/snapshot/ipadic/ipadic/doc/ipadic-ja.pdf
export const createKeyFromKey = (token) => {
// e.g.) "は:助詞.係助詞"
return `${token.surface_form}:${token.pos}.${token.pos_detail_1}`
return `${token.surface_form}:${token.pos}.${token.pos_detail_1}`;
};
// keyからsurfaceを取り出す
export const restoreToSurfaceFromKey = (key) => {
Expand Down
16 changes: 16 additions & 0 deletions test/no-doubled-joshi-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ tester.run("no-double-joshi", rule, {
"ナイフで切断した後、ハンマーで破砕した。",
// 接続助詞のてが重複は許容
"まずは試していただいて",
// **に**と**には**は別の助動詞と認識
"そのため、文字列の長さを正確に測るにはある程度の妥協が必要になります。",
"そんな事で言うべきではない。",
"言うのは簡単の法則。",
// 1個目の「と」は格助詞、2個めの「と」は接続助詞
"ターミナルで「test」**と**入力する**と**、画面に表示されます。",
{
Expand Down Expand Up @@ -161,6 +165,18 @@ tester.run("no-double-joshi", rule, {
column: 38
}
]
},
{
// に + は と に + は
// https://github.com/textlint-ja/textlint-rule-no-doubled-joshi/issues/15
text: "文字列にはそこには問題がある。",
errors: [
{
message: `一文に二回以上利用されている助詞 "には" がみつかりました。`,
line: 1,
column: 8
}
]
}
]
});