Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
35 changes: 35 additions & 0 deletions app/src/main/assets/mozc_segmenter/pos_group.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Functional 29 29-576,857-1840,1936-2041,2057-2390,2473-2590 ^(助詞|助動詞|動詞,非自立|名詞,非自立|形容詞,非自立|動詞,接尾|名詞,接尾|形容詞,接尾)
Unknown 1841 1841-1849 名詞,サ変接続
FirstName 1922 1922-1922 名詞,固有名詞,人名,名
LastName 1923 1923-1923 名詞,固有名詞,人名,姓
Number 2044 2044-2044 名詞,数,アラビア数字
KanjiNumber 2046 2046-2055 名詞,数,漢数字
WeakCompoundNounPrefix 2600 2600-2637 接頭詞,名詞接続,
WeakCompoundVerbPrefix 2596 2596-2599 接頭詞,動詞接続,
WeakCompoundFillerPrefix 2 2-11 フィラー,
WeakCompoundNounSuffix 1841 1841-1898,1909-1918 ^名詞,(サ変接続|ナイ形容詞語幹|一般|副詞可能|形容詞語幹)
WeakCompoundVerbSuffix 577 577-856 動詞,自立
AcceptableParticleAtBeginOfSegment 271 271-271,274-274,283-285,326-327,330-330,332-332,349-350,363-363,367-375,378-378,387-387,389-389,401-401,420-420,424-424,427-427,433-433 ^助詞,*,*,*,*,*,(が|で|と|に|にて|の|へ|より|も|と|から|は|や)$
JapanesePunctuations 2645 2645-2654,2657-2658 記号,(句点|読点)
OpenBracket 2656 2656-2656 記号,括弧開
CloseBracket 2655 2655-2655 記号,括弧閉
GeneralSymbol 2644 2644-2644 記号,一般,
Zipcode 2672 2672-2672 特殊,郵便番号
IsolatedWord 2673 2673-2673 特殊,短縮よみ
SuggestOnlyWord 2674 2674-2674 特殊,サジェストのみ
ContentWordWithConjugation 645 645-842,2391-2472 ^(動詞,自立,*,*,五段|動詞,自立,*,*,一段|形容詞,自立)
SuffixWord 29 29-576,627-638,857-1840,2196-2390,2473-2590 ^(助詞|助動詞|動詞,非自立|動詞,接尾|形容詞,非自立|形容詞,接尾|動詞,自立,*,*,サ変・スル)
CounterSuffixWord 2011 2011-2018 名詞,接尾,助数詞
UniqueNoun 1920 1920-1929 ^名詞,固有名詞
GeneralNoun 1851 1851-1898 ^名詞,一般,*,*,*,*,*$
Pronoun 1899 1899-1908 ^名詞,代名詞,
ContentNoun 1841 1841-1849,1851-1898,1909-1918,1920-1929 ^名詞,(一般|固有名詞|副詞可能|サ変接続),
NounPrefix 2600 2600-2637 ^接頭詞,名詞接続,
EOSSymbol 2044 2044-2045,2643-2658 ^(記号,(句点|読点|アルファベット|一般|括弧開|括弧閉))|^(名詞,数,(アラビア数字|区切り文字))
Adverb 12 12-28 ^副詞,
AdverbSegmentSuffix 271 271-271,326-327,330-330,349-350,367-368,370-375,379-379,387-387,389-389,401-401,420-420,424-424,433-433 ^助詞,*,*,*,*,*,(から|で|と|に|にて|の|へ|を)$
ParallelMarker 268 268-276 ^助詞,並立助詞
TeSuffix 34 34-34,140-145,268-268,344-346,348-348,419-419,892-892,933-933,974-974,1015-1015,1056-1056,1097-1097,1138-1138,1179-1179,1219-1219,1260-1260,1272-1272,1284-1284,1296-1296,1308-1308,1320-1320,1332-1332,1344-1344,1356-1356,1361-1361,1367-1367,1372-1372,1377-1377,1382-1382,1387-1387,1392-1392,1485-1485,1487-1487,1505-1505,1507-1507,1526-1526,1528-1528,1546-1547,1563-1563,1565-1565,1583-1583,1585-1585,1603-1603,1605-1605,1623-1623,1625-1625,1643-1643,1645-1645,1663-1663,1665-1665,1683-1683,1685-1685,1763-1764,1773-1774,1784-1785,1796-1797,1807-1808,1818-1819,1830-1831 (助詞,接続助詞,*,*,*,*,(て|ちゃ)|助詞,並立助詞,*,*,*,*,たり|助詞,終助詞,*,*,*,*,てん|助動詞,*,*,*,特殊・タ,|動詞,非自立,*,*,一段,*,てる|助動詞,*,*,*,下二・タ行,連用形,つ|動詞,非自立,*,*,五段・カ行イ音便,*,とく|動詞,非自立,*,*,五段・カ行促音便,*,てく|動詞,非自立,*,*,五段・ラ行,*,(たる|とる)|動詞,非自立,*,*,五段・ワ行促音便,*,(ちゃう|ちまう)|動詞,非自立,*,*,一段,連用形,てる)
VerbSuffix 29 29-267,329-367,857-1840 (^動詞,非自立|^助詞,接続助詞|^助動詞)
KagyoTaConnectionVerb 726 726-726,734-734,1330-1341,1385-1389 ^動詞,(非自立|自立),*,*,五段・カ行(促音便|イ音便),連用タ接続
WagyoRenyoConnectionVerb 829 829-832,1825-1835 ^動詞,(非自立|自立),*,*,五段・ワ行促音便,連用形
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,21 @@ import com.kazumaproject.markdownhelperkeyboard.converter.bitset.SuccinctBitVect
import com.kazumaproject.markdownhelperkeyboard.converter.candidate.BunsetsuCandidateResult
import com.kazumaproject.markdownhelperkeyboard.converter.candidate.Candidate
import com.kazumaproject.markdownhelperkeyboard.converter.graph.GraphBuilder
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.LoudsTokenArrayMozcDictionary
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcBoundaryDetector
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcCandidateFilter
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcCandidateProvider
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcCompatibleConverter
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcConnector
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcConversionOptions
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcConverterTrace
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcNBestGenerator
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcPrefixSuffixPenalty
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcResegmenter
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcSegmenter
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcSegmenterData
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcUnknownNodeGenerator
import com.kazumaproject.markdownhelperkeyboard.converter.mozc.MozcViterbi
import com.kazumaproject.markdownhelperkeyboard.converter.path_algorithm.FindPath
import com.kazumaproject.markdownhelperkeyboard.dictionary_override.DictionaryBinaryReader
import com.kazumaproject.markdownhelperkeyboard.dictionary_override.DictionaryCategory
Expand Down Expand Up @@ -68,6 +83,8 @@ class KanaKanjiEngine {
private lateinit var graphBuilder: GraphBuilder
private lateinit var findPath: FindPath
private var dictionaryBinaryReader: DictionaryBinaryReader? = null
private var mozcCompatibleConverter: MozcCandidateProvider? = null
private var mozcCompatibleConverterForTesting: MozcCandidateProvider? = null

private lateinit var connectionIds: ShortArray
private var connectionMatrixSize: Int = 0
Expand Down Expand Up @@ -201,6 +218,12 @@ class KanaKanjiEngine {

fun setDictionaryBinaryReader(reader: DictionaryBinaryReader) {
dictionaryBinaryReader = reader
mozcCompatibleConverter = null
}

fun setMozcCompatibleConverterForTesting(converter: MozcCandidateProvider?) {
mozcCompatibleConverterForTesting = converter
mozcCompatibleConverter = null
}

private fun dictionaryReader(context: Context): DictionaryBinaryReader {
Expand Down Expand Up @@ -250,6 +273,77 @@ class KanaKanjiEngine {
)
}

private fun tryMozcCompatibleCandidatesOrNull(
enableMozcCompatibleConversion: Boolean,
input: String,
n: Int,
isOmissionSearchEnable: Boolean,
omissionSearchOffsetScore: Int,
): List<Candidate>? {
if (!enableMozcCompatibleConversion) return null
if (input.isAllHalfWidthAscii()) return null
val result = try {
getMozcCompatibleConverter().getCandidates(
input = input,
options = MozcConversionOptions(
nBest = n,
isOmissionSearchEnabled = isOmissionSearchEnable,
omissionSearchOffsetScore = omissionSearchOffsetScore,
),
)
} catch (error: Throwable) {
Timber.w(error, "Mozc compatible conversion failed. Falling back to legacy converter.")
return null
}
check(result.isNotEmpty()) {
"Mozc compatible conversion returned empty candidates for '$input'"
}
return result
}

private fun getMozcCompatibleConverter(): MozcCandidateProvider {
mozcCompatibleConverterForTesting?.let { return it }
mozcCompatibleConverter?.let { return it }

val reader = dictionaryBinaryReader
?: error("DictionaryBinaryReader is required for Mozc compatible conversion")
val trace = MozcConverterTrace()
val segmenterData = MozcSegmenterData.fromInputStreams(
prefixPenalty = reader.openBundledAsset(MozcSegmenterData.PREFIX_PENALTY_ASSET),
suffixPenalty = reader.openBundledAsset(MozcSegmenterData.SUFFIX_PENALTY_ASSET),
boundaryRule = reader.openBundledAsset(MozcSegmenterData.BOUNDARY_RULE_ASSET),
posGroup = reader.openBundledAsset(MozcSegmenterData.POS_GROUP_ASSET),
)
val dictionary = LoudsTokenArrayMozcDictionary(
yomiTrie = systemYomiTrie,
tangoTrie = systemTangoTrie,
tokenArray = systemTokenArray,
succinctBitVectorLBSYomi = systemSuccinctBitVectorLBSYomi,
succinctBitVectorIsLeafYomi = systemSuccinctBitVectorIsLeafYomi,
succinctBitVectorTokenArray = systemSuccinctBitVectorTokenArray,
succinctBitVectorTangoLBS = systemSuccinctBitVectorTangoLBS,
trace = trace,
)
val connector = MozcConnector(
connectionIds = connectionIds,
matrixSize = connectionMatrixSize,
)
val segmenter = MozcSegmenter(segmenterData, trace)
val boundaryDetector = MozcBoundaryDetector(segmenter)
return MozcCompatibleConverter(
dictionary = dictionary,
unknownNodeGenerator = MozcUnknownNodeGenerator(segmenterData.posMatcher, trace),
prefixSuffixPenalty = MozcPrefixSuffixPenalty(segmenter, trace),
resegmenter = MozcResegmenter(segmenter, connector, boundaryDetector, trace),
viterbi = MozcViterbi(connector, trace),
nBestGenerator = MozcNBestGenerator(connector, boundaryDetector, trace),
candidateFilter = MozcCandidateFilter(trace),
trace = trace,
).also {
mozcCompatibleConverter = it
}
}

fun applyDictionaryOverrideState(context: Context) {
val reader = dictionaryReader(context)
val appContext = context.applicationContext
Expand Down Expand Up @@ -854,8 +948,16 @@ class KanaKanjiEngine {
enableTypoCorrectionJapaneseFlick: Boolean = false,
enableTypoCorrectionQwertyEnglish: Boolean = false,
typoCorrectionOffsetScore: Int,
omissionSearchOffsetScore: Int
omissionSearchOffsetScore: Int,
enableMozcCompatibleConversion: Boolean = false,
): List<Candidate> {
tryMozcCompatibleCandidatesOrNull(
enableMozcCompatibleConversion = enableMozcCompatibleConversion,
input = input,
n = n,
isOmissionSearchEnable = isOmissionSearchEnable,
omissionSearchOffsetScore = omissionSearchOffsetScore,
)?.let { return it }

val graph = graphBuilder.constructGraph(
input,
Expand Down Expand Up @@ -1333,8 +1435,16 @@ class KanaKanjiEngine {
enableTypoCorrectionJapaneseFlick: Boolean = false,
enableTypoCorrectionQwertyEnglish: Boolean = false,
typoCorrectionOffsetScore: Int,
omissionSearchOffsetScore: Int
omissionSearchOffsetScore: Int,
enableMozcCompatibleConversion: Boolean = false,
): BunsetsuCandidateResult {
tryMozcCompatibleCandidatesOrNull(
enableMozcCompatibleConversion = enableMozcCompatibleConversion,
input = input,
n = n,
isOmissionSearchEnable = isOmissionSearchEnable,
omissionSearchOffsetScore = omissionSearchOffsetScore,
)?.let { return BunsetsuCandidateResult(candidates = it, splitPatterns = emptyList()) }

val graph = graphBuilder.constructGraph(
input,
Expand Down Expand Up @@ -1838,8 +1948,16 @@ class KanaKanjiEngine {
enableTypoCorrectionJapaneseFlick: Boolean = false,
enableTypoCorrectionQwertyEnglish: Boolean = false,
typoCorrectionOffsetScore: Int,
omissionSearchOffsetScore: Int
omissionSearchOffsetScore: Int,
enableMozcCompatibleConversion: Boolean = false,
): BunsetsuCandidateResult {
tryMozcCompatibleCandidatesOrNull(
enableMozcCompatibleConversion = enableMozcCompatibleConversion,
input = input,
n = n,
isOmissionSearchEnable = isOmissionSearchEnable,
omissionSearchOffsetScore = omissionSearchOffsetScore,
)?.let { return BunsetsuCandidateResult(candidates = it, splitPatterns = emptyList()) }

val graph = graphBuilder.constructGraph(
input,
Expand Down Expand Up @@ -2288,8 +2406,16 @@ class KanaKanjiEngine {
enableTypoCorrectionJapaneseFlick: Boolean = false,
enableTypoCorrectionQwertyEnglish: Boolean = false,
typoCorrectionOffsetScore: Int,
omissionSearchOffsetScore: Int
omissionSearchOffsetScore: Int,
enableMozcCompatibleConversion: Boolean = false,
): List<Candidate> {
tryMozcCompatibleCandidatesOrNull(
enableMozcCompatibleConversion = enableMozcCompatibleConversion,
input = input,
n = n,
isOmissionSearchEnable = isOmissionSearchEnable,
omissionSearchOffsetScore = omissionSearchOffsetScore,
)?.let { return it }

val graph = graphBuilder.constructGraph(
input,
Expand Down Expand Up @@ -2723,8 +2849,16 @@ class KanaKanjiEngine {
userDictionaryRepository: UserDictionaryRepository,
learnRepository: LearnRepository?,
typoCorrectionOffsetScore: Int,
omissionSearchOffsetScore: Int
omissionSearchOffsetScore: Int,
enableMozcCompatibleConversion: Boolean = false,
): List<Candidate> {
tryMozcCompatibleCandidatesOrNull(
enableMozcCompatibleConversion = enableMozcCompatibleConversion,
input = input,
n = n,
isOmissionSearchEnable = false,
omissionSearchOffsetScore = omissionSearchOffsetScore,
)?.let { return it }

val graph = graphBuilder.constructGraph(
input,
Expand Down Expand Up @@ -3186,8 +3320,16 @@ class KanaKanjiEngine {
userDictionaryRepository: UserDictionaryRepository,
learnRepository: LearnRepository?,
typoCorrectionOffsetScore: Int,
omissionSearchOffsetScore: Int
omissionSearchOffsetScore: Int,
enableMozcCompatibleConversion: Boolean = false,
): BunsetsuCandidateResult {
tryMozcCompatibleCandidatesOrNull(
enableMozcCompatibleConversion = enableMozcCompatibleConversion,
input = input,
n = n,
isOmissionSearchEnable = false,
omissionSearchOffsetScore = omissionSearchOffsetScore,
)?.let { return BunsetsuCandidateResult(candidates = it, splitPatterns = emptyList()) }

val graph = graphBuilder.constructGraph(
input,
Expand Down
Loading
Loading