正则表达式游戏

看到微博有人转了一个正则表达式测试题，于是果断拿来玩玩，记录一下解题思路。

地址：http://regex.alf.nu/

第一题：

Match all of these…	and none of these…
afoot	Atlas
catfoot	Aymoro
dogfoot	Iberic
fanfoot	Mahran
foody	Ormazd
foolery	Silipan
foolish	altared
fooster	chandoo
footage	crenel
foothot	crooked
footle	fardo
footpad	folksy
footway	forest
hotfoot	hebamic
jawfoot	idgah
mafoo	manlike
nonfood	marly
padfoot	palazzi
prefool	sixfold
sfoot	tarrock
unfool	unfold

根据习惯，脑补了第一个答案是：

^\w*foo\w*$

原因很简单，想写严谨，第一列元素都是小写字母开头，以及小写字母结束，中间包含foo的字符串。不过这个复杂度略高，得分只有199，我们来简化的玩。

\w*foo\w*

去掉匹配开头和结尾，得分高了2分，变为201，看来得分和复杂度果然是正相关啊。继续简化的玩。

foo

这个得分207，应该不会有更高了吧…当然你如果为了优雅，可以写成下面的，损失一点分数。

fo{2,}

这个204分，不过看起来有条理些。

第二题：

思路和之前差不多的就不写了，浪费篇幅。看到又是同样两个list：

Match all of these…	and none of these…
Mick	Kickapoo
Rick	Nickneven
allocochick	Rickettsiales
backtrick	billsticker
bestick	borickite
candlestick	chickell
counterprick	fickleness
heartsick	finickily
lampwick	kilbrickenite
lick	lickpenny
lungsick	mispickel
potstick	quickfoot
quick	quickhatch
rampick	ricksha
rebrick	rollicking
relick	slapsticky
seasick	snickdrawing
slick	sunstricken
tick	tricklingly
unsick	unlicked
upstick	unnickeled

ick$

这个是不是一眼看出，但是等等，直接写下面这个是不是更简单。

k$

208分，继续下一题。

第三题：

Match all of these…	and none of these…
abac	beam
accede	buoy
adead	canjac
babe	chymia
bead	corah
bebed	cupula
bedad	griece
bedded	hafter
bedead	idic
bedeaf	lucy
caba	martyr
caffa	matron
dace	messrs
dade	mucose
daff	relose
dead	sonly
deed	tegua
deface	threap
faded	towned
faff	widish
feed	yite

^[a-f][^g-z]*[a-f]$

不是太好看，分数也不高，只有191，想想办法，把分数搞高一点吧。

^[a-f][^g-z]*$

何必全部匹配呢，只要中间有不对的就排斥掉嘛…这下有196了，将就一下，下一题吧。

第四题：

Match all of these…	and none of these…
allochirally	anticker
anticovenanting	corundum
barbary	crabcatcher
calelectrical	damnably
entablement	foxtailed
ethanethiol	galvanotactic
froufrou	gummage
furfuryl	gurniad
galagala	hypergoddess
heavyheaded	kashga
linguatuline	nonimitative
mathematic	parsonage
monoammonium	pouchlike
perpera	presumptuously
photophonic	pylar
purpuraceous	rachioparalysis
salpingonasal	scherzando
testes	swayed
trisectrix	unbridledness
undergrounder	unupbraidingly
untaunted	wellside

(\w{2,3})(?:\w+)\1

偷懒的写法貌似只有182分，继续偷偷懒，把分数提高。

(.{2})(?:.+)\1

把范围精简一下，186分，继续下一题吧。

第五题：

Match all of these…	and none of these…
acritan	abba
aesthophysiology	anallagmatic
amphimictical	bassarisk
baruria	chorioallantois
calomorphic	coccomyces
disarmature	commotive
effusive	engrammatic
fluted	glossoscopia
fusoid	hexacoralla
goblinize	hippogriffin
nihilistic	inflammableness
noisefully	otto
picrorhiza	overattached
postarytenoid	saffarid
revolutionize	sarraceniaceae
suprasphanoidal	scillipicrin
suspenseful	tlapallan
tapachula	trillion
transmit	unclassably
unversatile	unfitting
vibetoite	unsmelled
	warrandice

^(?!.*(.)(.)\2\1)

这个和上面的一样，只不过要排除，193分。

第六题：

Match all of these…	and none of these…
civic	arrogatingly
deedeed	camshach
degged	cinnabar
hallah	defendress
kakkak	derivedly
kook	gourmet
level	hamleteer
murdrum	hydroaviation
noon	lophine
redder	nonalcohol
repaper	outslink
retter	pretest
reviver	psalterium
rotator	psorosperm
sexes	scrummage
sooloos	sporous
tebbet	springer
tenet	sunburn
terret	teleoptile
	unstuttering
	womanways

^((.+)(.*){1}(.+)\3\2)$

中规中矩，157分，那么开始精简吧。

^(.).*\1$

这样就提高到171分了。

第七题：

Match all of these…	and none of these…
xx	xxxx
xxx	xxxxxx
xxxxx	xxxxxxxx
xxxxxxx	xxxxxxxxx
xxxxxxxxxxx	xxxxxxxxxx
xxxxxxxxxxxxx	xxxxxxxxxxxx
xxxxxxxxxxxxxxxxx	xxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx	xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxx	xxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx	xxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [31 chars]	xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [37 chars]	xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [41 chars]	xxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [43 chars]	xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [47 chars]	xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [53 chars]	xxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [59 chars]	xxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [61 chars]	xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [67 chars]	xxxxxxxxxxxxxx… [30 chars]
xxxxxxxxxxxxxx… [71 chars]	xxxxxxxxxxxxxx… [32 chars]

这道题真卡住了，因为想用重复来实现….其实不必，网站的思路是这个284分：

^(?!(..+)(\1)+$)

还是分组排除，不难发现，绝对不可以被2整除，那么起点就是两个字符（以上），然后double一下，再排除掉。

第八题：

Match all of these…	and none of these…
Makaraka	Ludgate
Wasagara	Mitsukurinidae
degenerescence	Ternstroemiaceae
desilicification	arrhythmical
elevener	bleater
hipponosological	energetics
homoeomorphy	inthrow
homologous	mecopterous
ileocolotomy	multum
intervisibility	naphthalene
jararaca	nullibicity
locomotory	observancy
micropoikilitic	overpunishment
odontonosology	overregularly
parhomologous	overwilily
pogonotomy	participator
promonopolist	predisable
protohomo	reyield
pseudoprimitivism	rubeola
tocororo	traitorlike
unintelligibility	unregainable

间隔字符四次重复，那么就跟着题意写一下吧，198分：

(.).\1.\1.\1

貌似合并一下，可以提高一分，到199了:

(.)(.\1){3}

第九题：

Match all of these…	and none of these…
access	analyse
accloy	balanism
adeem	baronet
aflow	biddable
aglow	griefless
beefin	harebrain
befist	jestword
billot	laicize
bossy	marvelry
certy	oriole
chintz	pickietar
chips	preferee
chort	primness
cloop	pulghere
coost	rebirth
demos	scupper
fitty	serigraph
flory	sororize
flossy	theowman
ghost	unfrayed
mopsy	wagonman

这道题的副标题提示为cheat，然后题目为order，先写一个搞笑的吧，188分。

^[a-m][c-o](?!st|a|dd)

但是很容易发现左边是不是短了点，基本都是五位的吧，把六位的那几个没选中的扔编辑器中，列编辑去掉前五位，结果是s, y, n, t, t, z, y, 去重就是synzt，分数提到到196分。

^.{5}[synzt]*$

第十题：

Match all of these…	and none of these…
000000000	000000005
000000003	000000008
000000006	000000010
000000009	000000011
000000012	000000014
000000015	018990130
066990060	112057285
140091876	159747125
173655750	176950268
312440187	259108903
321769005	333162608
368542278	388401457
390259104	477848777
402223947	478621693
443512431	531683939
714541758	704168662
747289572	759282218
819148602	769340942
878531775	851936815
905586303	973816159
953734824	979204403

作者在这题写了一句，少年们你们做完这道题去尝试解一下7倍的吧！卧槽，这个作者的喜好好变态。

我觉得这道题应该用重复做，但是感觉实现起来好长，那么用偷懒的方法吧，写段脚本辅助分析一下：

//把左边的数据存一下
var data = [
    '000000000',
    '000000003',
    '000000006',
    '000000009',
    '000000012',
    '000000015',
    '066990060',
    '140091876',
    '173655750',
    '312440187',
    '321769005',
    '368542278',
    '390259104',
    '402223947',
    '443512431',
    '714541758',
    '747289572',
    '819148602',
    '878531775',
    '905586303',
    '953734824'
];
//把字符串切片，算出现次数
var ret = {};
for (var i = 0, j = data.length; i < j; i++) {
    var item = data[i];
    for (var m = 0, n = data[i].length; m < n; m++) {
        for (var p = m + 1, q = data[i].length; p < q; p++) {
            var key = item.slice(m, p);
            if (ret[key]) {
                ret[key] += 1;
            } else {
                ret[key] = 1;
            }
        }
    }
}
//把出现一次的屌丝们干掉
data.length=0;
for (var i in ret) {
    if (ret[i] === 1) {
        delete ret[i]
    }
}

console.log(JSON.stringify(ret))

把结果超过10次的还有共性太大的01系干掉后，结果如下：

{
    "12": 2,
    "14": 3,
    "17": 4,
    "18": 2,
    "22": 3,
    "24": 2,
    "31": 2,
    "36": 2,
    "39": 2,
    "40": 3,
    "43": 2,
    "44": 2,
    "48": 2,
    "53": 2,
    "54": 2,
    "55": 2,
    "57": 2,
    "69": 2,
    "73": 2,
    "75": 2,
    "85": 2,
    "86": 2,
    "87": 2,
    "90": 4,
    "91": 3,
    "95": 2,
    "124": 2,
    "900": 2,
    "06": 2,
    "02": 2
}

经过尝试，貌似不妥，换一个思路，统计间隔字符出现频率。大概实现如下：

var data = [
    '000000000',
    '000000003',
    '000000006',
    '000000009',
    '000000012',
    '000000015',
    '066990060',
    '140091876',
    '173655750',
    '312440187',
    '321769005',
    '368542278',
    '390259104',
    '402223947',
    '443512431',
    '714541758',
    '747289572',
    '819148602',
    '878531775',
    '905586303',
    '953734824'
];
//把字符串切片，算出现次数
var ret = {};
for (var i = 0, j = data.length; i < j; i++) {
    var item = data[i];
    ret[item] = [];
    for (var m = 0, n = 9; m < n; m++) {
        for (var p = 0, q = 9; p < q; p++) {
            var left = item.indexOf(m);
            var right = item.lastIndexOf(p);

            if(left==-1||right==-1||left==right){
                break;
            }
            console.log(m,p,item)
            var tmp = {}
            tmp[m] = p;
            ret[item].push(tmp)
        }
    }
}

console.log(JSON.stringify(ret))

上面两种比较笨的思路行不通,那么我们用观察的好了，开始的一部分以及第七个,第八个其实都有包含00,所以或许我们把前八个抽象更好.

00($|3|6|9|12|15)

接着我们砍掉前八个,就只匹配后面的几个试一试.

00($|3|6|9|12|15)|36.5|3.?2|[79]14|8.3|02.*4+|3.*4[38]|9572

这条写的稀烂，但是分数是571…应该有更多更好的组合方式吧。

十一题：

Match all of these…	and none of these…
err matches superreform	anapaestical matches anapaestically
falleess matches unfallenness	*chegonio matches archegoniophore
illog* matches unphilological	dissoluti matches dissolutional
plentud* matches overplenitude	*domestica matches domesticality
taiodi matches pentaiodide	*expedition matches expedition
*viceberry matches serviceberry	*hormog matches hormogonium
bowdl* matches bowdlerism	stipular matches infrastipular
bronhopleursy matches bronchopleurisy	*strabis matches strabismal
chromatophobia matches chromatophobia	cathartica matches cathartically
cockneyla* matches cockneyland	di matches gerundively
colorlessly matches colorlessly	hacean matches zoanthacean
cretefaction matches cretefaction	headmist matches headmistress
downrightly matches downrightly	herwi matches trencherwise
leather* matches leatherbark	iemphraxia matches cardiemphraxia
mitogenet* matches mitogenetic	kmak matches packmaking
palindrom* matches palindromic	mbable* matches unclimbable
parallelepiped matches parallelepiped	nspi*tor matches inspirator
primigenial matches primigenial	ocumidi matches pseudocumidine
puppe* matches puppetlike	raretinal* matches intraretinal
resurrender matches resurrender	tte matches whitterick
wreathwi* matches wreathwise	uefoliate matches quinquefoliate

ctrl+f一下*，发现两边都有没有*的字符串，并且数量都不一定成双，所以这道题虽然名曰glob，但是却不是考察*，如果非要说考察，看的是转义了。我们来实现一下他的规则吧，首先是完整匹配：

^(.*)(?!\*)\sm.*\s\1$

接着是单词尾部带*：

^(.*)\*\sm.*\s\1.*$

然后是两边都带*的:

^\*(.+)\*\sm.*\s.+\1.+$

中间包含多个*的：

^\*(.+)\*(.+)\*\sm.*\s.+\1.+\2.+$

首尾才有*的：

^\*(.*)\*\sm.*\s.+\1.+$

上面几条规则简化以后大概是这样的：

`^(.+)(?!*).+\s\1$ ^(.+)*\s.+\s\1.+$ ^*(.+)*\s.+\s.+\1.+$ ^*(.+)*(.+)*\s.*\s.+\1.+\2.+$ ^*(.+)*\s.*\s.+\1.+$\`

去重和化简合并后,检查有没有漏的后，就是这个样子了,322分：

^((.+)(?!\*).+\s\2$|(.+)\*\s.+\s\3.+$|\*(.+)\*.+\s.+\4.+$|\*(.+)(?!\*)\s.+\s.+\5$|(?!\*)(.*\*){2})

十二题：

Match all of these…	and none of these…
<<<<<>><<>>><<… [62 chars]	<
<<<<<>><>><<><… [110 chars]	<<<<<<>>><<><>>>>>><<>
<<<<<>><>><>><… [102 chars]	<<<<<>>><>>><<<>>>><>>
<<<<<>><>>>><<… [88 chars]	<<<<<>>>>>>
<<<<<>>><<<>><… [58 chars]	<<<<>><<<<<><<>><><<<<
<<<<<>>><<><>>… [152 chars]	<<<>><<<<><><><><
<<<<<>>><<>><<… [42 chars]	<<<>>>><><<<><>
<<<<<>>><>><<<>>>><<>>	<<><<<<><<><<>>><<
<<<<<>>>><<<<>… [102 chars]	<<><<<>>>>><<
<<<<<>>>><<<><… [30 chars]	<<>>><<<>>
<<<<<>>>><><<<… [66 chars]	<><<<>><<>>><<>
<<<<<>>>><><<<… [124 chars]	<><<>>><<<><>><<<>>><<>>>><
<<<<<>>>><>><<>>	<><<>>><><<<>
<<<<><<>>><<<>… [34 chars]	<><>><>>><><<<… [36 chars]
<<<<>><<<>>>><… [92 chars]	<>><><<<><>
<<<<>>><<<<>><>><<<>>>>>	<>>>>>><<<>><<>><><
<<<<>>><<<><<>>><><<>>>><<>>	<>>>>>>><<<
<<<<>>><<><<<>… [84 chars]	>
<<<<>>>><<<><<… [52 chars]	><
<<<><<<>>>><<<… [50 chars]	><<<>><><<<><<
<<<><<><>>>>	><<<>>>><><<<<><>>><<><><<
<<<><>><<<>>>>	><<><<<<><<<<>>>><
<<<>><<<><<>>>… [44 chars]	><><><<<>>>>>
<<<>><><<<><>>… [48 chars]	><><>>><>><>
<<<>>><<><<<<>>>><<><<<>>>>>	><><>>>><>>>>>>><>>><>>
<<><<<<>><>>>>… [60 chars]	><>><<<<<>>
<<>>	><>><><><<>><<>>><<
<<>><<<<<>>>>>… [54 chars]	><>>><>>>>><<><<<><>><>><<<
<<>><<<<>><<<>… [74 chars]	>><<<><<<<<<><>><<
<>	>><>>><<<><>>><><<>><<><><<
<><>	>>>><>><>>>><>>><>><><
	>>>>><<<>>>

老外君给了一句提示:虽然看起来不可能，但是数量有限。那么就一定是写死一定数量的某东西了… 通过观察发现，复杂的包含简单的，所以我们匹配到简单的就成：这里继续是参考老外童鞋的思路，外层写死7层，为什么是7，因为基数是2层，就是最里面的是2，然后右边用*忽略多出来的层数，这个分数是286。

^(<(<(<(<(<(<<>>)*>)*>)*>)*>)*>)*$

十三题：

Match all of these…	and none of these…
x	xxx
xx	xxxxx
xxxx	xxxxxxx
xxxxxxxx	xxxxxxxxxxx
xxxxxxxxxxxxxxxx	xxxxxxxxxxxxx
xxxxxxxxxxxxxx… [32 chars]	xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [64 chars]	xxxxxxxxxxxxxx… [48 chars]
xxxxxxxxxxxxxx… [128 chars]	xxxxxxxxxxxxxx… [160 chars]
xxxxxxxxxxxxxx… [256 chars]	xxxxxxxxxxxxxx… [401 chars]
xxxxxxxxxxxxxx… [512 chars]	xxxxxxxxxxxxxx… [600 chars]
xxxxxxxxxxxxxx… [1024 chars]	xxxxxxxxxxxxxx… [1025 chars]

先写一个最笨的方法：

^(x|xx|x{4}|x{8}|x{16}|x{32}|x{64}|x{128}|x{256}|x{512}|x{1024})$

分数太低了，我们改进一下（初中的分段函数），61分：

^((x{32}){6,32}|(x{32}){1,4}|(x{4}){1,6}|x{1,2})$

十四题：

Match “all” of these…	and none of these…
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111	0000 0001 0000 0011 0100 0101 0110 0110 1000 1001 1010 1011 1100 1101 1110 1111
	0000 0001 0010 0010 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
	0000 0001 0010 0011 0000 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
	0000 0001 0010 0011 0100 0101 0110 0011 1000 1001 1010 1011 1100 1101 1110 1111
	0000 0001 0010 0011 0100 0101 0110 0111 1000 0001 1010 1011 1100 1101 1110 1111
	0000 0001 0010 0011 0100 0101 0110 0111 1000 1000 1010 1011 1100 1101 1110 1111
	0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1001 1100 1101 1110 1111
	0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 101011011 1100 1101 1110 1111
	0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1110 1011 1100 1101 1110 1111
	0000 0001 0010 0011 0100 0101 0110 0111 1010 1001 1010 1011 1100 1101 1110 1111
	0000 0001 0010 0011 0100 0101 0110 011111000 1001 1010 1011 1100 1101 1110 1111
	0000 0001 0010 0011 0100 0101 0111 0111 1000 1001 1010 1011 1100 1101 1110 1111
	0000 0001 0010 0011 0100 0101 1110 0111 1000 1001 1010 1011 1100 1101 1110 1111
	0000 0001 0010 0011 0100 0111 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
	0000 0001 0010 0011 010010101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
	0000 0001 0010 1011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
	0000 000110010 0011 0100 0101 0110 0111 1000 1000 1010 1011 1100 1101 1110 1111
	0000 0101 0010 0011 0100 0101 0110 0111 1000 1001 1010 1010 1100 1101 1110 1111
	0001 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1100 1110 1111
	1000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1110

这道题老外疏忽了，我们可以这样（191分）：

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

是不是很无赖，仔细看以下，发现是二进制进位，先写了一个排重的语句，发现这货其实并不是所有的不能通过的例子都是重复的，还是要考虑顺序，于是：

砥砺