看到微博有人转了一个正则表达式测试题,于是果断拿来玩玩,记录一下解题思路。

  • 地址:http://regex.alf.nu/

第一题:

Match all of these… and none of these…
afoot Atlas
catfoot Aymoro
dogfoot Iberic
fanfoot Mahran
foody Ormazd
foolery Silipan
foolish altared
fooster chandoo
footage crenel
foothot crooked
footle fardo
footpad folksy
footway forest
hotfoot hebamic
jawfoot idgah
mafoo manlike
nonfood marly
padfoot palazzi
prefool sixfold
sfoot tarrock
unfool unfold

根据习惯,脑补了第一个答案是:

^\w*foo\w*$

原因很简单,想写严谨,第一列元素都是小写字母开头,以及小写字母结束,中间包含foo的字符串。不过这个复杂度略高,得分只有199,我们来简化的玩。

\w*foo\w*

去掉匹配开头和结尾,得分高了2分,变为201,看来得分和复杂度果然是正相关啊。继续简化的玩。

foo

这个得分207,应该不会有更高了吧…当然你如果为了优雅,可以写成下面的,损失一点分数。

fo{2,}

这个204分,不过看起来有条理些。  

第二题:

思路和之前差不多的就不写了,浪费篇幅。 看到又是同样两个list:

Match all of these… and none of these…
Mick Kickapoo
Rick Nickneven
allocochick Rickettsiales
backtrick billsticker
bestick borickite
candlestick chickell
counterprick fickleness
heartsick finickily
lampwick kilbrickenite
lick lickpenny
lungsick mispickel
potstick quickfoot
quick quickhatch
rampick ricksha
rebrick rollicking
relick slapsticky
seasick snickdrawing
slick sunstricken
tick tricklingly
unsick unlicked
upstick unnickeled

ick$

这个是不是一眼看出,但是等等,直接写下面这个是不是更简单。

k$

208分,继续下一题。

第三题:

Match all of these… and none of these…
abac beam
accede buoy
adead canjac
babe chymia
bead corah
bebed cupula
bedad griece
bedded hafter
bedead idic
bedeaf lucy
caba martyr
caffa matron
dace messrs
dade mucose
daff relose
dead sonly
deed tegua
deface threap
faded towned
faff widish
feed yite

^[a-f][^g-z]*[a-f]$

不是太好看,分数也不高,只有191,想想办法,把分数搞高一点吧。

^[a-f][^g-z]*$

何必全部匹配呢,只要中间有不对的就排斥掉嘛…这下有196了,将就一下,下一题吧。

第四题:

Match all of these… and none of these…
allochirally anticker
anticovenanting corundum
barbary crabcatcher
calelectrical damnably
entablement foxtailed
ethanethiol galvanotactic
froufrou gummage
furfuryl gurniad
galagala hypergoddess
heavyheaded kashga
linguatuline nonimitative
mathematic parsonage
monoammonium pouchlike
perpera presumptuously
photophonic pylar
purpuraceous rachioparalysis
salpingonasal scherzando
testes swayed
trisectrix unbridledness
undergrounder unupbraidingly
untaunted wellside

(\w{2,3})(?:\w+)\1

偷懒的写法貌似只有182分,继续偷偷懒,把分数提高。

(.{2})(?:.+)\1

把范围精简一下,186分,继续下一题吧。

第五题:

Match all of these… and none of these…
acritan abba
aesthophysiology anallagmatic
amphimictical bassarisk
baruria chorioallantois
calomorphic coccomyces
disarmature commotive
effusive engrammatic
fluted glossoscopia
fusoid hexacoralla
goblinize hippogriffin
nihilistic inflammableness
noisefully otto
picrorhiza overattached
postarytenoid saffarid
revolutionize sarraceniaceae
suprasphanoidal scillipicrin
suspenseful tlapallan
tapachula trillion
transmit unclassably
unversatile unfitting
vibetoite unsmelled
warrandice

^(?!.*(.)(.)\2\1)

这个和上面的一样,只不过要排除,193分。

第六题:

Match all of these… and none of these…
civic arrogatingly
deedeed camshach
degged cinnabar
hallah defendress
kakkak derivedly
kook gourmet
level hamleteer
murdrum hydroaviation
noon lophine
redder nonalcohol
repaper outslink
retter pretest
reviver psalterium
rotator psorosperm
sexes scrummage
sooloos sporous
tebbet springer
tenet sunburn
terret teleoptile
unstuttering
womanways

^((.+)(.*){1}(.+)\3\2)$

中规中矩,157分,那么开始精简吧。

^(.).*\1$

这样就提高到171分了。

第七题:

Match all of these… and none of these…
xx xxxx
xxx xxxxxx
xxxxx xxxxxxxx
xxxxxxx xxxxxxxxx
xxxxxxxxxxx xxxxxxxxxx
xxxxxxxxxxxxx xxxxxxxxxxxx
xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [31 chars] xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [37 chars] xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [41 chars] xxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [43 chars] xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [47 chars] xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [53 chars] xxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [59 chars] xxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [61 chars] xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [67 chars] xxxxxxxxxxxxxx… [30 chars]
xxxxxxxxxxxxxx… [71 chars] xxxxxxxxxxxxxx… [32 chars]

这道题真卡住了,因为想用重复来实现….其实不必,网站的思路是这个284分:

^(?!(..+)(\1)+$)

还是分组排除,不难发现,绝对不可以被2整除,那么起点就是两个字符(以上),然后double一下,再排除掉。

第八题:

Match all of these… and none of these…
Makaraka Ludgate
Wasagara Mitsukurinidae
degenerescence Ternstroemiaceae
desilicification arrhythmical
elevener bleater
hipponosological energetics
homoeomorphy inthrow
homologous mecopterous
ileocolotomy multum
intervisibility naphthalene
jararaca nullibicity
locomotory observancy
micropoikilitic overpunishment
odontonosology overregularly
parhomologous overwilily
pogonotomy participator
promonopolist predisable
protohomo reyield
pseudoprimitivism rubeola
tocororo traitorlike
unintelligibility unregainable

间隔字符四次重复,那么就跟着题意写一下吧,198分:

(.).\1.\1.\1

貌似合并一下,可以提高一分,到199了:

(.)(.\1){3}

第九题:

Match all of these… and none of these…
access analyse
accloy balanism
adeem baronet
aflow biddable
aglow griefless
beefin harebrain
befist jestword
billot laicize
bossy marvelry
certy oriole
chintz pickietar
chips preferee
chort primness
cloop pulghere
coost rebirth
demos scupper
fitty serigraph
flory sororize
flossy theowman
ghost unfrayed
mopsy wagonman

这道题的副标题提示为cheat,然后题目为order,先写一个搞笑的吧,188分。

^[a-m][c-o](?!st|a|dd)

但是很容易发现左边是不是短了点,基本都是五位的吧,把六位的那几个没选中的扔编辑器中,列编辑去掉前五位,结果是s, y, n, t, t, z, y, 去重就是synzt,分数提到到196分。

^.{5}[synzt]*$

 第十题:

Match all of these… and none of these…
000000000 000000005
000000003 000000008
000000006 000000010
000000009 000000011
000000012 000000014
000000015 018990130
066990060 112057285
140091876 159747125
173655750 176950268
312440187 259108903
321769005 333162608
368542278 388401457
390259104 477848777
402223947 478621693
443512431 531683939
714541758 704168662
747289572 759282218
819148602 769340942
878531775 851936815
905586303 973816159
953734824 979204403

作者在这题写了一句,少年们你们做完这道题去尝试解一下7倍的吧!卧槽,这个作者的喜好好变态。

我觉得这道题应该用重复做,但是感觉实现起来好长,那么用偷懒的方法吧,写段脚本辅助分析一下:

//把左边的数据存一下
var data = [
    '000000000',
    '000000003',
    '000000006',
    '000000009',
    '000000012',
    '000000015',
    '066990060',
    '140091876',
    '173655750',
    '312440187',
    '321769005',
    '368542278',
    '390259104',
    '402223947',
    '443512431',
    '714541758',
    '747289572',
    '819148602',
    '878531775',
    '905586303',
    '953734824'
];
//把字符串切片,算出现次数
var ret = {};
for (var i = 0, j = data.length; i < j; i++) {
    var item = data[i];
    for (var m = 0, n = data[i].length; m < n; m++) {
        for (var p = m + 1, q = data[i].length; p < q; p++) {
            var key = item.slice(m, p);
            if (ret[key]) {
                ret[key] += 1;
            } else {
                ret[key] = 1;
            }
        }
    }
}
//把出现一次的屌丝们干掉
data.length=0;
for (var i in ret) {
    if (ret[i] === 1) {
        delete ret[i]
    }
}

console.log(JSON.stringify(ret))

把结果超过10次的还有共性太大的01系干掉后,结果如下:

{
    "12": 2,
    "14": 3,
    "17": 4,
    "18": 2,
    "22": 3,
    "24": 2,
    "31": 2,
    "36": 2,
    "39": 2,
    "40": 3,
    "43": 2,
    "44": 2,
    "48": 2,
    "53": 2,
    "54": 2,
    "55": 2,
    "57": 2,
    "69": 2,
    "73": 2,
    "75": 2,
    "85": 2,
    "86": 2,
    "87": 2,
    "90": 4,
    "91": 3,
    "95": 2,
    "124": 2,
    "900": 2,
    "06": 2,
    "02": 2
}

经过尝试,貌似不妥,换一个思路,统计间隔字符出现频率。 大概实现如下:

var data = [
    '000000000',
    '000000003',
    '000000006',
    '000000009',
    '000000012',
    '000000015',
    '066990060',
    '140091876',
    '173655750',
    '312440187',
    '321769005',
    '368542278',
    '390259104',
    '402223947',
    '443512431',
    '714541758',
    '747289572',
    '819148602',
    '878531775',
    '905586303',
    '953734824'
];
//把字符串切片,算出现次数
var ret = {};
for (var i = 0, j = data.length; i < j; i++) {
    var item = data[i];
    ret[item] = [];
    for (var m = 0, n = 9; m < n; m++) {
        for (var p = 0, q = 9; p < q; p++) {
            var left = item.indexOf(m);
            var right = item.lastIndexOf(p);

            if(left==-1||right==-1||left==right){
                break;
            }
            console.log(m,p,item)
            var tmp = {}
            tmp[m] = p;
            ret[item].push(tmp)
        }
    }
}

console.log(JSON.stringify(ret))

上面两种比较笨的思路行不通,那么我们用观察的好了,开始的一部分以及第七个,第八个其实都有包含00,所以或许我们把前八个抽象更好.

00($|3|6|9|12|15)

接着我们砍掉前八个,就只匹配后面的几个试一试.

00($|3|6|9|12|15)|36.5|3.?2|[79]14|8.3|02.*4+|3.*4[38]|9572

这条写的稀烂,但是分数是571…应该有更多更好的组合方式吧。

十一题:

Match all of these… and none of these…
err matches superreform anapaestical matches anapaestically
falleess matches unfallenness *chegonio matches archegoniophore
illog* matches unphilological dissoluti matches dissolutional
plentud* matches overplenitude *domestica matches domesticality
taiodi matches pentaiodide *expedition matches expedition
*viceberry matches serviceberry  *hormog matches hormogonium
bowdl* matches bowdlerism stipular matches infrastipular
bronhopleursy matches bronchopleurisy *strabis matches strabismal
chromatophobia matches chromatophobia cathartica matches cathartically
cockneyla* matches cockneyland di matches gerundively
colorlessly matches colorlessly hacean matches zoanthacean
cretefaction matches cretefaction headmist matches headmistress
downrightly matches downrightly herwi matches trencherwise
leather* matches leatherbark iemphraxia matches cardiemphraxia
mitogenet* matches mitogenetic kmak matches packmaking
palindrom* matches palindromic mbable* matches unclimbable
parallelepiped matches parallelepiped nspi*tor matches inspirator
primigenial matches primigenial ocumidi matches pseudocumidine
puppe* matches puppetlike raretinal* matches intraretinal
resurrender matches resurrender tte matches whitterick
wreathwi* matches wreathwise uefoliate matches quinquefoliate

ctrl+f一下*,发现两边都有没有*的字符串,并且数量都不一定成双,所以这道题虽然名曰glob,但是却不是考察*,如果非要说考察,看的是转义了。 我们来实现一下他的规则吧,首先是完整匹配:

^(.*)(?!\*)\sm.*\s\1$

接着是单词尾部带*:

^(.*)\*\sm.*\s\1.*$

然后是两边都带*的:

^\*(.+)\*\sm.*\s.+\1.+$

中间包含多个*的:

^\*(.+)\*(.+)\*\sm.*\s.+\1.+\2.+$

首尾才有*的:

^\*(.*)\*\sm.*\s.+\1.+$

上面几条规则简化以后大概是这样的:

`^(.+)(?!*).+\s\1$ ^(.+)*\s.+\s\1.+$ ^*(.+)*\s.+\s.+\1.+$ ^*(.+)*(.+)*\s.*\s.+\1.+\2.+$ ^*(.+)*\s.*\s.+\1.+$\`

去重和化简合并后,检查有没有漏的后,就是这个样子了,322分:

^((.+)(?!\*).+\s\2$|(.+)\*\s.+\s\3.+$|\*(.+)\*.+\s.+\4.+$|\*(.+)(?!\*)\s.+\s.+\5$|(?!\*)(.*\*){2})

 十二题:

Match all of these… and none of these…
<<<<<>><<>>><<… [62 chars] <
<<<<<>><>><<><… [110 chars] <<<<<<>>><<><>>>>>><<>
<<<<<>><>><>><… [102 chars] <<<<<>>><>>><<<>>>><>>
<<<<<>><>>>><<… [88 chars] <<<<<>>>>>>
<<<<<>>><<<>><… [58 chars] <<<<>><<<<<><<>><><<<<
<<<<<>>><<><>>… [152 chars] <<<>><<<<><><><><
<<<<<>>><<>><<… [42 chars] <<<>>>><><<<><>
<<<<<>>><>><<<>>>><<>> <<><<<<><<><<>>><<
<<<<<>>>><<<<>… [102 chars] <<><<<>>>>><<
<<<<<>>>><<<><… [30 chars] <<>>><<<>>
<<<<<>>>><><<<… [66 chars] <><<<>><<>>><<>
<<<<<>>>><><<<… [124 chars] <><<>>><<<><>><<<>>><<>>>><
<<<<<>>>><>><<>> <><<>>><><<<>
<<<<><<>>><<<>… [34 chars] <><>><>>><><<<… [36 chars]
<<<<>><<<>>>><… [92 chars] <>><><<<><>
<<<<>>><<<<>><>><<<>>>>> <>>>>>><<<>><<>><><
<<<<>>><<<><<>>><><<>>>><<>> <>>>>>>><<<
<<<<>>><<><<<>… [84 chars] >
<<<<>>>><<<><<… [52 chars] ><
<<<><<<>>>><<<… [50 chars] ><<<>><><<<><<
<<<><<><>>>> ><<<>>>><><<<<><>>><<><><<
<<<><>><<<>>>> ><<><<<<><<<<>>>><
<<<>><<<><<>>>… [44 chars] ><><><<<>>>>>
<<<>><><<<><>>… [48 chars] ><><>>><>><>
<<<>>><<><<<<>>>><<><<<>>>>> ><><>>>><>>>>>>><>>><>>
<<><<<<>><>>>>… [60 chars] ><>><<<<<>>
<<>> ><>><><><<>><<>>><<
<<>><<<<<>>>>>… [54 chars] ><>>><>>>>><<><<<><>><>><<<
<<>><<<<>><<<>… [74 chars] >><<<><<<<<<><>><<
<> >><>>><<<><>>><><<>><<><><<
<><> >>>><>><>>>><>>><>><><
>>>>><<<>>>

老外君给了一句提示:虽然看起来不可能,但是数量有限。那么就一定是写死一定数量的某东西了… 通过观察发现,复杂的包含简单的,所以我们匹配到简单的就成: 这里继续是参考老外童鞋的思路,外层写死7层,为什么是7,因为基数是2层,就是最里面的是2,然后右边用*忽略多出来的层数,这个分数是286。

^(<(<(<(<(<(<<>>)*>)*>)*>)*>)*>)*$

 十三题:

Match all of these… and none of these…
x xxx
xx xxxxx
xxxx xxxxxxx
xxxxxxxx xxxxxxxxxxx
xxxxxxxxxxxxxxxx xxxxxxxxxxxxx
xxxxxxxxxxxxxx… [32 chars] xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx… [64 chars] xxxxxxxxxxxxxx… [48 chars]
xxxxxxxxxxxxxx… [128 chars] xxxxxxxxxxxxxx… [160 chars]
xxxxxxxxxxxxxx… [256 chars] xxxxxxxxxxxxxx… [401 chars]
xxxxxxxxxxxxxx… [512 chars] xxxxxxxxxxxxxx… [600 chars]
xxxxxxxxxxxxxx… [1024 chars] xxxxxxxxxxxxxx… [1025 chars]

先写一个最笨的方法:

^(x|xx|x{4}|x{8}|x{16}|x{32}|x{64}|x{128}|x{256}|x{512}|x{1024})$

分数太低了,我们改进一下(初中的分段函数),61分:

^((x{32}){6,32}|(x{32}){1,4}|(x{4}){1,6}|x{1,2})$

 十四题:

Match “all” of these… and none of these…
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0000 0011 0100 0101 0110 0110 1000 1001 1010 1011 1100 1101 1110 1111
0000 0001 0010 0010 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
0000 0001 0010 0011 0000 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
0000 0001 0010 0011 0100 0101 0110 0011 1000 1001 1010 1011 1100 1101 1110 1111
0000 0001 0010 0011 0100 0101 0110 0111 1000 0001 1010 1011 1100 1101 1110 1111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1000 1010 1011 1100 1101 1110 1111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1001 1100 1101 1110 1111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 101011011 1100 1101 1110 1111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1110 1011 1100 1101 1110 1111
0000 0001 0010 0011 0100 0101 0110 0111 1010 1001 1010 1011 1100 1101 1110 1111
0000 0001 0010 0011 0100 0101 0110 011111000 1001 1010 1011 1100 1101 1110 1111
0000 0001 0010 0011 0100 0101 0111 0111 1000 1001 1010 1011 1100 1101 1110 1111
0000 0001 0010 0011 0100 0101 1110 0111 1000 1001 1010 1011 1100 1101 1110 1111
0000 0001 0010 0011 0100 0111 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
0000 0001 0010 0011 010010101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
0000 0001 0010 1011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
0000 000110010 0011 0100 0101 0110 0111 1000 1000 1010 1011 1100 1101 1110 1111
0000 0101 0010 0011 0100 0101 0110 0111 1000 1001 1010 1010 1100 1101 1110 1111
0001 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1100 1110 1111
1000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1110

这道题老外疏忽了,我们可以这样(191分):

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

是不是很无赖,仔细看以下,发现是二进制进位,先写了一个排重的语句,发现这货其实并不是所有的不能通过的例子都是重复的,还是要考虑顺序,于是: