看到微博有人转了一个正则表达式测试题,于是果断拿来玩玩,记录一下解题思路。
- 地址:http://regex.alf.nu/
第一题:
Match all of these… | and none of these… |
---|---|
afoot | Atlas |
catfoot | Aymoro |
dogfoot | Iberic |
fanfoot | Mahran |
foody | Ormazd |
foolery | Silipan |
foolish | altared |
fooster | chandoo |
footage | crenel |
foothot | crooked |
footle | fardo |
footpad | folksy |
footway | forest |
hotfoot | hebamic |
jawfoot | idgah |
mafoo | manlike |
nonfood | marly |
padfoot | palazzi |
prefool | sixfold |
sfoot | tarrock |
unfool | unfold |
根据习惯,脑补了第一个答案是:
^\w*foo\w*$
原因很简单,想写严谨,第一列元素都是小写字母开头,以及小写字母结束,中间包含foo的字符串。不过这个复杂度略高,得分只有199,我们来简化的玩。
\w*foo\w*
去掉匹配开头和结尾,得分高了2分,变为201,看来得分和复杂度果然是正相关啊。继续简化的玩。
foo
这个得分207,应该不会有更高了吧…当然你如果为了优雅,可以写成下面的,损失一点分数。
fo{2,}
这个204分,不过看起来有条理些。
第二题:
思路和之前差不多的就不写了,浪费篇幅。 看到又是同样两个list:
Match all of these… | and none of these… |
---|---|
Mick | Kickapoo |
Rick | Nickneven |
allocochick | Rickettsiales |
backtrick | billsticker |
bestick | borickite |
candlestick | chickell |
counterprick | fickleness |
heartsick | finickily |
lampwick | kilbrickenite |
lick | lickpenny |
lungsick | mispickel |
potstick | quickfoot |
quick | quickhatch |
rampick | ricksha |
rebrick | rollicking |
relick | slapsticky |
seasick | snickdrawing |
slick | sunstricken |
tick | tricklingly |
unsick | unlicked |
upstick | unnickeled |
ick$
这个是不是一眼看出,但是等等,直接写下面这个是不是更简单。
k$
208分,继续下一题。
第三题:
Match all of these… | and none of these… |
---|---|
abac | beam |
accede | buoy |
adead | canjac |
babe | chymia |
bead | corah |
bebed | cupula |
bedad | griece |
bedded | hafter |
bedead | idic |
bedeaf | lucy |
caba | martyr |
caffa | matron |
dace | messrs |
dade | mucose |
daff | relose |
dead | sonly |
deed | tegua |
deface | threap |
faded | towned |
faff | widish |
feed | yite |
^[a-f][^g-z]*[a-f]$
不是太好看,分数也不高,只有191,想想办法,把分数搞高一点吧。
^[a-f][^g-z]*$
何必全部匹配呢,只要中间有不对的就排斥掉嘛…这下有196了,将就一下,下一题吧。
第四题:
Match all of these… | and none of these… |
---|---|
allochirally | anticker |
anticovenanting | corundum |
barbary | crabcatcher |
calelectrical | damnably |
entablement | foxtailed |
ethanethiol | galvanotactic |
froufrou | gummage |
furfuryl | gurniad |
galagala | hypergoddess |
heavyheaded | kashga |
linguatuline | nonimitative |
mathematic | parsonage |
monoammonium | pouchlike |
perpera | presumptuously |
photophonic | pylar |
purpuraceous | rachioparalysis |
salpingonasal | scherzando |
testes | swayed |
trisectrix | unbridledness |
undergrounder | unupbraidingly |
untaunted | wellside |
(\w{2,3})(?:\w+)\1
偷懒的写法貌似只有182分,继续偷偷懒,把分数提高。
(.{2})(?:.+)\1
把范围精简一下,186分,继续下一题吧。
第五题:
Match all of these… | and none of these… |
---|---|
acritan | abba |
aesthophysiology | anallagmatic |
amphimictical | bassarisk |
baruria | chorioallantois |
calomorphic | coccomyces |
disarmature | commotive |
effusive | engrammatic |
fluted | glossoscopia |
fusoid | hexacoralla |
goblinize | hippogriffin |
nihilistic | inflammableness |
noisefully | otto |
picrorhiza | overattached |
postarytenoid | saffarid |
revolutionize | sarraceniaceae |
suprasphanoidal | scillipicrin |
suspenseful | tlapallan |
tapachula | trillion |
transmit | unclassably |
unversatile | unfitting |
vibetoite | unsmelled |
warrandice |
^(?!.*(.)(.)\2\1)
这个和上面的一样,只不过要排除,193分。
第六题:
Match all of these… | and none of these… |
---|---|
civic | arrogatingly |
deedeed | camshach |
degged | cinnabar |
hallah | defendress |
kakkak | derivedly |
kook | gourmet |
level | hamleteer |
murdrum | hydroaviation |
noon | lophine |
redder | nonalcohol |
repaper | outslink |
retter | pretest |
reviver | psalterium |
rotator | psorosperm |
sexes | scrummage |
sooloos | sporous |
tebbet | springer |
tenet | sunburn |
terret | teleoptile |
unstuttering | |
womanways |
^((.+)(.*){1}(.+)\3\2)$
中规中矩,157分,那么开始精简吧。
^(.).*\1$
这样就提高到171分了。
第七题:
Match all of these… | and none of these… |
---|---|
xx | xxxx |
xxx | xxxxxx |
xxxxx | xxxxxxxx |
xxxxxxx | xxxxxxxxx |
xxxxxxxxxxx | xxxxxxxxxx |
xxxxxxxxxxxxx | xxxxxxxxxxxx |
xxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxx… [31 chars] | xxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxx… [37 chars] | xxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxx… [41 chars] | xxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxx… [43 chars] | xxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxx… [47 chars] | xxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxx… [53 chars] | xxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxx… [59 chars] | xxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxx… [61 chars] | xxxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxx… [67 chars] | xxxxxxxxxxxxxx… [30 chars] |
xxxxxxxxxxxxxx… [71 chars] | xxxxxxxxxxxxxx… [32 chars] |
这道题真卡住了,因为想用重复来实现….其实不必,网站的思路是这个284分:
^(?!(..+)(\1)+$)
还是分组排除,不难发现,绝对不可以被2整除,那么起点就是两个字符(以上),然后double一下,再排除掉。
第八题:
Match all of these… | and none of these… |
---|---|
Makaraka | Ludgate |
Wasagara | Mitsukurinidae |
degenerescence | Ternstroemiaceae |
desilicification | arrhythmical |
elevener | bleater |
hipponosological | energetics |
homoeomorphy | inthrow |
homologous | mecopterous |
ileocolotomy | multum |
intervisibility | naphthalene |
jararaca | nullibicity |
locomotory | observancy |
micropoikilitic | overpunishment |
odontonosology | overregularly |
parhomologous | overwilily |
pogonotomy | participator |
promonopolist | predisable |
protohomo | reyield |
pseudoprimitivism | rubeola |
tocororo | traitorlike |
unintelligibility | unregainable |
间隔字符四次重复,那么就跟着题意写一下吧,198分:
(.).\1.\1.\1
貌似合并一下,可以提高一分,到199了:
(.)(.\1){3}
第九题:
Match all of these… | and none of these… |
---|---|
access | analyse |
accloy | balanism |
adeem | baronet |
aflow | biddable |
aglow | griefless |
beefin | harebrain |
befist | jestword |
billot | laicize |
bossy | marvelry |
certy | oriole |
chintz | pickietar |
chips | preferee |
chort | primness |
cloop | pulghere |
coost | rebirth |
demos | scupper |
fitty | serigraph |
flory | sororize |
flossy | theowman |
ghost | unfrayed |
mopsy | wagonman |
这道题的副标题提示为cheat,然后题目为order,先写一个搞笑的吧,188分。
^[a-m][c-o](?!st|a|dd)
但是很容易发现左边是不是短了点,基本都是五位的吧,把六位的那几个没选中的扔编辑器中,列编辑去掉前五位,结果是s, y, n, t, t, z, y, 去重就是synzt,分数提到到196分。
^.{5}[synzt]*$
第十题:
Match all of these… | and none of these… |
---|---|
000000000 | 000000005 |
000000003 | 000000008 |
000000006 | 000000010 |
000000009 | 000000011 |
000000012 | 000000014 |
000000015 | 018990130 |
066990060 | 112057285 |
140091876 | 159747125 |
173655750 | 176950268 |
312440187 | 259108903 |
321769005 | 333162608 |
368542278 | 388401457 |
390259104 | 477848777 |
402223947 | 478621693 |
443512431 | 531683939 |
714541758 | 704168662 |
747289572 | 759282218 |
819148602 | 769340942 |
878531775 | 851936815 |
905586303 | 973816159 |
953734824 | 979204403 |
作者在这题写了一句,少年们你们做完这道题去尝试解一下7倍的吧!卧槽,这个作者的喜好好变态。
我觉得这道题应该用重复做,但是感觉实现起来好长,那么用偷懒的方法吧,写段脚本辅助分析一下:
//把左边的数据存一下
var data = [
'000000000',
'000000003',
'000000006',
'000000009',
'000000012',
'000000015',
'066990060',
'140091876',
'173655750',
'312440187',
'321769005',
'368542278',
'390259104',
'402223947',
'443512431',
'714541758',
'747289572',
'819148602',
'878531775',
'905586303',
'953734824'
];
//把字符串切片,算出现次数
var ret = {};
for (var i = 0, j = data.length; i < j; i++) {
var item = data[i];
for (var m = 0, n = data[i].length; m < n; m++) {
for (var p = m + 1, q = data[i].length; p < q; p++) {
var key = item.slice(m, p);
if (ret[key]) {
ret[key] += 1;
} else {
ret[key] = 1;
}
}
}
}
//把出现一次的屌丝们干掉
data.length=0;
for (var i in ret) {
if (ret[i] === 1) {
delete ret[i]
}
}
console.log(JSON.stringify(ret))
把结果超过10次的还有共性太大的01系干掉后,结果如下:
{
"12": 2,
"14": 3,
"17": 4,
"18": 2,
"22": 3,
"24": 2,
"31": 2,
"36": 2,
"39": 2,
"40": 3,
"43": 2,
"44": 2,
"48": 2,
"53": 2,
"54": 2,
"55": 2,
"57": 2,
"69": 2,
"73": 2,
"75": 2,
"85": 2,
"86": 2,
"87": 2,
"90": 4,
"91": 3,
"95": 2,
"124": 2,
"900": 2,
"06": 2,
"02": 2
}
经过尝试,貌似不妥,换一个思路,统计间隔字符出现频率。 大概实现如下:
var data = [
'000000000',
'000000003',
'000000006',
'000000009',
'000000012',
'000000015',
'066990060',
'140091876',
'173655750',
'312440187',
'321769005',
'368542278',
'390259104',
'402223947',
'443512431',
'714541758',
'747289572',
'819148602',
'878531775',
'905586303',
'953734824'
];
//把字符串切片,算出现次数
var ret = {};
for (var i = 0, j = data.length; i < j; i++) {
var item = data[i];
ret[item] = [];
for (var m = 0, n = 9; m < n; m++) {
for (var p = 0, q = 9; p < q; p++) {
var left = item.indexOf(m);
var right = item.lastIndexOf(p);
if(left==-1||right==-1||left==right){
break;
}
console.log(m,p,item)
var tmp = {}
tmp[m] = p;
ret[item].push(tmp)
}
}
}
console.log(JSON.stringify(ret))
上面两种比较笨的思路行不通,那么我们用观察的好了,开始的一部分以及第七个,第八个其实都有包含00,所以或许我们把前八个抽象更好.
00($|3|6|9|12|15)
接着我们砍掉前八个,就只匹配后面的几个试一试.
00($|3|6|9|12|15)|36.5|3.?2|[79]14|8.3|02.*4+|3.*4[38]|9572
这条写的稀烂,但是分数是571…应该有更多更好的组合方式吧。
十一题:
Match all of these… | and none of these… |
---|---|
err matches superreform | anapaestical matches anapaestically |
falleess matches unfallenness | *chegonio matches archegoniophore |
illog* matches unphilological | dissoluti matches dissolutional |
plentud* matches overplenitude | *domestica matches domesticality |
taiodi matches pentaiodide | *expedition matches expedition |
*viceberry matches serviceberry | *hormog matches hormogonium |
bowdl* matches bowdlerism | stipular matches infrastipular |
bronhopleursy matches bronchopleurisy | *strabis matches strabismal |
chromatophobia matches chromatophobia | cathartica matches cathartically |
cockneyla* matches cockneyland | di matches gerundively |
colorlessly matches colorlessly | hacean matches zoanthacean |
cretefaction matches cretefaction | headmist matches headmistress |
downrightly matches downrightly | herwi matches trencherwise |
leather* matches leatherbark | iemphraxia matches cardiemphraxia |
mitogenet* matches mitogenetic | kmak matches packmaking |
palindrom* matches palindromic | mbable* matches unclimbable |
parallelepiped matches parallelepiped | nspi*tor matches inspirator |
primigenial matches primigenial | ocumidi matches pseudocumidine |
puppe* matches puppetlike | raretinal* matches intraretinal |
resurrender matches resurrender | tte matches whitterick |
wreathwi* matches wreathwise | uefoliate matches quinquefoliate |
ctrl+f一下*,发现两边都有没有*的字符串,并且数量都不一定成双,所以这道题虽然名曰glob,但是却不是考察*,如果非要说考察,看的是转义了。 我们来实现一下他的规则吧,首先是完整匹配:
^(.*)(?!\*)\sm.*\s\1$
接着是单词尾部带*:
^(.*)\*\sm.*\s\1.*$
然后是两边都带*的:
^\*(.+)\*\sm.*\s.+\1.+$
中间包含多个*的:
^\*(.+)\*(.+)\*\sm.*\s.+\1.+\2.+$
首尾才有*的:
^\*(.*)\*\sm.*\s.+\1.+$
上面几条规则简化以后大概是这样的:
`^(.+)(?!*).+\s\1$ ^(.+)*\s.+\s\1.+$ ^*(.+)*\s.+\s.+\1.+$ ^*(.+)*(.+)*\s.*\s.+\1.+\2.+$ ^*(.+)*\s.*\s.+\1.+$\`
去重和化简合并后,检查有没有漏的后,就是这个样子了,322分:
^((.+)(?!\*).+\s\2$|(.+)\*\s.+\s\3.+$|\*(.+)\*.+\s.+\4.+$|\*(.+)(?!\*)\s.+\s.+\5$|(?!\*)(.*\*){2})
十二题:
Match all of these… | and none of these… |
---|---|
<<<<<>><<>>><<… [62 chars] | < |
<<<<<>><>><<><… [110 chars] | <<<<<<>>><<><>>>>>><<> |
<<<<<>><>><>><… [102 chars] | <<<<<>>><>>><<<>>>><>> |
<<<<<>><>>>><<… [88 chars] | <<<<<>>>>>> |
<<<<<>>><<<>><… [58 chars] | <<<<>><<<<<><<>><><<<< |
<<<<<>>><<><>>… [152 chars] | <<<>><<<<><><><>< |
<<<<<>>><<>><<… [42 chars] | <<<>>>><><<<><> |
<<<<<>>><>><<<>>>><<>> | <<><<<<><<><<>>><< |
<<<<<>>>><<<<>… [102 chars] | <<><<<>>>>><< |
<<<<<>>>><<<><… [30 chars] | <<>>><<<>> |
<<<<<>>>><><<<… [66 chars] | <><<<>><<>>><<> |
<<<<<>>>><><<<… [124 chars] | <><<>>><<<><>><<<>>><<>>>>< |
<<<<<>>>><>><<>> | <><<>>><><<<> |
<<<<><<>>><<<>… [34 chars] | <><>><>>><><<<… [36 chars] |
<<<<>><<<>>>><… [92 chars] | <>><><<<><> |
<<<<>>><<<<>><>><<<>>>>> | <>>>>>><<<>><<>><>< |
<<<<>>><<<><<>>><><<>>>><<>> | <>>>>>>><<< |
<<<<>>><<><<<>… [84 chars] | > |
<<<<>>>><<<><<… [52 chars] | >< |
<<<><<<>>>><<<… [50 chars] | ><<<>><><<<><< |
<<<><<><>>>> | ><<<>>>><><<<<><>>><<><><< |
<<<><>><<<>>>> | ><<><<<<><<<<>>>>< |
<<<>><<<><<>>>… [44 chars] | ><><><<<>>>>> |
<<<>><><<<><>>… [48 chars] | ><><>>><>><> |
<<<>>><<><<<<>>>><<><<<>>>>> | ><><>>>><>>>>>>><>>><>> |
<<><<<<>><>>>>… [60 chars] | ><>><<<<<>> |
<<>> | ><>><><><<>><<>>><< |
<<>><<<<<>>>>>… [54 chars] | ><>>><>>>>><<><<<><>><>><<< |
<<>><<<<>><<<>… [74 chars] | >><<<><<<<<<><>><< |
<> | >><>>><<<><>>><><<>><<><><< |
<><> | >>>><>><>>>><>>><>><>< |
>>>>><<<>>> |
老外君给了一句提示:虽然看起来不可能,但是数量有限。那么就一定是写死一定数量的某东西了… 通过观察发现,复杂的包含简单的,所以我们匹配到简单的就成: 这里继续是参考老外童鞋的思路,外层写死7层,为什么是7,因为基数是2层,就是最里面的是2,然后右边用*忽略多出来的层数,这个分数是286。
^(<(<(<(<(<(<<>>)*>)*>)*>)*>)*>)*$
十三题:
Match all of these… | and none of these… |
---|---|
x | xxx |
xx | xxxxx |
xxxx | xxxxxxx |
xxxxxxxx | xxxxxxxxxxx |
xxxxxxxxxxxxxxxx | xxxxxxxxxxxxx |
xxxxxxxxxxxxxx… [32 chars] | xxxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxx… [64 chars] | xxxxxxxxxxxxxx… [48 chars] |
xxxxxxxxxxxxxx… [128 chars] | xxxxxxxxxxxxxx… [160 chars] |
xxxxxxxxxxxxxx… [256 chars] | xxxxxxxxxxxxxx… [401 chars] |
xxxxxxxxxxxxxx… [512 chars] | xxxxxxxxxxxxxx… [600 chars] |
xxxxxxxxxxxxxx… [1024 chars] | xxxxxxxxxxxxxx… [1025 chars] |
先写一个最笨的方法:
^(x|xx|x{4}|x{8}|x{16}|x{32}|x{64}|x{128}|x{256}|x{512}|x{1024})$
分数太低了,我们改进一下(初中的分段函数),61分:
^((x{32}){6,32}|(x{32}){1,4}|(x{4}){1,6}|x{1,2})$
十四题:
Match “all” of these… | and none of these… |
---|---|
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | 0000 0001 0000 0011 0100 0101 0110 0110 1000 1001 1010 1011 1100 1101 1110 1111 |
0000 0001 0010 0010 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
0000 0001 0010 0011 0000 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
0000 0001 0010 0011 0100 0101 0110 0011 1000 1001 1010 1011 1100 1101 1110 1111 | |
0000 0001 0010 0011 0100 0101 0110 0111 1000 0001 1010 1011 1100 1101 1110 1111 | |
0000 0001 0010 0011 0100 0101 0110 0111 1000 1000 1010 1011 1100 1101 1110 1111 | |
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1001 1100 1101 1110 1111 | |
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 101011011 1100 1101 1110 1111 | |
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1110 1011 1100 1101 1110 1111 | |
0000 0001 0010 0011 0100 0101 0110 0111 1010 1001 1010 1011 1100 1101 1110 1111 | |
0000 0001 0010 0011 0100 0101 0110 011111000 1001 1010 1011 1100 1101 1110 1111 | |
0000 0001 0010 0011 0100 0101 0111 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
0000 0001 0010 0011 0100 0101 1110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
0000 0001 0010 0011 0100 0111 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
0000 0001 0010 0011 010010101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
0000 0001 0010 1011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
0000 000110010 0011 0100 0101 0110 0111 1000 1000 1010 1011 1100 1101 1110 1111 | |
0000 0101 0010 0011 0100 0101 0110 0111 1000 1001 1010 1010 1100 1101 1110 1111 | |
0001 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1100 1110 1111 | |
1000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1110 |
这道题老外疏忽了,我们可以这样(191分):
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
是不是很无赖,仔细看以下,发现是二进制进位,先写了一个排重的语句,发现这货其实并不是所有的不能通过的例子都是重复的,还是要考虑顺序,于是: