看到微博有人转了一个正则表达式测试题,于是果断拿来玩玩,记录一下解题思路。
- 地址:http://regex.alf.nu/
第一题:
| Match all of these… | and none of these… |
|---|---|
| afoot | Atlas |
| catfoot | Aymoro |
| dogfoot | Iberic |
| fanfoot | Mahran |
| foody | Ormazd |
| foolery | Silipan |
| foolish | altared |
| fooster | chandoo |
| footage | crenel |
| foothot | crooked |
| footle | fardo |
| footpad | folksy |
| footway | forest |
| hotfoot | hebamic |
| jawfoot | idgah |
| mafoo | manlike |
| nonfood | marly |
| padfoot | palazzi |
| prefool | sixfold |
| sfoot | tarrock |
| unfool | unfold |
根据习惯,脑补了第一个答案是:
^\w*foo\w*$
原因很简单,想写严谨,第一列元素都是小写字母开头,以及小写字母结束,中间包含foo的字符串。不过这个复杂度略高,得分只有199,我们来简化的玩。
\w*foo\w*
去掉匹配开头和结尾,得分高了2分,变为201,看来得分和复杂度果然是正相关啊。继续简化的玩。
foo
这个得分207,应该不会有更高了吧…当然你如果为了优雅,可以写成下面的,损失一点分数。
fo{2,}
这个204分,不过看起来有条理些。
第二题:
思路和之前差不多的就不写了,浪费篇幅。 看到又是同样两个list:
| Match all of these… | and none of these… |
|---|---|
| Mick | Kickapoo |
| Rick | Nickneven |
| allocochick | Rickettsiales |
| backtrick | billsticker |
| bestick | borickite |
| candlestick | chickell |
| counterprick | fickleness |
| heartsick | finickily |
| lampwick | kilbrickenite |
| lick | lickpenny |
| lungsick | mispickel |
| potstick | quickfoot |
| quick | quickhatch |
| rampick | ricksha |
| rebrick | rollicking |
| relick | slapsticky |
| seasick | snickdrawing |
| slick | sunstricken |
| tick | tricklingly |
| unsick | unlicked |
| upstick | unnickeled |
ick$
这个是不是一眼看出,但是等等,直接写下面这个是不是更简单。
k$
208分,继续下一题。
第三题:
| Match all of these… | and none of these… |
|---|---|
| abac | beam |
| accede | buoy |
| adead | canjac |
| babe | chymia |
| bead | corah |
| bebed | cupula |
| bedad | griece |
| bedded | hafter |
| bedead | idic |
| bedeaf | lucy |
| caba | martyr |
| caffa | matron |
| dace | messrs |
| dade | mucose |
| daff | relose |
| dead | sonly |
| deed | tegua |
| deface | threap |
| faded | towned |
| faff | widish |
| feed | yite |
^[a-f][^g-z]*[a-f]$
不是太好看,分数也不高,只有191,想想办法,把分数搞高一点吧。
^[a-f][^g-z]*$
何必全部匹配呢,只要中间有不对的就排斥掉嘛…这下有196了,将就一下,下一题吧。
第四题:
| Match all of these… | and none of these… |
|---|---|
| allochirally | anticker |
| anticovenanting | corundum |
| barbary | crabcatcher |
| calelectrical | damnably |
| entablement | foxtailed |
| ethanethiol | galvanotactic |
| froufrou | gummage |
| furfuryl | gurniad |
| galagala | hypergoddess |
| heavyheaded | kashga |
| linguatuline | nonimitative |
| mathematic | parsonage |
| monoammonium | pouchlike |
| perpera | presumptuously |
| photophonic | pylar |
| purpuraceous | rachioparalysis |
| salpingonasal | scherzando |
| testes | swayed |
| trisectrix | unbridledness |
| undergrounder | unupbraidingly |
| untaunted | wellside |
(\w{2,3})(?:\w+)\1
偷懒的写法貌似只有182分,继续偷偷懒,把分数提高。
(.{2})(?:.+)\1
把范围精简一下,186分,继续下一题吧。
第五题:
| Match all of these… | and none of these… |
|---|---|
| acritan | abba |
| aesthophysiology | anallagmatic |
| amphimictical | bassarisk |
| baruria | chorioallantois |
| calomorphic | coccomyces |
| disarmature | commotive |
| effusive | engrammatic |
| fluted | glossoscopia |
| fusoid | hexacoralla |
| goblinize | hippogriffin |
| nihilistic | inflammableness |
| noisefully | otto |
| picrorhiza | overattached |
| postarytenoid | saffarid |
| revolutionize | sarraceniaceae |
| suprasphanoidal | scillipicrin |
| suspenseful | tlapallan |
| tapachula | trillion |
| transmit | unclassably |
| unversatile | unfitting |
| vibetoite | unsmelled |
| warrandice |
^(?!.*(.)(.)\2\1)
这个和上面的一样,只不过要排除,193分。
第六题:
| Match all of these… | and none of these… |
|---|---|
| civic | arrogatingly |
| deedeed | camshach |
| degged | cinnabar |
| hallah | defendress |
| kakkak | derivedly |
| kook | gourmet |
| level | hamleteer |
| murdrum | hydroaviation |
| noon | lophine |
| redder | nonalcohol |
| repaper | outslink |
| retter | pretest |
| reviver | psalterium |
| rotator | psorosperm |
| sexes | scrummage |
| sooloos | sporous |
| tebbet | springer |
| tenet | sunburn |
| terret | teleoptile |
| unstuttering | |
| womanways |
^((.+)(.*){1}(.+)\3\2)$
中规中矩,157分,那么开始精简吧。
^(.).*\1$
这样就提高到171分了。
第七题:
| Match all of these… | and none of these… |
|---|---|
| xx | xxxx |
| xxx | xxxxxx |
| xxxxx | xxxxxxxx |
| xxxxxxx | xxxxxxxxx |
| xxxxxxxxxxx | xxxxxxxxxx |
| xxxxxxxxxxxxx | xxxxxxxxxxxx |
| xxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxx |
| xxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxx… [31 chars] | xxxxxxxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxx… [37 chars] | xxxxxxxxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxx… [41 chars] | xxxxxxxxxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxx… [43 chars] | xxxxxxxxxxxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxx… [47 chars] | xxxxxxxxxxxxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxx… [53 chars] | xxxxxxxxxxxxxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxx… [59 chars] | xxxxxxxxxxxxxxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxx… [61 chars] | xxxxxxxxxxxxxxxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxx… [67 chars] | xxxxxxxxxxxxxx… [30 chars] |
| xxxxxxxxxxxxxx… [71 chars] | xxxxxxxxxxxxxx… [32 chars] |
这道题真卡住了,因为想用重复来实现….其实不必,网站的思路是这个284分:
^(?!(..+)(\1)+$)
还是分组排除,不难发现,绝对不可以被2整除,那么起点就是两个字符(以上),然后double一下,再排除掉。
第八题:
| Match all of these… | and none of these… |
|---|---|
| Makaraka | Ludgate |
| Wasagara | Mitsukurinidae |
| degenerescence | Ternstroemiaceae |
| desilicification | arrhythmical |
| elevener | bleater |
| hipponosological | energetics |
| homoeomorphy | inthrow |
| homologous | mecopterous |
| ileocolotomy | multum |
| intervisibility | naphthalene |
| jararaca | nullibicity |
| locomotory | observancy |
| micropoikilitic | overpunishment |
| odontonosology | overregularly |
| parhomologous | overwilily |
| pogonotomy | participator |
| promonopolist | predisable |
| protohomo | reyield |
| pseudoprimitivism | rubeola |
| tocororo | traitorlike |
| unintelligibility | unregainable |
间隔字符四次重复,那么就跟着题意写一下吧,198分:
(.).\1.\1.\1
貌似合并一下,可以提高一分,到199了:
(.)(.\1){3}
第九题:
| Match all of these… | and none of these… |
|---|---|
| access | analyse |
| accloy | balanism |
| adeem | baronet |
| aflow | biddable |
| aglow | griefless |
| beefin | harebrain |
| befist | jestword |
| billot | laicize |
| bossy | marvelry |
| certy | oriole |
| chintz | pickietar |
| chips | preferee |
| chort | primness |
| cloop | pulghere |
| coost | rebirth |
| demos | scupper |
| fitty | serigraph |
| flory | sororize |
| flossy | theowman |
| ghost | unfrayed |
| mopsy | wagonman |
这道题的副标题提示为cheat,然后题目为order,先写一个搞笑的吧,188分。
^[a-m][c-o](?!st|a|dd)
但是很容易发现左边是不是短了点,基本都是五位的吧,把六位的那几个没选中的扔编辑器中,列编辑去掉前五位,结果是s, y, n, t, t, z, y, 去重就是synzt,分数提到到196分。
^.{5}[synzt]*$
第十题:
| Match all of these… | and none of these… |
|---|---|
| 000000000 | 000000005 |
| 000000003 | 000000008 |
| 000000006 | 000000010 |
| 000000009 | 000000011 |
| 000000012 | 000000014 |
| 000000015 | 018990130 |
| 066990060 | 112057285 |
| 140091876 | 159747125 |
| 173655750 | 176950268 |
| 312440187 | 259108903 |
| 321769005 | 333162608 |
| 368542278 | 388401457 |
| 390259104 | 477848777 |
| 402223947 | 478621693 |
| 443512431 | 531683939 |
| 714541758 | 704168662 |
| 747289572 | 759282218 |
| 819148602 | 769340942 |
| 878531775 | 851936815 |
| 905586303 | 973816159 |
| 953734824 | 979204403 |
作者在这题写了一句,少年们你们做完这道题去尝试解一下7倍的吧!卧槽,这个作者的喜好好变态。
我觉得这道题应该用重复做,但是感觉实现起来好长,那么用偷懒的方法吧,写段脚本辅助分析一下:
//把左边的数据存一下
var data = [
'000000000',
'000000003',
'000000006',
'000000009',
'000000012',
'000000015',
'066990060',
'140091876',
'173655750',
'312440187',
'321769005',
'368542278',
'390259104',
'402223947',
'443512431',
'714541758',
'747289572',
'819148602',
'878531775',
'905586303',
'953734824'
];
//把字符串切片,算出现次数
var ret = {};
for (var i = 0, j = data.length; i < j; i++) {
var item = data[i];
for (var m = 0, n = data[i].length; m < n; m++) {
for (var p = m + 1, q = data[i].length; p < q; p++) {
var key = item.slice(m, p);
if (ret[key]) {
ret[key] += 1;
} else {
ret[key] = 1;
}
}
}
}
//把出现一次的屌丝们干掉
data.length=0;
for (var i in ret) {
if (ret[i] === 1) {
delete ret[i]
}
}
console.log(JSON.stringify(ret))
把结果超过10次的还有共性太大的01系干掉后,结果如下:
{
"12": 2,
"14": 3,
"17": 4,
"18": 2,
"22": 3,
"24": 2,
"31": 2,
"36": 2,
"39": 2,
"40": 3,
"43": 2,
"44": 2,
"48": 2,
"53": 2,
"54": 2,
"55": 2,
"57": 2,
"69": 2,
"73": 2,
"75": 2,
"85": 2,
"86": 2,
"87": 2,
"90": 4,
"91": 3,
"95": 2,
"124": 2,
"900": 2,
"06": 2,
"02": 2
}
经过尝试,貌似不妥,换一个思路,统计间隔字符出现频率。 大概实现如下:
var data = [
'000000000',
'000000003',
'000000006',
'000000009',
'000000012',
'000000015',
'066990060',
'140091876',
'173655750',
'312440187',
'321769005',
'368542278',
'390259104',
'402223947',
'443512431',
'714541758',
'747289572',
'819148602',
'878531775',
'905586303',
'953734824'
];
//把字符串切片,算出现次数
var ret = {};
for (var i = 0, j = data.length; i < j; i++) {
var item = data[i];
ret[item] = [];
for (var m = 0, n = 9; m < n; m++) {
for (var p = 0, q = 9; p < q; p++) {
var left = item.indexOf(m);
var right = item.lastIndexOf(p);
if(left==-1||right==-1||left==right){
break;
}
console.log(m,p,item)
var tmp = {}
tmp[m] = p;
ret[item].push(tmp)
}
}
}
console.log(JSON.stringify(ret))
上面两种比较笨的思路行不通,那么我们用观察的好了,开始的一部分以及第七个,第八个其实都有包含00,所以或许我们把前八个抽象更好.
00($|3|6|9|12|15)
接着我们砍掉前八个,就只匹配后面的几个试一试.
00($|3|6|9|12|15)|36.5|3.?2|[79]14|8.3|02.*4+|3.*4[38]|9572
这条写的稀烂,但是分数是571…应该有更多更好的组合方式吧。
十一题:
| Match all of these… | and none of these… |
|---|---|
| err matches superreform | anapaestical matches anapaestically |
| falleess matches unfallenness | *chegonio matches archegoniophore |
| illog* matches unphilological | dissoluti matches dissolutional |
| plentud* matches overplenitude | *domestica matches domesticality |
| taiodi matches pentaiodide | *expedition matches expedition |
| *viceberry matches serviceberry | *hormog matches hormogonium |
| bowdl* matches bowdlerism | stipular matches infrastipular |
| bronhopleursy matches bronchopleurisy | *strabis matches strabismal |
| chromatophobia matches chromatophobia | cathartica matches cathartically |
| cockneyla* matches cockneyland | di matches gerundively |
| colorlessly matches colorlessly | hacean matches zoanthacean |
| cretefaction matches cretefaction | headmist matches headmistress |
| downrightly matches downrightly | herwi matches trencherwise |
| leather* matches leatherbark | iemphraxia matches cardiemphraxia |
| mitogenet* matches mitogenetic | kmak matches packmaking |
| palindrom* matches palindromic | mbable* matches unclimbable |
| parallelepiped matches parallelepiped | nspi*tor matches inspirator |
| primigenial matches primigenial | ocumidi matches pseudocumidine |
| puppe* matches puppetlike | raretinal* matches intraretinal |
| resurrender matches resurrender | tte matches whitterick |
| wreathwi* matches wreathwise | uefoliate matches quinquefoliate |
ctrl+f一下*,发现两边都有没有*的字符串,并且数量都不一定成双,所以这道题虽然名曰glob,但是却不是考察*,如果非要说考察,看的是转义了。 我们来实现一下他的规则吧,首先是完整匹配:
^(.*)(?!\*)\sm.*\s\1$
接着是单词尾部带*:
^(.*)\*\sm.*\s\1.*$
然后是两边都带*的:
^\*(.+)\*\sm.*\s.+\1.+$
中间包含多个*的:
^\*(.+)\*(.+)\*\sm.*\s.+\1.+\2.+$
首尾才有*的:
^\*(.*)\*\sm.*\s.+\1.+$
上面几条规则简化以后大概是这样的:
`^(.+)(?!*).+\s\1$ ^(.+)*\s.+\s\1.+$ ^*(.+)*\s.+\s.+\1.+$ ^*(.+)*(.+)*\s.*\s.+\1.+\2.+$ ^*(.+)*\s.*\s.+\1.+$\`
去重和化简合并后,检查有没有漏的后,就是这个样子了,322分:
^((.+)(?!\*).+\s\2$|(.+)\*\s.+\s\3.+$|\*(.+)\*.+\s.+\4.+$|\*(.+)(?!\*)\s.+\s.+\5$|(?!\*)(.*\*){2})
十二题:
| Match all of these… | and none of these… |
|---|---|
| <<<<<>><<>>><<… [62 chars] | < |
| <<<<<>><>><<><… [110 chars] | <<<<<<>>><<><>>>>>><<> |
| <<<<<>><>><>><… [102 chars] | <<<<<>>><>>><<<>>>><>> |
| <<<<<>><>>>><<… [88 chars] | <<<<<>>>>>> |
| <<<<<>>><<<>><… [58 chars] | <<<<>><<<<<><<>><><<<< |
| <<<<<>>><<><>>… [152 chars] | <<<>><<<<><><><>< |
| <<<<<>>><<>><<… [42 chars] | <<<>>>><><<<><> |
| <<<<<>>><>><<<>>>><<>> | <<><<<<><<><<>>><< |
| <<<<<>>>><<<<>… [102 chars] | <<><<<>>>>><< |
| <<<<<>>>><<<><… [30 chars] | <<>>><<<>> |
| <<<<<>>>><><<<… [66 chars] | <><<<>><<>>><<> |
| <<<<<>>>><><<<… [124 chars] | <><<>>><<<><>><<<>>><<>>>>< |
| <<<<<>>>><>><<>> | <><<>>><><<<> |
| <<<<><<>>><<<>… [34 chars] | <><>><>>><><<<… [36 chars] |
| <<<<>><<<>>>><… [92 chars] | <>><><<<><> |
| <<<<>>><<<<>><>><<<>>>>> | <>>>>>><<<>><<>><>< |
| <<<<>>><<<><<>>><><<>>>><<>> | <>>>>>>><<< |
| <<<<>>><<><<<>… [84 chars] | > |
| <<<<>>>><<<><<… [52 chars] | >< |
| <<<><<<>>>><<<… [50 chars] | ><<<>><><<<><< |
| <<<><<><>>>> | ><<<>>>><><<<<><>>><<><><< |
| <<<><>><<<>>>> | ><<><<<<><<<<>>>>< |
| <<<>><<<><<>>>… [44 chars] | ><><><<<>>>>> |
| <<<>><><<<><>>… [48 chars] | ><><>>><>><> |
| <<<>>><<><<<<>>>><<><<<>>>>> | ><><>>>><>>>>>>><>>><>> |
| <<><<<<>><>>>>… [60 chars] | ><>><<<<<>> |
| <<>> | ><>><><><<>><<>>><< |
| <<>><<<<<>>>>>… [54 chars] | ><>>><>>>>><<><<<><>><>><<< |
| <<>><<<<>><<<>… [74 chars] | >><<<><<<<<<><>><< |
| <> | >><>>><<<><>>><><<>><<><><< |
| <><> | >>>><>><>>>><>>><>><>< |
| >>>>><<<>>> |
老外君给了一句提示:虽然看起来不可能,但是数量有限。那么就一定是写死一定数量的某东西了… 通过观察发现,复杂的包含简单的,所以我们匹配到简单的就成: 这里继续是参考老外童鞋的思路,外层写死7层,为什么是7,因为基数是2层,就是最里面的是2,然后右边用*忽略多出来的层数,这个分数是286。
^(<(<(<(<(<(<<>>)*>)*>)*>)*>)*>)*$
十三题:
| Match all of these… | and none of these… |
|---|---|
| x | xxx |
| xx | xxxxx |
| xxxx | xxxxxxx |
| xxxxxxxx | xxxxxxxxxxx |
| xxxxxxxxxxxxxxxx | xxxxxxxxxxxxx |
| xxxxxxxxxxxxxx… [32 chars] | xxxxxxxxxxxxxxxxxxxxxxxxxxxx |
| xxxxxxxxxxxxxx… [64 chars] | xxxxxxxxxxxxxx… [48 chars] |
| xxxxxxxxxxxxxx… [128 chars] | xxxxxxxxxxxxxx… [160 chars] |
| xxxxxxxxxxxxxx… [256 chars] | xxxxxxxxxxxxxx… [401 chars] |
| xxxxxxxxxxxxxx… [512 chars] | xxxxxxxxxxxxxx… [600 chars] |
| xxxxxxxxxxxxxx… [1024 chars] | xxxxxxxxxxxxxx… [1025 chars] |
先写一个最笨的方法:
^(x|xx|x{4}|x{8}|x{16}|x{32}|x{64}|x{128}|x{256}|x{512}|x{1024})$
分数太低了,我们改进一下(初中的分段函数),61分:
^((x{32}){6,32}|(x{32}){1,4}|(x{4}){1,6}|x{1,2})$
十四题:
| Match “all” of these… | and none of these… |
|---|---|
| 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | 0000 0001 0000 0011 0100 0101 0110 0110 1000 1001 1010 1011 1100 1101 1110 1111 |
| 0000 0001 0010 0010 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0000 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0100 0101 0110 0011 1000 1001 1010 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0100 0101 0110 0111 1000 0001 1010 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0100 0101 0110 0111 1000 1000 1010 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1001 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 101011011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1110 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0100 0101 0110 0111 1010 1001 1010 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0100 0101 0110 011111000 1001 1010 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0100 0101 0111 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0100 0101 1110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 0100 0111 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 0011 010010101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
| 0000 0001 0010 1011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 | |
| 0000 000110010 0011 0100 0101 0110 0111 1000 1000 1010 1011 1100 1101 1110 1111 | |
| 0000 0101 0010 0011 0100 0101 0110 0111 1000 1001 1010 1010 1100 1101 1110 1111 | |
| 0001 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1100 1110 1111 | |
| 1000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1110 |
这道题老外疏忽了,我们可以这样(191分):
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
是不是很无赖,仔细看以下,发现是二进制进位,先写了一个排重的语句,发现这货其实并不是所有的不能通过的例子都是重复的,还是要考虑顺序,于是: