Задать вопрос
artzub
@artzub
Программист

Awk парсинг xml?

Привет!


Дано:

1. xml файл
<font color="black"><font color="#0000ff">&lt;?</font><font color="#800000">xml</font> <font color="#ff0000">version</font>=<font color="#ff0000">1</font>.<font color="#ff0000">0</font>?<font color="#0000ff">&gt;</font><br/>
<font color="#0000ff">&lt;</font><font color="#800000">file_events</font><font color="#0000ff">&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;f4d64c1a/497b733f81c2866d/81c2866da7e4d268.68&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;51d46ff1/fdb0cf112ec24d1e/2ec24d1e87c7a87a.7a&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;384bccff/ba9fc3f089695f6d/89695f6dea4210c1.c1&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;486c2459/24e0b8e2d1c311d8/d1c311d80290ed01.01&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;415eef3b/1c681c2b8a542c77/8a542c77cb1839ce.ce&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;b3008424/6da995605f28165c/5f28165c84475335.35&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;ff4d0e6d/ea7152595adb7c97/5adb7c97bf59427e.7e&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
<font color="#0000ff">&lt;/</font><font color="#800000">file_events</font><font color="#0000ff">&gt;</font></font><br/>
<br/>
<font color="gray">* This source code was highlighted with <a href="http://virtser.net/blog/post/source-code-highlighter.aspx"><font color="gray">Source Code Highlighter</font></a>.</font>


Причем в node event порядок аргументов может быть произвольным.


Задача:

1. Преобразовать данный файл к такому формату:
date|author|action|filename|comment

2. не обязательно Отсортировать данные по параметру date.


Вообще я делаю так:

<font color="black">cat $1 | \<br/>
grep -e <font color="#A31515">&quot;event &quot;</font> | \<br/>
sed -e <font color="#A31515">&quot;s/^[   ]*//&quot;</font> | \<br/>
awk <font color="#A31515">'<br/>
  $2 ~ /data/ { p1=$2; } <br/>
  $2 ~ /author/ { p2=$2; } <br/>
  $2 ~ /action/ { p3=$2;} <br/>
  $2 ~ /filename/ { p4=$2; } <br/>
  $2 ~ /comment/ { p5=$2; } <br/>
  <br/>
  $3 ~ /data/ { p1=$3; } <br/>
  $3 ~ /author/ { p2=$3; } <br/>
  $3 ~ /action/ { p3=$3; } <br/>
  $3 ~ /filename/ { p4=$3; } <br/>
  $3 ~ /comment/ { p5=$3; } <br/>
  <br/>
  $4 ~ /data/ { p1=$4; } <br/>
  $4 ~ /author/ { p2=$4; } <br/>
  $4 ~ /action/ { p3=$4; } <br/>
  $4 ~ /filename/ { p4=$4; } <br/>
  $4 ~ /comment/ { p5=$4; } <br/>
  <br/>
  $5 ~ /data/ { p1=$5; } <br/>
  $5 ~ /author/ { p2=$5; } <br/>
  $5 ~ /action/ { p3=$5; } <br/>
  $5 ~ /filename/ { p4=$5; } <br/>
  $5 ~ /comment/ { p5=$5; }     <br/>
  <br/>
  $6 ~ /data/ { p1=$6; } <br/>
  $6 ~ /author/ { p2=$6; } <br/>
  $6 ~ /action/ { p3=$6; } <br/>
  $6 ~ /filename/ { p4=$6; } <br/>
  $6 ~ /comment/ { p5=$6; }<br/>
  <br/>
  { print p1&quot;|&quot;p2&quot;|&quot;p3&quot;|&quot;p4&quot;|&quot;p5&quot;\n&quot;; } '</font> | \<br/>
sort -t <font color="#A31515">&quot;|&quot;</font> -k1 &gt; $result <br/>
</font><br/>
<font color="gray">* This source code was highlighted with <a href="http://virtser.net/blog/post/source-code-highlighter.aspx"><font color="gray">Source Code Highlighter</font></a>.</font>



на выходе имею:
<font color="black"><font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;f4d64c1a/497b733f81c2866d/81c2866da7e4d268.68&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;51d46ff1/fdb0cf112ec24d1e/2ec24d1e87c7a87a.7a&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;384bccff/ba9fc3f089695f6d/89695f6dea4210c1.c1&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;486c2459/24e0b8e2d1c311d8/d1c311d80290ed01.01&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;415eef3b/1c681c2b8a542c77/8a542c77cb1839ce.ce&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;b3008424/6da995605f28165c/5f28165c84475335.35&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;ff4d0e6d/ea7152595adb7c97/5adb7c97bf59427e.7e&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;a0c052d4/b0a0b0c0f70a7d29/f70a7d29231dacbd.bd&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;eabd8551/ccb2616f5be66fdb/5be66fdb0d4c9a77.77&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;25046ffa/0dfcd577c31d07d8/c31d07d855ade3e5.e5&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;cb86925a/bf4f23acb14c6c47/b14c6c474628ff82.82&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;51d46ff1/fdb0cf112ec24d1e/2ec24d1e87c7a87a.7a&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="gray">Source Code Highlighter</font></a>.</font>
  • Вопрос задан
  • 7608 просмотров
Подписаться 2 Оценить 1 комментарий
Пригласить эксперта
Ответы на вопрос 4
@faust0
BEGIN {
RS="/>"
}

{
fields = 0;
for(i = 1; i <= NF; i++) {
if($i=="<event") {
fields = 1;
continue;
}
if(!fields) continue;
split($i, a, "[=\"]");
res[a[1]] = a[3];
}
print res["date"]"|"res["author"]"|"res["action"]"|"res["filename"]"|"res["comment"]
}
Ответ написан
k12th
@k12th
console.log(`You're pulling my leg, right?`);
Безрадостно парсить XML регулярками.
Попробуйте xml-coreutils или XMLStarlet
Ответ написан
mitry
@mitry
Может, воспользоваться xmllint из libxml2 (не знаю, входит ли в msysgit, но, по идее, должно) или виндовой msxsl и написать для этой задачи xslt преобразование?
Ответ написан
sledopit
@sledopit
набросал на скорую руку:
xml2 < 1 | sed 's=/file_events/event[/]*[@]*==;' | awk '/^$/{s++}{printf "%05d %s\n",s,$0}' | sort -k1 -k2rn | sed 's/^[^ ]* //;s/[^=]*=//;s/^$/\&\&\&/' | tr '\n' '|' | sed 's/|&&&|/\n/g'
xml2 из пакета xml2. превращает xml d такой вид:
/file_events/event
/file_events/event/@date=1254728164000
/file_events/event/@author=Bin/.svn/entries
/file_events/event/@filename=ff4d0e6d/ea7152595adb7c97/5adb7c97bf59427e.7e
/file_events/event/@action=D
/file_events/event/@comment
Ответ написан
Ваш ответ на вопрос

Войдите, чтобы написать ответ

Похожие вопросы