    Вот тут не понял как получить attr пишу в функции

    Пишет ошибку
    OVK2015: Бро, спасибо огромное, уже решил с помошью php simple dom. Получилось почти идеально)
    OVK2015: С картинками порешал :)
    помоги плз с ссылками сделать так-же, а то что-то у меня не получается :)

    Меня больше регулярки тут интересуют :)
    опять же нам известно href и известно что это тег a
    OVK2015: отработал без ошибок :)

    Ломаю голову как внедрить в общий класс, конкретно смущает функция initArray
    внутри неё получается надо будет делать цикл и пушить значения ?
    $file = preg_replace_callback("#<img(.*?)>#si", array('Crawler', "imgChecker"), $bodyfile);

    $file = preg_replace_callback("#<img(.*?)>#si", array($this, "imgChecker"), $bodyfile);

    И так тоже не работает
    $file = preg_replace_callback("#<img(.*?)>#si", "imgChecker", $bodyfile);
    Сделал так:

    public function delimg($bodyfile, $url)
    			$this->missingImg = array();
    			preg_match_all("/<img[^>]+>/i", $bodyfile, $matchimages);
    			foreach ($matchimages[0] as $brokenimage) 
    				preg_match_all("/src=[\"|''](.*?)[\"|'']/", $brokenimage, $matches);
    				foreach ($matches[1] as $scrimage) {
    					if(!preg_match("/http:\/\//", $scrimage))
    								$missingImg[] = $scrimage;
    			function imgChecker($item)
    				preg_match("#src=\"(.*?)\"#si", $item[0], $match);
    				//return file_exists($match[1]) ? item : "";
    				return in_array($match[1], $this->$missingImg) ? "" : $item[0];
      			$file = preg_replace_callback("#<img(.*?)>#si", "imgChecker", $bodyfile);
    			file_put_contents($this->rootPath.$url, $file);

    Fatal error: Cannot redeclare imgChecker() (previously declared in D:\OpenServer\domains\nt\scandir.php:181) in D:\OpenServer\domains\nt\scandir.php on line 181
    Наверное да :)

    Вот так выглядет сейчас вся функция

    public function delimg($bodyfile, $url)
    			$missingImg = array();
    			preg_match_all("/<img[^>]+>/i", $bodyfile, $matchimages);
    			foreach ($matchimages[0] as $brokenimage) 
    				preg_match_all("/src=[\"|''](.*?)[\"|'']/", $brokenimage, $matches);
    				foreach ($matches[1] as $scrimage) {
    					if(!preg_match("/http:\/\//", $scrimage))
    								$missingImg[] = $scrimage;
    			function imgChecker($item, $missingImg)
    				preg_match("#src=\"(.*?)\"#si", $item[0], $match);
    				//return file_exists($match[1]) ? item : "";
    				return in_array($match[1], $missingImg) ? "" : $item[0];
      			$file = preg_replace_callback("#<img(.*?)>#si", "imgChecker", $bodyfile);
    			file_put_contents($this->rootPath.$url, $file);

    bodyfile это текст страницы, url это её адресс
    OVK2015: А как передать массив в функцию ?
    missingImg я этот собираю отдельным кодом
    OVK2015: Сейчас затестим, отпишу :)
    Я не знаю как это получится :)

    Ну например вот листинг файла :

    <html><!-- InstanceBegin template="/Templates/my_home.dwt" codeOutsideHTMLIsLocked="false" -->
    <title>:: www.my-myitkyina.com ::</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    <!-- Fireworks MX Dreamweaver MX target.  Created Sun Oct 17 09:05:10 GMT-0700 (Pacific Standard Time) 2004-->
    <script language="JavaScript">
    function MM_findObj(n, d) { //v4.01
      var p,i,x;  if(!d) d=document; if((p=n.indexOf("?"))>0&&parent.frames.length) {
        d=parent.frames[n.substring(p+1)].document; n=n.substring(0,p);}
      if(!(x=d[n])&&d.all) x=d.all[n]; for (i=0;!x&&i<d.forms.length;i++) x=d.forms[i][n];
      for(i=0;!x&&d.layers&&i<d.layers.length;i++) x=MM_findObj(n,d.layers[i].document);
      if(!x && d.getElementById) x=d.getElementById(n); return x;
    function MM_nbGroup(event, grpName) { //v6.0
    var i,img,nbArr,args=MM_nbGroup.arguments;
      if (event == "init" && args.length > 2) {
        if ((img = MM_findObj(args[2])) != null && !img.MM_init) {
          img.MM_init = true; img.MM_up = args[3]; img.MM_dn = img.src;
          if ((nbArr = document[grpName]) == null) nbArr = document[grpName] = new Array();
          nbArr[nbArr.length] = img;
          for (i=4; i < args.length-1; i+=2) if ((img = MM_findObj(args[i])) != null) {
            if (!img.MM_up) img.MM_up = img.src;
            img.src = img.MM_dn = args[i+1];
            nbArr[nbArr.length] = img;
        } }
      } else if (event == "over") {
        document.MM_nbOver = nbArr = new Array();
        for (i=1; i < args.length-1; i+=3) if ((img = MM_findObj(args[i])) != null) {
          if (!img.MM_up) img.MM_up = img.src;
          img.src = (img.MM_dn && args[i+2]) ? args[i+2] : ((args[i+1])?args[i+1] : img.MM_up);
          nbArr[nbArr.length] = img;
      } else if (event == "out" ) {
        for (i=0; i < document.MM_nbOver.length; i++) { img = document.MM_nbOver[i]; img.src = (img.MM_dn) ? img.MM_dn : img.MM_up; }
      } else if (event == "down") {
        nbArr = document[grpName];
        if (nbArr) for (i=0; i < nbArr.length; i++) { img=nbArr[i]; img.src = img.MM_up; img.MM_dn = 0; }
        document[grpName] = nbArr = new Array();
        for (i=2; i < args.length-1; i+=2) if ((img = MM_findObj(args[i])) != null) {
          if (!img.MM_up) img.MM_up = img.src;
          img.src = img.MM_dn = (args[i+1])? args[i+1] : img.MM_up;
          nbArr[nbArr.length] = img;
      } }
    function MM_preloadImages() { //v3.0
     var d=document; if(d.images){ if(!d.MM_p) d.MM_p=new Array();
       var i,j=d.MM_p.length,a=MM_preloadImages.arguments; for(i=0; i<a.length; i++)
       if (a[i].indexOf("#")!=0){ d.MM_p[j]=new Image; d.MM_p[j++].src=a[i];}}
    <body bgcolor="#CCCCFF" onLoad="MM_preloadImages('/mh_images/home_f2.gif','/mh_images/home_f4.gif','/mh_images/home_f3.gif','/mh_images/myitkyina_f2.gif','/mh_images/myitkyina_f4.gif','/mh_images/myitkyina_f3.gif','/mh_images/memory_f2.gif','/mh_images/memory_f4.gif','/mh_images/memory_f3.gif','/mh_images/mdc_title_f2.gif','/mh_images/mdc_title_f4.gif','/mh_images/mdc_title_f3.gif','/mh_images/friends_f2.gif','/mh_images/friends_f4.gif','/mh_images/friends_f3.gif','/mh_images/family_f2.gif','/mh_images/family_f4.gif','/mh_images/family_f3.gif','/mh_images/photos_f2.gif','/mh_images/photos_f4.gif','/mh_images/photos_f3.gif','/mh_images/journal_f2.gif','/mh_images/journal_f4.gif','/mh_images/journal_f3.gif','/mh_images/links_f2.gif','/mh_images/links_f4.gif','/mh_images/links_f3.gif');">
    <table align="center" border="0" cellpadding="0" cellspacing="0" width="750">
    <!-- fwtable fwsrc="/mh_html/new_mka_template.png" fwbase="/mh_html/new_titlebar_template.dwt.gif" fwstyle="Dreamweaver" fwdocid = "742308039" fwnested="0" -->
       <td><img src="/mh_images/spacer.gif" width="16" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="42" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="22" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="69" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="20" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="58" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="39" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="32" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="45" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="54" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="30" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="48" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="37" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="50" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="31" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="55" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="39" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="38" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="25" height="1" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="1" height="1" border="0" alt=""></td>
       <td colspan="19"><img name="new_titlebar_templatedwt_r1_c1" src="/mh_images/new_titlebar_template.dwt_r1_c1.gif" width="750" height="59" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="1" height="59" border="0" alt=""></td>
       <td rowspan="3"><img name="new_titlebar_templatedwt_r2_c1" src="/mh_images/new_titlebar_template.dwt_r2_c1.gif" width="16" height="18" border="0" alt=""></td>
       <td><a href="/index.htm" onMouseOut="MM_nbGroup('out');" onMouseOver="MM_nbGroup('over','home','/mh_images/home_f2.gif','/mh_images/home_f4.gif',1);" onClick="MM_nbGroup('down','navbar1','home','/mh_images/home_f3.gif',1);"><img name="home" src="/mh_images/home.gif" width="42" height="11" border="0" alt=""></a></td>
       <td rowspan="3"><img name="new_titlebar_templatedwt_r2_c3" src="/mh_images/new_titlebar_template.dwt_r2_c3.gif" width="22" height="18" border="0" alt=""></td>
       <td rowspan="2"><a href="/mh_html/mka.htm" onMouseOut="MM_nbGroup('out');" onMouseOver="MM_nbGroup('over','myitkyina','/mh_images/myitkyina_f2.gif','/mh_images/myitkyina_f4.gif',1);" onClick="MM_nbGroup('down','navbar1','myitkyina','/mh_images/myitkyina_f3.gif',1);"><img name="myitkyina" src="/mh_images/myitkyina.gif" width="69" height="13" border="0" alt=""></a></td>
       <td rowspan="3"><img name="new_titlebar_templatedwt_r2_c5" src="/mh_images/new_titlebar_template.dwt_r2_c5.gif" width="20" height="18" border="0" alt=""></td>
       <td rowspan="2"><a href="/mh_html/memory.htm" onMouseOut="MM_nbGroup('out');" onMouseOver="MM_nbGroup('over','memory','/mh_images/memory_f2.gif','/mh_images/memory_f4.gif',1);" onClick="MM_nbGroup('down','navbar1','memory','/mh_images/memory_f3.gif',1);"><img name="memory" src="/mh_images/memory.gif" width="58" height="13" border="0" alt=""></a></td>
       <td rowspan="3"><img name="new_titlebar_templatedwt_r2_c7" src="/mh_images/new_titlebar_template.dwt_r2_c7.gif" width="39" height="18" border="0" alt=""></td>
       <td><a href="/mh_html/college.htm" onMouseOut="MM_nbGroup('out');" onMouseOver="MM_nbGroup('over','mdc_title','/mh_images/mdc_title_f2.gif','/mh_images/mdc_title_f4.gif',1);" onClick="MM_nbGroup('down','navbar1','mdc_title','/mh_images/mdc_title_f3.gif',1);"><img name="mdc_title" src="/mh_images/mdc_title.gif" width="32" height="11" border="0" alt=""></a></td>
       <td rowspan="3"><img name="new_titlebar_templatedwt_r2_c9" src="/mh_images/new_titlebar_template.dwt_r2_c9.gif" width="45" height="18" border="0" alt=""></td>
       <td><a href="/mh_html/friends.htm" onMouseOut="MM_nbGroup('out');" onMouseOver="MM_nbGroup('over','friends','/mh_images/friends_f2.gif','/mh_images/friends_f4.gif',1);" onClick="MM_nbGroup('down','navbar1','friends','/mh_images/friends_f3.gif',1);"><img name="friends" src="/mh_images/friends.gif" width="54" height="11" border="0" alt=""></a></td>
       <td rowspan="3"><img name="new_titlebar_templatedwt_r2_c11" src="/mh_images/new_titlebar_template.dwt_r2_c11.gif" width="30" height="18" border="0" alt=""></td>
       <td rowspan="2"><a href="/mh_html/family.htm" onMouseOut="MM_nbGroup('out');" onMouseOver="MM_nbGroup('over','family','/mh_images/family_f2.gif','/mh_images/family_f4.gif',1);" onClick="MM_nbGroup('down','navbar1','family','/mh_images/family_f3.gif',1);"><img name="family" src="/mh_images/family.gif" width="48" height="13" border="0" alt=""></a></td>
       <td rowspan="3"><img name="new_titlebar_templatedwt_r2_c13" src="/mh_images/new_titlebar_template.dwt_r2_c13.gif" width="37" height="18" border="0" alt=""></td>
       <td><a href="/mh_html/album.htm" onMouseOut="MM_nbGroup('out');" onMouseOver="MM_nbGroup('over','photos','/mh_images/photos_f2.gif','/mh_images/photos_f4.gif',1);" onClick="MM_nbGroup('down','navbar1','photos','/mh_images/photos_f3.gif',1);"><img name="photos" src="/mh_images/photos.gif" width="50" height="11" border="0" alt=""></a></td>
       <td rowspan="3"><img name="new_titlebar_templatedwt_r2_c15" src="/mh_images/new_titlebar_template.dwt_r2_c15.gif" width="31" height="18" border="0" alt=""></td>
       <td><a href="/mh_html/journal_home.htm" onMouseOut="MM_nbGroup('out');" onMouseOver="MM_nbGroup('over','journal','/mh_images/journal_f2.gif','/mh_images/journal_f4.gif',1);" onClick="MM_nbGroup('down','navbar1','journal','/mh_images/journal_f3.gif',1);"><img name="journal" src="/mh_images/journal.gif" width="55" height="11" border="0" alt=""></a></td>
       <td rowspan="3"><img name="new_titlebar_templatedwt_r2_c17" src="/mh_images/new_titlebar_template.dwt_r2_c17.gif" width="39" height="18" border="0" alt=""></td>
       <td><a href="/mh_html/links.htm" onMouseOut="MM_nbGroup('out');" onMouseOver="MM_nbGroup('over','links','/mh_images/links_f2.gif','/mh_images/links_f4.gif',1);" onClick="MM_nbGroup('down','navbar1','links','/mh_images/links_f3.gif',1);"><img name="links" src="/mh_images/links.gif" width="38" height="11" border="0" alt=""></a></td>
       <td rowspan="3"><img name="new_titlebar_templatedwt_r2_c19" src="/mh_images/new_titlebar_template.dwt_r2_c19.gif" width="25" height="18" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="1" height="11" border="0" alt=""></td>
       <td rowspan="2"><img name="new_titlebar_templatedwt_r3_c2" src="/mh_images/new_titlebar_template.dwt_r3_c2.gif" width="42" height="7" border="0" alt=""></td>
       <td rowspan="2"><img name="new_titlebar_templatedwt_r3_c8" src="/mh_images/new_titlebar_template.dwt_r3_c8.gif" width="32" height="7" border="0" alt=""></td>
       <td rowspan="2"><img name="new_titlebar_templatedwt_r3_c10" src="/mh_images/new_titlebar_template.dwt_r3_c10.gif" width="54" height="7" border="0" alt=""></td>
       <td rowspan="2"><img name="new_titlebar_templatedwt_r3_c14" src="/mh_images/new_titlebar_template.dwt_r3_c14.gif" width="50" height="7" border="0" alt=""></td>
       <td rowspan="2"><img name="new_titlebar_templatedwt_r3_c16" src="/mh_images/new_titlebar_template.dwt_r3_c16.gif" width="55" height="7" border="0" alt=""></td>
       <td rowspan="2"><img name="new_titlebar_templatedwt_r3_c18" src="/mh_images/new_titlebar_template.dwt_r3_c18.gif" width="38" height="7" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="1" height="2" border="0" alt=""></td>
       <td><img name="new_titlebar_templatedwt_r4_c4" src="/mh_images/new_titlebar_template.dwt_r4_c4.gif" width="69" height="5" border="0" alt=""></td>
       <td><img name="new_titlebar_templatedwt_r4_c6" src="/mh_images/new_titlebar_template.dwt_r4_c6.gif" width="58" height="5" border="0" alt=""></td>
       <td><img name="new_titlebar_templatedwt_r4_c12" src="/mh_images/new_titlebar_template.dwt_r4_c12.gif" width="48" height="5" border="0" alt=""></td>
       <td><img src="/mh_images/spacer.gif" width="1" height="5" border="0" alt=""></td>
    <table width="748" border="0" align="center" cellpadding="0" bordercolor="#CCCCFF" bgcolor="#FFFFFF">
        <td><!-- InstanceBeginEditable name="mka_space_1204" -->
    <table width="743" border="0" cellpadding="0" bgcolor="#CCCCCC">
              <td width="735"> <a href="/mh_html/J1.htm">&lt;<font face="Verdana, Arial, Helvetica, sans-serif"> 
                Previous</font></a><font face="Verdana, Arial, Helvetica, sans-serif"> 
                | <a href="/mh_html/journal_home.htm">Journal Home</a> | <a href="/mh_html/J2.htm">Next</a></font> 
                <a href="/mh_html/J2.htm">&gt;</a></td>
          <p align="left"><strong><font color="#0000FF" face="Verdana, Arial, Helvetica, sans-serif">Life, 
            Education, and Your Future (December 25, 2002) </font></strong> </p>
          <p><font face="Verdana, Arial, Helvetica, sans-serif">What is the ultimate 
            goal of a human life? Where do we come from, and where do we go? Is it 
            good enough to believe in God or to be religious? I don&#8217;t have answers 
            yet. If you are rich, will you be always happy? I don&#8217;t think so. 
            How about if you are poor, can you be always happy? I don&#8217;t think 
            either. <br>
            But most importantly, I think it is very important to enjoy your life 
            with love and forgiveness.</font></p>
          <p><font face="Verdana, Arial, Helvetica, sans-serif">In today&#8217;s society, 
            education is very important. After enjoying student life for more than 
            20 years, I realized that only education couldn't make you ready for the 
            real world. Becoming a college or university student is great but what 
            you can benefit from school depends on how you learn at school. It is 
            not just studying hard and doing well in exams. Actual learning is beyond 
            that. In fact, life is already continuous process of learning. </font></p>
          <p><font face="Verdana, Arial, Helvetica, sans-serif">You have to go to 
            school to learn certain things, and you can learn others valuable things 
            outside the school. Indeed, you can learn something anywhere. Education 
            is not just for making money. The better purpose should be to help our 
            human- kind to have better, safer, and more peaceful lives.</font></p>
          <p><font face="Verdana, Arial, Helvetica, sans-serif">I have heard many 
            students saying which majors are better than others or this major can 
            make more money than that major or Computer Science is hotter than Engineering. 
            I think all subjects are interesting. The only thing is you have to be 
            interested in what you are studying. You have to understand your passion. 
            Sometimes, it is very hard to find out what you are really interested 
            in. Even career assessment test cannot give you 100% correct answers. 
            A better idea will be to take all general education courses at the beginning 
            of the first year of college, and find out what courses are more interesting 
            for you. Again, just studying hard is not enough. Getting more friends, 
            knowing your teachers, and having work experience at school or outside 
            of the school are very important for you future career plan. </font></p>
          <p><font face="Verdana, Arial, Helvetica, sans-serif">Even though, you do 
            everything right as planned and try your best, things can go wrong in 
            you future. It is natural. Things will not always happen exactly as you 
            plan. Don&#8217;t be upset. Don&#8217;t look at the past to blame yourself 
            or others, and foresee the future. You got to do your best at your present 
            time. You have to enjoy and stay calm at your present time. You have to 
            be ready to face anything you are going to face whether it is good or 
            bad. These are more important than your studying. You need to be patient 
            and stay calm in order to achieve something. When the time comes, you 
            will be fine. </font></p>
          <p><font face="Verdana, Arial, Helvetica, sans-serif">Just remember, <strong>life 
            is beautiful</strong>. </font> <!-- InstanceEndEditable --></td>
    <!-- InstanceEnd --></html>

    Нам из него надо удлить изображения

    1. /mh_images/new_titlebar_template.dwt_r2_c5.gif
    2. /mh_images/spacer.gif
    Прости но я действительно не понимаю как это будет работать :(
    Мануал читал, но как то осознание не пришло :)
    не совсем )
    Мне всегда известен src изображения. Вот по нему надо удалять.

    Конструкция может быть такая:
    <img name="new_titlebar_templatedwt_r3_c14" src="/mh_images/new_titlebar_template.dwt_r3_c14.gif" width="50" height="7" border="0" alt="">

    а может и такая:

    <img src="/mh_images/spacer.gif" width="1" height="1" border="0" alt="">

    Из всего этого известно только что это тег img и его src
    А можно примером :)
    Я не соображу как это сделать
    Тоже загадка :)
    Файлы всегда разные.
    Нужно удалить только тег img если изображение находящееся в нём физически отсутствует на сервере
    В середине :(

    Да, ты прав, эта строчка всё портит, но без неё вообще не работает ничего(