p( 7VdZddlZddlZddlmZdgZejdZejdZejdZ ejdZ ejd Z ejd Z ejd Z ejd Zejd ZejdZejdejZejdejZejdejZejd ZejdZGddejZdS)zA parser for HTML and XHTML.N)unescape HTMLParserz[&<]z &[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z <[a-zA-Z]z z--!?>z-?>z0([a-zA-Z][^\t\n\r\f />]*)(?:[\t\n\r\f ]|/(?!>))*a{ ( (?<=['"\t\n\r\f /])[^\t\n\r\f />][^\t\n\r\f /=>]* # attribute name ) ([\t\n\r\f ]*=[\t\n\r\f ]* # value indicator ('[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\t\n\r\f ]* # bare value ) )? (?:[\t\n\r\f ]|/(?!>))* # possibly followed by a space a [a-zA-Z][^\t\n\r\f />]* # tag name [\t\n\r\f /]* # optional whitespace before attribute name (?:(?<=['"\t\n\r\f /])[^\t\n\r\f />][^\t\n\r\f /=>]* # attribute name (?:[\t\n\r\f ]*=[\t\n\r\f ]* # value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\t\n\r\f ]* # bare value ) )? [\t\n\r\f /]* # possibly followed by a space )* >? aF <[a-zA-Z][^\t\n\r\f />\x00]* # tag name (?:[\s/]* # optional whitespace before attribute name (?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name (?:\s*=+\s* # value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\s]* # bare value ) \s* # possibly followed by a space )?(?:\s|/(?!>))* )* )? \s* # trailing whitespace z#ceZdZdZdZdZddddZdZd Zd Z d Z d Z dd dZ dZ d$dZdZdZd$dZd%dZdZdZdZdZdZdZdZdZdZdZd Zd!Zd"Zd#Z d S)&raEFind tags and other markup and call handler functions. Usage: p = HTMLParser() p.feed(data) ... p.close() Start tags are handled by calling self.handle_starttag() or self.handle_startendtag(); end tags by self.handle_endtag(). The data between tags is passed from the parser to the derived class by calling self.handle_data() with the data as argument (the data may be split up in arbitrary chunks). If convert_charrefs is True the character references are converted automatically to the corresponding Unicode character (and self.handle_data() is no longer split in chunks), otherwise they are passed by calling self.handle_entityref() or self.handle_charref() with the string containing respectively the named or numeric reference as the argument. )scriptstylexmpiframenoembednoframes)textareatitleTF)convert_charrefs scriptingcJ||_||_|dS)azInitialize and reset this instance. If convert_charrefs is true (the default), all character references are automatically converted to the corresponding Unicode characters. If *scripting* is false (the default), the content of the ``noscript`` element is parsed normally; if it's true, it's returned as is without being parsed. N)rrreset)selfrrs 2/opt/alt/python311/lib64/python3.11/html/parser.py__init__zHTMLParser.__init__vs$!1" cd|_d|_t|_d|_d|_d|_tj |dS)z1Reset this instance. Loses all unprocessed data.z???NT) rawdatalasttaginteresting_normal interesting cdata_elem_support_cdata _escapable _markupbase ParserBaserrs rrzHTMLParser.resetsK  -"$$T*****rcN|j|z|_|ddS)zFeed data to the parser. Call this as often as you want, with as little or as much text as you want (may include '\n'). rN)rgoaheadrdatas rfeedzHTMLParser.feeds% |d*  Qrc0|ddS)zHandle any buffered data.N)r$r"s rclosezHTMLParser.closes QrNc|jS)z)Return full source of start tag: '<...>'.)_HTMLParser__starttag_textr"s rget_starttag_textzHTMLParser.get_starttag_texts ##r escapablec||_||_|jdkrtjd|_dS|rB|js;tjd|jztjtjz|_dStjd|jztjtjz|_dS)N plaintextz\Zz&|])z])) lowerrrrecompilerr IGNORECASEASCII)relemr/s rset_cdata_modezHTMLParser.set_cdata_modes**,,# ?k ) )!z%00D     Bt4 B!z*Dt*V*,-*@ B BD    "z*BT_*T*,-*@ B BD   rc:t|_d|_d|_dS)NT)rrrrr"s rclear_cdata_modezHTMLParser.clear_cdata_modes-rc||_dS)aEnable or disable support of the CDATA sections. If enabled, "<[CDATA[" starts a CDATA section which ends with "]]>". If disabled, "<[CDATA[" starts a bogus comments which ends with ">". This method is not called by default. Its purpose is to be called in custom handle_starttag() and handle_endtag() methods, with value that depends on the adjusted current node. See https://html.spec.whatwg.org/multipage/parsing.html#markup-declaration-open-state for details. N)r)rflags r_set_support_cdatazHTMLParser._set_support_cdatas#rc |j}d}t|}||krS|jr}|jsv|d|}|dkrY|dt ||dz }|dkr*tjd ||sn|}n=|j ||}|r| }n |jrn|}||krV|jr2|j r+| t|||n| ||||||}||krn|j}|d|rt"||r||} n|d|r||} n|d|r||} nl|d|r||} nJ|d |r||} n(|d z|ks|r| d|d z} nn#| dkr|snt"||rn|d|r_|d z|kr| dnt0||rnd|||d zdnB|d|rU|}d D]/} || |d zr|t| z}n0|||d z|n|d|r(|jr!|||dzdn|||dzdkr!|||d zdni|d |r!|||d zdn<|d|r!|||d zdntAd|} ||| }n-|d|rtB||}|rq|"d d} |#| |$} |d| d z s| d z } ||| }d||dvr9| |||d z|||d z}nS|d|r5tJ||}|rj|"d } |&| |$} |d| d z s| d z } ||| }tN||}|rX|rU|"||dkr5|$} | |kr|} |||d z}nJ|d z|kr/| d|||d z}n nJd||kS|rr||krl|jr2|j r+| t|||n| ||||||}||d|_dS)Nr<&"z [\t\n\r\f ;]>wOO$A(//;; AAA1uu(3T_3$$Xgacl%;%;<<<<$$WQqS\222q!$$AAvvu +Jz#q!!\ 6%%gq11++A..AAZa(( ))!,,AAZ** **1--AAZa(( a((AAZa((33A66AA!eq[[C[$$S)))AAAq55#))'155H#D!,,Hq5A:: ,,T2222'--gq99? !//! >>>>#FA..H&8&&F&//!<<& !S[[ 0 %&++GAaCEN;;;;#K33 H8K H))'!A#$$-8888 1Q3--//;>>((17777#D!,,H++GAaCDDM::::#D!,,Hwqstt}5555,-FGGGANN1a((D!$$+ 6 gq11  ;;==2.D''--- A%:c1Q3//"Eq!,,Agabbk))((1Q3888 NN1ac22C## 6!33 ;;q>>D))$/// A%:c1Q3//"Eq!,,A"((!445u{{}} ;;!IIKK66 !A NN1a!e44!eq[[$$S)))q!a%00AA5555qw!eez  %1q55$ / /  '!A#,!7!78888  1...q!$$Aqrr{ rc^|j}|||dzdks Jd|||dzdkr||S|||dzdkrM|jrF|d|dz}|d krd S|||d z||d zS|||dzd krF|d |dz}|d krd S|||dz||dzS|||d zdkry|d |d z}|d krd S||dz dkr$|||d z|dz n |||dz||dzS||S)NrFrEz+unexpected call to parse_html_declaration()rHrCrKrIz]]>rrMrJrLrr)zV # #%%a(( ( QqsU^{ * *t/B * UAaC((A1uur   gac1fo . . .q5L QqsU^ ! ! # #{ 2 2LLac**E{{r   WQqSY/ 0 0 07N QqsU^u $ $ S!A#&&A1uurqs|s""!!'!A#qs("34444##GAaCFO444q5L++A.. .rch|j}|d|s Jdt||dz}|s"t||dz}|sdS|r4|}|||dz||S)NrC"unexpected call to parse_comment()rHrM) rrW commentcloserScommentabruptcloserYrTr`ri)rrmreportrrYros rr\zHTMLParser.parse_commentps,!!&!,,RR.RRR,##GQqS11 &,,Wac::E r  1 A   !Q 0 0 0yy{{rr)c|j}|||dzdvs Jd|d|dz}|dkrdS|r |||dz||dzS)NrF)rErBryrrMr))rrPr`)rrmr|rposs rrvzHTMLParser.parse_bogus_comments,q1u~---1B---ll3!$$ "992  2   !C 0 1 1 1Qwrc|j}|||dzdks Jdt||dz}|sdS|}|||dz||}|S)NrFrDzunexpected call to parse_pi()rM)rpicloserSrTrdrirrmrrYros rr]zHTMLParser.parse_pis,q1u~%%%'F%%%w!,, 2 KKMM wqsAv''' IIKKrcd|_||}|dkr|S|j}||||_g}t||dz}|s Jd|}|dx|_}||krt||}|sn|ddd\} } } | sd} nI| dddcxkr| ddks"n| dddcxkr| ddkr nn | dd} | rt| } | | | f|}||k||| } | d vr| ||||S| d r|||nj|||||jvs|jr|d ks|d kr||d n ||jvr||d|S)Nrr)z#unexpected call to parse_starttag()rFrJ'rM")r/>rnoscriptr1Fr.T)r,check_for_whole_start_tagrtagfind_tolerantrYrirgr2rattrfind_tolerantrappendstriprUrahandle_startendtaghandle_starttagCDATA_CONTENT_ELEMENTSrr8RCDATA_CONTENT_ELEMENTS) rrmendposrattrsrYrqtagmattrnamerest attrvalueris rrZzHTMLParser.parse_starttags $//22 A::M,&qx0 &&w!44;;;;;u IIKK"[[^^11333 s&jj!''33A ()1a(8(8 %HdI , 2A2$8888)BCC.88882A2#777723377777%adO  0$Y// LL(..**I6 7 7 7A&jjah%%'' k ! !   WQvX. / / /M <<   9  # #C / / / /  e , , ,t2223$':$5$5{""##C5#9999444##C4#888 rc|j}t||dz}|sJ|}||dz dkrdS|S)Nr)rrM)r locatetagendrYrirs rrz$HTMLParser.check_for_whole_start_tagsU,""7AaC00 u IIKK 1Q3<3  2rc|j}|||dzdks Jd|d|dzdkrdSt||s.||dz|dzdkr|dzS||St ||dz}|sJ|}||dz dkrdSt||dz}|sJ|d }| || |S) NrFrBzunexpected call to parse_endtagrrrMrJr)) rrPr_rYrvrrirrgr2 handle_endtagr:)rrmrrYrors rr[zHTMLParser.parse_endtagsU,q1u~%%%'H%%% <<QqS ! !A % %2++ 3qs1Q3w3&&s //222""7AaC00 u IIKK 1Q3<3  2!&&w!44 ukk!nn""$$ 3 rc\|||||dSN)rrrrrs rrzHTMLParser.handle_startendtags2 S%((( 3rcdSrrs rrzHTMLParser.handle_starttag rcdSrr)rrs rrzHTMLParser.handle_endtagrrcdSrrrrss rrhzHTMLParser.handle_charrefrrcdSrrrs rrkzHTMLParser.handle_entityrefrrcdSrrr%s rrUzHTMLParser.handle_datarrcdSrrr%s rr`zHTMLParser.handle_comment rrcdSrr)rdecls rrczHTMLParser.handle_declrrcdSrrr%s rrdzHTMLParser.handle_pirrcdSrrr%s rrbzHTMLParser.unknown_declrr)T)r))!__name__ __module__ __qualname____doc__rrrrr'r*r,r-r8r:r=r$r^r\rvr]rZrr[rrrrhrkrUr`rcrdrbrrrrrZs0Y3+/5     +++O$$$16 B B B B B # # # # G#G#G#X///D           ...d<                                r)rr3r htmlr__all__r4rrlrjrfrXr_rrzr{rVERBOSErrlocatestarttagend_tolerant endendtag endtagfindr!rrrrrs""  . RZ'' RZ % % BJ> ? ? "*@ A Arz+&& RZ % % "*S//rz(## RZ''2:QRRBJ Z  rz Z   (RZ)Z BJsOO RZ> ? ? | | | | | '| | | | | r