a [fa~@sddlmZmZmZddlmZddlmZmZddl Z ddl Z ddl m Z m Z ddlmZddlmZmZmZmZdd lmZdd lmZed d eDZed d eDZedd eDZeeddgBZdZejre edde ddZ!n e eZ!hdZ"e dZ#iZ$Gddde%Z&ddZ'Gddde%Z(Gddde(Z)Gdd d e*Z+Gd!d"d"e%Z,Gd#d$d$e%Z-d%d&Z.dS)')absolute_importdivisionunicode_literals) text_type) http_clienturllibN)BytesIOStringIO) webencodings)EOFspaceCharacters asciiLettersasciiUppercase)_ReparseException)_utilscCsg|]}|dqSasciiencode.0itemrE/usr/lib/python3.9/site-packages/pip/_vendor/html5lib/_inputstream.py rcCsg|]}|dqSrrrrrrrrcCsg|]}|dqSrrrrrrrr> i iii iiiiiiiiii iii ii i iiiiii i i ii iiz[ - -/:-@\[-`{-~]c@sHeZdZdZddZddZddZdd Zd d Zd d Z ddZ dS)BufferedStreamzBuffering for streams that do not have buffering of their own The buffer is implemented as a list of chunks on the assumption that joining many strings will be slow since it is O(n**2) cCs||_g|_ddg|_dS)Nrr)streambufferposition)selfr"rrr__init__:szBufferedStream.__init__cCs<d}|jd|jdD]}|t|7}q||jd7}|SNrr )r#r$len)r%poschunkrrrtell?s zBufferedStream.tellcCsD|}d}t|j||kr6|t|j|8}|d7}q||g|_dSr')r(r#r$)r%r)offsetirrrseekFs  zBufferedStream.seekcCsT|js||S|jdt|jkrF|jdt|jdkrF||S||SdS)Nrr r)r# _readStreamr$r(_readFromBufferr%bytesrrrreadOs  zBufferedStream.readcCstdd|jDS)NcSsg|] }t|qSr)r(rrrrrYrz1BufferedStream._bufferedBytes..)sumr#r%rrr_bufferedBytesXszBufferedStream._bufferedBytescCs<|j|}|j||jdd7<t||jd<|Sr')r"r3r#appendr$r()r%r2datarrrr/[s   zBufferedStream._readStreamcCs|}g}|jd}|jd}|t|jkr|dkr|j|}|t||kr`|}|||g|_n"t||}|t|g|_|d7}|||||||8}d}q|r|||d|S)Nrr r)r$r(r#r7r/join)r%r2ZremainingBytesrvZ bufferIndexZ bufferOffsetZ bufferedDataZ bytesToReadrrrr0bs$    zBufferedStream._readFromBufferN) __name__ __module__ __qualname____doc__r&r+r.r3r6r/r0rrrrr!3s  r!cKst|tjs(t|tjjr.t|jtjr.d}n&t|drJt|dt }n t|t }|rdd|D}|rvt d|t |fi|St |fi|SdS)NFr3rcSsg|]}|dr|qS) _encoding)endswith)rxrrrrrz#HTMLInputStream..z3Cannot set an encoding with a unicode input, set %r) isinstancerZ HTTPResponserZresponseZaddbasefphasattrr3r TypeErrorHTMLUnicodeInputStreamHTMLBinaryInputStream)sourcekwargsZ isUnicode encodingsrrrHTMLInputStream}s      rKc@speZdZdZdZddZddZddZd d Zd d Z d dZ dddZ ddZ ddZ dddZddZdS)rFProvides a unicode stream of characters to the HTMLTokenizer. This class takes care of character encoding and removing or replacing incorrect byte-sequences and also provides column and line tracking. i(cCsZtjsd|_ntddkr$|j|_n|j|_dg|_tddf|_| ||_ | dS)Initialises the HTMLInputStream. HTMLInputStream(source, [encoding]) -> Normalized stream from source for use by html5lib. source can be either a file-object, local filename or a string. The optional encoding parameter must be a string that indicates the encoding. If specified, that encoding will be used, regardless of any BOM or later declaration (such as in a meta element) Nu􏿿r rutf-8certain) rsupports_lone_surrogatesreportCharacterErrorsr(characterErrorsUCS4characterErrorsUCS2ZnewLineslookupEncoding charEncoding openStream dataStreamreset)r%rHrrrr&s   zHTMLUnicodeInputStream.__init__cCs.d|_d|_d|_g|_d|_d|_d|_dS)Nr)r* chunkSize chunkOffseterrors prevNumLines prevNumCols_bufferedCharacterr5rrrrXszHTMLUnicodeInputStream.resetcCst|dr|}nt|}|SzvProduces a file object from source. source can be either a file object, local filename or a string. r3)rDr r%rHr"rrrrVs z!HTMLUnicodeInputStream.openStreamcCsT|j}|dd|}|j|}|dd|}|dkr@|j|}n ||d}||fS)N rrr )r*countr]rfindr^)r%r,r*ZnLinesZ positionLineZ lastLinePosZpositionColumnrrr _positions   z HTMLUnicodeInputStream._positioncCs||j\}}|d|fS)z:Returns (line, col) of the current position in the stream.r )rer[)r%linecolrrrr$szHTMLUnicodeInputStream.positioncCs6|j|jkr|stS|j}|j|}|d|_|S)zo Read one character from the stream or queue if available. Return EOF when EOF is reached. r )r[rZ readChunkr r*)r%r[charrrrris   zHTMLUnicodeInputStream.charNcCs|dur|j}||j\|_|_d|_d|_d|_|j|}|j rX|j |}d|_ n|s`dSt |dkrt |d}|dksd|krdkrnn|d|_ |dd}|j r| || d d }| d d }||_t ||_d S) NrYrFr r iz rb T)_defaultChunkSizererZr]r^r*r[rWr3r_r(ordrQreplace)r%rZr8Zlastvrrrrhs0           z HTMLUnicodeInputStream.readChunkcCs(ttt|D]}|jdqdS)Ninvalid-codepoint)ranger(invalid_unicode_refindallr\r7)r%r8_rrrrRsz*HTMLUnicodeInputStream.characterErrorsUCS4cCsd}t|D]}|rqt|}|}t|||drrt|||d}|tvrl|j dd}q|dkr|dkr|t |dkr|j dqd}|j dqdS)NFrpTrkir ) rrfinditerrngroupstartrZisSurrogatePairZsurrogatePairToCodepointnon_bmp_invalid_codepointsr\r7r()r%r8skipmatchZ codepointr)Zchar_valrrrrS#s"  z*HTMLUnicodeInputStream.characterErrorsUCS2Fc Cszt||f}WnLty\ddd|D}|s>d|}td|}t||f<Yn0g}||j|j}|dur|j|jkrqn0| }||jkr| |j|j|||_q| |j|jd| sbqqbd|}|S)z Returns a string of characters from the stream up to but not including any character in 'characters' or EOF. 'characters' must be a container that supports the 'in' method and iteration over its characters. rYcSsg|]}dt|qS)z\x%02x)rn)rcrrrrHrz5HTMLUnicodeInputStream.charsUntil..z^%sz[%s]+N) charsUntilRegExKeyErrorr9recompiler{r*r[rZendr7rh) r%Z charactersZoppositecharsZregexr:mrrrrr charsUntil:s,     z!HTMLUnicodeInputStream.charsUntilcCs@|tur<|jdkr.||j|_|jd7_n|jd8_dSr')r r[r*rZ)r%rirrrungetis   zHTMLUnicodeInputStream.unget)N)F)r;r<r=r>rmr&rXrVrer$rirhrRrSrrrrrrrFs   & /rFc@sLeZdZdZdddZddZd d Zdd d Zd dZddZ ddZ dS)rGrLN windows-1252TcCs\|||_t||jd|_d|_||_||_||_||_ ||_ | ||_ | dS)rMidN)rV rawStreamrFr& numBytesMetanumBytesChardetoverride_encodingtransport_encodingsame_origin_parent_encodinglikely_encodingdefault_encodingdetermineEncodingrUrX)r%rHrrrrrZ useChardetrrrr&s  zHTMLBinaryInputStream.__init__cCs&|jdj|jd|_t|dS)Nrro)rUZ codec_info streamreaderrrWrFrXr5rrrrXszHTMLBinaryInputStream.resetcCsJt|dr|}nt|}z||WntyDt|}Yn0|Sr`)rDrr.r+ Exceptionr!rarrrrVs  z HTMLBinaryInputStream.openStreamcCs|df}|ddur|St|jdf}|ddur:|St|jdf}|ddurX|S|df}|ddurt|St|jdf}|ddur|djds|St|jdf}|ddur|S|r^zddl m }Wnt yYnv0g}|}|j s*|j |j}|sq*||||q|t|jd}|j d|dur^|dfSt|jdf}|ddur~|StddfS)NrOrZ tentativezutf-16)UniversalDetectorencodingr) detectBOMrTrrdetectEncodingMetarname startswithrZ%pip._vendor.chardet.universaldetectorr ImportErrorZdonerr3rr7Zfeedcloseresultr.r)r%ZchardetrUrZbuffersZdetectorr#rrrrrsP            z'HTMLBinaryInputStream.determineEncodingcCst|}|durdS|jdvr(td}nT||jdkrH|jddf|_n4|jd|df|_|td|jd|fdS)Nutf-16beutf-16lerNrrOzEncoding changed from %s to %s)rTrrUrr.rXr)r%Z newEncodingrrrchangeEncodings   z$HTMLBinaryInputStream.changeEncodingc Cstjdtjdtjdtjdtjdi}|jd}||dd}d}|sp||}d}|sp||dd }d }|r|j |t |S|j d dSdS) zAttempts to detect at BOM at the start of the stream. If an encoding can be determined from the BOM return the name of the encoding otherwise return NonerNrrzutf-32lezutf-32beNrur) codecsBOM_UTF8 BOM_UTF16_LE BOM_UTF16_BE BOM_UTF32_LE BOM_UTF32_BErr3getr.rT)r%ZbomDictstringrr.rrrrs$      zHTMLBinaryInputStream.detectBOMcCsH|j|j}t|}|jd|}|durD|jdvrDtd}|S)z9Report the encoding declared by the meta element rNrrN)rr3rEncodingParserr. getEncodingrrT)r%r#parserrrrrr3s z(HTMLBinaryInputStream.detectEncodingMeta)NNNNrT)T) r;r<r=r>r&rXrVrrrrrrrrrGzs * >"rGc@seZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ e e e Z ddZe eZefddZddZddZddZdS) EncodingByteszString-like object with an associated position and various extra methods If the position is ever greater than the string length then an exception is raisedcCst||SN)r2__new__lowerr%valuerrrrFszEncodingBytes.__new__cCs d|_dS)Nr)rerrrrr&JszEncodingBytes.__init__cCs|Srrr5rrr__iter__NszEncodingBytes.__iter__cCs>|jd}|_|t|kr"tn |dkr.t|||dS)Nr rrer( StopIterationrEr%prrr__next__Qs  zEncodingBytes.__next__cCs|Sr)rr5rrrnextYszEncodingBytes.nextcCsB|j}|t|krtn |dkr$t|d|_}|||dSr'rrrrrprevious]s zEncodingBytes.previouscCs|jt|krt||_dSrrer(r)r%r$rrr setPositionfszEncodingBytes.setPositioncCs*|jt|krt|jdkr"|jSdSdS)Nrrr5rrr getPositionks  zEncodingBytes.getPositioncCs||j|jdSNr )r$r5rrrgetCurrentByteuszEncodingBytes.getCurrentBytecCsH|j}|t|kr>|||d}||vr4||_|S|d7}q||_dS)zSkip past a list of charactersr Nr$r(rer%rrr|rrrrzzs  zEncodingBytes.skipcCsH|j}|t|kr>|||d}||vr4||_|S|d7}q||_dSrrrrrr skipUntils  zEncodingBytes.skipUntilcCs(|||j}|r$|jt|7_|S)zLook for a sequence of bytes at the start of a string. If the bytes are found return True and advance the position to the byte after the match. Otherwise return False and leave the position alone)rr$r()r%r2r:rrr matchBytesszEncodingBytes.matchBytescCs<z |||jt|d|_Wnty6tYn0dS)zLook for the next sequence of bytes matching a given sequence. If a match is found advance the position to the last byte of the matchr T)indexr$r(re ValueErrorrr1rrrjumpTos    zEncodingBytes.jumpToN)r;r<r=r>rr&rrrrrrpropertyr$r currentBytespaceCharactersBytesrzrrrrrrrrBs      rc@sXeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ dS)rz?Mini parser for detecting character encoding from meta elementscCst||_d|_dS)z3string - the data to work on for encoding detectionN)rr8rr%r8rrrr&s zEncodingParser.__init__c Csd|jvrdSd|jfd|jfd|jfd|jfd|jfd|jff}|jD]}d}z|jdWntyxYqYn0|D]B\}}|j|r~z|}WqWq~tyd}YqYq~0q~|sHqqH|j S) Nsr8rr5rrrrszEncodingParser.handleCommentcCs|jjtvrdSd}d}|}|dur,dS|ddkr\|ddk}|r|dur||_dSq|ddkr|d}t|}|dur||_dSq|ddkrtt|d}|}|durt|}|dur|r||_dS|}qdS) NTFrs http-equivr s content-typecharsetscontent) r8rr getAttributerrTContentAttrParserrparse)r%Z hasPragmaZpendingEncodingattrZtentativeEncodingcodecZ contentParserrrrrs8      zEncodingParser.handleMetacCs |dS)NF)handlePossibleTagr5rrrrsz%EncodingParser.handlePossibleStartTagcCst|j|dS)NT)rr8rr5rrrrs z#EncodingParser.handlePossibleEndTagcCsb|j}|jtvr(|r$||dS|t}|dkrD|n|}|dur^|}qLdS)NTr)r8rasciiLettersBytesrrrspacesAngleBracketsr)r%ZendTagr8r|rrrrrs    z EncodingParser.handlePossibleTagcCs |jdS)Nrrr5rrrrszEncodingParser.handleOthercCs|j}|ttdgB}|dvr&dSg}g}|dkr>|r>qnX|tvrR|}qnD|dvrhd|dfS|tvr||n|durdS||t|}q.|dkr| d|dfSt||}|dvr2|}t|}||kr t|d|d|fS|tvr$||q||qnJ|dkrJd|dfS|tvrd||n|durrdS||t|}|t vrd|d|fS|tvr||n|durdS||q|dS) z_Return a name,value pair for the next attribute in the stream, if one is found, or None/)rNN=)rrr)'"r) r8rzr frozensetr9asciiUppercaseBytesr7rrrr)r%r8r|ZattrNameZ attrValueZ quoteCharrrrrs`             zEncodingParser.getAttributeN) r;r<r=r>r&rrrrrrrrrrrrrs$rc@seZdZddZddZdS)rcCs ||_dSr)r8rrrrr&aszContentAttrParser.__init__cCsz|jd|jjd7_|j|jjdkssF      JgIb='