Cheetah
All Classes Namespaces Files Functions Variables Pages
Public Member Functions | Protected Member Functions | List of all members
HTMLPurifier_Lexer_DOMLex Class Reference
Inheritance diagram for HTMLPurifier_Lexer_DOMLex:
HTMLPurifier_Lexer HTMLPurifier_Lexer_PH5P

Public Member Functions

 __construct ()
 
 tokenizeHTML ($html, $config, $context)
 
 muteErrorHandler ($errno, $errstr)
 
 callbackUndoCommentSubst ($matches)
 
 callbackArmorCommentEntities ($matches)
 
- Public Member Functions inherited from HTMLPurifier_Lexer
 parseText ($string, $config)
 
 parseAttr ($string, $config)
 
 parseData ($string, $is_attr, $config)
 
 normalize ($html, $config, $context)
 
 extractBody ($html)
 

Protected Member Functions

 tokenizeDOM ($node, &$tokens, $config)
 
 getTagName ($node)
 
 getData ($node)
 
 createStartNode ($node, &$tokens, $collect, $config)
 
 createEndNode ($node, &$tokens)
 
 transformAttrToAssoc ($node_map)
 
 wrapHTML ($html, $config, $context, $use_div=true)
 

Additional Inherited Members

- Static Public Member Functions inherited from HTMLPurifier_Lexer
static create ($config)
 
- Public Attributes inherited from HTMLPurifier_Lexer
 $tracksLineNumbers = false
 
- Static Protected Member Functions inherited from HTMLPurifier_Lexer
static escapeCDATA ($string)
 
static escapeCommentedCDATA ($string)
 
static removeIEConditional ($string)
 
static CDATACallback ($matches)
 
- Protected Attributes inherited from HTMLPurifier_Lexer
 $_special_entity2str
 

Detailed Description

Parser that uses PHP 5's DOM extension (part of the core).

In PHP 5, the DOM XML extension was revamped into DOM and added to the core. It gives us a forgiving HTML parser, which we use to transform the HTML into a DOM, and then into the tokens. It is blazingly fast (for large documents, it performs twenty times faster than HTMLPurifier_Lexer_DirectLex,and is the default choice for PHP 5.

Note
Any empty elements will have empty tokens associated with them, even if this is prohibited by the spec. This is cannot be fixed until the spec comes into play.
PHP's DOM extension does not actually parse any entities, we use our own function to do that.
Warning
DOM tends to drop whitespace, which may wreak havoc on indenting. If this is a huge problem, due to the fact that HTML is hand edited and you are unable to get a parser cache that caches the the output of HTML Purifier while keeping the original HTML lying around, you may want to run Tidy on the resulting output or use HTMLPurifier_DirectLex

Definition at line 18942 of file HTMLPurifier.standalone.php.

Constructor & Destructor Documentation

◆ __construct()

HTMLPurifier_Lexer_DOMLex::__construct ( )

Reimplemented from HTMLPurifier_Lexer.

Definition at line 18950 of file HTMLPurifier.standalone.php.

Member Function Documentation

◆ callbackArmorCommentEntities()

HTMLPurifier_Lexer_DOMLex::callbackArmorCommentEntities (   $matches)

Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them

Parameters
array$matches
Returns
string

Definition at line 19213 of file HTMLPurifier.standalone.php.

◆ callbackUndoCommentSubst()

HTMLPurifier_Lexer_DOMLex::callbackUndoCommentSubst (   $matches)

Callback function for undoing escaping of stray angled brackets in comments

Parameters
array$matches
Returns
string

Definition at line 19202 of file HTMLPurifier.standalone.php.

◆ createEndNode()

HTMLPurifier_Lexer_DOMLex::createEndNode (   $node,
$tokens 
)
protected
Parameters
DOMNode$node
HTMLPurifier_Token[]$tokens

Definition at line 19160 of file HTMLPurifier.standalone.php.

◆ createStartNode()

HTMLPurifier_Lexer_DOMLex::createStartNode (   $node,
$tokens,
  $collect,
  $config 
)
protected
Parameters
DOMNode$nodeDOMNode to be tokenized.
HTMLPurifier_Token[]$tokensArray-list of already tokenized tokens.
bool$collectSays whether or start and close are collected, set to false at first recursion because it's the implicit DIV tag you're dealing with.
Returns
bool if the token needs an endtoken

Definition at line 19098 of file HTMLPurifier.standalone.php.

◆ getData()

HTMLPurifier_Lexer_DOMLex::getData (   $node)
protected

Portably retrieve the data of a node; deals with older versions of libxml like 2.7.6

Parameters
DOMNode$node

Definition at line 19076 of file HTMLPurifier.standalone.php.

◆ getTagName()

HTMLPurifier_Lexer_DOMLex::getTagName (   $node)
protected

Portably retrieve the tag name of a node; deals with older versions of libxml like 2.7.6

Parameters
DOMNode$node

Definition at line 19059 of file HTMLPurifier.standalone.php.

◆ muteErrorHandler()

HTMLPurifier_Lexer_DOMLex::muteErrorHandler (   $errno,
  $errstr 
)

An error handler that mutes all errors

Parameters
int$errno
string$errstr

Definition at line 19192 of file HTMLPurifier.standalone.php.

◆ tokenizeDOM()

HTMLPurifier_Lexer_DOMLex::tokenizeDOM (   $node,
$tokens,
  $config 
)
protected

Iterative function that tokenizes a node, putting it into an accumulator. To iterate is human, to recurse divine - L. Peter Deutsch

Parameters
DOMNode$nodeDOMNode to be tokenized.
HTMLPurifier_Token[]$tokensArray-list of already tokenized tokens.
Returns
HTMLPurifier_Token of node appended to previously passed tokens.

Definition at line 19024 of file HTMLPurifier.standalone.php.

◆ tokenizeHTML()

HTMLPurifier_Lexer_DOMLex::tokenizeHTML (   $html,
  $config,
  $context 
)
Parameters
string$html
HTMLPurifier_Config$config
HTMLPurifier_Context$context
Returns
HTMLPurifier_Token[]

Reimplemented from HTMLPurifier_Lexer.

Reimplemented in HTMLPurifier_Lexer_PH5P.

Definition at line 18963 of file HTMLPurifier.standalone.php.

◆ transformAttrToAssoc()

HTMLPurifier_Lexer_DOMLex::transformAttrToAssoc (   $node_map)
protected

Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.

Parameters
DOMNamedNodeMap$node_mapDOMNamedNodeMap of DOMAttr objects.
Returns
array Associative array of attributes.

Definition at line 19172 of file HTMLPurifier.standalone.php.

◆ wrapHTML()

HTMLPurifier_Lexer_DOMLex::wrapHTML (   $html,
  $config,
  $context,
  $use_div = true 
)
protected

Wraps an HTML fragment in the necessary HTML

Parameters
string$html
HTMLPurifier_Config$config
HTMLPurifier_Context$context
Returns
string

Definition at line 19225 of file HTMLPurifier.standalone.php.


The documentation for this class was generated from the following file: