How to extract text or html code from HTML documents or web sites?
- Step 1: load HTML data. You can copy and paste your HTML data to the Source Data box, then click Load button. After you have done so, the Source Data box will turn into a graphical HTML viewer and your HTML data will be displayed as a node tree.
- Step 2: select the XML data you want to convert. Use the graphical XML viewer to navigate the XML node tree, find the node you want and click the radio button. If the seleted node is a simple XML element, only the content of that element will be converted. If the selected the node is a complex XML element, the content of all its child nodes will be converted. If you want to convert the whole XML document, you can select the root node. Click the Convert button, the selected XML data will be converted to a plain text file.
- You can repeat Step 2 many times by selecting different nodes of your XML document.
- Choose the target file format, CSV or plain text, by clicking Options.
Options
You can use the following options to make the converted text in the format you desire.
- Field Separator: Field separators will make the converted text easy to read or parse. You can specify how to separate the converted data fields. The default separator is a space. You can change it to any string of characters;
- Trim XML Format White Space: Some XML files contain spaces and line breakers for the purpose of formating so that the document can be displayed in a more readable pattern by a text editor. When converted to text, these formating white spaces may not be wanted. You can use this option to trim the unwanted white spaces.
- Add Linebreakers to Rows: If you specify to trim formating white spaces, all the data fields will be converted to one long line. You can use this option to add a line breaker to the end of each row of the XML elements.