TextDocument (DL-Learner Javadoc)

java.lang.Object
- java.util.AbstractCollection<E>
- - java.util.AbstractList<E>
  - - java.util.AbstractSequentialList<E>
    - - java.util.LinkedList<Token>
      - org.dllearner.algorithms.isle.index.TextDocument

All Implemented Interfaces:

Serializable, Cloneable, Iterable<Token>, Collection<Token>, Deque<Token>, List<Token>, Queue<Token>, Document
```
public class TextDocument
extends LinkedList<Token>
implements Document
```
A simple text document without further formatting or markup.

Author:

Daniel Fleischhacker

See Also:

Serialized Form

Constructor Summary

Constructors
Constructor and Description

TextDocument()

Constructors
Constructor and Description
`TextDocument()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`String`	`getContent()` Returns the cleaned content of this document represented as a string.
`String`	`getContentStartingAtToken(Token start, SurfaceFormLevel l)` Returns a string containing all tokens starting at the token `start` until the end of the list.
`String`	`getPOSTaggedContent()` Returns the uncleaned content with POS tags in form of word1/pos1 word2/pos2 ...
`String`	`getRawContent()` Returns the uncleaned content, i.e., as originally retrieved, of this document represented as string.
`List<Token>`	`getTokensStartingAtToken(Token start, boolean ignorePunctuation)` Returns a list containing all successive tokens from this document starting at the given start token.
`List<Token>`	`getTokensStartingAtToken(Token start, int numberOfTokens, boolean ignorePunctuation)` Returns a list containing `numberOfTokens` successive tokens from this document starting at the given start token.
`static void`	`main(String[] args)`

Methods inherited from class java.util.LinkedList
add, add, addAll, addAll, addFirst, addLast, clear, clone, contains, descendingIterator, element, get, getFirst, getLast, indexOf, lastIndexOf, listIterator, offer, offerFirst, offerLast, peek, peekFirst, peekLast, poll, pollFirst, pollLast, pop, push, remove, remove, remove, removeFirst, removeFirstOccurrence, removeLast, removeLastOccurrence, set, size, spliterator, toArray, toArray

Methods inherited from class java.util.AbstractSequentialList
iterator

Methods inherited from class java.util.AbstractList
equals, hashCode, listIterator, subList

Methods inherited from class java.util.AbstractCollection
containsAll, isEmpty, removeAll, retainAll, toString

Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface java.util.List
containsAll, equals, hashCode, isEmpty, iterator, listIterator, removeAll, replaceAll, retainAll, sort, subList

Methods inherited from interface java.util.Deque
iterator

Methods inherited from interface java.util.Collection
parallelStream, removeIf, stream

Methods inherited from interface java.lang.Iterable
forEach

- Constructor Detail
  - TextDocument
```
public TextDocument()
```
- Method Detail
  - main
```
public static void main(String[] args)
```
  - getContent
```
public String getContent()
```
    Description copied from interface: Document
    
    Returns the cleaned content of this document represented as a string. This returns the cleaned content, thus markup and other structure is removed. The raw content can be retrieved using Document.getRawContent(). Methods for retrieving more specialized content formats might be implemented by the actual implementations.
    
    Specified by:
    
    getContent in interface Document
    
    Returns:
    
    this document's text content
  - getRawContent
```
public String getRawContent()
```
    Description copied from interface: Document
    
    Returns the uncleaned content, i.e., as originally retrieved, of this document represented as string.
    
    Specified by:
    
    getRawContent in interface Document
    
    Returns:
    
    uncleaned content of this document
  - getPOSTaggedContent
```
public String getPOSTaggedContent()
```
    Description copied from interface: Document
    
    Returns the uncleaned content with POS tags in form of word1/pos1 word2/pos2 ... as string.
    
    Specified by:
    
    getPOSTaggedContent in interface Document
    
    Returns:
    
    uncleaned content with POS tags
  - getContentStartingAtToken
```
public String getContentStartingAtToken(Token start,
                                        SurfaceFormLevel l)
```
    Returns a string containing all tokens starting at the token start until the end of the list. The surface forms according to level are used to build the string.
    
    Parameters:
    
    start - token to start building the string at, i.e., the first token in the returned string
    
    l - level of surface forms to use
    
    Returns:
    
    built string
  - getTokensStartingAtToken
```
public List<Token> getTokensStartingAtToken(Token start,
                                            int numberOfTokens,
                                            boolean ignorePunctuation)
```
    Returns a list containing numberOfTokens successive tokens from this document starting at the given start token. If ignorePunctuation is set, tokens which represent punctuation are added to the result but not counted for the number of tokens.
    
    Parameters:
    
    start - token to start collecting tokens from the document
    
    numberOfTokens - number of tokens to collect from the document
    
    ignorePunctuation - if true, punctuation are not counted towards the number of tokens to return
    
    Returns:
    
    list containing the given number of relevant tokens, depending in the value of ignorePunctuation, the list might contain additional non-relevant (punctuation) tokens
  - getTokensStartingAtToken
```
public List<Token> getTokensStartingAtToken(Token start,
                                            boolean ignorePunctuation)
```
    Returns a list containing all successive tokens from this document starting at the given start token. If ignorePunctuation is set, tokens which represent punctuation are added to the result but not counted for the number of tokens.
    
    Parameters:
    
    start - token to start collecting tokens from the document
    
    ignorePunctuation - if true, punctuation are not counted towards the number of tokens to return
    
    Returns:
    
    list containing all relevant tokens, depending in the value of ignorePunctuation, the list might contain additional non-relevant (punctuation) tokens

Class TextDocument

Constructor Summary

Method Summary

Methods inherited from class java.util.LinkedList

Methods inherited from class java.util.AbstractSequentialList

Methods inherited from class java.util.AbstractList

Methods inherited from class java.util.AbstractCollection

Methods inherited from class java.lang.Object

Methods inherited from interface java.util.List

Methods inherited from interface java.util.Deque

Methods inherited from interface java.util.Collection

Methods inherited from interface java.lang.Iterable

Constructor Detail

TextDocument

Method Detail

main

getContent

getRawContent

getPOSTaggedContent

getContentStartingAtToken

getTokensStartingAtToken

getTokensStartingAtToken