Package org.apache.lucene.document
Document for indexing and
searching.
The document package provides the user level logical representation of content to be indexed
and searched. The package also provides utilities for working with Documents and IndexableFields.
Document and IndexableField
A Document is a collection of IndexableFields. A IndexableField is a
logical representation of a user's content that needs to be indexed or stored. IndexableFields have a number of properties that tell Lucene how to
treat the content (like indexed, tokenized, stored, etc.) See the Field implementation of IndexableField for specifics on these properties.
Note: it is common to refer to Documents having Fields, even though technically they have IndexableFields.
Working with Documents
First and foremost, a Document is something created by the
user application. It is your job to create Documents based on the content of the files you are
working with in your application (Word, txt, PDF, Excel or any other format.) How this is done is
completely up to you. That being said, there are many tools available in other projects that can
make the process of taking a file and converting it into a Lucene Document.
The DateTools is a utility class to make dates and times
searchable. IntPoint, LongPoint, FloatPoint and DoublePoint enable indexing of numeric values (and also dates) for
fast range queries using PointRangeQuery
-
ClassDescriptionField that stores a per-document
BytesRefvalue.An indexed binary field for fast range filters.A binary representation of a range that wraps a BinaryDocValues fieldProvides support for converting dates to strings and vice-versa.Specifies the time granularity.Documents are the unit of indexing and search.AStoredFieldVisitorthat creates aDocumentfrom stored fields.Syntactic sugar for encoding doubles as NumericDocValues viaDouble.doubleToRawLongBits(double).Field that stores a per-documentdoublevalue for scoring, sorting or value retrieval and index the field for fast range filters.An indexeddoublefield for fast range filters.An indexed Double Range field.DocValues field for DoubleRange.Fieldthat can be used to store static scoring factors into documents.Expert: directly create a field for a document.Specifies whether and how a field should be stored.Describes the properties of a field.Syntactic sugar for encoding floats as NumericDocValues viaFloat.floatToRawIntBits(float).Field that stores a per-documentfloatvalue for scoring, sorting or value retrieval and index the field for fast range filters.An indexedfloatfield for fast range filters.An indexed Float Range field.DocValues field for FloatRange.An indexed 128-bitInetAddressfield.An indexed InetAddress Range FieldField that stores a per-documentintvalue for scoring, sorting or value retrieval and index the field for fast range filters.An indexedintfield for fast range filters.An indexed Integer Range field.DocValues field for IntRange.Describes how anIndexableFieldshould be inverted for indexing terms and postings.Field that indexes a per-document String orBytesRefinto an inverted index for fast filtering, stores values in a columnar fashion usingDocValuesType.SORTED_SETdoc values for sorting and faceting, and optionally stores values as stored fields for top-hits retrieval.A field that contains a single byte numeric vector (or none) for each document.A field that contains a single floating-point numeric vector (or none) for each document.An per-document location field.An indexed location field.An geo shape utility class for indexing and searching gis geometries whose vertices are latitude, longitude values (in decimal degrees).A concrete implementation ofShapeDocValuesfor storing binary doc value representation ofLatLonShapegeometries in aLatLonShapeDocValuesFieldConcrete implementation of aShapeDocValuesFieldfor geographic geometries.Field that stores a per-documentlongvalue for scoring, sorting or value retrieval and index the field for fast range filters.An indexedlongfield for fast range filters.An indexed Long Range field.DocValues field for LongRange.Field that stores a per-documentlongvalue for scoring, sorting or value retrieval.Query class for searchingRangeFieldtypes by a definedPointValues.Relation.Used byRangeFieldQueryto check how each internal or leaf node relates to the query.A doc values field forLatLonShapeandXYShapethat usesShapeDocValuesas the underlying binary doc value format.A base shape utility class used for both LatLon (spherical) and XY (cartesian) shape fields.Represents a encoded triangle usingShapeField.decodeTriangle(byte[], DecodedTriangle).type of triangleQuery Relation Types *polygons are decomposed into tessellated triangles usingTessellatorthese triangles are encoded and inserted as separate indexed POINT fieldsField that stores a per-documentBytesRefvalue, indexed for sorting.Field that stores a per-documentlongvalues for scoring, sorting or value retrieval.Field that stores a set of per-documentBytesRefvalues, indexed for faceting,grouping,joining.A field whose value is stored so thatIndexSearcher.storedFields()andIndexReader.storedFields()will return the field and its value.Abstraction around a stored value.Type of aStoredValue.A field that is indexed but not tokenized: the entire String value is indexed as a single token.A field that is indexed and tokenized, without term vectors.An per-document location field.XYGeometry query forXYDocValuesField.An indexed XY position field.A cartesian shape utility class for indexing and searching geometries whose vertices are unitless x, y values.A concrete implementation ofShapeDocValuesfor storing binary doc value representation ofXYShapegeometries in aXYShapeDocValuesFieldConcrete implementation of aShapeDocValuesFieldfor cartesian geometries.