parent
b50c929081
commit
abf901d932
|
@ -0,0 +1,6 @@
|
|||
zardoz
|
||||
bayes.*
|
||||
/logs
|
||||
logs/*
|
||||
binaries/*
|
||||
binaries
|
|
@ -0,0 +1,3 @@
|
|||
{
|
||||
"go.inferGopath": false
|
||||
}
|
|
@ -0,0 +1,15 @@
|
|||
Zardoz: a lightweight WAF , based on Pseudo-Bayes machine learning.
|
||||
Copyright (C) 2020 loweel@keinpfusch.net
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
@ -0,0 +1,122 @@
|
|||
# Zardoz: a lightweight WAF , based on Pseudo-Bayes machine learning.
|
||||
|
||||
|
||||
|
||||
Zardoz is a small WAF, aiming to take off HTTP calls which are well-known to end in some HTTP error. It behaves like a reverse proxy, running as a frontend. It intercepts the calls, forwards them when needed and learns how the server reacts from the Status Code.
|
||||
|
||||
After a while, the bayes classifier is able to understand what is a "good" HTTP call and a bad one, based on the header contents.
|
||||
|
||||
It is designed to don't consume much memory neither CPU, so that you don't need powerful servers to keep it running, neither it can introduce high latency on the web server.
|
||||
|
||||
## STATUS:
|
||||
|
||||
This is just an experiment I'm doing with Pseudo-Bayes classifiers. It works pretty well with my blog. Run in production at your own risk.
|
||||
|
||||
|
||||
## Compiling:
|
||||
|
||||
Requirements:
|
||||
|
||||
- golang >= 1.12.9
|
||||
|
||||
build:
|
||||
|
||||
```bash
|
||||
git clone https://git.keinpfusch.net/LowEel/zardoz
|
||||
cd zardoz
|
||||
go build
|
||||
```
|
||||
|
||||
## Starting:
|
||||
|
||||
Zardoz has no configuration file, it entirely depends from environment string.
|
||||
|
||||
In Dockerfile, this maps like:
|
||||
|
||||
```bash
|
||||
ENV REVERSEURL http://10.0.1.1:3000
|
||||
ENV PROXYPORT :17000
|
||||
ENV TRIGGER 0.6
|
||||
ENV SENIORITY 1025
|
||||
ENV DEBUG false
|
||||
ENV DUMPFILE /somewhere/bayes.txt
|
||||
ENV COLLECTION 2048
|
||||
```
|
||||
|
||||
Using a bash script, this means something like:
|
||||
|
||||
```bash
|
||||
export REVERSEURL=http://10.0.1.1:3000
|
||||
export PROXYPORT=":17000"
|
||||
export TRIGGER="0.6"
|
||||
export SENIORITY="1025"
|
||||
export DEBUG="true"
|
||||
export DUMPFILE="/somewhere/bayes.txt"
|
||||
export COLLECTION
|
||||
./zardoz
|
||||
```
|
||||
|
||||
## Understanding Configuration:
|
||||
|
||||
**REVERSEURL** is the server zardoz will be a reverse proxy for. This maps to IP and port of the server you want to protect.
|
||||
|
||||
**PROXYPORT** is the IP and PORT where zardoz will listen. If you want zardoz to listen on all ports, just write like ":17000", meaning, it will listen on all interfaces at port 17000
|
||||
|
||||
**TRIGGER**: this is one of the trickiest part. We can describe the behavior of zardoz in quadrants, like:
|
||||
|
||||
|
||||
|
||||
| - | BAD > GOOD | BAD < GOOD |
|
||||
| ------------------------------- | ----------- | ---------- |
|
||||
| **\| GOOD - BAD \| > TRIGGER** | BLOCK | PASS |
|
||||
| **\| GOOD - BAD \| <= TRIGGER** | BLOCK+LEARN | PASS+LEARN |
|
||||
|
||||
|
||||
|
||||
The value of trigger can be from 0 to 1, like "0.5" or "0.6". The difference between BLOCK without learning and block with learning is execution time. On the point of view of user experience, it will change nothing (user will be blocked) but in case of "block+learn" the machine will try to learn the lesson.
|
||||
|
||||
Basically, if the GOOD and BAD are very far, "likelyhood" is very high, so that block and pass are taken strictly.
|
||||
|
||||
If the likelyhood is lesser than TRIGGER, then we aren't sure the prediction is good, so zardoz executes the PASS or BLOCK, but it waits for the response , and learns from it. To summerize, the concept is about "likelyhood", which makes the difference between an action and the same action + LEARN.
|
||||
|
||||
Personally I've got good results putting the trigger at 0.6, meaning this is not disturbing so much users, and in the same time it has filtered tons of malicious scan.
|
||||
|
||||
**SENIORITY**: since Zardoz will learn what is good for your web server, it takes time to gain seniority. To start Zardoz as empty and leave it to decide will generate some terrible behavior, because of false positives and false negatives. Plus, at the beginning Zardoz is supposed to ALWAYS learn.
|
||||
|
||||
The parameter "SENIORITY" is then the amount of requests it will set in "PASS+LEARN" before the filtering starts. During this time, it will learn from real traffic. It will block no traffic unless "seniority" is reach. If you set it to 1025, it will learn from 1025 requests and then it will start to actually filter the requests. The number depends by many factors: if you have a lot of page served and a lot of contents, I suggest to increase the number.
|
||||
|
||||
**DUMPFILE**
|
||||
|
||||
This is where you want the dumpfile to be saved. Useful with Docker volumes.
|
||||
|
||||
**COLLECTION**
|
||||
|
||||
The amount of collected tokens which are considered enough to do a good job. This depends by your service. This is useful to limit memory usage if your server has a very complex content, by example.
|
||||
|
||||
|
||||
**TROUBLESHOOTING:**
|
||||
|
||||
If DEBUG is set to "false" or not set, minute Zardoz will dump the sparse matrix describing to the whole bayesian learning, into a file named bayes.json. This contains the weighted matrix of calls and classes. If Zardoz is not behaving like you expected, you may give a look to this file. The format is a classic sparse matrix. WARNING: this file **may** contain cookies or other sensitive headers.
|
||||
|
||||
DEBUG : if set to "true", Zardoz will create a folder "logs" and log what happens, together with the dump of sparse matrix. If set to "false" or not set, sparse matrix will be available on disk for post-mortem.
|
||||
|
||||
**CREDIT**
|
||||
|
||||
Credits for the Bayesian Implementation to Jake Brukhman : https://github.com/jbrukh/bayesian
|
||||
|
||||
|
||||
## TODO:
|
||||
|
||||
- [ ] Loading Bayesian data from file.
|
||||
- [X] Better Logging
|
||||
- [ ] Configurable block message.
|
||||
- [ ] Usage Statistics/Metrics sent to influxDB/prometheus/whatever
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,51 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
"time"
|
||||
)
|
||||
|
||||
//HTTPFlow is a type containg all the data we need.
|
||||
type HTTPFlow struct {
|
||||
request *http.Request
|
||||
response *http.Response
|
||||
sensitivity float64 // value who triggers decisions
|
||||
seniority int64
|
||||
collection float64
|
||||
refreshtime time.Duration
|
||||
}
|
||||
|
||||
//DebugLog tells if logs are in debug mode or not
|
||||
var DebugLog bool
|
||||
|
||||
//ProxyFlow represents our flow
|
||||
var ProxyFlow HTTPFlow
|
||||
|
||||
//ZClassifier is our bayesian classifier
|
||||
var ZClassifier *ByClassifier
|
||||
|
||||
//BlockMessage is the messgae we return when blocking
|
||||
var BlockMessage string
|
||||
|
||||
//Maturity is the minimal amount of request , needed to say Zardoz has learnt enough
|
||||
var Maturity int64
|
||||
|
||||
func init() {
|
||||
|
||||
ZClassifier = new(ByClassifier)
|
||||
ZClassifier.enroll()
|
||||
|
||||
ProxyFlow.sensitivity = 0.5
|
||||
ProxyFlow.seniority = 0
|
||||
|
||||
bl, err := Asset("assets/message.txt")
|
||||
if err != nil {
|
||||
log.Println("Cannot marshal asset error message!!")
|
||||
BlockMessage = ""
|
||||
} else {
|
||||
BlockMessage = fmt.Sprintf("%s", bl)
|
||||
}
|
||||
|
||||
}
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
|
@ -0,0 +1,6 @@
|
|||
penis
|
||||
wallet
|
||||
/.well-known/host-meta
|
||||
/.well-known/host-meta/
|
||||
/.well-known/nodeinfo
|
||||
|
|
@ -0,0 +1,24 @@
|
|||
#!/bin/bash
|
||||
|
||||
rm ./zardoz
|
||||
|
||||
GOOS=linux GOARCH=arm64 CGO_ENABLED=0 go build -mod=vendor
|
||||
file zardoz
|
||||
mv ./zardoz ./binaries/arm64/zardoz
|
||||
tar -cvzf ./binaries/tgz/zardoz_arm64.tgz -C ./binaries/arm64 . --owner=0 --group=0
|
||||
|
||||
GOOS=linux GOARCH=arm CGO_ENABLED=0 GOARM=7 go build -mod=vendor
|
||||
file zardoz
|
||||
mv ./zardoz ./binaries/armv7/zardoz
|
||||
tar -cvzf ./binaries/tgz/zardoz_armv7.tgz -C ./binaries/armv7 . --owner=0 --group=0
|
||||
|
||||
GOOS=linux GOARCH=mips CGO_ENABLED=0 go build -mod=vendor
|
||||
file zardoz
|
||||
mv ./zardoz ./binaries/mips32/zardoz
|
||||
tar -cvzf ./binaries/tgz/zardoz_mips32.tgz -C ./binaries/mips32 . --owner=0 --group=0
|
||||
|
||||
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -mod=vendor
|
||||
file zardoz
|
||||
mv ./zardoz ./binaries/amd64/zardoz
|
||||
tar -cvzf ./binaries/tgz/zardoz_amd64.tgz -C ./binaries/amd64 . --owner=0 --group=0
|
||||
|
|
@ -0,0 +1,147 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"io/ioutil"
|
||||
"log"
|
||||
"net"
|
||||
"net/http"
|
||||
)
|
||||
|
||||
//Zexpressions is the set of regexp being used by zardoz
|
||||
var Zexpressions = []string{
|
||||
`[[:alpha:]]{4,32}`, // alpha digit token
|
||||
`[ ]([A-Za-z0-9-_]{4,}\.)+\w+`, // domain name
|
||||
`[ ]/[A-Za-z0-9-_/.]{4,}[ ]`, // URI path (also partial)
|
||||
`[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}`, // IP address
|
||||
`[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}`, // UUID
|
||||
}
|
||||
|
||||
func passAndLearn(resp *http.Response) error {
|
||||
|
||||
ProxyFlow.response = resp
|
||||
ProxyFlow.seniority++
|
||||
req := ProxyFlow.request
|
||||
|
||||
switch {
|
||||
case isAuth(resp):
|
||||
log.Println("401: We don't want to store credentials")
|
||||
case isError(resp):
|
||||
buf := bytes.NewBufferString(BlockMessage)
|
||||
resp.Body = ioutil.NopCloser(buf)
|
||||
resp.Status = "403 Forbidden"
|
||||
resp.StatusCode = 403
|
||||
resp.Header["Content-Length"] = []string{fmt.Sprint(buf.Len())}
|
||||
resp.Header.Set("Content-Encoding", "none")
|
||||
resp.Header.Set("Cache-Control", "no-cache, no-store")
|
||||
log.Println("Filing inside bad class")
|
||||
feedRequest(req, "BAD")
|
||||
ControPlane.StatsTokens <- "DOWNGRADE"
|
||||
case isSuccess(resp):
|
||||
log.Println("Filing inside Good Class: ", resp.StatusCode)
|
||||
feedRequest(req, "GOOD")
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func blockAndlearn(resp *http.Response) error {
|
||||
|
||||
ProxyFlow.response = resp
|
||||
ProxyFlow.seniority++
|
||||
req := ProxyFlow.request
|
||||
|
||||
buf := bytes.NewBufferString(BlockMessage)
|
||||
resp.Body = ioutil.NopCloser(buf)
|
||||
resp.Status = "403 Forbidden"
|
||||
resp.StatusCode = 403
|
||||
resp.Header["Content-Length"] = []string{fmt.Sprint(buf.Len())}
|
||||
resp.Header.Set("Content-Encoding", "none")
|
||||
resp.Header.Set("Cache-Control", "no-cache, no-store")
|
||||
|
||||
switch {
|
||||
case isAuth(resp):
|
||||
log.Println("401: We don't want to store credentials")
|
||||
case isError(resp):
|
||||
log.Println("Filing inside bad class")
|
||||
feedRequest(req, "BAD")
|
||||
case isSuccess(resp):
|
||||
log.Println("Filing inside Good Class: ", resp.StatusCode)
|
||||
ControPlane.StatsTokens <- "UPGRADED"
|
||||
feedRequest(req, "GOOD")
|
||||
}
|
||||
|
||||
return nil
|
||||
|
||||
}
|
||||
|
||||
func feedRequest(req *http.Request, class string) {
|
||||
|
||||
feed := SourceIP(req)
|
||||
|
||||
// feed := formatRequest(req)
|
||||
|
||||
if class == "BAD" {
|
||||
|
||||
log.Println("Feeding BAD token: ", feed)
|
||||
|
||||
ControPlane.BadTokens <- feed
|
||||
|
||||
}
|
||||
|
||||
if class == "GOOD" {
|
||||
|
||||
log.Println("Feeding GOOD Token:", feed)
|
||||
|
||||
ControPlane.GoodTokens <- feed
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
//Unique returns unique elements in the string
|
||||
func Unique(slice []string) []string {
|
||||
// create a map with all the values as key
|
||||
uniqMap := make(map[string]struct{})
|
||||
for _, v := range slice {
|
||||
uniqMap[v] = struct{}{}
|
||||
}
|
||||
|
||||
// turn the map keys into a slice
|
||||
uniqSlice := make([]string, 0, len(uniqMap))
|
||||
for v := range uniqMap {
|
||||
uniqSlice = append(uniqSlice, v)
|
||||
}
|
||||
return uniqSlice
|
||||
}
|
||||
|
||||
func isSuccess(resp *http.Response) bool {
|
||||
return resp.StatusCode <= 299
|
||||
}
|
||||
|
||||
func isAuth(resp *http.Response) bool {
|
||||
return resp.StatusCode == 401
|
||||
}
|
||||
|
||||
func isError(resp *http.Response) bool {
|
||||
return resp.StatusCode >= 400 && resp.StatusCode != 401
|
||||
}
|
||||
|
||||
//SourceIP returns the source IP of a http call
|
||||
func SourceIP(req *http.Request) string {
|
||||
|
||||
var feed string
|
||||
|
||||
if feed = req.Header.Get("X-Forwarded-For"); feed != "" {
|
||||
log.Println("Got X-Forwarded-For: " + feed)
|
||||
} else {
|
||||
|
||||
feed, _, _ = net.SplitHostPort(req.RemoteAddr)
|
||||
|
||||
log.Println("NO X-Forwarded-For, using: "+feed+" from ", req.RemoteAddr)
|
||||
}
|
||||
|
||||
return feed
|
||||
|
||||
}
|
|
@ -0,0 +1,93 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"os"
|
||||
"time"
|
||||
)
|
||||
|
||||
// WriteToFile will print any string of text to a file safely by
|
||||
// checking for errors and syncing at the end.
|
||||
func writeToFile(filename string, data string) error {
|
||||
file, err := os.Create(filename)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
_, err = io.WriteString(file, data)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return file.Sync()
|
||||
}
|
||||
|
||||
func handlepanic() {
|
||||
|
||||
if a := recover(); a != nil {
|
||||
fmt.Println("OPS!: Recovering from:", a)
|
||||
}
|
||||
}
|
||||
|
||||
func saveBayesToFile() {
|
||||
|
||||
log.Println("Trying to write json file")
|
||||
defer handlepanic()
|
||||
|
||||
dumpfile := os.Getenv("DUMPFILE")
|
||||
if dumpfile == "" {
|
||||
dumpfile = "bayes.json"
|
||||
}
|
||||
|
||||
ZClassifier.STATS.busy.Lock()
|
||||
defer ZClassifier.STATS.busy.Unlock()
|
||||
|
||||
statsREPORT, err := json.MarshalIndent(ZClassifier.STATS.stats, "", " ")
|
||||
if err != nil {
|
||||
statsREPORT = []byte(err.Error())
|
||||
}
|
||||
|
||||
ZClassifier.Working.busy.Lock()
|
||||
defer ZClassifier.Working.busy.Unlock()
|
||||
|
||||
wScores, err := json.MarshalIndent(ZClassifier.Working.sMap, "", " ")
|
||||
if err != nil {
|
||||
wScores = []byte(err.Error())
|
||||
}
|
||||
|
||||
ZClassifier.Learning.busy.Lock()
|
||||
defer ZClassifier.Learning.busy.Unlock()
|
||||
|
||||
lScores, err := json.MarshalIndent(ZClassifier.Learning.sMap, "", " ")
|
||||
if err != nil {
|
||||
lScores = []byte(err.Error())
|
||||
}
|
||||
|
||||
report := fmt.Sprintf("STATS: %s\n WORKING: %s\n LEARNING: %s\n", statsREPORT, wScores, lScores)
|
||||
|
||||
writeToFile(dumpfile, report)
|
||||
|
||||
log.Println(report)
|
||||
|
||||
}
|
||||
|
||||
func jsonEngine() {
|
||||
|
||||
for {
|
||||
log.Println("Zardoz Seniority: ", ProxyFlow.seniority)
|
||||
saveBayesToFile()
|
||||
time.Sleep(1 * time.Minute)
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
func init() {
|
||||
log.Printf("File Engine Starting")
|
||||
|
||||
go jsonEngine()
|
||||
log.Printf("FIle Engine Started")
|
||||
}
|
|
@ -0,0 +1,11 @@
|
|||
module zardoz
|
||||
|
||||
go 1.13
|
||||
|
||||
require (
|
||||
github.com/blevesearch/bleve v0.8.1 // indirect
|
||||
github.com/blevesearch/go-porterstemmer v1.0.2 // indirect
|
||||
github.com/go-bindata/go-bindata v3.1.2+incompatible // indirect
|
||||
github.com/jteeuwen/go-bindata v3.0.7+incompatible // indirect
|
||||
github.com/lytics/multibayes v0.0.0-20161108162840-3457a5582021
|
||||
)
|
|
@ -0,0 +1,10 @@
|
|||
github.com/blevesearch/bleve v0.8.1 h1:20zBREtGe8dvBxCC+717SaxKcUVQOWk3/Fm75vabKpU=
|
||||
github.com/blevesearch/bleve v0.8.1/go.mod h1:Y2lmIkzV6mcNfAnAdOd+ZxHkHchhBfU/xroGIp61wfw=
|
||||
github.com/blevesearch/go-porterstemmer v1.0.2 h1:qe7n69gBd1OLY5sHKnxQHIbzn0LNJA4hpAf+5XDxV2I=
|
||||
github.com/blevesearch/go-porterstemmer v1.0.2/go.mod h1:haWQqFT3RdOGz7PJuM3or/pWNJS1pKkoZJWCkWu0DVA=
|
||||
github.com/go-bindata/go-bindata v3.1.2+incompatible h1:5vjJMVhowQdPzjE1LdxyFF7YFTXg5IgGVW4gBr5IbvE=
|
||||
github.com/go-bindata/go-bindata v3.1.2+incompatible/go.mod h1:xK8Dsgwmeed+BBsSy2XTopBn/8uK2HWuGSnA11C3Joo=
|
||||
github.com/jteeuwen/go-bindata v3.0.7+incompatible h1:91Uy4d9SYVr1kyTJ15wJsog+esAZZl7JmEfTkwmhJts=
|
||||
github.com/jteeuwen/go-bindata v3.0.7+incompatible/go.mod h1:JVvhzYOiGBnFSYRyV00iY8q7/0PThjIYav1p9h5dmKs=
|
||||
github.com/lytics/multibayes v0.0.0-20161108162840-3457a5582021 h1:J9Pk5h7TJlqMQtcINI23BUa0+bbxRXPMf7r8gAlfNxo=
|
||||
github.com/lytics/multibayes v0.0.0-20161108162840-3457a5582021/go.mod h1:lXjTNxya7kn6QNxA3fW8WGYQq0KL/SUcPE9AwcPSgwI=
|
|
@ -0,0 +1,80 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"log"
|
||||
"math"
|
||||
"net/http"
|
||||
"net/http/httputil"
|
||||
)
|
||||
|
||||
func handler(p *httputil.ReverseProxy) func(http.ResponseWriter, *http.Request) {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
//put the request inside our structure
|
||||
ProxyFlow.request = r
|
||||
log.Println("Received HTTP Request")
|
||||
probs := ZClassifier.Posterior(SourceIP(r))
|
||||
log.Printf("Posterior Probabilities: %+v\n", probs)
|
||||
action := quadrant(probs)
|
||||
ControPlane.StatsTokens <- action
|
||||
|
||||
switch action {
|
||||
case "BLOCK", "BLOCKLEARN":
|
||||
p.ModifyResponse = blockAndlearn
|
||||
w.Header().Set("Probabilities", fmt.Sprintf("%v ", probs))
|
||||
log.Println("Request Blocked")
|
||||
p.ServeHTTP(w, r)
|
||||
|
||||
case "PASS", "PASSLEARN":
|
||||
p.ModifyResponse = passAndLearn
|
||||
w.Header().Set("Probabilities", fmt.Sprintf("%v ", probs))
|
||||
p.ServeHTTP(w, r)
|
||||
log.Println("Passing Request")
|
||||
|
||||
default:
|
||||
log.Println("No Decision: PASS and LEARN")
|
||||
p.ModifyResponse = passAndLearn
|
||||
w.Header().Set("Probabilities", fmt.Sprintf("%v ", probs))
|
||||
p.ServeHTTP(w, r)
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
func quadrant(p map[string]float64) string {
|
||||
|
||||
sure := math.Abs(p["BAD"]-p["GOOD"]) >= ProxyFlow.sensitivity
|
||||
badish := p["BAD"] > p["GOOD"]
|
||||
goodish := p["GOOD"] > p["BAD"]
|
||||
|
||||
if ProxyFlow.seniority < Maturity {
|
||||
log.Println("Seniority too low. Waiting.")
|
||||
return "PASSLEARN"
|
||||
}
|
||||
|
||||
if sure {
|
||||
|
||||
if goodish {
|
||||
return "PASS"
|
||||
}
|
||||
|
||||
if badish {
|
||||
return "BLOCK"
|
||||
}
|
||||
|
||||
} else {
|
||||
|
||||
if goodish {
|
||||
return "PASSLEARN"
|
||||
}
|
||||
|
||||
if badish {
|
||||
return "BLOCKLEARN"
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
return "PASSLEARN"
|
||||
|
||||
}
|
|
@ -0,0 +1,125 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"io/ioutil"
|
||||
"log"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"time"
|
||||
)
|
||||
|
||||
//Zardozlogfile defines the log structure
|
||||
type Zardozlogfile struct {
|
||||
filename string
|
||||
logfile *os.File
|
||||
active bool
|
||||
}
|
||||
|
||||
//VSlogfile is the logger we use
|
||||
var VSlogfile Zardozlogfile
|
||||
|
||||
func init() {
|
||||
|
||||
verbose := os.Getenv("DEBUG")
|
||||
log.Println("Verbose mode on: ", verbose)
|
||||
DebugLog = (verbose == "true")
|
||||
log.Println("DebugLog: ", DebugLog)
|
||||
log.Println("Starting Log Engine")
|
||||
// just the first time
|
||||
var currentFolder = Hpwd()
|
||||
os.MkdirAll(filepath.Join(currentFolder, "logs"), 0755)
|
||||
//
|
||||
|
||||
VSlogfile.active = DebugLog
|
||||
VSlogfile.SetLogFolder()
|
||||
go VSlogfile.RotateLogFolder()
|
||||
|
||||
}
|
||||
|
||||
//RotateLogFolder rotates the log folder
|
||||
func (lf *Zardozlogfile) RotateLogFolder() {
|
||||
|
||||
for {
|
||||
|
||||
time.Sleep(1 * time.Hour)
|
||||
if lf.logfile != nil {
|
||||
|
||||
err := lf.logfile.Close()
|
||||
log.Println("[TOOLS][LOG] close logfile returned: ", err)
|
||||
}
|
||||
|
||||
lf.SetLogFolder()
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
//SetLogFolder sets the log folder
|
||||
func (lf *Zardozlogfile) SetLogFolder() {
|
||||
|
||||
if DebugLog {
|
||||
lf.EnableLog()
|
||||
} else {
|
||||
lf.DisableLog()
|
||||
}
|
||||
|
||||
if lf.active {
|
||||
|
||||
const layout = "2006-01-02.15"
|
||||
|
||||
orario := time.Now().UTC()
|
||||
|
||||
var currentFolder = Hpwd()
|
||||
lf.filename = filepath.Join(currentFolder, "logs", "Zardoz."+orario.Format(layout)+"00.log")
|
||||
|
||||
lf.logfile, _ = os.Create(lf.filename)
|
||||
|
||||
log.Println("[TOOLS][LOG] Logfile is: " + lf.filename)
|
||||
log.SetOutput(lf.logfile)
|
||||
|
||||
// log.SetFlags(log.LstdFlags | log.Lshortfile | log.LUTC)
|
||||
log.SetFlags(log.LstdFlags | log.LUTC)
|
||||
|
||||
} else {
|
||||
log.SetOutput(ioutil.Discard)
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
//EnableLog enables logging
|
||||
func (lf *Zardozlogfile) EnableLog() {
|
||||
|
||||
lf.active = true
|
||||
|
||||
}
|
||||
|
||||
//DisableLog disables logging
|
||||
func (lf *Zardozlogfile) DisableLog() {
|
||||
|
||||
lf.active = false
|
||||
log.SetFlags(0)
|
||||
log.SetOutput(ioutil.Discard)
|
||||
|
||||
}
|
||||
|
||||
//LogEngineStart just triggers the init for the package, and logs it.
|
||||
func LogEngineStart() {
|
||||
|
||||
log.Println("LogRotation Init")
|
||||
|
||||
}
|
||||
|
||||
//Hpwd behaves like the unix pwd command, returning the current path
|
||||
func Hpwd() string {
|
||||
|
||||
tmpLoc, err := os.Getwd()
|
||||
|
||||
if err != nil {
|
||||
tmpLoc = "/tmp"
|
||||
log.Printf("[TOOLS][FS] Problem getting unix pwd: %s", err.Error())
|
||||
|
||||
}
|
||||
|
||||
return tmpLoc
|
||||
|
||||
}
|
|
@ -0,0 +1,53 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"log"
|
||||
"net/http"
|
||||
"net/http/httputil"
|
||||
"net/url"
|
||||
"os"
|
||||
"strconv"
|
||||
)
|
||||
|
||||
func main() {
|
||||
|
||||
vip := os.Getenv("REVERSEURL")
|
||||
pport := os.Getenv("PROXYPORT")
|
||||
sensitivity := os.Getenv("TRIGGER")
|
||||
maturity := os.Getenv("SENIORITY")
|
||||
collect := os.Getenv("COLLECTION")
|
||||
|
||||
log.Println("Reverse path is: ", vip)
|
||||
log.Println("Reverse port is: ", pport)
|
||||
|
||||
remote, err := url.Parse(vip)
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
|
||||
ProxyFlow.sensitivity, err = strconv.ParseFloat(sensitivity, 64)
|
||||
if err != nil {
|
||||
ProxyFlow.sensitivity = 0.5
|
||||
}
|
||||
log.Println("Trigger is: ", ProxyFlow.sensitivity)
|
||||
|
||||
Maturity, err = strconv.ParseInt(maturity, 10, 64)
|
||||
if err != nil {
|
||||
Maturity = 1024
|
||||
}
|
||||
log.Println("Minimum request to learn: ", Maturity)
|
||||
|
||||
ProxyFlow.collection, err = strconv.ParseFloat(collect, 64)
|
||||
if err != nil {
|
||||
// This is because we assume every example should add at least one token
|
||||
ProxyFlow.collection = float64(Maturity)
|
||||
}
|
||||
log.Println("Collection limit is: ", ProxyFlow.collection)
|
||||
|
||||
proxy := httputil.NewSingleHostReverseProxy(remote)
|
||||
http.HandleFunc("/", handler(proxy))
|
||||
err = http.ListenAndServe(pport, nil)
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
}
|
|
@ -0,0 +1,276 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"log"
|
||||
"os"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
//ByControlPlane contains all the channels we need.
|
||||
type ByControlPlane struct {
|
||||
BadTokens chan string
|
||||
GoodTokens chan string
|
||||
StatsTokens chan string
|
||||
}
|
||||
|
||||
type safeClassifier struct {
|
||||
sMap map[string]string
|
||||
busy sync.Mutex
|
||||
}
|
||||
|
||||
type safeStats struct {
|
||||
stats map[string]int64
|
||||
busy sync.Mutex
|
||||
}
|
||||
|
||||
//ControPlane is the variabile
|
||||
var ControPlane ByControlPlane
|
||||
|
||||
//ByClassifier is the structure containing our Pseudo-Bayes classifier.
|
||||
type ByClassifier struct {
|
||||
STATS safeStats
|
||||
Learning safeClassifier
|
||||
Working safeClassifier
|
||||
Generation int64
|
||||
}
|
||||
|
||||
//AddStats adds the statistics after proper blocking.
|
||||
func (c *ByClassifier) AddStats(action string) {
|
||||
|
||||
c.STATS.busy.Lock()
|
||||
defer c.STATS.busy.Unlock()
|
||||
|
||||
if _, exists := c.STATS.stats[action]; exists {
|
||||
c.STATS.stats[action]++
|
||||
} else {
|
||||
c.STATS.stats[action] = 1
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
//IsBAD inserts a bad key in the right place.
|
||||
func (c *ByClassifier) IsBAD(key string) {
|
||||
|
||||
log.Println("BAD Received", key)
|
||||
|
||||
k := strings.Fields(key)
|
||||
|
||||
c.Learning.busy.Lock()
|
||||
defer c.Learning.busy.Unlock()
|
||||
|
||||
for _, tk := range k {
|
||||
|
||||
if kind, exists := c.Learning.sMap[tk]; exists {
|
||||
|
||||
switch kind {
|
||||
case "BAD":
|
||||
log.Println("Word was known as bad:", tk)
|
||||
case "GOOD":
|
||||
c.Learning.sMap[tk] = "MEH"
|
||||
log.Println("So sad, work was known as good", tk)
|
||||
case "MEH":
|
||||
log.Println("Word was known as ambiguos:", tk)
|
||||
}
|
||||
|
||||
} else {
|
||||
c.Learning.sMap[tk] = "BAD"
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
log.Println("BAD Learned", key)
|
||||
|
||||
}
|
||||
|
||||
//IsGOOD inserts the key in the right place.
|
||||
func (c *ByClassifier) IsGOOD(key string) {
|
||||
|
||||
k := strings.Fields(key)
|
||||
|
||||
log.Println("GOOD Received", key)
|
||||
|
||||
c.Learning.busy.Lock()
|
||||
defer c.Learning.busy.Unlock()
|
||||
|
||||
for _, tk := range k {
|
||||
|
||||
if kind, exists := c.Learning.sMap[tk]; exists {
|
||||
|
||||
switch kind {
|
||||
case "GOOD":
|
||||
log.Println("Word was known as good: ", tk)
|
||||
case "BAD":
|
||||
c.Learning.sMap[tk] = "MEH"
|
||||
log.Println("So sad, work was known as bad: ", tk)
|
||||
case "MEH":
|
||||
log.Println("Word was known as ambiguos: ", tk)
|
||||
}
|
||||
|
||||
} else {
|
||||
c.Learning.sMap[tk] = "GOOD"
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
log.Println("GOOD Learned", key)
|
||||
|
||||
}
|
||||
|
||||
//Posterior calculates Shannon based entropy using bad and good as different distributions
|
||||
func (c *ByClassifier) Posterior(hdr string) map[string]float64 {
|
||||
|
||||
tokens := strings.Fields(hdr)
|
||||
ff := make(map[string]float64)
|
||||
|
||||
if c.Generation == 0 || len(tokens) == 0 {
|
||||
ff["BAD"] = 0.5
|
||||
ff["GOOD"] = 0.5
|
||||
return ff
|
||||
|
||||
}
|
||||
|
||||
log.Println("Posterior locking the Working Bayesian")
|
||||
c.Working.busy.Lock()
|
||||
defer c.Working.busy.Unlock()
|
||||
|
||||
var totalGood, totalBad float64
|
||||
|
||||
for _, tk := range tokens {
|
||||
|
||||
if kind, exists := c.Working.sMap[tk]; exists {
|
||||
|
||||
switch kind {
|
||||
case "BAD":
|
||||
totalBad++
|
||||
case "GOOD":
|
||||
totalGood++
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
ff["GOOD"] = 1 - (totalBad / float64(len(tokens)))
|
||||
ff["BAD"] = 1 - (totalGood / float64(len(tokens)))
|
||||
|
||||
return ff
|
||||
|
||||
}
|
||||
|
||||
func (c *ByClassifier) enroll() {
|
||||
|
||||
ControPlane.BadTokens = make(chan string, 2048)
|
||||
ControPlane.GoodTokens = make(chan string, 2048)
|
||||
ControPlane.StatsTokens = make(chan string, 2048)
|
||||
|
||||
c.Generation = 0
|
||||
c.Learning.sMap = make(map[string]string)
|
||||
c.Working.sMap = make(map[string]string)
|
||||
c.STATS.stats = make(map[string]int64)
|
||||
|
||||
c.readInitList("blacklist.txt", "BAD")
|
||||
c.readInitList("whitelist.txt", "GOOD")
|
||||
|
||||
go c.readBadTokens()
|
||||
go c.readGoodTokens()
|
||||
go c.readStatsTokens()
|
||||
go c.updateLearners()
|
||||
|
||||
log.Println("Classifier populated...")
|
||||
|
||||
}
|
||||
|
||||
func (c *ByClassifier) readBadTokens() {
|
||||
|
||||
log.Println("Start reading BAD tokens")
|
||||
|
||||
for token := range ControPlane.BadTokens {
|
||||
log.Println("Received BAD Token: ", token)
|
||||
c.IsBAD(token)
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
func (c *ByClassifier) readGoodTokens() {
|
||||
|
||||
log.Println("Start reading GOOD tokens")
|
||||
|
||||
for token := range ControPlane.GoodTokens {
|
||||
log.Println("Received GOOD Token: ", token)
|
||||
c.IsGOOD(token)
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
func (c *ByClassifier) readStatsTokens() {
|
||||
|
||||
log.Println("Start reading STATS tokens")
|
||||
|
||||
for token := range ControPlane.StatsTokens {
|
||||
c.AddStats(token)
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
func (c *ByClassifier) readInitList(filePath, class string) {
|
||||
|
||||
inFile, err := os.Open(filePath)
|
||||
if err != nil {
|
||||
log.Println(err.Error() + `: ` + filePath)
|
||||
return
|
||||
}
|
||||
defer inFile.Close()
|
||||
|
||||
scanner := bufio.NewScanner(inFile)
|
||||
for scanner.Scan() {
|
||||
|
||||
if len(scanner.Text()) > 3 {
|
||||
switch class {
|
||||
case "BAD":
|
||||
log.Println("Loading into Blacklist: ", scanner.Text()) // the line
|
||||
c.IsBAD(scanner.Text())
|
||||
case "GOOD":
|
||||
log.Println("Loading into Whitelist: ", scanner.Text()) // the line
|
||||
c.IsGOOD(scanner.Text())
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
func (c *ByClassifier) updateLearners() {
|
||||
|
||||
log.Println("Bayes Updater Start...")
|
||||
|
||||
ticker := time.NewTicker(10 * time.Second)
|
||||
|
||||
for ; true; <-ticker.C {
|
||||
var currentGen int64
|
||||
log.Println("Maturity is:", Maturity)
|
||||
log.Println("Seniority is:", ProxyFlow.seniority)
|
||||
if Maturity > 0 {
|
||||
currentGen = ProxyFlow.seniority / Maturity
|
||||
} else {
|
||||
currentGen = 0
|
||||
}
|
||||
log.Println("Current Generation is: ", currentGen)
|
||||
log.Println("Working Generation is: ", c.Generation)
|
||||
if currentGen > c.Generation || float64(len(c.Learning.sMap)) > ProxyFlow.collection {
|
||||
c.Learning.busy.Lock()
|
||||
c.Working.busy.Lock()
|
||||
c.Working.sMap = c.Learning.sMap
|
||||
c.Learning.sMap = make(map[string]string)
|
||||
c.Generation = currentGen
|
||||
log.Println("Generation Updated to: ", c.Generation)
|
||||
ControPlane.StatsTokens <- "GENERATION"
|
||||
c.Learning.busy.Unlock()
|
||||
c.Working.busy.Unlock()
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
}
|
|
@ -0,0 +1,8 @@
|
|||
export REVERSEURL=https://google.com
|
||||
export PROXYPORT=":8089"
|
||||
export TRIGGER="0.6"
|
||||
#export SENIORITY="1025"
|
||||
export SENIORITY="15"
|
||||
export DEBUG="true"
|
||||
export DUMPFILE="bayes.json"
|
||||
./zardoz
|
|
@ -0,0 +1,202 @@
|
|||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
|
@ -0,0 +1,152 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package analysis
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeTokenLocation int
|
||||
var reflectStaticSizeTokenFreq int
|
||||
|
||||
func init() {
|
||||
var tl TokenLocation
|
||||
reflectStaticSizeTokenLocation = int(reflect.TypeOf(tl).Size())
|
||||
var tf TokenFreq
|
||||
reflectStaticSizeTokenFreq = int(reflect.TypeOf(tf).Size())
|
||||
}
|
||||
|
||||
// TokenLocation represents one occurrence of a term at a particular location in
|
||||
// a field. Start, End and Position have the same meaning as in analysis.Token.
|
||||
// Field and ArrayPositions identify the field value in the source document.
|
||||
// See document.Field for details.
|
||||
type TokenLocation struct {
|
||||
Field string
|
||||
ArrayPositions []uint64
|
||||
Start int
|
||||
End int
|
||||
Position int
|
||||
}
|
||||
|
||||
func (tl *TokenLocation) Size() int {
|
||||
rv := reflectStaticSizeTokenLocation
|
||||
rv += len(tl.ArrayPositions) * size.SizeOfUint64
|
||||
return rv
|
||||
}
|
||||
|
||||
// TokenFreq represents all the occurrences of a term in all fields of a
|
||||
// document.
|
||||
type TokenFreq struct {
|
||||
Term []byte
|
||||
Locations []*TokenLocation
|
||||
frequency int
|
||||
}
|
||||
|
||||
func (tf *TokenFreq) Size() int {
|
||||
rv := reflectStaticSizeTokenFreq
|
||||
rv += len(tf.Term)
|
||||
for _, loc := range tf.Locations {
|
||||
rv += loc.Size()
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
func (tf *TokenFreq) Frequency() int {
|
||||
return tf.frequency
|
||||
}
|
||||
|
||||
// TokenFrequencies maps document terms to their combined frequencies from all
|
||||
// fields.
|
||||
type TokenFrequencies map[string]*TokenFreq
|
||||
|
||||
func (tfs TokenFrequencies) Size() int {
|
||||
rv := size.SizeOfMap
|
||||
rv += len(tfs) * (size.SizeOfString + size.SizeOfPtr)
|
||||
for k, v := range tfs {
|
||||
rv += len(k)
|
||||
rv += v.Size()
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies) {
|
||||
// walk the new token frequencies
|
||||
for tfk, tf := range other {
|
||||
// set the remoteField value in incoming token freqs
|
||||
for _, l := range tf.Locations {
|
||||
l.Field = remoteField
|
||||
}
|
||||
existingTf, exists := tfs[tfk]
|
||||
if exists {
|
||||
existingTf.Locations = append(existingTf.Locations, tf.Locations...)
|
||||
existingTf.frequency = existingTf.frequency + tf.frequency
|
||||
} else {
|
||||
tfs[tfk] = &TokenFreq{
|
||||
Term: tf.Term,
|
||||
frequency: tf.frequency,
|
||||
Locations: make([]*TokenLocation, len(tf.Locations)),
|
||||
}
|
||||
copy(tfs[tfk].Locations, tf.Locations)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TokenFrequency(tokens TokenStream, arrayPositions []uint64, includeTermVectors bool) TokenFrequencies {
|
||||
rv := make(map[string]*TokenFreq, len(tokens))
|
||||
|
||||
if includeTermVectors {
|
||||
tls := make([]TokenLocation, len(tokens))
|
||||
tlNext := 0
|
||||
|
||||
for _, token := range tokens {
|
||||
tls[tlNext] = TokenLocation{
|
||||
ArrayPositions: arrayPositions,
|
||||
Start: token.Start,
|
||||
End: token.End,
|
||||
Position: token.Position,
|
||||
}
|
||||
|
||||
curr, ok := rv[string(token.Term)]
|
||||
if ok {
|
||||
curr.Locations = append(curr.Locations, &tls[tlNext])
|
||||
curr.frequency++
|
||||
} else {
|
||||
rv[string(token.Term)] = &TokenFreq{
|
||||
Term: token.Term,
|
||||
Locations: []*TokenLocation{&tls[tlNext]},
|
||||
frequency: 1,
|
||||
}
|
||||
}
|
||||
|
||||
tlNext++
|
||||
}
|
||||
} else {
|
||||
for _, token := range tokens {
|
||||
curr, exists := rv[string(token.Term)]
|
||||
if exists {
|
||||
curr.frequency++
|
||||
} else {
|
||||
rv[string(token.Term)] = &TokenFreq{
|
||||
Term: token.Term,
|
||||
frequency: 1,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return rv
|
||||
}
|
|
@ -0,0 +1,7 @@
|
|||
# full line comment
|
||||
marty
|
||||
steve # trailing comment
|
||||
| different format of comment
|
||||
dustin
|
||||
siri | different style trailing comment
|
||||
multiple words with different whitespace
|
84
vendor/github.com/blevesearch/bleve/analysis/tokenizer/regexp/regexp.go
generated
vendored
Normal file
84
vendor/github.com/blevesearch/bleve/analysis/tokenizer/regexp/regexp.go
generated
vendored
Normal file
|
@ -0,0 +1,84 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package regexp
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"regexp"
|
||||
"strconv"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
"github.com/blevesearch/bleve/registry"
|
||||
)
|
||||
|
||||
const Name = "regexp"
|
||||
|
||||
var IdeographRegexp = regexp.MustCompile(`\p{Han}|\p{Hangul}|\p{Hiragana}|\p{Katakana}`)
|
||||
|
||||
type RegexpTokenizer struct {
|
||||
r *regexp.Regexp
|
||||
}
|
||||
|
||||
func NewRegexpTokenizer(r *regexp.Regexp) *RegexpTokenizer {
|
||||
return &RegexpTokenizer{
|
||||
r: r,
|
||||
}
|
||||
}
|
||||
|
||||
func (rt *RegexpTokenizer) Tokenize(input []byte) analysis.TokenStream {
|
||||
matches := rt.r.FindAllIndex(input, -1)
|
||||
rv := make(analysis.TokenStream, 0, len(matches))
|
||||
for i, match := range matches {
|
||||
matchBytes := input[match[0]:match[1]]
|
||||
if match[1]-match[0] > 0 {
|
||||
token := analysis.Token{
|
||||
Term: matchBytes,
|
||||
Start: match[0],
|
||||
End: match[1],
|
||||
Position: i + 1,
|
||||
Type: detectTokenType(matchBytes),
|
||||
}
|
||||
rv = append(rv, &token)
|
||||
}
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
func RegexpTokenizerConstructor(config map[string]interface{}, cache *registry.Cache) (analysis.Tokenizer, error) {
|
||||
rval, ok := config["regexp"].(string)
|
||||
if !ok {
|
||||
return nil, fmt.Errorf("must specify regexp")
|
||||
}
|
||||
r, err := regexp.Compile(rval)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("unable to build regexp tokenizer: %v", err)
|
||||
}
|
||||
return NewRegexpTokenizer(r), nil
|
||||
}
|
||||
|
||||
func init() {
|
||||
registry.RegisterTokenizer(Name, RegexpTokenizerConstructor)
|
||||
}
|
||||
|
||||
func detectTokenType(termBytes []byte) analysis.TokenType {
|
||||
if IdeographRegexp.Match(termBytes) {
|
||||
return analysis.Ideographic
|
||||
}
|
||||
_, err := strconv.ParseFloat(string(termBytes), 64)
|
||||
if err == nil {
|
||||
return analysis.Numeric
|
||||
}
|
||||
return analysis.AlphaNumeric
|
||||
}
|
|
@ -0,0 +1,76 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package analysis
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"bytes"
|
||||
"io"
|
||||
"io/ioutil"
|
||||
"strings"
|
||||
)
|
||||
|
||||
type TokenMap map[string]bool
|
||||
|
||||
func NewTokenMap() TokenMap {
|
||||
return make(TokenMap, 0)
|
||||
}
|
||||
|
||||
// LoadFile reads in a list of tokens from a text file,
|
||||
// one per line.
|
||||
// Comments are supported using `#` or `|`
|
||||
func (t TokenMap) LoadFile(filename string) error {
|
||||
data, err := ioutil.ReadFile(filename)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return t.LoadBytes(data)
|
||||
}
|
||||
|
||||
// LoadBytes reads in a list of tokens from memory,
|
||||
// one per line.
|
||||
// Comments are supported using `#` or `|`
|
||||
func (t TokenMap) LoadBytes(data []byte) error {
|
||||
bytesReader := bytes.NewReader(data)
|
||||
bufioReader := bufio.NewReader(bytesReader)
|
||||
line, err := bufioReader.ReadString('\n')
|
||||
for err == nil {
|
||||
t.LoadLine(line)
|
||||
line, err = bufioReader.ReadString('\n')
|
||||
}
|
||||
// if the err was EOF we still need to process the last value
|
||||
if err == io.EOF {
|
||||
t.LoadLine(line)
|
||||
return nil
|
||||
}
|
||||
return err
|
||||
}
|
||||
|
||||
func (t TokenMap) LoadLine(line string) {
|
||||
// find the start of a comment, if any
|
||||
startComment := strings.IndexAny(line, "#|")
|
||||
if startComment >= 0 {
|
||||
line = line[:startComment]
|
||||
}
|
||||
|
||||
tokens := strings.Fields(line)
|
||||
for _, token := range tokens {
|
||||
t.AddToken(token)
|
||||
}
|
||||
}
|
||||
|
||||
func (t TokenMap) AddToken(token string) {
|
||||
t[token] = true
|
||||
}
|
|
@ -0,0 +1,103 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package analysis
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"time"
|
||||
)
|
||||
|
||||
type CharFilter interface {
|
||||
Filter([]byte) []byte
|
||||
}
|
||||
|
||||
type TokenType int
|
||||
|
||||
const (
|
||||
AlphaNumeric TokenType = iota
|
||||
Ideographic
|
||||
Numeric
|
||||
DateTime
|
||||
Shingle
|
||||
Single
|
||||
Double
|
||||
Boolean
|
||||
)
|
||||
|
||||
// Token represents one occurrence of a term at a particular location in a
|
||||
// field.
|
||||
type Token struct {
|
||||
// Start specifies the byte offset of the beginning of the term in the
|
||||
// field.
|
||||
Start int `json:"start"`
|
||||
|
||||
// End specifies the byte offset of the end of the term in the field.
|
||||
End int `json:"end"`
|
||||
Term []byte `json:"term"`
|
||||
|
||||
// Position specifies the 1-based index of the token in the sequence of
|
||||
// occurrences of its term in the field.
|
||||
Position int `json:"position"`
|
||||
Type TokenType `json:"type"`
|
||||
KeyWord bool `json:"keyword"`
|
||||
}
|
||||
|
||||
func (t *Token) String() string {
|
||||
return fmt.Sprintf("Start: %d End: %d Position: %d Token: %s Type: %d", t.Start, t.End, t.Position, string(t.Term), t.Type)
|
||||
}
|
||||
|
||||
type TokenStream []*Token
|
||||
|
||||
// A Tokenizer splits an input string into tokens, the usual behaviour being to
|
||||
// map words to tokens.
|
||||
type Tokenizer interface {
|
||||
Tokenize([]byte) TokenStream
|
||||
}
|
||||
|
||||
// A TokenFilter adds, transforms or removes tokens from a token stream.
|
||||
type TokenFilter interface {
|
||||
Filter(TokenStream) TokenStream
|
||||
}
|
||||
|
||||
type Analyzer struct {
|
||||
CharFilters []CharFilter
|
||||
Tokenizer Tokenizer
|
||||
TokenFilters []TokenFilter
|
||||
}
|
||||
|
||||
func (a *Analyzer) Analyze(input []byte) TokenStream {
|
||||
if a.CharFilters != nil {
|
||||
for _, cf := range a.CharFilters {
|
||||
input = cf.Filter(input)
|
||||
}
|
||||
}
|
||||
tokens := a.Tokenizer.Tokenize(input)
|
||||
if a.TokenFilters != nil {
|
||||
for _, tf := range a.TokenFilters {
|
||||
tokens = tf.Filter(tokens)
|
||||
}
|
||||
}
|
||||
return tokens
|
||||
}
|
||||
|
||||
var ErrInvalidDateTime = fmt.Errorf("unable to parse datetime with any of the layouts")
|
||||
|
||||
type DateTimeParser interface {
|
||||
ParseDateTime(string) (time.Time, error)
|
||||
}
|
||||
|
||||
type ByteArrayConverter interface {
|
||||
Convert([]byte) (interface{}, error)
|
||||
}
|
|
@ -0,0 +1,92 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package analysis
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"unicode/utf8"
|
||||
)
|
||||
|
||||
func DeleteRune(in []rune, pos int) []rune {
|
||||
if pos >= len(in) {
|
||||
return in
|
||||
}
|
||||
copy(in[pos:], in[pos+1:])
|
||||
return in[:len(in)-1]
|
||||
}
|
||||
|
||||
func InsertRune(in []rune, pos int, r rune) []rune {
|
||||
// create a new slice 1 rune larger
|
||||
rv := make([]rune, len(in)+1)
|
||||
// copy the characters before the insert pos
|
||||
copy(rv[0:pos], in[0:pos])
|
||||
// set the inserted rune
|
||||
rv[pos] = r
|
||||
// copy the characters after the insert pos
|
||||
copy(rv[pos+1:], in[pos:])
|
||||
return rv
|
||||
}
|
||||
|
||||
// BuildTermFromRunesOptimistic will build a term from the provided runes
|
||||
// AND optimistically attempt to encode into the provided buffer
|
||||
// if at any point it appears the buffer is too small, a new buffer is
|
||||
// allocated and that is used instead
|
||||
// this should be used in cases where frequently the new term is the same
|
||||
// length or shorter than the original term (in number of bytes)
|
||||
func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte {
|
||||
rv := buf
|
||||
used := 0
|
||||
for _, r := range runes {
|
||||
nextLen := utf8.RuneLen(r)
|
||||
if used+nextLen > len(rv) {
|
||||
// alloc new buf
|
||||
buf = make([]byte, len(runes)*utf8.UTFMax)
|
||||
// copy work we've already done
|
||||
copy(buf, rv[:used])
|
||||
rv = buf
|
||||
}
|
||||
written := utf8.EncodeRune(rv[used:], r)
|
||||
used += written
|
||||
}
|
||||
return rv[:used]
|
||||
}
|
||||
|
||||
func BuildTermFromRunes(runes []rune) []byte {
|
||||
return BuildTermFromRunesOptimistic(make([]byte, len(runes)*utf8.UTFMax), runes)
|
||||
}
|
||||
|
||||
func TruncateRunes(input []byte, num int) []byte {
|
||||
runes := bytes.Runes(input)
|
||||
runes = runes[:len(runes)-num]
|
||||
out := BuildTermFromRunes(runes)
|
||||
return out
|
||||
}
|
||||
|
||||
func RunesEndsWith(input []rune, suffix string) bool {
|
||||
inputLen := len(input)
|
||||
suffixRunes := []rune(suffix)
|
||||
suffixLen := len(suffixRunes)
|
||||
if suffixLen > inputLen {
|
||||
return false
|
||||
}
|
||||
|
||||
for i := suffixLen - 1; i >= 0; i-- {
|
||||
if input[inputLen-(suffixLen-i)] != suffixRunes[i] {
|
||||
return false
|
||||
}
|
||||
}
|
||||
|
||||
return true
|
||||
}
|
|
@ -0,0 +1,101 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package document
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"reflect"
|
||||
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeDocument int
|
||||
|
||||
func init() {
|
||||
var d Document
|
||||
reflectStaticSizeDocument = int(reflect.TypeOf(d).Size())
|
||||
}
|
||||
|
||||
type Document struct {
|
||||
ID string `json:"id"`
|
||||
Fields []Field `json:"fields"`
|
||||
CompositeFields []*CompositeField
|
||||
}
|
||||
|
||||
func NewDocument(id string) *Document {
|
||||
return &Document{
|
||||
ID: id,
|
||||
Fields: make([]Field, 0),
|
||||
CompositeFields: make([]*CompositeField, 0),
|
||||
}
|
||||
}
|
||||
|
||||
func (d *Document) Size() int {
|
||||
sizeInBytes := reflectStaticSizeDocument + size.SizeOfPtr +
|
||||
len(d.ID)
|
||||
|
||||
for _, entry := range d.Fields {
|
||||
sizeInBytes += entry.Size()
|
||||
}
|
||||
|
||||
for _, entry := range d.CompositeFields {
|
||||
sizeInBytes += entry.Size()
|
||||
}
|
||||
|
||||
return sizeInBytes
|
||||
}
|
||||
|
||||
func (d *Document) AddField(f Field) *Document {
|
||||
switch f := f.(type) {
|
||||
case *CompositeField:
|
||||
d.CompositeFields = append(d.CompositeFields, f)
|
||||
default:
|
||||
d.Fields = append(d.Fields, f)
|
||||
}
|
||||
return d
|
||||
}
|
||||
|
||||
func (d *Document) GoString() string {
|
||||
fields := ""
|
||||
for i, field := range d.Fields {
|
||||
if i != 0 {
|
||||
fields += ", "
|
||||
}
|
||||
fields += fmt.Sprintf("%#v", field)
|
||||
}
|
||||
compositeFields := ""
|
||||
for i, field := range d.CompositeFields {
|
||||
if i != 0 {
|
||||
compositeFields += ", "
|
||||
}
|
||||
compositeFields += fmt.Sprintf("%#v", field)
|
||||
}
|
||||
return fmt.Sprintf("&document.Document{ID:%s, Fields: %s, CompositeFields: %s}", d.ID, fields, compositeFields)
|
||||
}
|
||||
|
||||
func (d *Document) NumPlainTextBytes() uint64 {
|
||||
rv := uint64(0)
|
||||
for _, field := range d.Fields {
|
||||
rv += field.NumPlainTextBytes()
|
||||
}
|
||||
for _, compositeField := range d.CompositeFields {
|
||||
for _, field := range d.Fields {
|
||||
if compositeField.includesField(field.Name()) {
|
||||
rv += field.NumPlainTextBytes()
|
||||
}
|
||||
}
|
||||
}
|
||||
return rv
|
||||
}
|
|
@ -0,0 +1,41 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package document
|
||||
|
||||
import (
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
)
|
||||
|
||||
type Field interface {
|
||||
// Name returns the path of the field from the root DocumentMapping.
|
||||
// A root field path is "field", a subdocument field is "parent.field".
|
||||
Name() string
|
||||
// ArrayPositions returns the intermediate document and field indices
|
||||
// required to resolve the field value in the document. For example, if the
|
||||
// field path is "doc1.doc2.field" where doc1 and doc2 are slices or
|
||||
// arrays, ArrayPositions returns 2 indices used to resolve "doc2" value in
|
||||
// "doc1", then "field" in "doc2".
|
||||
ArrayPositions() []uint64
|
||||
Options() IndexingOptions
|
||||
Analyze() (int, analysis.TokenFrequencies)
|
||||
Value() []byte
|
||||
|
||||
// NumPlainTextBytes should return the number of plain text bytes
|
||||
// that this field represents - this is a common metric for tracking
|
||||
// the rate of indexing
|
||||
NumPlainTextBytes() uint64
|
||||
|
||||
Size() int
|
||||
}
|
|
@ -0,0 +1,123 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package document
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"reflect"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeBooleanField int
|
||||
|
||||
func init() {
|
||||
var f BooleanField
|
||||
reflectStaticSizeBooleanField = int(reflect.TypeOf(f).Size())
|
||||
}
|
||||
|
||||
const DefaultBooleanIndexingOptions = StoreField | IndexField | DocValues
|
||||
|
||||
type BooleanField struct {
|
||||
name string
|
||||
arrayPositions []uint64
|
||||
options IndexingOptions
|
||||
value []byte
|
||||
numPlainTextBytes uint64
|
||||
}
|
||||
|
||||
func (b *BooleanField) Size() int {
|
||||
return reflectStaticSizeBooleanField + size.SizeOfPtr +
|
||||
len(b.name) +
|
||||
len(b.arrayPositions)*size.SizeOfUint64 +
|
||||
len(b.value)
|
||||
}
|
||||
|
||||
func (b *BooleanField) Name() string {
|
||||
return b.name
|
||||
}
|
||||
|
||||
func (b *BooleanField) ArrayPositions() []uint64 {
|
||||
return b.arrayPositions
|
||||
}
|
||||
|
||||
func (b *BooleanField) Options() IndexingOptions {
|
||||
return b.options
|
||||
}
|
||||
|
||||
func (b *BooleanField) Analyze() (int, analysis.TokenFrequencies) {
|
||||
tokens := make(analysis.TokenStream, 0)
|
||||
tokens = append(tokens, &analysis.Token{
|
||||
Start: 0,
|
||||
End: len(b.value),
|
||||
Term: b.value,
|
||||
Position: 1,
|
||||
Type: analysis.Boolean,
|
||||
})
|
||||
|
||||
fieldLength := len(tokens)
|
||||
tokenFreqs := analysis.TokenFrequency(tokens, b.arrayPositions, b.options.IncludeTermVectors())
|
||||
return fieldLength, tokenFreqs
|
||||
}
|
||||
|
||||
func (b *BooleanField) Value() []byte {
|
||||
return b.value
|
||||
}
|
||||
|
||||
func (b *BooleanField) Boolean() (bool, error) {
|
||||
if len(b.value) == 1 {
|
||||
return b.value[0] == 'T', nil
|
||||
}
|
||||
return false, fmt.Errorf("boolean field has %d bytes", len(b.value))
|
||||
}
|
||||
|
||||
func (b *BooleanField) GoString() string {
|
||||
return fmt.Sprintf("&document.BooleanField{Name:%s, Options: %s, Value: %s}", b.name, b.options, b.value)
|
||||
}
|
||||
|
||||
func (b *BooleanField) NumPlainTextBytes() uint64 {
|
||||
return b.numPlainTextBytes
|
||||
}
|
||||
|
||||
func NewBooleanFieldFromBytes(name string, arrayPositions []uint64, value []byte) *BooleanField {
|
||||
return &BooleanField{
|
||||
name: name,
|
||||
arrayPositions: arrayPositions,
|
||||
value: value,
|
||||
options: DefaultNumericIndexingOptions,
|
||||
numPlainTextBytes: uint64(len(value)),
|
||||
}
|
||||
}
|
||||
|
||||
func NewBooleanField(name string, arrayPositions []uint64, b bool) *BooleanField {
|
||||
return NewBooleanFieldWithIndexingOptions(name, arrayPositions, b, DefaultNumericIndexingOptions)
|
||||
}
|
||||
|
||||
func NewBooleanFieldWithIndexingOptions(name string, arrayPositions []uint64, b bool, options IndexingOptions) *BooleanField {
|
||||
numPlainTextBytes := 5
|
||||
v := []byte("F")
|
||||
if b {
|
||||
numPlainTextBytes = 4
|
||||
v = []byte("T")
|
||||
}
|
||||
return &BooleanField{
|
||||
name: name,
|
||||
arrayPositions: arrayPositions,
|
||||
value: v,
|
||||
options: options,
|
||||
numPlainTextBytes: uint64(numPlainTextBytes),
|
||||
}
|
||||
}
|
|
@ -0,0 +1,124 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package document
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeCompositeField int
|
||||
|
||||
func init() {
|
||||
var cf CompositeField
|
||||
reflectStaticSizeCompositeField = int(reflect.TypeOf(cf).Size())
|
||||
}
|
||||
|
||||
const DefaultCompositeIndexingOptions = IndexField
|
||||
|
||||
type CompositeField struct {
|
||||
name string
|
||||
includedFields map[string]bool
|
||||
excludedFields map[string]bool
|
||||
defaultInclude bool
|
||||
options IndexingOptions
|
||||
totalLength int
|
||||
compositeFrequencies analysis.TokenFrequencies
|
||||
}
|
||||
|
||||
func NewCompositeField(name string, defaultInclude bool, include []string, exclude []string) *CompositeField {
|
||||
return NewCompositeFieldWithIndexingOptions(name, defaultInclude, include, exclude, DefaultCompositeIndexingOptions)
|
||||
}
|
||||
|
||||
func NewCompositeFieldWithIndexingOptions(name string, defaultInclude bool, include []string, exclude []string, options IndexingOptions) *CompositeField {
|
||||
rv := &CompositeField{
|
||||
name: name,
|
||||
options: options,
|
||||
defaultInclude: defaultInclude,
|
||||
includedFields: make(map[string]bool, len(include)),
|
||||
excludedFields: make(map[string]bool, len(exclude)),
|
||||
compositeFrequencies: make(analysis.TokenFrequencies),
|
||||
}
|
||||
|
||||
for _, i := range include {
|
||||
rv.includedFields[i] = true
|
||||
}
|
||||
for _, e := range exclude {
|
||||
rv.excludedFields[e] = true
|
||||
}
|
||||
|
||||
return rv
|
||||
}
|
||||
|
||||
func (c *CompositeField) Size() int {
|
||||
sizeInBytes := reflectStaticSizeCompositeField + size.SizeOfPtr +
|
||||
len(c.name)
|
||||
|
||||
for k, _ := range c.includedFields {
|
||||
sizeInBytes += size.SizeOfString + len(k) + size.SizeOfBool
|
||||
}
|
||||
|
||||
for k, _ := range c.excludedFields {
|
||||
sizeInBytes += size.SizeOfString + len(k) + size.SizeOfBool
|
||||
}
|
||||
|
||||
return sizeInBytes
|
||||
}
|
||||
|
||||
func (c *CompositeField) Name() string {
|
||||
return c.name
|
||||
}
|
||||
|
||||
func (c *CompositeField) ArrayPositions() []uint64 {
|
||||
return []uint64{}
|
||||
}
|
||||
|
||||
func (c *CompositeField) Options() IndexingOptions {
|
||||
return c.options
|
||||
}
|
||||
|
||||
func (c *CompositeField) Analyze() (int, analysis.TokenFrequencies) {
|
||||
return c.totalLength, c.compositeFrequencies
|
||||
}
|
||||
|
||||
func (c *CompositeField) Value() []byte {
|
||||
return []byte{}
|
||||
}
|
||||
|
||||
func (c *CompositeField) NumPlainTextBytes() uint64 {
|
||||
return 0
|
||||
}
|
||||
|
||||
func (c *CompositeField) includesField(field string) bool {
|
||||
shouldInclude := c.defaultInclude
|
||||
_, fieldShouldBeIncluded := c.includedFields[field]
|
||||
if fieldShouldBeIncluded {
|
||||
shouldInclude = true
|
||||
}
|
||||
_, fieldShouldBeExcluded := c.excludedFields[field]
|
||||
if fieldShouldBeExcluded {
|
||||
shouldInclude = false
|
||||
}
|
||||
return shouldInclude
|
||||
}
|
||||
|
||||
func (c *CompositeField) Compose(field string, length int, freq analysis.TokenFrequencies) {
|
||||
if c.includesField(field) {
|
||||
c.totalLength += length
|
||||
c.compositeFrequencies.MergeAll(field, freq)
|
||||
}
|
||||
}
|
|
@ -0,0 +1,159 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package document
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"math"
|
||||
"reflect"
|
||||
"time"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
"github.com/blevesearch/bleve/numeric"
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeDateTimeField int
|
||||
|
||||
func init() {
|
||||
var f DateTimeField
|
||||
reflectStaticSizeDateTimeField = int(reflect.TypeOf(f).Size())
|
||||
}
|
||||
|
||||
const DefaultDateTimeIndexingOptions = StoreField | IndexField | DocValues
|
||||
const DefaultDateTimePrecisionStep uint = 4
|
||||
|
||||
var MinTimeRepresentable = time.Unix(0, math.MinInt64)
|
||||
var MaxTimeRepresentable = time.Unix(0, math.MaxInt64)
|
||||
|
||||
type DateTimeField struct {
|
||||
name string
|
||||
arrayPositions []uint64
|
||||
options IndexingOptions
|
||||
value numeric.PrefixCoded
|
||||
numPlainTextBytes uint64
|
||||
}
|
||||
|
||||
func (n *DateTimeField) Size() int {
|
||||
return reflectStaticSizeDateTimeField + size.SizeOfPtr +
|
||||
len(n.name) +
|
||||
len(n.arrayPositions)*size.SizeOfUint64
|
||||
}
|
||||
|
||||
func (n *DateTimeField) Name() string {
|
||||
return n.name
|
||||
}
|
||||
|
||||
func (n *DateTimeField) ArrayPositions() []uint64 {
|
||||
return n.arrayPositions
|
||||
}
|
||||
|
||||
func (n *DateTimeField) Options() IndexingOptions {
|
||||
return n.options
|
||||
}
|
||||
|
||||
func (n *DateTimeField) Analyze() (int, analysis.TokenFrequencies) {
|
||||
tokens := make(analysis.TokenStream, 0)
|
||||
tokens = append(tokens, &analysis.Token{
|
||||
Start: 0,
|
||||
End: len(n.value),
|
||||
Term: n.value,
|
||||
Position: 1,
|
||||
Type: analysis.DateTime,
|
||||
})
|
||||
|
||||
original, err := n.value.Int64()
|
||||
if err == nil {
|
||||
|
||||
shift := DefaultDateTimePrecisionStep
|
||||
for shift < 64 {
|
||||
shiftEncoded, err := numeric.NewPrefixCodedInt64(original, shift)
|
||||
if err != nil {
|
||||
break
|
||||
}
|
||||
token := analysis.Token{
|
||||
Start: 0,
|
||||
End: len(shiftEncoded),
|
||||
Term: shiftEncoded,
|
||||
Position: 1,
|
||||
Type: analysis.DateTime,
|
||||
}
|
||||
tokens = append(tokens, &token)
|
||||
shift += DefaultDateTimePrecisionStep
|
||||
}
|
||||
}
|
||||
|
||||
fieldLength := len(tokens)
|
||||
tokenFreqs := analysis.TokenFrequency(tokens, n.arrayPositions, n.options.IncludeTermVectors())
|
||||
return fieldLength, tokenFreqs
|
||||
}
|
||||
|
||||
func (n *DateTimeField) Value() []byte {
|
||||
return n.value
|
||||
}
|
||||
|
||||
func (n *DateTimeField) DateTime() (time.Time, error) {
|
||||
i64, err := n.value.Int64()
|
||||
if err != nil {
|
||||
return time.Time{}, err
|
||||
}
|
||||
return time.Unix(0, i64).UTC(), nil
|
||||
}
|
||||
|
||||
func (n *DateTimeField) GoString() string {
|
||||
return fmt.Sprintf("&document.DateField{Name:%s, Options: %s, Value: %s}", n.name, n.options, n.value)
|
||||
}
|
||||
|
||||
func (n *DateTimeField) NumPlainTextBytes() uint64 {
|
||||
return n.numPlainTextBytes
|
||||
}
|
||||
|
||||
func NewDateTimeFieldFromBytes(name string, arrayPositions []uint64, value []byte) *DateTimeField {
|
||||
return &DateTimeField{
|
||||
name: name,
|
||||
arrayPositions: arrayPositions,
|
||||
value: value,
|
||||
options: DefaultDateTimeIndexingOptions,
|
||||
numPlainTextBytes: uint64(len(value)),
|
||||
}
|
||||
}
|
||||
|
||||
func NewDateTimeField(name string, arrayPositions []uint64, dt time.Time) (*DateTimeField, error) {
|
||||
return NewDateTimeFieldWithIndexingOptions(name, arrayPositions, dt, DefaultDateTimeIndexingOptions)
|
||||
}
|
||||
|
||||
func NewDateTimeFieldWithIndexingOptions(name string, arrayPositions []uint64, dt time.Time, options IndexingOptions) (*DateTimeField, error) {
|
||||
if canRepresent(dt) {
|
||||
dtInt64 := dt.UnixNano()
|
||||
prefixCoded := numeric.MustNewPrefixCodedInt64(dtInt64, 0)
|
||||
return &DateTimeField{
|
||||
name: name,
|
||||
arrayPositions: arrayPositions,
|
||||
value: prefixCoded,
|
||||
options: options,
|
||||
// not correct, just a place holder until we revisit how fields are
|
||||
// represented and can fix this better
|
||||
numPlainTextBytes: uint64(8),
|
||||
}, nil
|
||||
}
|
||||
return nil, fmt.Errorf("cannot represent %s in this type", dt)
|
||||
}
|
||||
|
||||
func canRepresent(dt time.Time) bool {
|
||||
if dt.Before(MinTimeRepresentable) || dt.After(MaxTimeRepresentable) {
|
||||
return false
|
||||
}
|
||||
return true
|
||||
}
|
|
@ -0,0 +1,152 @@
|
|||
// Copyright (c) 2017 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package document
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"reflect"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
"github.com/blevesearch/bleve/geo"
|
||||
"github.com/blevesearch/bleve/numeric"
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeGeoPointField int
|
||||
|
||||
func init() {
|
||||
var f GeoPointField
|
||||
reflectStaticSizeGeoPointField = int(reflect.TypeOf(f).Size())
|
||||
}
|
||||
|
||||
var GeoPrecisionStep uint = 9
|
||||
|
||||
type GeoPointField struct {
|
||||
name string
|
||||
arrayPositions []uint64
|
||||
options IndexingOptions
|
||||
value numeric.PrefixCoded
|
||||
numPlainTextBytes uint64
|
||||
}
|
||||
|
||||
func (n *GeoPointField) Size() int {
|
||||
return reflectStaticSizeGeoPointField + size.SizeOfPtr +
|
||||
len(n.name) +
|
||||
len(n.arrayPositions)*size.SizeOfUint64
|
||||
}
|
||||
|
||||
func (n *GeoPointField) Name() string {
|
||||
return n.name
|
||||
}
|
||||
|
||||
func (n *GeoPointField) ArrayPositions() []uint64 {
|
||||
return n.arrayPositions
|
||||
}
|
||||
|
||||
func (n *GeoPointField) Options() IndexingOptions {
|
||||
return n.options
|
||||
}
|
||||
|
||||
func (n *GeoPointField) Analyze() (int, analysis.TokenFrequencies) {
|
||||
tokens := make(analysis.TokenStream, 0)
|
||||
tokens = append(tokens, &analysis.Token{
|
||||
Start: 0,
|
||||
End: len(n.value),
|
||||
Term: n.value,
|
||||
Position: 1,
|
||||
Type: analysis.Numeric,
|
||||
})
|
||||
|
||||
original, err := n.value.Int64()
|
||||
if err == nil {
|
||||
|
||||
shift := GeoPrecisionStep
|
||||
for shift < 64 {
|
||||
shiftEncoded, err := numeric.NewPrefixCodedInt64(original, shift)
|
||||
if err != nil {
|
||||
break
|
||||
}
|
||||
token := analysis.Token{
|
||||
Start: 0,
|
||||
End: len(shiftEncoded),
|
||||
Term: shiftEncoded,
|
||||
Position: 1,
|
||||
Type: analysis.Numeric,
|
||||
}
|
||||
tokens = append(tokens, &token)
|
||||
shift += GeoPrecisionStep
|
||||
}
|
||||
}
|
||||
|
||||
fieldLength := len(tokens)
|
||||
tokenFreqs := analysis.TokenFrequency(tokens, n.arrayPositions, n.options.IncludeTermVectors())
|
||||
return fieldLength, tokenFreqs
|
||||
}
|
||||
|
||||
func (n *GeoPointField) Value() []byte {
|
||||
return n.value
|
||||
}
|
||||
|
||||
func (n *GeoPointField) Lon() (float64, error) {
|
||||
i64, err := n.value.Int64()
|
||||
if err != nil {
|
||||
return 0.0, err
|
||||
}
|
||||
return geo.MortonUnhashLon(uint64(i64)), nil
|
||||
}
|
||||
|
||||
func (n *GeoPointField) Lat() (float64, error) {
|
||||
i64, err := n.value.Int64()
|
||||
if err != nil {
|
||||
return 0.0, err
|
||||
}
|
||||
return geo.MortonUnhashLat(uint64(i64)), nil
|
||||
}
|
||||
|
||||
func (n *GeoPointField) GoString() string {
|
||||
return fmt.Sprintf("&document.GeoPointField{Name:%s, Options: %s, Value: %s}", n.name, n.options, n.value)
|
||||
}
|
||||
|
||||
func (n *GeoPointField) NumPlainTextBytes() uint64 {
|
||||
return n.numPlainTextBytes
|
||||
}
|
||||
|
||||
func NewGeoPointFieldFromBytes(name string, arrayPositions []uint64, value []byte) *GeoPointField {
|
||||
return &GeoPointField{
|
||||
name: name,
|
||||
arrayPositions: arrayPositions,
|
||||
value: value,
|
||||
options: DefaultNumericIndexingOptions,
|
||||
numPlainTextBytes: uint64(len(value)),
|
||||
}
|
||||
}
|
||||
|
||||
func NewGeoPointField(name string, arrayPositions []uint64, lon, lat float64) *GeoPointField {
|
||||
return NewGeoPointFieldWithIndexingOptions(name, arrayPositions, lon, lat, DefaultNumericIndexingOptions)
|
||||
}
|
||||
|
||||
func NewGeoPointFieldWithIndexingOptions(name string, arrayPositions []uint64, lon, lat float64, options IndexingOptions) *GeoPointField {
|
||||
mhash := geo.MortonHash(lon, lat)
|
||||
prefixCoded := numeric.MustNewPrefixCodedInt64(int64(mhash), 0)
|
||||
return &GeoPointField{
|
||||
name: name,
|
||||
arrayPositions: arrayPositions,
|
||||
value: prefixCoded,
|
||||
options: options,
|
||||
// not correct, just a place holder until we revisit how fields are
|
||||
// represented and can fix this better
|
||||
numPlainTextBytes: uint64(8),
|
||||
}
|
||||
}
|
|
@ -0,0 +1,145 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package document
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"reflect"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
"github.com/blevesearch/bleve/numeric"
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeNumericField int
|
||||
|
||||
func init() {
|
||||
var f NumericField
|
||||
reflectStaticSizeNumericField = int(reflect.TypeOf(f).Size())
|
||||
}
|
||||
|
||||
const DefaultNumericIndexingOptions = StoreField | IndexField | DocValues
|
||||
|
||||
const DefaultPrecisionStep uint = 4
|
||||
|
||||
type NumericField struct {
|
||||
name string
|
||||
arrayPositions []uint64
|
||||
options IndexingOptions
|
||||
value numeric.PrefixCoded
|
||||
numPlainTextBytes uint64
|
||||
}
|
||||
|
||||
func (n *NumericField) Size() int {
|
||||
return reflectStaticSizeNumericField + size.SizeOfPtr +
|
||||
len(n.name) +
|
||||
len(n.arrayPositions)*size.SizeOfPtr
|
||||
}
|
||||
|
||||
func (n *NumericField) Name() string {
|
||||
return n.name
|
||||
}
|
||||
|
||||
func (n *NumericField) ArrayPositions() []uint64 {
|
||||
return n.arrayPositions
|
||||
}
|
||||
|
||||
func (n *NumericField) Options() IndexingOptions {
|
||||
return n.options
|
||||
}
|
||||
|
||||
func (n *NumericField) Analyze() (int, analysis.TokenFrequencies) {
|
||||
tokens := make(analysis.TokenStream, 0)
|
||||
tokens = append(tokens, &analysis.Token{
|
||||
Start: 0,
|
||||
End: len(n.value),
|
||||
Term: n.value,
|
||||
Position: 1,
|
||||
Type: analysis.Numeric,
|
||||
})
|
||||
|
||||
original, err := n.value.Int64()
|
||||
if err == nil {
|
||||
|
||||
shift := DefaultPrecisionStep
|
||||
for shift < 64 {
|
||||
shiftEncoded, err := numeric.NewPrefixCodedInt64(original, shift)
|
||||
if err != nil {
|
||||
break
|
||||
}
|
||||
token := analysis.Token{
|
||||
Start: 0,
|
||||
End: len(shiftEncoded),
|
||||
Term: shiftEncoded,
|
||||
Position: 1,
|
||||
Type: analysis.Numeric,
|
||||
}
|
||||
tokens = append(tokens, &token)
|
||||
shift += DefaultPrecisionStep
|
||||
}
|
||||
}
|
||||
|
||||
fieldLength := len(tokens)
|
||||
tokenFreqs := analysis.TokenFrequency(tokens, n.arrayPositions, n.options.IncludeTermVectors())
|
||||
return fieldLength, tokenFreqs
|
||||
}
|
||||
|
||||
func (n *NumericField) Value() []byte {
|
||||
return n.value
|
||||
}
|
||||
|
||||
func (n *NumericField) Number() (float64, error) {
|
||||
i64, err := n.value.Int64()
|
||||
if err != nil {
|
||||
return 0.0, err
|
||||
}
|
||||
return numeric.Int64ToFloat64(i64), nil
|
||||
}
|
||||
|
||||
func (n *NumericField) GoString() string {
|
||||
return fmt.Sprintf("&document.NumericField{Name:%s, Options: %s, Value: %s}", n.name, n.options, n.value)
|
||||
}
|
||||
|
||||
func (n *NumericField) NumPlainTextBytes() uint64 {
|
||||
return n.numPlainTextBytes
|
||||
}
|
||||
|
||||
func NewNumericFieldFromBytes(name string, arrayPositions []uint64, value []byte) *NumericField {
|
||||
return &NumericField{
|
||||
name: name,
|
||||
arrayPositions: arrayPositions,
|
||||
value: value,
|
||||
options: DefaultNumericIndexingOptions,
|
||||
numPlainTextBytes: uint64(len(value)),
|
||||
}
|
||||
}
|
||||
|
||||
func NewNumericField(name string, arrayPositions []uint64, number float64) *NumericField {
|
||||
return NewNumericFieldWithIndexingOptions(name, arrayPositions, number, DefaultNumericIndexingOptions)
|
||||
}
|
||||
|
||||
func NewNumericFieldWithIndexingOptions(name string, arrayPositions []uint64, number float64, options IndexingOptions) *NumericField {
|
||||
numberInt64 := numeric.Float64ToInt64(number)
|
||||
prefixCoded := numeric.MustNewPrefixCodedInt64(numberInt64, 0)
|
||||
return &NumericField{
|
||||
name: name,
|
||||
arrayPositions: arrayPositions,
|
||||
value: prefixCoded,
|
||||
options: options,
|
||||
// not correct, just a place holder until we revisit how fields are
|
||||
// represented and can fix this better
|
||||
numPlainTextBytes: uint64(8),
|
||||
}
|
||||
}
|
|
@ -0,0 +1,139 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package document
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"reflect"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeTextField int
|
||||
|
||||
func init() {
|
||||
var f TextField
|
||||
reflectStaticSizeTextField = int(reflect.TypeOf(f).Size())
|
||||
}
|
||||
|
||||
const DefaultTextIndexingOptions = IndexField | DocValues
|
||||
|
||||
type TextField struct {
|
||||
name string
|
||||
arrayPositions []uint64
|
||||
options IndexingOptions
|
||||
analyzer *analysis.Analyzer
|
||||
value []byte
|
||||
numPlainTextBytes uint64
|
||||
}
|
||||
|
||||
func (t *TextField) Size() int {
|
||||
return reflectStaticSizeTextField + size.SizeOfPtr +
|
||||
len(t.name) +
|
||||
len(t.arrayPositions)*size.SizeOfUint64 +
|
||||
len(t.value)
|
||||
}
|
||||
|
||||
func (t *TextField) Name() string {
|
||||
return t.name
|
||||
}
|
||||
|
||||
func (t *TextField) ArrayPositions() []uint64 {
|
||||
return t.arrayPositions
|
||||
}
|
||||
|
||||
func (t *TextField) Options() IndexingOptions {
|
||||
return t.options
|
||||
}
|
||||
|
||||
func (t *TextField) Analyze() (int, analysis.TokenFrequencies) {
|
||||
var tokens analysis.TokenStream
|
||||
if t.analyzer != nil {
|
||||
bytesToAnalyze := t.Value()
|
||||
if t.options.IsStored() {
|
||||
// need to copy
|
||||
bytesCopied := make([]byte, len(bytesToAnalyze))
|
||||
copy(bytesCopied, bytesToAnalyze)
|
||||
bytesToAnalyze = bytesCopied
|
||||
}
|
||||
tokens = t.analyzer.Analyze(bytesToAnalyze)
|
||||
} else {
|
||||
tokens = analysis.TokenStream{
|
||||
&analysis.Token{
|
||||
Start: 0,
|
||||
End: len(t.value),
|
||||
Term: t.value,
|
||||
Position: 1,
|
||||
Type: analysis.AlphaNumeric,
|
||||
},
|
||||
}
|
||||
}
|
||||
fieldLength := len(tokens) // number of tokens in this doc field
|
||||
tokenFreqs := analysis.TokenFrequency(tokens, t.arrayPositions, t.options.IncludeTermVectors())
|
||||
return fieldLength, tokenFreqs
|
||||
}
|
||||
|
||||
func (t *TextField) Analyzer() *analysis.Analyzer {
|
||||
return t.analyzer
|
||||
}
|
||||
|
||||
func (t *TextField) Value() []byte {
|
||||
return t.value
|
||||
}
|
||||
|
||||
func (t *TextField) GoString() string {
|
||||
return fmt.Sprintf("&document.TextField{Name:%s, Options: %s, Analyzer: %v, Value: %s, ArrayPositions: %v}", t.name, t.options, t.analyzer, t.value, t.arrayPositions)
|
||||
}
|
||||
|
||||
func (t *TextField) NumPlainTextBytes() uint64 {
|
||||
return t.numPlainTextBytes
|
||||
}
|
||||
|
||||
func NewTextField(name string, arrayPositions []uint64, value []byte) *TextField {
|
||||
return NewTextFieldWithIndexingOptions(name, arrayPositions, value, DefaultTextIndexingOptions)
|
||||
}
|
||||
|
||||
func NewTextFieldWithIndexingOptions(name string, arrayPositions []uint64, value []byte, options IndexingOptions) *TextField {
|
||||
return &TextField{
|
||||
name: name,
|
||||
arrayPositions: arrayPositions,
|
||||
options: options,
|
||||
value: value,
|
||||
numPlainTextBytes: uint64(len(value)),
|
||||
}
|
||||
}
|
||||
|
||||
func NewTextFieldWithAnalyzer(name string, arrayPositions []uint64, value []byte, analyzer *analysis.Analyzer) *TextField {
|
||||
return &TextField{
|
||||
name: name,
|
||||
arrayPositions: arrayPositions,
|
||||
options: DefaultTextIndexingOptions,
|
||||
analyzer: analyzer,
|
||||
value: value,
|
||||
numPlainTextBytes: uint64(len(value)),
|
||||
}
|
||||
}
|
||||
|
||||
func NewTextFieldCustom(name string, arrayPositions []uint64, value []byte, options IndexingOptions, analyzer *analysis.Analyzer) *TextField {
|
||||
return &TextField{
|
||||
name: name,
|
||||
arrayPositions: arrayPositions,
|
||||
options: options,
|
||||
analyzer: analyzer,
|
||||
value: value,
|
||||
numPlainTextBytes: uint64(len(value)),
|
||||
}
|
||||
}
|
|
@ -0,0 +1,66 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package document
|
||||
|
||||
type IndexingOptions int
|
||||
|
||||
const (
|
||||
IndexField IndexingOptions = 1 << iota
|
||||
StoreField
|
||||
IncludeTermVectors
|
||||
DocValues
|
||||
)
|
||||
|
||||
func (o IndexingOptions) IsIndexed() bool {
|
||||
return o&IndexField != 0
|
||||
}
|
||||
|
||||
func (o IndexingOptions) IsStored() bool {
|
||||
return o&StoreField != 0
|
||||
}
|
||||
|
||||
func (o IndexingOptions) IncludeTermVectors() bool {
|
||||
return o&IncludeTermVectors != 0
|
||||
}
|
||||
|
||||
func (o IndexingOptions) IncludeDocValues() bool {
|
||||
return o&DocValues != 0
|
||||
}
|
||||
|
||||
func (o IndexingOptions) String() string {
|
||||
rv := ""
|
||||
if o.IsIndexed() {
|
||||
rv += "INDEXED"
|
||||
}
|
||||
if o.IsStored() {
|
||||
if rv != "" {
|
||||
rv += ", "
|
||||
}
|
||||
rv += "STORE"
|
||||
}
|
||||
if o.IncludeTermVectors() {
|
||||
if rv != "" {
|
||||
rv += ", "
|
||||
}
|
||||
rv += "TV"
|
||||
}
|
||||
if o.IncludeDocValues() {
|
||||
if rv != "" {
|
||||
rv += ", "
|
||||
}
|
||||
rv += "DV"
|
||||
}
|
||||
return rv
|
||||
}
|
|
@ -0,0 +1,9 @@
|
|||
# geo support in bleve
|
||||
|
||||
First, all of this geo code is a Go adaptation of the [Lucene 5.3.2 sandbox geo support](https://lucene.apache.org/core/5_3_2/sandbox/org/apache/lucene/util/package-summary.html).
|
||||
|
||||
## Notes
|
||||
|
||||
- All of the APIs will use float64 for lon/lat values.
|
||||
- When describing a point in function arguments or return values, we always use the order lon, lat.
|
||||
- High level APIs will use TopLeft and BottomRight to describe bounding boxes. This may not map cleanly to min/max lon/lat when crossing the dateline. The lower level APIs will use min/max lon/lat and require the higher-level code to split boxes accordingly.
|
|
@ -0,0 +1,208 @@
|
|||
// Copyright (c) 2017 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package geo
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"math"
|
||||
|
||||
"github.com/blevesearch/bleve/numeric"
|
||||
)
|
||||
|
||||
// GeoBits is the number of bits used for a single geo point
|
||||
// Currently this is 32bits for lon and 32bits for lat
|
||||
var GeoBits uint = 32
|
||||
|
||||
var minLon = -180.0
|
||||
var minLat = -90.0
|
||||
var maxLon = 180.0
|
||||
var maxLat = 90.0
|
||||
var minLonRad = minLon * degreesToRadian
|
||||
var minLatRad = minLat * degreesToRadian
|
||||
var maxLonRad = maxLon * degreesToRadian
|
||||
var maxLatRad = maxLat * degreesToRadian
|
||||
var geoTolerance = 1E-6
|
||||
var lonScale = float64((uint64(0x1)<<GeoBits)-1) / 360.0
|
||||
var latScale = float64((uint64(0x1)<<GeoBits)-1) / 180.0
|
||||
|
||||
// Point represents a geo point.
|
||||
type Point struct {
|
||||
Lon float64
|
||||
Lat float64
|
||||
}
|
||||
|
||||
// MortonHash computes the morton hash value for the provided geo point
|
||||
// This point is ordered as lon, lat.
|
||||
func MortonHash(lon, lat float64) uint64 {
|
||||
return numeric.Interleave(scaleLon(lon), scaleLat(lat))
|
||||
}
|
||||
|
||||
func scaleLon(lon float64) uint64 {
|
||||
rv := uint64((lon - minLon) * lonScale)
|
||||
return rv
|
||||
}
|
||||
|
||||
func scaleLat(lat float64) uint64 {
|
||||
rv := uint64((lat - minLat) * latScale)
|
||||
return rv
|
||||
}
|
||||
|
||||
// MortonUnhashLon extracts the longitude value from the provided morton hash.
|
||||
func MortonUnhashLon(hash uint64) float64 {
|
||||
return unscaleLon(numeric.Deinterleave(hash))
|
||||
}
|
||||
|
||||
// MortonUnhashLat extracts the latitude value from the provided morton hash.
|
||||
func MortonUnhashLat(hash uint64) float64 {
|
||||
return unscaleLat(numeric.Deinterleave(hash >> 1))
|
||||
}
|
||||
|
||||
func unscaleLon(lon uint64) float64 {
|
||||
return (float64(lon) / lonScale) + minLon
|
||||
}
|
||||
|
||||
func unscaleLat(lat uint64) float64 {
|
||||
return (float64(lat) / latScale) + minLat
|
||||
}
|
||||
|
||||
// compareGeo will compare two float values and see if they are the same
|
||||
// taking into consideration a known geo tolerance.
|
||||
func compareGeo(a, b float64) float64 {
|
||||
compare := a - b
|
||||
if math.Abs(compare) <= geoTolerance {
|
||||
return 0
|
||||
}
|
||||
return compare
|
||||
}
|
||||
|
||||
// RectIntersects checks whether rectangles a and b intersect
|
||||
func RectIntersects(aMinX, aMinY, aMaxX, aMaxY, bMinX, bMinY, bMaxX, bMaxY float64) bool {
|
||||
return !(aMaxX < bMinX || aMinX > bMaxX || aMaxY < bMinY || aMinY > bMaxY)
|
||||
}
|
||||
|
||||
// RectWithin checks whether box a is within box b
|
||||
func RectWithin(aMinX, aMinY, aMaxX, aMaxY, bMinX, bMinY, bMaxX, bMaxY float64) bool {
|
||||
rv := !(aMinX < bMinX || aMinY < bMinY || aMaxX > bMaxX || aMaxY > bMaxY)
|
||||
return rv
|
||||
}
|
||||
|
||||
// BoundingBoxContains checks whether the lon/lat point is within the box
|
||||
func BoundingBoxContains(lon, lat, minLon, minLat, maxLon, maxLat float64) bool {
|
||||
return compareGeo(lon, minLon) >= 0 && compareGeo(lon, maxLon) <= 0 &&
|
||||
compareGeo(lat, minLat) >= 0 && compareGeo(lat, maxLat) <= 0
|
||||
}
|
||||
|
||||
const degreesToRadian = math.Pi / 180
|
||||
const radiansToDegrees = 180 / math.Pi
|
||||
|
||||
// DegreesToRadians converts an angle in degrees to radians
|
||||
func DegreesToRadians(d float64) float64 {
|
||||
return d * degreesToRadian
|
||||
}
|
||||
|
||||
// RadiansToDegrees converts an angle in radians to degress
|
||||
func RadiansToDegrees(r float64) float64 {
|
||||
return r * radiansToDegrees
|
||||
}
|
||||
|
||||
var earthMeanRadiusMeters = 6371008.7714
|
||||
|
||||
func RectFromPointDistance(lon, lat, dist float64) (float64, float64, float64, float64, error) {
|
||||
err := checkLongitude(lon)
|
||||
if err != nil {
|
||||
return 0, 0, 0, 0, err
|
||||
}
|
||||
err = checkLatitude(lat)
|
||||
if err != nil {
|
||||
return 0, 0, 0, 0, err
|
||||
}
|
||||
radLon := DegreesToRadians(lon)
|
||||
radLat := DegreesToRadians(lat)
|
||||
radDistance := (dist + 7e-2) / earthMeanRadiusMeters
|
||||
|
||||
minLatL := radLat - radDistance
|
||||
maxLatL := radLat + radDistance
|
||||
|
||||
var minLonL, maxLonL float64
|
||||
if minLatL > minLatRad && maxLatL < maxLatRad {
|
||||
deltaLon := asin(sin(radDistance) / cos(radLat))
|
||||
minLonL = radLon - deltaLon
|
||||
if minLonL < minLonRad {
|
||||
minLonL += 2 * math.Pi
|
||||
}
|
||||
maxLonL = radLon + deltaLon
|
||||
if maxLonL > maxLonRad {
|
||||
maxLonL -= 2 * math.Pi
|
||||
}
|
||||
} else {
|
||||
// pole is inside distance
|
||||
minLatL = math.Max(minLatL, minLatRad)
|
||||
maxLatL = math.Min(maxLatL, maxLatRad)
|
||||
minLonL = minLonRad
|
||||
maxLonL = maxLonRad
|
||||
}
|
||||
|
||||
return RadiansToDegrees(minLonL),
|
||||
RadiansToDegrees(maxLatL),
|
||||
RadiansToDegrees(maxLonL),
|
||||
RadiansToDegrees(minLatL),
|
||||
nil
|
||||
}
|
||||
|
||||
func checkLatitude(latitude float64) error {
|
||||
if math.IsNaN(latitude) || latitude < minLat || latitude > maxLat {
|
||||
return fmt.Errorf("invalid latitude %f; must be between %f and %f", latitude, minLat, maxLat)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func checkLongitude(longitude float64) error {
|
||||
if math.IsNaN(longitude) || longitude < minLon || longitude > maxLon {
|
||||
return fmt.Errorf("invalid longitude %f; must be between %f and %f", longitude, minLon, maxLon)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func BoundingRectangleForPolygon(polygon []Point) (
|
||||
float64, float64, float64, float64, error) {
|
||||
err := checkLongitude(polygon[0].Lon)
|
||||
if err != nil {
|
||||
return 0, 0, 0, 0, err
|
||||
}
|
||||
err = checkLatitude(polygon[0].Lat)
|
||||
if err != nil {
|
||||
return 0, 0, 0, 0, err
|
||||
}
|
||||
maxY, minY := polygon[0].Lat, polygon[0].Lat
|
||||
maxX, minX := polygon[0].Lon, polygon[0].Lon
|
||||
for i := 1; i < len(polygon); i++ {
|
||||
err := checkLongitude(polygon[i].Lon)
|
||||
if err != nil {
|
||||
return 0, 0, 0, 0, err
|
||||
}
|
||||
err = checkLatitude(polygon[i].Lat)
|
||||
if err != nil {
|
||||
return 0, 0, 0, 0, err
|
||||
}
|
||||
|
||||
maxY = math.Max(maxY, polygon[i].Lat)
|
||||
minY = math.Min(minY, polygon[i].Lat)
|
||||
|
||||
maxX = math.Max(maxX, polygon[i].Lon)
|
||||
minX = math.Min(minX, polygon[i].Lon)
|
||||
}
|
||||
|
||||
return minX, maxY, maxX, minY, nil
|
||||
}
|
|
@ -0,0 +1,98 @@
|
|||
// Copyright (c) 2017 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package geo
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"math"
|
||||
"strconv"
|
||||
"strings"
|
||||
)
|
||||
|
||||
type distanceUnit struct {
|
||||
conv float64
|
||||
suffixes []string
|
||||
}
|
||||
|
||||
var inch = distanceUnit{0.0254, []string{"in", "inch"}}
|
||||
var yard = distanceUnit{0.9144, []string{"yd", "yards"}}
|
||||
var feet = distanceUnit{0.3048, []string{"ft", "feet"}}
|
||||
var kilom = distanceUnit{1000, []string{"km", "kilometers"}}
|
||||
var nauticalm = distanceUnit{1852.0, []string{"nm", "nauticalmiles"}}
|
||||
var millim = distanceUnit{0.001, []string{"mm", "millimeters"}}
|
||||
var centim = distanceUnit{0.01, []string{"cm", "centimeters"}}
|
||||
var miles = distanceUnit{1609.344, []string{"mi", "miles"}}
|
||||
var meters = distanceUnit{1, []string{"m", "meters"}}
|
||||
|
||||
var distanceUnits = []*distanceUnit{
|
||||
&inch, &yard, &feet, &kilom, &nauticalm, &millim, ¢im, &miles, &meters,
|
||||
}
|
||||
|
||||
// ParseDistance attempts to parse a distance string and return distance in
|
||||
// meters. Example formats supported:
|
||||
// "5in" "5inch" "7yd" "7yards" "9ft" "9feet" "11km" "11kilometers"
|
||||
// "3nm" "3nauticalmiles" "13mm" "13millimeters" "15cm" "15centimeters"
|
||||
// "17mi" "17miles" "19m" "19meters"
|
||||
// If the unit cannot be determined, the entire string is parsed and the
|
||||
// unit of meters is assumed.
|
||||
// If the number portion cannot be parsed, 0 and the parse error are returned.
|
||||
func ParseDistance(d string) (float64, error) {
|
||||
for _, unit := range distanceUnits {
|
||||
for _, unitSuffix := range unit.suffixes {
|
||||
if strings.HasSuffix(d, unitSuffix) {
|
||||
parsedNum, err := strconv.ParseFloat(d[0:len(d)-len(unitSuffix)], 64)
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
return parsedNum * unit.conv, nil
|
||||
}
|
||||
}
|
||||
}
|
||||
// no unit matched, try assuming meters?
|
||||
parsedNum, err := strconv.ParseFloat(d, 64)
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
return parsedNum, nil
|
||||
}
|
||||
|
||||
// ParseDistanceUnit attempts to parse a distance unit and return the
|
||||
// multiplier for converting this to meters. If the unit cannot be parsed
|
||||
// then 0 and the error message is returned.
|
||||
func ParseDistanceUnit(u string) (float64, error) {
|
||||
for _, unit := range distanceUnits {
|
||||
for _, unitSuffix := range unit.suffixes {
|
||||
if u == unitSuffix {
|
||||
return unit.conv, nil
|
||||
}
|
||||
}
|
||||
}
|
||||
return 0, fmt.Errorf("unknown distance unit: %s", u)
|
||||
}
|
||||
|
||||
// Haversin computes the distance between two points.
|
||||
// This implemenation uses the sloppy math implemenations which trade off
|
||||
// accuracy for performance. The distance returned is in kilometers.
|
||||
func Haversin(lon1, lat1, lon2, lat2 float64) float64 {
|
||||
x1 := lat1 * degreesToRadian
|
||||
x2 := lat2 * degreesToRadian
|
||||
h1 := 1 - cos(x1-x2)
|
||||
h2 := 1 - cos((lon1-lon2)*degreesToRadian)
|
||||
h := (h1 + cos(x1)*cos(x2)*h2) / 2
|
||||
avgLat := (x1 + x2) / 2
|
||||
diameter := earthDiameter(avgLat)
|
||||
|
||||
return diameter * asin(math.Min(1, math.Sqrt(h)))
|
||||
}
|
|
@ -0,0 +1,111 @@
|
|||
// Copyright (c) 2019 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
// This implementation is inspired from the geohash-js
|
||||
// ref: https://github.com/davetroy/geohash-js
|
||||
|
||||
package geo
|
||||
|
||||
// encoding encapsulates an encoding defined by a given base32 alphabet.
|
||||
type encoding struct {
|
||||
enc string
|
||||
dec [256]byte
|
||||
}
|
||||
|
||||
// newEncoding constructs a new encoding defined by the given alphabet,
|
||||
// which must be a 32-byte string.
|
||||
func newEncoding(encoder string) *encoding {
|
||||
e := new(encoding)
|
||||
e.enc = encoder
|
||||
for i := 0; i < len(e.dec); i++ {
|
||||
e.dec[i] = 0xff
|
||||
}
|
||||
for i := 0; i < len(encoder); i++ {
|
||||
e.dec[encoder[i]] = byte(i)
|
||||
}
|
||||
return e
|
||||
}
|
||||
|
||||
// base32encoding with the Geohash alphabet.
|
||||
var base32encoding = newEncoding("0123456789bcdefghjkmnpqrstuvwxyz")
|
||||
|
||||
var masks = []uint64{16, 8, 4, 2, 1}
|
||||
|
||||
// DecodeGeoHash decodes the string geohash faster with
|
||||
// higher precision. This api is in experimental phase.
|
||||
func DecodeGeoHash(geoHash string) (float64, float64) {
|
||||
even := true
|
||||
lat := []float64{-90.0, 90.0}
|
||||
lon := []float64{-180.0, 180.0}
|
||||
|
||||
for i := 0; i < len(geoHash); i++ {
|
||||
cd := uint64(base32encoding.dec[geoHash[i]])
|
||||
for j := 0; j < 5; j++ {
|
||||
if even {
|
||||
if cd&masks[j] > 0 {
|
||||
lon[0] = (lon[0] + lon[1]) / 2
|
||||
} else {
|
||||
lon[1] = (lon[0] + lon[1]) / 2
|
||||
}
|
||||
} else {
|
||||
if cd&masks[j] > 0 {
|
||||
lat[0] = (lat[0] + lat[1]) / 2
|
||||
} else {
|
||||
lat[1] = (lat[0] + lat[1]) / 2
|
||||
}
|
||||
}
|
||||
even = !even
|
||||
}
|
||||
}
|
||||
|
||||
return (lat[0] + lat[1]) / 2, (lon[0] + lon[1]) / 2
|
||||
}
|
||||
|
||||
func EncodeGeoHash(lat, lon float64) string {
|
||||
even := true
|
||||
lats := []float64{-90.0, 90.0}
|
||||
lons := []float64{-180.0, 180.0}
|
||||
precision := 12
|
||||
var ch, bit uint64
|
||||
var geoHash string
|
||||
|
||||
for len(geoHash) < precision {
|
||||
if even {
|
||||
mid := (lons[0] + lons[1]) / 2
|
||||
if lon > mid {
|
||||
ch |= masks[bit]
|
||||
lons[0] = mid
|
||||
} else {
|
||||
lons[1] = mid
|
||||
}
|
||||
} else {
|
||||
mid := (lats[0] + lats[1]) / 2
|
||||
if lat > mid {
|
||||
ch |= masks[bit]
|
||||
lats[0] = mid
|
||||
} else {
|
||||
lats[1] = mid
|
||||
}
|
||||
}
|
||||
even = !even
|
||||
if bit < 4 {
|
||||
bit++
|
||||
} else {
|
||||
geoHash += string(base32encoding.enc[ch])
|
||||
ch = 0
|
||||
bit = 0
|
||||
}
|
||||
}
|
||||
|
||||
return geoHash
|
||||
}
|
|
@ -0,0 +1,179 @@
|
|||
// Copyright (c) 2017 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package geo
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
"strconv"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// ExtractGeoPoint takes an arbitrary interface{} and tries it's best to
|
||||
// interpret it is as geo point. Supported formats:
|
||||
// Container:
|
||||
// slice length 2 (GeoJSON)
|
||||
// first element lon, second element lat
|
||||
// string (coordinates separated by comma, or a geohash)
|
||||
// first element lat, second element lon
|
||||
// map[string]interface{}
|
||||
// exact keys lat and lon or lng
|
||||
// struct
|
||||
// w/exported fields case-insensitive match on lat and lon or lng
|
||||
// struct
|
||||
// satisfying Later and Loner or Lnger interfaces
|
||||
//
|
||||
// in all cases values must be some sort of numeric-like thing: int/uint/float
|
||||
func ExtractGeoPoint(thing interface{}) (lon, lat float64, success bool) {
|
||||
var foundLon, foundLat bool
|
||||
|
||||
thingVal := reflect.ValueOf(thing)
|
||||
if !thingVal.IsValid() {
|
||||
return lon, lat, false
|
||||
}
|
||||
|
||||
thingTyp := thingVal.Type()
|
||||
|
||||
// is it a slice
|
||||
if thingVal.Kind() == reflect.Slice {
|
||||
// must be length 2
|
||||
if thingVal.Len() == 2 {
|
||||
first := thingVal.Index(0)
|
||||
if first.CanInterface() {
|
||||
firstVal := first.Interface()
|
||||
lon, foundLon = extractNumericVal(firstVal)
|
||||
}
|
||||
second := thingVal.Index(1)
|
||||
if second.CanInterface() {
|
||||
secondVal := second.Interface()
|
||||
lat, foundLat = extractNumericVal(secondVal)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// is it a string
|
||||
if thingVal.Kind() == reflect.String {
|
||||
geoStr := thingVal.Interface().(string)
|
||||
if strings.Contains(geoStr, ",") {
|
||||
// geo point with coordinates split by comma
|
||||
points := strings.Split(geoStr, ",")
|
||||
for i, point := range points {
|
||||
// trim any leading or trailing white spaces
|
||||
points[i] = strings.TrimSpace(point)
|
||||
}
|
||||
if len(points) == 2 {
|
||||
var err error
|
||||
lat, err = strconv.ParseFloat(points[0], 64)
|
||||
if err == nil {
|
||||
foundLat = true
|
||||
}
|
||||
lon, err = strconv.ParseFloat(points[1], 64)
|
||||
if err == nil {
|
||||
foundLon = true
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// geohash
|
||||
lat, lon = DecodeGeoHash(geoStr)
|
||||
foundLat = true
|
||||
foundLon = true
|
||||
}
|
||||
}
|
||||
|
||||
// is it a map
|
||||
if l, ok := thing.(map[string]interface{}); ok {
|
||||
if lval, ok := l["lon"]; ok {
|
||||
lon, foundLon = extractNumericVal(lval)
|
||||
} else if lval, ok := l["lng"]; ok {
|
||||
lon, foundLon = extractNumericVal(lval)
|
||||
}
|
||||
if lval, ok := l["lat"]; ok {
|
||||
lat, foundLat = extractNumericVal(lval)
|
||||
}
|
||||
}
|
||||
|
||||
// now try reflection on struct fields
|
||||
if thingVal.Kind() == reflect.Struct {
|
||||
for i := 0; i < thingVal.NumField(); i++ {
|
||||
fieldName := thingTyp.Field(i).Name
|
||||
if strings.HasPrefix(strings.ToLower(fieldName), "lon") {
|
||||
if thingVal.Field(i).CanInterface() {
|
||||
fieldVal := thingVal.Field(i).Interface()
|
||||
lon, foundLon = extractNumericVal(fieldVal)
|
||||
}
|
||||
}
|
||||
if strings.HasPrefix(strings.ToLower(fieldName), "lng") {
|
||||
if thingVal.Field(i).CanInterface() {
|
||||
fieldVal := thingVal.Field(i).Interface()
|
||||
lon, foundLon = extractNumericVal(fieldVal)
|
||||
}
|
||||
}
|
||||
if strings.HasPrefix(strings.ToLower(fieldName), "lat") {
|
||||
if thingVal.Field(i).CanInterface() {
|
||||
fieldVal := thingVal.Field(i).Interface()
|
||||
lat, foundLat = extractNumericVal(fieldVal)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// last hope, some interfaces
|
||||
// lon
|
||||
if l, ok := thing.(loner); ok {
|
||||
lon = l.Lon()
|
||||
foundLon = true
|
||||
} else if l, ok := thing.(lnger); ok {
|
||||
lon = l.Lng()
|
||||
foundLon = true
|
||||
}
|
||||
// lat
|
||||
if l, ok := thing.(later); ok {
|
||||
lat = l.Lat()
|
||||
foundLat = true
|
||||
}
|
||||
|
||||
return lon, lat, foundLon && foundLat
|
||||
}
|
||||
|
||||
// extract numeric value (if possible) and returns a float64
|
||||
func extractNumericVal(v interface{}) (float64, bool) {
|
||||
val := reflect.ValueOf(v)
|
||||
if !val.IsValid() {
|
||||
return 0, false
|
||||
}
|
||||
typ := val.Type()
|
||||
switch typ.Kind() {
|
||||
case reflect.Float32, reflect.Float64:
|
||||
return val.Float(), true
|
||||
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
|
||||
return float64(val.Int()), true
|
||||
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
|
||||
return float64(val.Uint()), true
|
||||
}
|
||||
|
||||
return 0, false
|
||||
}
|
||||
|
||||
// various support interfaces which can be used to find lat/lon
|
||||
type loner interface {
|
||||
Lon() float64
|
||||
}
|
||||
|
||||
type later interface {
|
||||
Lat() float64
|
||||
}
|
||||
|
||||
type lnger interface {
|
||||
Lng() float64
|
||||
}
|
|
@ -0,0 +1,212 @@
|
|||
// Copyright (c) 2017 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package geo
|
||||
|
||||
import (
|
||||
"math"
|
||||
)
|
||||
|
||||
var earthDiameterPerLatitude []float64
|
||||
var sinTab []float64
|
||||
var cosTab []float64
|
||||
var asinTab []float64
|
||||
var asinDer1DivF1Tab []float64
|
||||
var asinDer2DivF2Tab []float64
|
||||
var asinDer3DivF3Tab []float64
|
||||
var asinDer4DivF4Tab []float64
|
||||
|
||||
const radiusTabsSize = (1 << 10) + 1
|
||||
const radiusDelta = (math.Pi / 2) / (radiusTabsSize - 1)
|
||||
const radiusIndexer = 1 / radiusDelta
|
||||
const sinCosTabsSize = (1 << 11) + 1
|
||||
const asinTabsSize = (1 << 13) + 1
|
||||
const oneDivF2 = 1 / 2.0
|
||||
const oneDivF3 = 1 / 6.0
|
||||
const oneDivF4 = 1 / 24.0
|
||||
|
||||
// 1.57079632673412561417e+00 first 33 bits of pi/2
|
||||
var pio2Hi = math.Float64frombits(0x3FF921FB54400000)
|
||||
|
||||
// 6.07710050650619224932e-11 pi/2 - PIO2_HI
|
||||
var pio2Lo = math.Float64frombits(0x3DD0B4611A626331)
|
||||
|
||||
var asinPio2Hi = math.Float64frombits(0x3FF921FB54442D18) // 1.57079632679489655800e+00
|
||||
var asinPio2Lo = math.Float64frombits(0x3C91A62633145C07) // 6.12323399573676603587e-17
|
||||
var asinPs0 = math.Float64frombits(0x3fc5555555555555) // 1.66666666666666657415e-01
|
||||
var asinPs1 = math.Float64frombits(0xbfd4d61203eb6f7d) // -3.25565818622400915405e-01
|
||||
var asinPs2 = math.Float64frombits(0x3fc9c1550e884455) // 2.01212532134862925881e-01
|
||||
var asinPs3 = math.Float64frombits(0xbfa48228b5688f3b) // -4.00555345006794114027e-02
|
||||
var asinPs4 = math.Float64frombits(0x3f49efe07501b288) // 7.91534994289814532176e-04
|
||||
var asinPs5 = math.Float64frombits(0x3f023de10dfdf709) // 3.47933107596021167570e-05
|
||||
var asinQs1 = math.Float64frombits(0xc0033a271c8a2d4b) // -2.40339491173441421878e+00
|
||||
var asinQs2 = math.Float64frombits(0x40002ae59c598ac8) // 2.02094576023350569471e+00
|
||||
var asinQs3 = math.Float64frombits(0xbfe6066c1b8d0159) // -6.88283971605453293030e-01
|
||||
var asinQs4 = math.Float64frombits(0x3fb3b8c5b12e9282) // 7.70381505559019352791e-02
|
||||
|
||||
var twoPiHi = 4 * pio2Hi
|
||||
var twoPiLo = 4 * pio2Lo
|
||||
var sinCosDeltaHi = twoPiHi/sinCosTabsSize - 1
|
||||
var sinCosDeltaLo = twoPiLo/sinCosTabsSize - 1
|
||||
var sinCosIndexer = 1 / (sinCosDeltaHi + sinCosDeltaLo)
|
||||
var sinCosMaxValueForIntModulo = ((math.MaxInt64 >> 9) / sinCosIndexer) * 0.99
|
||||
var asinMaxValueForTabs = math.Sin(73.0 * degreesToRadian)
|
||||
|
||||
var asinDelta = asinMaxValueForTabs / (asinTabsSize - 1)
|
||||
var asinIndexer = 1 / asinDelta
|
||||
|
||||
func init() {
|
||||
// initializes the tables used for the sloppy math functions
|
||||
|
||||
// sin and cos
|
||||
sinTab = make([]float64, sinCosTabsSize)
|
||||
cosTab = make([]float64, sinCosTabsSize)
|
||||
sinCosPiIndex := (sinCosTabsSize - 1) / 2
|
||||
sinCosPiMul2Index := 2 * sinCosPiIndex
|
||||
sinCosPiMul05Index := sinCosPiIndex / 2
|
||||
sinCosPiMul15Index := 3 * sinCosPiIndex / 2
|
||||
for i := 0; i < sinCosTabsSize; i++ {
|
||||
// angle: in [0,2*PI].
|
||||
angle := float64(i)*sinCosDeltaHi + float64(i)*sinCosDeltaLo
|
||||
sinAngle := math.Sin(angle)
|
||||
cosAngle := math.Cos(angle)
|
||||
// For indexes corresponding to null cosine or sine, we make sure the value is zero
|
||||
// and not an epsilon. This allows for a much better accuracy for results close to zero.
|
||||
if i == sinCosPiIndex {
|
||||
sinAngle = 0.0
|
||||
} else if i == sinCosPiMul2Index {
|
||||
sinAngle = 0.0
|
||||
} else if i == sinCosPiMul05Index {
|
||||
sinAngle = 0.0
|
||||
} else if i == sinCosPiMul15Index {
|
||||
sinAngle = 0.0
|
||||
}
|
||||
sinTab[i] = sinAngle
|
||||
cosTab[i] = cosAngle
|
||||
}
|
||||
|
||||
// asin
|
||||
asinTab = make([]float64, asinTabsSize)
|
||||
asinDer1DivF1Tab = make([]float64, asinTabsSize)
|
||||
asinDer2DivF2Tab = make([]float64, asinTabsSize)
|
||||
asinDer3DivF3Tab = make([]float64, asinTabsSize)
|
||||
asinDer4DivF4Tab = make([]float64, asinTabsSize)
|
||||
for i := 0; i < asinTabsSize; i++ {
|
||||
// x: in [0,ASIN_MAX_VALUE_FOR_TABS].
|
||||
x := float64(i) * asinDelta
|
||||
asinTab[i] = math.Asin(x)
|
||||
oneMinusXSqInv := 1.0 / (1 - x*x)
|
||||
oneMinusXSqInv05 := math.Sqrt(oneMinusXSqInv)
|
||||
oneMinusXSqInv15 := oneMinusXSqInv05 * oneMinusXSqInv
|
||||
oneMinusXSqInv25 := oneMinusXSqInv15 * oneMinusXSqInv
|
||||
oneMinusXSqInv35 := oneMinusXSqInv25 * oneMinusXSqInv
|
||||
asinDer1DivF1Tab[i] = oneMinusXSqInv05
|
||||
asinDer2DivF2Tab[i] = (x * oneMinusXSqInv15) * oneDivF2
|
||||
asinDer3DivF3Tab[i] = ((1 + 2*x*x) * oneMinusXSqInv25) * oneDivF3
|
||||
asinDer4DivF4Tab[i] = ((5 + 2*x*(2+x*(5-2*x))) * oneMinusXSqInv35) * oneDivF4
|
||||
}
|
||||
|
||||
// earth radius
|
||||
a := 6378137.0
|
||||
b := 6356752.31420
|
||||
a2 := a * a
|
||||
b2 := b * b
|
||||
earthDiameterPerLatitude = make([]float64, radiusTabsSize)
|
||||
earthDiameterPerLatitude[0] = 2.0 * a / 1000
|
||||
earthDiameterPerLatitude[radiusTabsSize-1] = 2.0 * b / 1000
|
||||
for i := 1; i < radiusTabsSize-1; i++ {
|
||||
lat := math.Pi * float64(i) / (2*radiusTabsSize - 1)
|
||||
one := math.Pow(a2*math.Cos(lat), 2)
|
||||
two := math.Pow(b2*math.Sin(lat), 2)
|
||||
three := math.Pow(float64(a)*math.Cos(lat), 2)
|
||||
four := math.Pow(b*math.Sin(lat), 2)
|
||||
radius := math.Sqrt((one + two) / (three + four))
|
||||
earthDiameterPerLatitude[i] = 2 * radius / 1000
|
||||
}
|
||||
}
|
||||
|
||||
// earthDiameter returns an estimation of the earth's diameter at the specified
|
||||
// latitude in kilometers
|
||||
func earthDiameter(lat float64) float64 {
|
||||
index := math.Mod(math.Abs(lat)*radiusIndexer+0.5, float64(len(earthDiameterPerLatitude)))
|
||||
if math.IsNaN(index) {
|
||||
return 0
|
||||
}
|
||||
return earthDiameterPerLatitude[int(index)]
|
||||
}
|
||||
|
||||
var pio2 = math.Pi / 2
|
||||
|
||||
func sin(a float64) float64 {
|
||||
return cos(a - pio2)
|
||||
}
|
||||
|
||||
// cos is a sloppy math (faster) implementation of math.Cos
|
||||
func cos(a float64) float64 {
|
||||
if a < 0.0 {
|
||||
a = -a
|
||||
}
|
||||
if a > sinCosMaxValueForIntModulo {
|
||||
return math.Cos(a)
|
||||
}
|
||||
// index: possibly outside tables range.
|
||||
index := int(a*sinCosIndexer + 0.5)
|
||||
delta := (a - float64(index)*sinCosDeltaHi) - float64(index)*sinCosDeltaLo
|
||||
// Making sure index is within tables range.
|
||||
// Last value of each table is the same than first, so we ignore it (tabs size minus one) for modulo.
|
||||
index &= (sinCosTabsSize - 2) // index % (SIN_COS_TABS_SIZE-1)
|
||||
indexCos := cosTab[index]
|
||||
indexSin := sinTab[index]
|
||||
return indexCos + delta*(-indexSin+delta*(-indexCos*oneDivF2+delta*(indexSin*oneDivF3+delta*indexCos*oneDivF4)))
|
||||
}
|
||||
|
||||
// asin is a sloppy math (faster) implementation of math.Asin
|
||||
func asin(a float64) float64 {
|
||||
var negateResult bool
|
||||
if a < 0 {
|
||||
a = -a
|
||||
negateResult = true
|
||||
}
|
||||
if a <= asinMaxValueForTabs {
|
||||
index := int(a*asinIndexer + 0.5)
|
||||
delta := a - float64(index)*asinDelta
|
||||
result := asinTab[index] + delta*(asinDer1DivF1Tab[index]+delta*(asinDer2DivF2Tab[index]+delta*(asinDer3DivF3Tab[index]+delta*asinDer4DivF4Tab[index])))
|
||||
if negateResult {
|
||||
return -result
|
||||
}
|
||||
return result
|
||||
}
|
||||
// value > ASIN_MAX_VALUE_FOR_TABS, or value is NaN
|
||||
// This part is derived from fdlibm.
|
||||
if a < 1 {
|
||||
t := (1.0 - a) * 0.5
|
||||
p := t * (asinPs0 + t*(asinPs1+t*(asinPs2+t*(asinPs3+t*(asinPs4+t+asinPs5)))))
|
||||
q := 1.0 + t*(asinQs1+t*(asinQs2+t*(asinQs3+t*asinQs4)))
|
||||
s := math.Sqrt(t)
|
||||
z := s + s*(p/q)
|
||||
result := asinPio2Hi - ((z + z) - asinPio2Lo)
|
||||
if negateResult {
|
||||
return -result
|
||||
}
|
||||
return result
|
||||
}
|
||||
// value >= 1.0, or value is NaN
|
||||
if a == 1.0 {
|
||||
if negateResult {
|
||||
return -math.Pi / 2
|
||||
}
|
||||
return math.Pi / 2
|
||||
}
|
||||
return math.NaN()
|
||||
}
|
|
@ -0,0 +1,110 @@
|
|||
// Copyright (c) 2015 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package index
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
"github.com/blevesearch/bleve/document"
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeAnalysisResult int
|
||||
|
||||
func init() {
|
||||
var ar AnalysisResult
|
||||
reflectStaticSizeAnalysisResult = int(reflect.TypeOf(ar).Size())
|
||||
}
|
||||
|
||||
type IndexRow interface {
|
||||
KeySize() int
|
||||
KeyTo([]byte) (int, error)
|
||||
Key() []byte
|
||||
|
||||
ValueSize() int
|
||||
ValueTo([]byte) (int, error)
|
||||
Value() []byte
|
||||
}
|
||||
|
||||
type AnalysisResult struct {
|
||||
DocID string
|
||||
Rows []IndexRow
|
||||
|
||||
// scorch
|
||||
Document *document.Document
|
||||
Analyzed []analysis.TokenFrequencies
|
||||
Length []int
|
||||
}
|
||||
|
||||
func (a *AnalysisResult) Size() int {
|
||||
rv := reflectStaticSizeAnalysisResult
|
||||
for _, analyzedI := range a.Analyzed {
|
||||
rv += analyzedI.Size()
|
||||
}
|
||||
rv += len(a.Length) * size.SizeOfInt
|
||||
return rv
|
||||
}
|
||||
|
||||
type AnalysisWork struct {
|
||||
i Index
|
||||
d *document.Document
|
||||
rc chan *AnalysisResult
|
||||
}
|
||||
|
||||
func NewAnalysisWork(i Index, d *document.Document, rc chan *AnalysisResult) *AnalysisWork {
|
||||
return &AnalysisWork{
|
||||
i: i,
|
||||
d: d,
|
||||
rc: rc,
|
||||
}
|
||||
}
|
||||
|
||||
type AnalysisQueue struct {
|
||||
queue chan *AnalysisWork
|
||||
done chan struct{}
|
||||
}
|
||||
|
||||
func (q *AnalysisQueue) Queue(work *AnalysisWork) {
|
||||
q.queue <- work
|
||||
}
|
||||
|
||||
func (q *AnalysisQueue) Close() {
|
||||
close(q.done)
|
||||
}
|
||||
|
||||
func NewAnalysisQueue(numWorkers int) *AnalysisQueue {
|
||||
rv := AnalysisQueue{
|
||||
queue: make(chan *AnalysisWork),
|
||||
done: make(chan struct{}),
|
||||
}
|
||||
for i := 0; i < numWorkers; i++ {
|
||||
go AnalysisWorker(rv)
|
||||
}
|
||||
return &rv
|
||||
}
|
||||
|
||||
func AnalysisWorker(q AnalysisQueue) {
|
||||
// read work off the queue
|
||||
for {
|
||||
select {
|
||||
case <-q.done:
|
||||
return
|
||||
case w := <-q.queue:
|
||||
r := w.i.Analyze(w.d)
|
||||
w.rc <- r
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,88 @@
|
|||
// Copyright (c) 2015 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package index
|
||||
|
||||
import (
|
||||
"sync"
|
||||
)
|
||||
|
||||
type FieldCache struct {
|
||||
fieldIndexes map[string]uint16
|
||||
indexFields []string
|
||||
lastFieldIndex int
|
||||
mutex sync.RWMutex
|
||||
}
|
||||
|
||||
func NewFieldCache() *FieldCache {
|
||||
return &FieldCache{
|
||||
fieldIndexes: make(map[string]uint16),
|
||||
lastFieldIndex: -1,
|
||||
}
|
||||
}
|
||||
|
||||
func (f *FieldCache) AddExisting(field string, index uint16) {
|
||||
f.mutex.Lock()
|
||||
f.addLOCKED(field, index)
|
||||
f.mutex.Unlock()
|
||||
}
|
||||
|
||||
func (f *FieldCache) addLOCKED(field string, index uint16) uint16 {
|
||||
f.fieldIndexes[field] = index
|
||||
if len(f.indexFields) < int(index)+1 {
|
||||
prevIndexFields := f.indexFields
|
||||
f.indexFields = make([]string, int(index)+16)
|
||||
copy(f.indexFields, prevIndexFields)
|
||||
}
|
||||
f.indexFields[int(index)] = field
|
||||
if int(index) > f.lastFieldIndex {
|
||||
f.lastFieldIndex = int(index)
|
||||
}
|
||||
return index
|
||||
}
|
||||
|
||||
// FieldNamed returns the index of the field, and whether or not it existed
|
||||
// before this call. if createIfMissing is true, and new field index is assigned
|
||||
// but the second return value will still be false
|
||||
func (f *FieldCache) FieldNamed(field string, createIfMissing bool) (uint16, bool) {
|
||||
f.mutex.RLock()
|
||||
if index, ok := f.fieldIndexes[field]; ok {
|
||||
f.mutex.RUnlock()
|
||||
return index, true
|
||||
} else if !createIfMissing {
|
||||
f.mutex.RUnlock()
|
||||
return 0, false
|
||||
}
|
||||
// trade read lock for write lock
|
||||
f.mutex.RUnlock()
|
||||
f.mutex.Lock()
|
||||
// need to check again with write lock
|
||||
if index, ok := f.fieldIndexes[field]; ok {
|
||||
f.mutex.Unlock()
|
||||
return index, true
|
||||
}
|
||||
// assign next field id
|
||||
index := f.addLOCKED(field, uint16(f.lastFieldIndex+1))
|
||||
f.mutex.Unlock()
|
||||
return index, false
|
||||
}
|
||||
|
||||
func (f *FieldCache) FieldIndexed(index uint16) (field string) {
|
||||
f.mutex.RLock()
|
||||
if int(index) < len(f.indexFields) {
|
||||
field = f.indexFields[int(index)]
|
||||
}
|
||||
f.mutex.RUnlock()
|
||||
return field
|
||||
}
|
|
@ -0,0 +1,369 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package index
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"reflect"
|
||||
|
||||
"github.com/blevesearch/bleve/document"
|
||||
"github.com/blevesearch/bleve/index/store"
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeTermFieldDoc int
|
||||
var reflectStaticSizeTermFieldVector int
|
||||
|
||||
func init() {
|
||||
var tfd TermFieldDoc
|
||||
reflectStaticSizeTermFieldDoc = int(reflect.TypeOf(tfd).Size())
|
||||
var tfv TermFieldVector
|
||||
reflectStaticSizeTermFieldVector = int(reflect.TypeOf(tfv).Size())
|
||||
}
|
||||
|
||||
var ErrorUnknownStorageType = fmt.Errorf("unknown storage type")
|
||||
|
||||
type Index interface {
|
||||
Open() error
|
||||
Close() error
|
||||
|
||||
Update(doc *document.Document) error
|
||||
Delete(id string) error
|
||||
Batch(batch *Batch) error
|
||||
|
||||
SetInternal(key, val []byte) error
|
||||
DeleteInternal(key []byte) error
|
||||
|
||||
// Reader returns a low-level accessor on the index data. Close it to
|
||||
// release associated resources.
|
||||
Reader() (IndexReader, error)
|
||||
|
||||
Stats() json.Marshaler
|
||||
StatsMap() map[string]interface{}
|
||||
|
||||
Analyze(d *document.Document) *AnalysisResult
|
||||
|
||||
Advanced() (store.KVStore, error)
|
||||
}
|
||||
|
||||
type DocumentFieldTermVisitor func(field string, term []byte)
|
||||
|
||||
type IndexReader interface {
|
||||
TermFieldReader(term []byte, field string, includeFreq, includeNorm, includeTermVectors bool) (TermFieldReader, error)
|
||||
|
||||
// DocIDReader returns an iterator over all doc ids
|
||||
// The caller must close returned instance to release associated resources.
|
||||
DocIDReaderAll() (DocIDReader, error)
|
||||
|
||||
DocIDReaderOnly(ids []string) (DocIDReader, error)
|
||||
|
||||
FieldDict(field string) (FieldDict, error)
|
||||
|
||||
// FieldDictRange is currently defined to include the start and end terms
|
||||
FieldDictRange(field string, startTerm []byte, endTerm []byte) (FieldDict, error)
|
||||
FieldDictPrefix(field string, termPrefix []byte) (FieldDict, error)
|
||||
|
||||
Document(id string) (*document.Document, error)
|
||||
DocumentVisitFieldTerms(id IndexInternalID, fields []string, visitor DocumentFieldTermVisitor) error
|
||||
|
||||
DocValueReader(fields []string) (DocValueReader, error)
|
||||
|
||||
Fields() ([]string, error)
|
||||
|
||||
GetInternal(key []byte) ([]byte, error)
|
||||
|
||||
DocCount() (uint64, error)
|
||||
|
||||
ExternalID(id IndexInternalID) (string, error)
|
||||
InternalID(id string) (IndexInternalID, error)
|
||||
|
||||
DumpAll() chan interface{}
|
||||
DumpDoc(id string) chan interface{}
|
||||
DumpFields() chan interface{}
|
||||
|
||||
Close() error
|
||||
}
|
||||
|
||||
// The Regexp interface defines the subset of the regexp.Regexp API
|
||||
// methods that are used by bleve indexes, allowing callers to pass in
|
||||
// alternate implementations.
|
||||
type Regexp interface {
|
||||
FindStringIndex(s string) (loc []int)
|
||||
|
||||
LiteralPrefix() (prefix string, complete bool)
|
||||
|
||||
String() string
|
||||
}
|
||||
|
||||
type IndexReaderRegexp interface {
|
||||
FieldDictRegexp(field string, regex string) (FieldDict, error)
|
||||
}
|
||||
|
||||
type IndexReaderFuzzy interface {
|
||||
FieldDictFuzzy(field string, term string, fuzziness int, prefix string) (FieldDict, error)
|
||||
}
|
||||
|
||||
type IndexReaderOnly interface {
|
||||
FieldDictOnly(field string, onlyTerms [][]byte, includeCount bool) (FieldDict, error)
|
||||
}
|
||||
|
||||
type IndexReaderContains interface {
|
||||
FieldDictContains(field string) (FieldDictContains, error)
|
||||
}
|
||||
|
||||
// FieldTerms contains the terms used by a document, keyed by field
|
||||
type FieldTerms map[string][]string
|
||||
|
||||
// FieldsNotYetCached returns a list of fields not yet cached out of a larger list of fields
|
||||
func (f FieldTerms) FieldsNotYetCached(fields []string) []string {
|
||||
rv := make([]string, 0, len(fields))
|
||||
for _, field := range fields {
|
||||
if _, ok := f[field]; !ok {
|
||||
rv = append(rv, field)
|
||||
}
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
// Merge will combine two FieldTerms
|
||||
// it assumes that the terms lists are complete (thus do not need to be merged)
|
||||
// field terms from the other list always replace the ones in the receiver
|
||||
func (f FieldTerms) Merge(other FieldTerms) {
|
||||
for field, terms := range other {
|
||||
f[field] = terms
|
||||
}
|
||||
}
|
||||
|
||||
type TermFieldVector struct {
|
||||
Field string
|
||||
ArrayPositions []uint64
|
||||
Pos uint64
|
||||
Start uint64
|
||||
End uint64
|
||||
}
|
||||
|
||||
func (tfv *TermFieldVector) Size() int {
|
||||
return reflectStaticSizeTermFieldVector + size.SizeOfPtr +
|
||||
len(tfv.Field) + len(tfv.ArrayPositions)*size.SizeOfUint64
|
||||
}
|
||||
|
||||
// IndexInternalID is an opaque document identifier interal to the index impl
|
||||
type IndexInternalID []byte
|
||||
|
||||
func (id IndexInternalID) Equals(other IndexInternalID) bool {
|
||||
return id.Compare(other) == 0
|
||||
}
|
||||
|
||||
func (id IndexInternalID) Compare(other IndexInternalID) int {
|
||||
return bytes.Compare(id, other)
|
||||
}
|
||||
|
||||
type TermFieldDoc struct {
|
||||
Term string
|
||||
ID IndexInternalID
|
||||
Freq uint64
|
||||
Norm float64
|
||||
Vectors []*TermFieldVector
|
||||
}
|
||||
|
||||
func (tfd *TermFieldDoc) Size() int {
|
||||
sizeInBytes := reflectStaticSizeTermFieldDoc + size.SizeOfPtr +
|
||||
len(tfd.Term) + len(tfd.ID)
|
||||
|
||||
for _, entry := range tfd.Vectors {
|
||||
sizeInBytes += entry.Size()
|
||||
}
|
||||
|
||||
return sizeInBytes
|
||||
}
|
||||
|
||||
// Reset allows an already allocated TermFieldDoc to be reused
|
||||
func (tfd *TermFieldDoc) Reset() *TermFieldDoc {
|
||||
// remember the []byte used for the ID
|
||||
id := tfd.ID
|
||||
vectors := tfd.Vectors
|
||||
// idiom to copy over from empty TermFieldDoc (0 allocations)
|
||||
*tfd = TermFieldDoc{}
|
||||
// reuse the []byte already allocated (and reset len to 0)
|
||||
tfd.ID = id[:0]
|
||||
tfd.Vectors = vectors[:0]
|
||||
return tfd
|
||||
}
|
||||
|
||||
// TermFieldReader is the interface exposing the enumeration of documents
|
||||
// containing a given term in a given field. Documents are returned in byte
|
||||
// lexicographic order over their identifiers.
|
||||
type TermFieldReader interface {
|
||||
// Next returns the next document containing the term in this field, or nil
|
||||
// when it reaches the end of the enumeration. The preAlloced TermFieldDoc
|
||||
// is optional, and when non-nil, will be used instead of allocating memory.
|
||||
Next(preAlloced *TermFieldDoc) (*TermFieldDoc, error)
|
||||
|
||||
// Advance resets the enumeration at specified document or its immediate
|
||||
// follower.
|
||||
Advance(ID IndexInternalID, preAlloced *TermFieldDoc) (*TermFieldDoc, error)
|
||||
|
||||
// Count returns the number of documents contains the term in this field.
|
||||
Count() uint64
|
||||
Close() error
|
||||
|
||||
Size() int
|
||||
}
|
||||
|
||||
type DictEntry struct {
|
||||
Term string
|
||||
Count uint64
|
||||
}
|
||||
|
||||
type FieldDict interface {
|
||||
Next() (*DictEntry, error)
|
||||
Close() error
|
||||
}
|
||||
|
||||
type FieldDictContains interface {
|
||||
Contains(key []byte) (bool, error)
|
||||
}
|
||||
|
||||
// DocIDReader is the interface exposing enumeration of documents identifiers.
|
||||
// Close the reader to release associated resources.
|
||||
type DocIDReader interface {
|
||||
// Next returns the next document internal identifier in the natural
|
||||
// index order, nil when the end of the sequence is reached.
|
||||
Next() (IndexInternalID, error)
|
||||
|
||||
// Advance resets the iteration to the first internal identifier greater than
|
||||
// or equal to ID. If ID is smaller than the start of the range, the iteration
|
||||
// will start there instead. If ID is greater than or equal to the end of
|
||||
// the range, Next() call will return io.EOF.
|
||||
Advance(ID IndexInternalID) (IndexInternalID, error)
|
||||
|
||||
Size() int
|
||||
|
||||
Close() error
|
||||
}
|
||||
|
||||
type BatchCallback func(error)
|
||||
|
||||
type Batch struct {
|
||||
IndexOps map[string]*document.Document
|
||||
InternalOps map[string][]byte
|
||||
persistedCallback BatchCallback
|
||||
}
|
||||
|
||||
func NewBatch() *Batch {
|
||||
return &Batch{
|
||||
IndexOps: make(map[string]*document.Document),
|
||||
InternalOps: make(map[string][]byte),
|
||||
}
|
||||
}
|
||||
|
||||
func (b *Batch) Update(doc *document.Document) {
|
||||
b.IndexOps[doc.ID] = doc
|
||||
}
|
||||
|
||||
func (b *Batch) Delete(id string) {
|
||||
b.IndexOps[id] = nil
|
||||
}
|
||||
|
||||
func (b *Batch) SetInternal(key, val []byte) {
|
||||
b.InternalOps[string(key)] = val
|
||||
}
|
||||
|
||||
func (b *Batch) DeleteInternal(key []byte) {
|
||||
b.InternalOps[string(key)] = nil
|
||||
}
|
||||
|
||||
func (b *Batch) SetPersistedCallback(f BatchCallback) {
|
||||
b.persistedCallback = f
|
||||
}
|
||||
|
||||
func (b *Batch) PersistedCallback() BatchCallback {
|
||||
return b.persistedCallback
|
||||
}
|
||||
|
||||
func (b *Batch) String() string {
|
||||
rv := fmt.Sprintf("Batch (%d ops, %d internal ops)\n", len(b.IndexOps), len(b.InternalOps))
|
||||
for k, v := range b.IndexOps {
|
||||
if v != nil {
|
||||
rv += fmt.Sprintf("\tINDEX - '%s'\n", k)
|
||||
} else {
|
||||
rv += fmt.Sprintf("\tDELETE - '%s'\n", k)
|
||||
}
|
||||
}
|
||||
for k, v := range b.InternalOps {
|
||||
if v != nil {
|
||||
rv += fmt.Sprintf("\tSET INTERNAL - '%s'\n", k)
|
||||
} else {
|
||||
rv += fmt.Sprintf("\tDELETE INTERNAL - '%s'\n", k)
|
||||
}
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
func (b *Batch) Reset() {
|
||||
b.IndexOps = make(map[string]*document.Document)
|
||||
b.InternalOps = make(map[string][]byte)
|
||||
b.persistedCallback = nil
|
||||
}
|
||||
|
||||
func (b *Batch) Merge(o *Batch) {
|
||||
for k, v := range o.IndexOps {
|
||||
b.IndexOps[k] = v
|
||||
}
|
||||
for k, v := range o.InternalOps {
|
||||
b.InternalOps[k] = v
|
||||
}
|
||||
}
|
||||
|
||||
func (b *Batch) TotalDocSize() int {
|
||||
var s int
|
||||
for k, v := range b.IndexOps {
|
||||
if v != nil {
|
||||
s += v.Size() + size.SizeOfString
|
||||
}
|
||||
s += len(k)
|
||||
}
|
||||
return s
|
||||
}
|
||||
|
||||
// Optimizable represents an optional interface that implementable by
|
||||
// optimizable resources (e.g., TermFieldReaders, Searchers). These
|
||||
// optimizable resources are provided the same OptimizableContext
|
||||
// instance, so that they can coordinate via dynamic interface
|
||||
// casting.
|
||||
type Optimizable interface {
|
||||
Optimize(kind string, octx OptimizableContext) (OptimizableContext, error)
|
||||
}
|
||||
|
||||
// Represents a result of optimization -- see the Finish() method.
|
||||
type Optimized interface{}
|
||||
|
||||
type OptimizableContext interface {
|
||||
// Once all the optimzable resources have been provided the same
|
||||
// OptimizableContext instance, the optimization preparations are
|
||||
// finished or completed via the Finish() method.
|
||||
//
|
||||
// Depending on the optimization being performed, the Finish()
|
||||
// method might return a non-nil Optimized instance. For example,
|
||||
// the Optimized instance might represent an optimized
|
||||
// TermFieldReader instance.
|
||||
Finish() (Optimized, error)
|
||||
}
|
||||
|
||||
type DocValueReader interface {
|
||||
VisitDocValues(id IndexInternalID, visitor DocumentFieldTermVisitor) error
|
||||
}
|
|
@ -0,0 +1,62 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package store
|
||||
|
||||
type op struct {
|
||||
K []byte
|
||||
V []byte
|
||||
}
|
||||
|
||||
type EmulatedBatch struct {
|
||||
Ops []*op
|
||||
Merger *EmulatedMerge
|
||||
}
|
||||
|
||||
func NewEmulatedBatch(mo MergeOperator) *EmulatedBatch {
|
||||
return &EmulatedBatch{
|
||||
Ops: make([]*op, 0, 1000),
|
||||
Merger: NewEmulatedMerge(mo),
|
||||
}
|
||||
}
|
||||
|
||||
func (b *EmulatedBatch) Set(key, val []byte) {
|
||||
ck := make([]byte, len(key))
|
||||
copy(ck, key)
|
||||
cv := make([]byte, len(val))
|
||||
copy(cv, val)
|
||||
b.Ops = append(b.Ops, &op{ck, cv})
|
||||
}
|
||||
|
||||
func (b *EmulatedBatch) Delete(key []byte) {
|
||||
ck := make([]byte, len(key))
|
||||
copy(ck, key)
|
||||
b.Ops = append(b.Ops, &op{ck, nil})
|
||||
}
|
||||
|
||||
func (b *EmulatedBatch) Merge(key, val []byte) {
|
||||
ck := make([]byte, len(key))
|
||||
copy(ck, key)
|
||||
cv := make([]byte, len(val))
|
||||
copy(cv, val)
|
||||
b.Merger.Merge(key, val)
|
||||
}
|
||||
|
||||
func (b *EmulatedBatch) Reset() {
|
||||
b.Ops = b.Ops[:0]
|
||||
}
|
||||
|
||||
func (b *EmulatedBatch) Close() error {
|
||||
return nil
|
||||
}
|
|
@ -0,0 +1,174 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package store
|
||||
|
||||
import "encoding/json"
|
||||
|
||||
// KVStore is an abstraction for working with KV stores. Note that
|
||||
// in order to be used with the bleve.registry, it must also implement
|
||||
// a constructor function of the registry.KVStoreConstructor type.
|
||||
type KVStore interface {
|
||||
|
||||
// Writer returns a KVWriter which can be used to
|
||||
// make changes to the KVStore. If a writer cannot
|
||||
// be obtained a non-nil error is returned.
|
||||
Writer() (KVWriter, error)
|
||||
|
||||
// Reader returns a KVReader which can be used to
|
||||
// read data from the KVStore. If a reader cannot
|
||||
// be obtained a non-nil error is returned.
|
||||
Reader() (KVReader, error)
|
||||
|
||||
// Close closes the KVStore
|
||||
Close() error
|
||||
}
|
||||
|
||||
// KVReader is an abstraction of an **ISOLATED** reader
|
||||
// In this context isolated is defined to mean that
|
||||
// writes/deletes made after the KVReader is opened
|
||||
// are not observed.
|
||||
// Because there is usually a cost associated with
|
||||
// keeping isolated readers active, users should
|
||||
// close them as soon as they are no longer needed.
|
||||
type KVReader interface {
|
||||
|
||||
// Get returns the value associated with the key
|
||||
// If the key does not exist, nil is returned.
|
||||
// The caller owns the bytes returned.
|
||||
Get(key []byte) ([]byte, error)
|
||||
|
||||
// MultiGet retrieves multiple values in one call.
|
||||
MultiGet(keys [][]byte) ([][]byte, error)
|
||||
|
||||
// PrefixIterator returns a KVIterator that will
|
||||
// visit all K/V pairs with the provided prefix
|
||||
PrefixIterator(prefix []byte) KVIterator
|
||||
|
||||
// RangeIterator returns a KVIterator that will
|
||||
// visit all K/V pairs >= start AND < end
|
||||
RangeIterator(start, end []byte) KVIterator
|
||||
|
||||
// Close closes the iterator
|
||||
Close() error
|
||||
}
|
||||
|
||||
// KVIterator is an abstraction around key iteration
|
||||
type KVIterator interface {
|
||||
|
||||
// Seek will advance the iterator to the specified key
|
||||
Seek(key []byte)
|
||||
|
||||
// Next will advance the iterator to the next key
|
||||
Next()
|
||||
|
||||
// Key returns the key pointed to by the iterator
|
||||
// The bytes returned are **ONLY** valid until the next call to Seek/Next/Close
|
||||
// Continued use after that requires that they be copied.
|
||||
Key() []byte
|
||||
|
||||
// Value returns the value pointed to by the iterator
|
||||
// The bytes returned are **ONLY** valid until the next call to Seek/Next/Close
|
||||
// Continued use after that requires that they be copied.
|
||||
Value() []byte
|
||||
|
||||
// Valid returns whether or not the iterator is in a valid state
|
||||
Valid() bool
|
||||
|
||||
// Current returns Key(),Value(),Valid() in a single operation
|
||||
Current() ([]byte, []byte, bool)
|
||||
|
||||
// Close closes the iterator
|
||||
Close() error
|
||||
}
|
||||
|
||||
// KVWriter is an abstraction for mutating the KVStore
|
||||
// KVWriter does **NOT** enforce restrictions of a single writer
|
||||
// if the underlying KVStore allows concurrent writes, the
|
||||
// KVWriter interface should also do so, it is up to the caller
|
||||
// to do this in a way that is safe and makes sense
|
||||
type KVWriter interface {
|
||||
|
||||
// NewBatch returns a KVBatch for performing batch operations on this kvstore
|
||||
NewBatch() KVBatch
|
||||
|
||||
// NewBatchEx returns a KVBatch and an associated byte array
|
||||
// that's pre-sized based on the KVBatchOptions. The caller can
|
||||
// use the returned byte array for keys and values associated with
|
||||
// the batch. Once the batch is either executed or closed, the
|
||||
// associated byte array should no longer be accessed by the
|
||||
// caller.
|
||||
NewBatchEx(KVBatchOptions) ([]byte, KVBatch, error)
|
||||
|
||||
// ExecuteBatch will execute the KVBatch, the provided KVBatch **MUST** have
|
||||
// been created by the same KVStore (though not necessarily the same KVWriter)
|
||||
// Batch execution is atomic, either all the operations or none will be performed
|
||||
ExecuteBatch(batch KVBatch) error
|
||||
|
||||
// Close closes the writer
|
||||
Close() error
|
||||
}
|
||||
|
||||
// KVBatchOptions provides the KVWriter.NewBatchEx() method with batch
|
||||
// preparation and preallocation information.
|
||||
type KVBatchOptions struct {
|
||||
// TotalBytes is the sum of key and value bytes needed by the
|
||||
// caller for the entire batch. It affects the size of the
|
||||
// returned byte array of KVWrite.NewBatchEx().
|
||||
TotalBytes int
|
||||
|
||||
// NumSets is the number of Set() calls the caller will invoke on
|
||||
// the KVBatch.
|
||||
NumSets int
|
||||
|
||||
// NumDeletes is the number of Delete() calls the caller will invoke
|
||||
// on the KVBatch.
|
||||
NumDeletes int
|
||||
|
||||
// NumMerges is the number of Merge() calls the caller will invoke
|
||||
// on the KVBatch.
|
||||
NumMerges int
|
||||
}
|
||||
|
||||
// KVBatch is an abstraction for making multiple KV mutations at once
|
||||
type KVBatch interface {
|
||||
|
||||
// Set updates the key with the specified value
|
||||
// both key and value []byte may be reused as soon as this call returns
|
||||
Set(key, val []byte)
|
||||
|
||||
// Delete removes the specified key
|
||||
// the key []byte may be reused as soon as this call returns
|
||||
Delete(key []byte)
|
||||
|
||||
// Merge merges old value with the new value at the specified key
|
||||
// as prescribed by the KVStores merge operator
|
||||
// both key and value []byte may be reused as soon as this call returns
|
||||
Merge(key, val []byte)
|
||||
|
||||
// Reset frees resources for this batch and allows reuse
|
||||
Reset()
|
||||
|
||||
// Close frees resources
|
||||
Close() error
|
||||
}
|
||||
|
||||
// KVStoreStats is an optional interface that KVStores can implement
|
||||
// if they're able to report any useful stats
|
||||
type KVStoreStats interface {
|
||||
// Stats returns a JSON serializable object representing stats for this KVStore
|
||||
Stats() json.Marshaler
|
||||
|
||||
StatsMap() map[string]interface{}
|
||||
}
|
|
@ -0,0 +1,64 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package store
|
||||
|
||||
// At the moment this happens to be the same interface as described by
|
||||
// RocksDB, but this may not always be the case.
|
||||
|
||||
type MergeOperator interface {
|
||||
|
||||
// FullMerge the full sequence of operands on top of the existingValue
|
||||
// if no value currently exists, existingValue is nil
|
||||
// return the merged value, and success/failure
|
||||
FullMerge(key, existingValue []byte, operands [][]byte) ([]byte, bool)
|
||||
|
||||
// Partially merge these two operands.
|
||||
// If partial merge cannot be done, return nil,false, which will defer
|
||||
// all processing until the FullMerge is done.
|
||||
PartialMerge(key, leftOperand, rightOperand []byte) ([]byte, bool)
|
||||
|
||||
// Name returns an identifier for the operator
|
||||
Name() string
|
||||
}
|
||||
|
||||
type EmulatedMerge struct {
|
||||
Merges map[string][][]byte
|
||||
mo MergeOperator
|
||||
}
|
||||
|
||||
func NewEmulatedMerge(mo MergeOperator) *EmulatedMerge {
|
||||
return &EmulatedMerge{
|
||||
Merges: make(map[string][][]byte),
|
||||
mo: mo,
|
||||
}
|
||||
}
|
||||
|
||||
func (m *EmulatedMerge) Merge(key, val []byte) {
|
||||
ops, ok := m.Merges[string(key)]
|
||||
if ok && len(ops) > 0 {
|
||||
last := ops[len(ops)-1]
|
||||
mergedVal, partialMergeOk := m.mo.PartialMerge(key, last, val)
|
||||
if partialMergeOk {
|
||||
// replace last entry with the result of the merge
|
||||
ops[len(ops)-1] = mergedVal
|
||||
} else {
|
||||
// could not partial merge, append this to the end
|
||||
ops = append(ops, val)
|
||||
}
|
||||
} else {
|
||||
ops = [][]byte{val}
|
||||
}
|
||||
m.Merges[string(key)] = ops
|
||||
}
|
|
@ -0,0 +1,33 @@
|
|||
// Copyright (c) 2016 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package store
|
||||
|
||||
// MultiGet is a helper function to retrieve mutiple keys from a
|
||||
// KVReader, and might be used by KVStore implementations that don't
|
||||
// have a native multi-get facility.
|
||||
func MultiGet(kvreader KVReader, keys [][]byte) ([][]byte, error) {
|
||||
vals := make([][]byte, 0, len(keys))
|
||||
|
||||
for i, key := range keys {
|
||||
val, err := kvreader.Get(key)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
vals[i] = val
|
||||
}
|
||||
|
||||
return vals, nil
|
||||
}
|
|
@ -0,0 +1,43 @@
|
|||
package numeric
|
||||
|
||||
var interleaveMagic = []uint64{
|
||||
0x5555555555555555,
|
||||
0x3333333333333333,
|
||||
0x0F0F0F0F0F0F0F0F,
|
||||
0x00FF00FF00FF00FF,
|
||||
0x0000FFFF0000FFFF,
|
||||
0x00000000FFFFFFFF,
|
||||
0xAAAAAAAAAAAAAAAA,
|
||||
}
|
||||
|
||||
var interleaveShift = []uint{1, 2, 4, 8, 16}
|
||||
|
||||
// Interleave the first 32 bits of each uint64
|
||||
// apdated from org.apache.lucene.util.BitUtil
|
||||
// which was adapted from:
|
||||
// http://graphics.stanford.edu/~seander/bithacks.html#InterleaveBMN
|
||||
func Interleave(v1, v2 uint64) uint64 {
|
||||
v1 = (v1 | (v1 << interleaveShift[4])) & interleaveMagic[4]
|
||||
v1 = (v1 | (v1 << interleaveShift[3])) & interleaveMagic[3]
|
||||
v1 = (v1 | (v1 << interleaveShift[2])) & interleaveMagic[2]
|
||||
v1 = (v1 | (v1 << interleaveShift[1])) & interleaveMagic[1]
|
||||
v1 = (v1 | (v1 << interleaveShift[0])) & interleaveMagic[0]
|
||||
v2 = (v2 | (v2 << interleaveShift[4])) & interleaveMagic[4]
|
||||
v2 = (v2 | (v2 << interleaveShift[3])) & interleaveMagic[3]
|
||||
v2 = (v2 | (v2 << interleaveShift[2])) & interleaveMagic[2]
|
||||
v2 = (v2 | (v2 << interleaveShift[1])) & interleaveMagic[1]
|
||||
v2 = (v2 | (v2 << interleaveShift[0])) & interleaveMagic[0]
|
||||
return (v2 << 1) | v1
|
||||
}
|
||||
|
||||
// Deinterleave the 32-bit value starting at position 0
|
||||
// to get the other 32-bit value, shift it by 1 first
|
||||
func Deinterleave(b uint64) uint64 {
|
||||
b &= interleaveMagic[0]
|
||||
b = (b ^ (b >> interleaveShift[0])) & interleaveMagic[1]
|
||||
b = (b ^ (b >> interleaveShift[1])) & interleaveMagic[2]
|
||||
b = (b ^ (b >> interleaveShift[2])) & interleaveMagic[3]
|
||||
b = (b ^ (b >> interleaveShift[3])) & interleaveMagic[4]
|
||||
b = (b ^ (b >> interleaveShift[4])) & interleaveMagic[5]
|
||||
return b
|
||||
}
|
|
@ -0,0 +1,34 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package numeric
|
||||
|
||||
import (
|
||||
"math"
|
||||
)
|
||||
|
||||
func Float64ToInt64(f float64) int64 {
|
||||
fasint := int64(math.Float64bits(f))
|
||||
if fasint < 0 {
|
||||
fasint = fasint ^ 0x7fffffffffffffff
|
||||
}
|
||||
return fasint
|
||||
}
|
||||
|
||||
func Int64ToFloat64(i int64) float64 {
|
||||
if i < 0 {
|
||||
i ^= 0x7fffffffffffffff
|
||||
}
|
||||
return math.Float64frombits(uint64(i))
|
||||
}
|
|
@ -0,0 +1,111 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package numeric
|
||||
|
||||
import "fmt"
|
||||
|
||||
const ShiftStartInt64 byte = 0x20
|
||||
|
||||
// PrefixCoded is a byte array encoding of
|
||||
// 64-bit numeric values shifted by 0-63 bits
|
||||
type PrefixCoded []byte
|
||||
|
||||
func NewPrefixCodedInt64(in int64, shift uint) (PrefixCoded, error) {
|
||||
rv, _, err := NewPrefixCodedInt64Prealloc(in, shift, nil)
|
||||
return rv, err
|
||||
}
|
||||
|
||||
func NewPrefixCodedInt64Prealloc(in int64, shift uint, prealloc []byte) (
|
||||
rv PrefixCoded, preallocRest []byte, err error) {
|
||||
if shift > 63 {
|
||||
return nil, prealloc, fmt.Errorf("cannot shift %d, must be between 0 and 63", shift)
|
||||
}
|
||||
|
||||
nChars := ((63 - shift) / 7) + 1
|
||||
|
||||
size := int(nChars + 1)
|
||||
if len(prealloc) >= size {
|
||||
rv = PrefixCoded(prealloc[0:size])
|
||||
preallocRest = prealloc[size:]
|
||||
} else {
|
||||
rv = make(PrefixCoded, size)
|
||||
}
|
||||
|
||||
rv[0] = ShiftStartInt64 + byte(shift)
|
||||
|
||||
sortableBits := int64(uint64(in) ^ 0x8000000000000000)
|
||||
sortableBits = int64(uint64(sortableBits) >> shift)
|
||||
for nChars > 0 {
|
||||
// Store 7 bits per byte for compatibility
|
||||
// with UTF-8 encoding of terms
|
||||
rv[nChars] = byte(sortableBits & 0x7f)
|
||||
nChars--
|
||||
sortableBits = int64(uint64(sortableBits) >> 7)
|
||||
}
|
||||
|
||||
return rv, preallocRest, nil
|
||||
}
|
||||
|
||||
func MustNewPrefixCodedInt64(in int64, shift uint) PrefixCoded {
|
||||
rv, err := NewPrefixCodedInt64(in, shift)
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
// Shift returns the number of bits shifted
|
||||
// returns 0 if in uninitialized state
|
||||
func (p PrefixCoded) Shift() (uint, error) {
|
||||
if len(p) > 0 {
|
||||
shift := p[0] - ShiftStartInt64
|
||||
if shift < 0 || shift < 63 {
|
||||
return uint(shift), nil
|
||||
}
|
||||
}
|
||||
return 0, fmt.Errorf("invalid prefix coded value")
|
||||
}
|
||||
|
||||
func (p PrefixCoded) Int64() (int64, error) {
|
||||
shift, err := p.Shift()
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
var sortableBits int64
|
||||
for _, inbyte := range p[1:] {
|
||||
sortableBits <<= 7
|
||||
sortableBits |= int64(inbyte)
|
||||
}
|
||||
return int64(uint64((sortableBits << shift)) ^ 0x8000000000000000), nil
|
||||
}
|
||||
|
||||
func ValidPrefixCodedTerm(p string) (bool, int) {
|
||||
return ValidPrefixCodedTermBytes([]byte(p))
|
||||
}
|
||||
|
||||
func ValidPrefixCodedTermBytes(p []byte) (bool, int) {
|
||||
if len(p) > 0 {
|
||||
if p[0] < ShiftStartInt64 || p[0] > ShiftStartInt64+63 {
|
||||
return false, 0
|
||||
}
|
||||
shift := p[0] - ShiftStartInt64
|
||||
nChars := ((63 - int(shift)) / 7) + 1
|
||||
if len(p) != nChars+1 {
|
||||
return false, 0
|
||||
}
|
||||
return true, int(shift)
|
||||
}
|
||||
return false, 0
|
||||
}
|
|
@ -0,0 +1,89 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
)
|
||||
|
||||
func RegisterAnalyzer(name string, constructor AnalyzerConstructor) {
|
||||
_, exists := analyzers[name]
|
||||
if exists {
|
||||
panic(fmt.Errorf("attempted to register duplicate analyzer named '%s'", name))
|
||||
}
|
||||
analyzers[name] = constructor
|
||||
}
|
||||
|
||||
type AnalyzerConstructor func(config map[string]interface{}, cache *Cache) (*analysis.Analyzer, error)
|
||||
type AnalyzerRegistry map[string]AnalyzerConstructor
|
||||
|
||||
type AnalyzerCache struct {
|
||||
*ConcurrentCache
|
||||
}
|
||||
|
||||
func NewAnalyzerCache() *AnalyzerCache {
|
||||
return &AnalyzerCache{
|
||||
NewConcurrentCache(),
|
||||
}
|
||||
}
|
||||
|
||||
func AnalyzerBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
|
||||
cons, registered := analyzers[name]
|
||||
if !registered {
|
||||
return nil, fmt.Errorf("no analyzer with name or type '%s' registered", name)
|
||||
}
|
||||
analyzer, err := cons(config, cache)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error building analyzer: %v", err)
|
||||
}
|
||||
return analyzer, nil
|
||||
}
|
||||
|
||||
func (c *AnalyzerCache) AnalyzerNamed(name string, cache *Cache) (*analysis.Analyzer, error) {
|
||||
item, err := c.ItemNamed(name, cache, AnalyzerBuild)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return item.(*analysis.Analyzer), nil
|
||||
}
|
||||
|
||||
func (c *AnalyzerCache) DefineAnalyzer(name string, typ string, config map[string]interface{}, cache *Cache) (*analysis.Analyzer, error) {
|
||||
item, err := c.DefineItem(name, typ, config, cache, AnalyzerBuild)
|
||||
if err != nil {
|
||||
if err == ErrAlreadyDefined {
|
||||
return nil, fmt.Errorf("analyzer named '%s' already defined", name)
|
||||
}
|
||||
return nil, err
|
||||
}
|
||||
return item.(*analysis.Analyzer), nil
|
||||
}
|
||||
|
||||
func AnalyzerTypesAndInstances() ([]string, []string) {
|
||||
emptyConfig := map[string]interface{}{}
|
||||
emptyCache := NewCache()
|
||||
var types []string
|
||||
var instances []string
|
||||
for name, cons := range analyzers {
|
||||
_, err := cons(emptyConfig, emptyCache)
|
||||
if err == nil {
|
||||
instances = append(instances, name)
|
||||
} else {
|
||||
types = append(types, name)
|
||||
}
|
||||
}
|
||||
return types, instances
|
||||
}
|
|
@ -0,0 +1,87 @@
|
|||
// Copyright (c) 2016 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"sync"
|
||||
)
|
||||
|
||||
var ErrAlreadyDefined = fmt.Errorf("item already defined")
|
||||
|
||||
type CacheBuild func(name string, config map[string]interface{}, cache *Cache) (interface{}, error)
|
||||
|
||||
type ConcurrentCache struct {
|
||||
mutex sync.RWMutex
|
||||
data map[string]interface{}
|
||||
}
|
||||
|
||||
func NewConcurrentCache() *ConcurrentCache {
|
||||
return &ConcurrentCache{
|
||||
data: make(map[string]interface{}),
|
||||
}
|
||||
}
|
||||
|
||||
func (c *ConcurrentCache) ItemNamed(name string, cache *Cache, build CacheBuild) (interface{}, error) {
|
||||
c.mutex.RLock()
|
||||
item, cached := c.data[name]
|
||||
if cached {
|
||||
c.mutex.RUnlock()
|
||||
return item, nil
|
||||
}
|
||||
// give up read lock
|
||||
c.mutex.RUnlock()
|
||||
// try to build it
|
||||
newItem, err := build(name, nil, cache)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
// acquire write lock
|
||||
c.mutex.Lock()
|
||||
defer c.mutex.Unlock()
|
||||
// check again because it could have been created while trading locks
|
||||
item, cached = c.data[name]
|
||||
if cached {
|
||||
return item, nil
|
||||
}
|
||||
c.data[name] = newItem
|
||||
return newItem, nil
|
||||
}
|
||||
|
||||
func (c *ConcurrentCache) DefineItem(name string, typ string, config map[string]interface{}, cache *Cache, build CacheBuild) (interface{}, error) {
|
||||
c.mutex.RLock()
|
||||
_, cached := c.data[name]
|
||||
if cached {
|
||||
c.mutex.RUnlock()
|
||||
return nil, ErrAlreadyDefined
|
||||
}
|
||||
// give up read lock so others lookups can proceed
|
||||
c.mutex.RUnlock()
|
||||
// really not there, try to build it
|
||||
newItem, err := build(typ, config, cache)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
// now we've built it, acquire lock
|
||||
c.mutex.Lock()
|
||||
defer c.mutex.Unlock()
|
||||
// check again because it could have been created while trading locks
|
||||
_, cached = c.data[name]
|
||||
if cached {
|
||||
return nil, ErrAlreadyDefined
|
||||
}
|
||||
c.data[name] = newItem
|
||||
return newItem, nil
|
||||
}
|
|
@ -0,0 +1,89 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
)
|
||||
|
||||
func RegisterCharFilter(name string, constructor CharFilterConstructor) {
|
||||
_, exists := charFilters[name]
|
||||
if exists {
|
||||
panic(fmt.Errorf("attempted to register duplicate char filter named '%s'", name))
|
||||
}
|
||||
charFilters[name] = constructor
|
||||
}
|
||||
|
||||
type CharFilterConstructor func(config map[string]interface{}, cache *Cache) (analysis.CharFilter, error)
|
||||
type CharFilterRegistry map[string]CharFilterConstructor
|
||||
|
||||
type CharFilterCache struct {
|
||||
*ConcurrentCache
|
||||
}
|
||||
|
||||
func NewCharFilterCache() *CharFilterCache {
|
||||
return &CharFilterCache{
|
||||
NewConcurrentCache(),
|
||||
}
|
||||
}
|
||||
|
||||
func CharFilterBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
|
||||
cons, registered := charFilters[name]
|
||||
if !registered {
|
||||
return nil, fmt.Errorf("no char filter with name or type '%s' registered", name)
|
||||
}
|
||||
charFilter, err := cons(config, cache)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error building char filter: %v", err)
|
||||
}
|
||||
return charFilter, nil
|
||||
}
|
||||
|
||||
func (c *CharFilterCache) CharFilterNamed(name string, cache *Cache) (analysis.CharFilter, error) {
|
||||
item, err := c.ItemNamed(name, cache, CharFilterBuild)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return item.(analysis.CharFilter), nil
|
||||
}
|
||||
|
||||
func (c *CharFilterCache) DefineCharFilter(name string, typ string, config map[string]interface{}, cache *Cache) (analysis.CharFilter, error) {
|
||||
item, err := c.DefineItem(name, typ, config, cache, CharFilterBuild)
|
||||
if err != nil {
|
||||
if err == ErrAlreadyDefined {
|
||||
return nil, fmt.Errorf("char filter named '%s' already defined", name)
|
||||
}
|
||||
return nil, err
|
||||
}
|
||||
return item.(analysis.CharFilter), nil
|
||||
}
|
||||
|
||||
func CharFilterTypesAndInstances() ([]string, []string) {
|
||||
emptyConfig := map[string]interface{}{}
|
||||
emptyCache := NewCache()
|
||||
var types []string
|
||||
var instances []string
|
||||
for name, cons := range charFilters {
|
||||
_, err := cons(emptyConfig, emptyCache)
|
||||
if err == nil {
|
||||
instances = append(instances, name)
|
||||
} else {
|
||||
types = append(types, name)
|
||||
}
|
||||
}
|
||||
return types, instances
|
||||
}
|
|
@ -0,0 +1,89 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
)
|
||||
|
||||
func RegisterDateTimeParser(name string, constructor DateTimeParserConstructor) {
|
||||
_, exists := dateTimeParsers[name]
|
||||
if exists {
|
||||
panic(fmt.Errorf("attempted to register duplicate date time parser named '%s'", name))
|
||||
}
|
||||
dateTimeParsers[name] = constructor
|
||||
}
|
||||
|
||||
type DateTimeParserConstructor func(config map[string]interface{}, cache *Cache) (analysis.DateTimeParser, error)
|
||||
type DateTimeParserRegistry map[string]DateTimeParserConstructor
|
||||
|
||||
type DateTimeParserCache struct {
|
||||
*ConcurrentCache
|
||||
}
|
||||
|
||||
func NewDateTimeParserCache() *DateTimeParserCache {
|
||||
return &DateTimeParserCache{
|
||||
NewConcurrentCache(),
|
||||
}
|
||||
}
|
||||
|
||||
func DateTimeParserBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
|
||||
cons, registered := dateTimeParsers[name]
|
||||
if !registered {
|
||||
return nil, fmt.Errorf("no date time parser with name or type '%s' registered", name)
|
||||
}
|
||||
dateTimeParser, err := cons(config, cache)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error building date time parser: %v", err)
|
||||
}
|
||||
return dateTimeParser, nil
|
||||
}
|
||||
|
||||
func (c *DateTimeParserCache) DateTimeParserNamed(name string, cache *Cache) (analysis.DateTimeParser, error) {
|
||||
item, err := c.ItemNamed(name, cache, DateTimeParserBuild)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return item.(analysis.DateTimeParser), nil
|
||||
}
|
||||
|
||||
func (c *DateTimeParserCache) DefineDateTimeParser(name string, typ string, config map[string]interface{}, cache *Cache) (analysis.DateTimeParser, error) {
|
||||
item, err := c.DefineItem(name, typ, config, cache, DateTimeParserBuild)
|
||||
if err != nil {
|
||||
if err == ErrAlreadyDefined {
|
||||
return nil, fmt.Errorf("date time parser named '%s' already defined", name)
|
||||
}
|
||||
return nil, err
|
||||
}
|
||||
return item.(analysis.DateTimeParser), nil
|
||||
}
|
||||
|
||||
func DateTimeParserTypesAndInstances() ([]string, []string) {
|
||||
emptyConfig := map[string]interface{}{}
|
||||
emptyCache := NewCache()
|
||||
var types []string
|
||||
var instances []string
|
||||
for name, cons := range dateTimeParsers {
|
||||
_, err := cons(emptyConfig, emptyCache)
|
||||
if err == nil {
|
||||
instances = append(instances, name)
|
||||
} else {
|
||||
types = append(types, name)
|
||||
}
|
||||
}
|
||||
return types, instances
|
||||
}
|
89
vendor/github.com/blevesearch/bleve/registry/fragment_formatter.go
generated
vendored
Normal file
89
vendor/github.com/blevesearch/bleve/registry/fragment_formatter.go
generated
vendored
Normal file
|
@ -0,0 +1,89 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/search/highlight"
|
||||
)
|
||||
|
||||
func RegisterFragmentFormatter(name string, constructor FragmentFormatterConstructor) {
|
||||
_, exists := fragmentFormatters[name]
|
||||
if exists {
|
||||
panic(fmt.Errorf("attempted to register duplicate fragment formatter named '%s'", name))
|
||||
}
|
||||
fragmentFormatters[name] = constructor
|
||||
}
|
||||
|
||||
type FragmentFormatterConstructor func(config map[string]interface{}, cache *Cache) (highlight.FragmentFormatter, error)
|
||||
type FragmentFormatterRegistry map[string]FragmentFormatterConstructor
|
||||
|
||||
type FragmentFormatterCache struct {
|
||||
*ConcurrentCache
|
||||
}
|
||||
|
||||
func NewFragmentFormatterCache() *FragmentFormatterCache {
|
||||
return &FragmentFormatterCache{
|
||||
NewConcurrentCache(),
|
||||
}
|
||||
}
|
||||
|
||||
func FragmentFormatterBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
|
||||
cons, registered := fragmentFormatters[name]
|
||||
if !registered {
|
||||
return nil, fmt.Errorf("no fragment formatter with name or type '%s' registered", name)
|
||||
}
|
||||
fragmentFormatter, err := cons(config, cache)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error building fragment formatter: %v", err)
|
||||
}
|
||||
return fragmentFormatter, nil
|
||||
}
|
||||
|
||||
func (c *FragmentFormatterCache) FragmentFormatterNamed(name string, cache *Cache) (highlight.FragmentFormatter, error) {
|
||||
item, err := c.ItemNamed(name, cache, FragmentFormatterBuild)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return item.(highlight.FragmentFormatter), nil
|
||||
}
|
||||
|
||||
func (c *FragmentFormatterCache) DefineFragmentFormatter(name string, typ string, config map[string]interface{}, cache *Cache) (highlight.FragmentFormatter, error) {
|
||||
item, err := c.DefineItem(name, typ, config, cache, FragmentFormatterBuild)
|
||||
if err != nil {
|
||||
if err == ErrAlreadyDefined {
|
||||
return nil, fmt.Errorf("fragment formatter named '%s' already defined", name)
|
||||
}
|
||||
return nil, err
|
||||
}
|
||||
return item.(highlight.FragmentFormatter), nil
|
||||
}
|
||||
|
||||
func FragmentFormatterTypesAndInstances() ([]string, []string) {
|
||||
emptyConfig := map[string]interface{}{}
|
||||
emptyCache := NewCache()
|
||||
var types []string
|
||||
var instances []string
|
||||
for name, cons := range fragmentFormatters {
|
||||
_, err := cons(emptyConfig, emptyCache)
|
||||
if err == nil {
|
||||
instances = append(instances, name)
|
||||
} else {
|
||||
types = append(types, name)
|
||||
}
|
||||
}
|
||||
return types, instances
|
||||
}
|
|
@ -0,0 +1,89 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/search/highlight"
|
||||
)
|
||||
|
||||
func RegisterFragmenter(name string, constructor FragmenterConstructor) {
|
||||
_, exists := fragmenters[name]
|
||||
if exists {
|
||||
panic(fmt.Errorf("attempted to register duplicate fragmenter named '%s'", name))
|
||||
}
|
||||
fragmenters[name] = constructor
|
||||
}
|
||||
|
||||
type FragmenterConstructor func(config map[string]interface{}, cache *Cache) (highlight.Fragmenter, error)
|
||||
type FragmenterRegistry map[string]FragmenterConstructor
|
||||
|
||||
type FragmenterCache struct {
|
||||
*ConcurrentCache
|
||||
}
|
||||
|
||||
func NewFragmenterCache() *FragmenterCache {
|
||||
return &FragmenterCache{
|
||||
NewConcurrentCache(),
|
||||
}
|
||||
}
|
||||
|
||||
func FragmenterBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
|
||||
cons, registered := fragmenters[name]
|
||||
if !registered {
|
||||
return nil, fmt.Errorf("no fragmenter with name or type '%s' registered", name)
|
||||
}
|
||||
fragmenter, err := cons(config, cache)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error building fragmenter: %v", err)
|
||||
}
|
||||
return fragmenter, nil
|
||||
}
|
||||
|
||||
func (c *FragmenterCache) FragmenterNamed(name string, cache *Cache) (highlight.Fragmenter, error) {
|
||||
item, err := c.ItemNamed(name, cache, FragmenterBuild)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return item.(highlight.Fragmenter), nil
|
||||
}
|
||||
|
||||
func (c *FragmenterCache) DefineFragmenter(name string, typ string, config map[string]interface{}, cache *Cache) (highlight.Fragmenter, error) {
|
||||
item, err := c.DefineItem(name, typ, config, cache, FragmenterBuild)
|
||||
if err != nil {
|
||||
if err == ErrAlreadyDefined {
|
||||
return nil, fmt.Errorf("fragmenter named '%s' already defined", name)
|
||||
}
|
||||
return nil, err
|
||||
}
|
||||
return item.(highlight.Fragmenter), nil
|
||||
}
|
||||
|
||||
func FragmenterTypesAndInstances() ([]string, []string) {
|
||||
emptyConfig := map[string]interface{}{}
|
||||
emptyCache := NewCache()
|
||||
var types []string
|
||||
var instances []string
|
||||
for name, cons := range fragmenters {
|
||||
_, err := cons(emptyConfig, emptyCache)
|
||||
if err == nil {
|
||||
instances = append(instances, name)
|
||||
} else {
|
||||
types = append(types, name)
|
||||
}
|
||||
}
|
||||
return types, instances
|
||||
}
|
|
@ -0,0 +1,89 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/search/highlight"
|
||||
)
|
||||
|
||||
func RegisterHighlighter(name string, constructor HighlighterConstructor) {
|
||||
_, exists := highlighters[name]
|
||||
if exists {
|
||||
panic(fmt.Errorf("attempted to register duplicate highlighter named '%s'", name))
|
||||
}
|
||||
highlighters[name] = constructor
|
||||
}
|
||||
|
||||
type HighlighterConstructor func(config map[string]interface{}, cache *Cache) (highlight.Highlighter, error)
|
||||
type HighlighterRegistry map[string]HighlighterConstructor
|
||||
|
||||
type HighlighterCache struct {
|
||||
*ConcurrentCache
|
||||
}
|
||||
|
||||
func NewHighlighterCache() *HighlighterCache {
|
||||
return &HighlighterCache{
|
||||
NewConcurrentCache(),
|
||||
}
|
||||
}
|
||||
|
||||
func HighlighterBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
|
||||
cons, registered := highlighters[name]
|
||||
if !registered {
|
||||
return nil, fmt.Errorf("no highlighter with name or type '%s' registered", name)
|
||||
}
|
||||
highlighter, err := cons(config, cache)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error building highlighter: %v", err)
|
||||
}
|
||||
return highlighter, nil
|
||||
}
|
||||
|
||||
func (c *HighlighterCache) HighlighterNamed(name string, cache *Cache) (highlight.Highlighter, error) {
|
||||
item, err := c.ItemNamed(name, cache, HighlighterBuild)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return item.(highlight.Highlighter), nil
|
||||
}
|
||||
|
||||
func (c *HighlighterCache) DefineHighlighter(name string, typ string, config map[string]interface{}, cache *Cache) (highlight.Highlighter, error) {
|
||||
item, err := c.DefineItem(name, typ, config, cache, HighlighterBuild)
|
||||
if err != nil {
|
||||
if err == ErrAlreadyDefined {
|
||||
return nil, fmt.Errorf("highlighter named '%s' already defined", name)
|
||||
}
|
||||
return nil, err
|
||||
}
|
||||
return item.(highlight.Highlighter), nil
|
||||
}
|
||||
|
||||
func HighlighterTypesAndInstances() ([]string, []string) {
|
||||
emptyConfig := map[string]interface{}{}
|
||||
emptyCache := NewCache()
|
||||
var types []string
|
||||
var instances []string
|
||||
for name, cons := range highlighters {
|
||||
_, err := cons(emptyConfig, emptyCache)
|
||||
if err == nil {
|
||||
instances = append(instances, name)
|
||||
} else {
|
||||
types = append(types, name)
|
||||
}
|
||||
}
|
||||
return types, instances
|
||||
}
|
|
@ -0,0 +1,45 @@
|
|||
// Copyright (c) 2015 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/index"
|
||||
)
|
||||
|
||||
func RegisterIndexType(name string, constructor IndexTypeConstructor) {
|
||||
_, exists := indexTypes[name]
|
||||
if exists {
|
||||
panic(fmt.Errorf("attempted to register duplicate index encoding named '%s'", name))
|
||||
}
|
||||
indexTypes[name] = constructor
|
||||
}
|
||||
|
||||
type IndexTypeConstructor func(storeName string, storeConfig map[string]interface{}, analysisQueue *index.AnalysisQueue) (index.Index, error)
|
||||
type IndexTypeRegistry map[string]IndexTypeConstructor
|
||||
|
||||
func IndexTypeConstructorByName(name string) IndexTypeConstructor {
|
||||
return indexTypes[name]
|
||||
}
|
||||
|
||||
func IndexTypesAndInstances() ([]string, []string) {
|
||||
var types []string
|
||||
var instances []string
|
||||
for name := range stores {
|
||||
types = append(types, name)
|
||||
}
|
||||
return types, instances
|
||||
}
|
|
@ -0,0 +1,184 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
"github.com/blevesearch/bleve/search/highlight"
|
||||
)
|
||||
|
||||
var stores = make(KVStoreRegistry, 0)
|
||||
var indexTypes = make(IndexTypeRegistry, 0)
|
||||
|
||||
// highlight
|
||||
var fragmentFormatters = make(FragmentFormatterRegistry, 0)
|
||||
var fragmenters = make(FragmenterRegistry, 0)
|
||||
var highlighters = make(HighlighterRegistry, 0)
|
||||
|
||||
// analysis
|
||||
var charFilters = make(CharFilterRegistry, 0)
|
||||
var tokenizers = make(TokenizerRegistry, 0)
|
||||
var tokenMaps = make(TokenMapRegistry, 0)
|
||||
var tokenFilters = make(TokenFilterRegistry, 0)
|
||||
var analyzers = make(AnalyzerRegistry, 0)
|
||||
var dateTimeParsers = make(DateTimeParserRegistry, 0)
|
||||
|
||||
type Cache struct {
|
||||
CharFilters *CharFilterCache
|
||||
Tokenizers *TokenizerCache
|
||||
TokenMaps *TokenMapCache
|
||||
TokenFilters *TokenFilterCache
|
||||
Analyzers *AnalyzerCache
|
||||
DateTimeParsers *DateTimeParserCache
|
||||
FragmentFormatters *FragmentFormatterCache
|
||||
Fragmenters *FragmenterCache
|
||||
Highlighters *HighlighterCache
|
||||
}
|
||||
|
||||
func NewCache() *Cache {
|
||||
return &Cache{
|
||||
CharFilters: NewCharFilterCache(),
|
||||
Tokenizers: NewTokenizerCache(),
|
||||
TokenMaps: NewTokenMapCache(),
|
||||
TokenFilters: NewTokenFilterCache(),
|
||||
Analyzers: NewAnalyzerCache(),
|
||||
DateTimeParsers: NewDateTimeParserCache(),
|
||||
FragmentFormatters: NewFragmentFormatterCache(),
|
||||
Fragmenters: NewFragmenterCache(),
|
||||
Highlighters: NewHighlighterCache(),
|
||||
}
|
||||
}
|
||||
|
||||
func typeFromConfig(config map[string]interface{}) (string, error) {
|
||||
prop, ok := config["type"]
|
||||
if !ok {
|
||||
return "", fmt.Errorf("'type' property is not defined")
|
||||
}
|
||||
typ, ok := prop.(string)
|
||||
if !ok {
|
||||
return "", fmt.Errorf("'type' property must be a string, not %T", prop)
|
||||
}
|
||||
return typ, nil
|
||||
}
|
||||
|
||||
func (c *Cache) CharFilterNamed(name string) (analysis.CharFilter, error) {
|
||||
return c.CharFilters.CharFilterNamed(name, c)
|
||||
}
|
||||
|
||||
func (c *Cache) DefineCharFilter(name string, config map[string]interface{}) (analysis.CharFilter, error) {
|
||||
typ, err := typeFromConfig(config)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return c.CharFilters.DefineCharFilter(name, typ, config, c)
|
||||
}
|
||||
|
||||
func (c *Cache) TokenizerNamed(name string) (analysis.Tokenizer, error) {
|
||||
return c.Tokenizers.TokenizerNamed(name, c)
|
||||
}
|
||||
|
||||
func (c *Cache) DefineTokenizer(name string, config map[string]interface{}) (analysis.Tokenizer, error) {
|
||||
typ, err := typeFromConfig(config)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot resolve '%s' tokenizer type: %s", name, err)
|
||||
}
|
||||
return c.Tokenizers.DefineTokenizer(name, typ, config, c)
|
||||
}
|
||||
|
||||
func (c *Cache) TokenMapNamed(name string) (analysis.TokenMap, error) {
|
||||
return c.TokenMaps.TokenMapNamed(name, c)
|
||||
}
|
||||
|
||||
func (c *Cache) DefineTokenMap(name string, config map[string]interface{}) (analysis.TokenMap, error) {
|
||||
typ, err := typeFromConfig(config)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return c.TokenMaps.DefineTokenMap(name, typ, config, c)
|
||||
}
|
||||
|
||||
func (c *Cache) TokenFilterNamed(name string) (analysis.TokenFilter, error) {
|
||||
return c.TokenFilters.TokenFilterNamed(name, c)
|
||||
}
|
||||
|
||||
func (c *Cache) DefineTokenFilter(name string, config map[string]interface{}) (analysis.TokenFilter, error) {
|
||||
typ, err := typeFromConfig(config)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return c.TokenFilters.DefineTokenFilter(name, typ, config, c)
|
||||
}
|
||||
|
||||
func (c *Cache) AnalyzerNamed(name string) (*analysis.Analyzer, error) {
|
||||
return c.Analyzers.AnalyzerNamed(name, c)
|
||||
}
|
||||
|
||||
func (c *Cache) DefineAnalyzer(name string, config map[string]interface{}) (*analysis.Analyzer, error) {
|
||||
typ, err := typeFromConfig(config)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return c.Analyzers.DefineAnalyzer(name, typ, config, c)
|
||||
}
|
||||
|
||||
func (c *Cache) DateTimeParserNamed(name string) (analysis.DateTimeParser, error) {
|
||||
return c.DateTimeParsers.DateTimeParserNamed(name, c)
|
||||
}
|
||||
|
||||
func (c *Cache) DefineDateTimeParser(name string, config map[string]interface{}) (analysis.DateTimeParser, error) {
|
||||
typ, err := typeFromConfig(config)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return c.DateTimeParsers.DefineDateTimeParser(name, typ, config, c)
|
||||
}
|
||||
|
||||
func (c *Cache) FragmentFormatterNamed(name string) (highlight.FragmentFormatter, error) {
|
||||
return c.FragmentFormatters.FragmentFormatterNamed(name, c)
|
||||
}
|
||||
|
||||
func (c *Cache) DefineFragmentFormatter(name string, config map[string]interface{}) (highlight.FragmentFormatter, error) {
|
||||
typ, err := typeFromConfig(config)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return c.FragmentFormatters.DefineFragmentFormatter(name, typ, config, c)
|
||||
}
|
||||
|
||||
func (c *Cache) FragmenterNamed(name string) (highlight.Fragmenter, error) {
|
||||
return c.Fragmenters.FragmenterNamed(name, c)
|
||||
}
|
||||
|
||||
func (c *Cache) DefineFragmenter(name string, config map[string]interface{}) (highlight.Fragmenter, error) {
|
||||
typ, err := typeFromConfig(config)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return c.Fragmenters.DefineFragmenter(name, typ, config, c)
|
||||
}
|
||||
|
||||
func (c *Cache) HighlighterNamed(name string) (highlight.Highlighter, error) {
|
||||
return c.Highlighters.HighlighterNamed(name, c)
|
||||
}
|
||||
|
||||
func (c *Cache) DefineHighlighter(name string, config map[string]interface{}) (highlight.Highlighter, error) {
|
||||
typ, err := typeFromConfig(config)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return c.Highlighters.DefineHighlighter(name, typ, config, c)
|
||||
}
|
|
@ -0,0 +1,51 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/index/store"
|
||||
)
|
||||
|
||||
func RegisterKVStore(name string, constructor KVStoreConstructor) {
|
||||
_, exists := stores[name]
|
||||
if exists {
|
||||
panic(fmt.Errorf("attempted to register duplicate store named '%s'", name))
|
||||
}
|
||||
stores[name] = constructor
|
||||
}
|
||||
|
||||
// KVStoreConstructor is used to build a KVStore of a specific type when
|
||||
// specificied by the index configuration. In addition to meeting the
|
||||
// store.KVStore interface, KVStores must also support this constructor.
|
||||
// Note that currently the values of config must
|
||||
// be able to be marshaled and unmarshaled using the encoding/json library (used
|
||||
// when reading/writing the index metadata file).
|
||||
type KVStoreConstructor func(mo store.MergeOperator, config map[string]interface{}) (store.KVStore, error)
|
||||
type KVStoreRegistry map[string]KVStoreConstructor
|
||||
|
||||
func KVStoreConstructorByName(name string) KVStoreConstructor {
|
||||
return stores[name]
|
||||
}
|
||||
|
||||
func KVStoreTypesAndInstances() ([]string, []string) {
|
||||
var types []string
|
||||
var instances []string
|
||||
for name := range stores {
|
||||
types = append(types, name)
|
||||
}
|
||||
return types, instances
|
||||
}
|
|
@ -0,0 +1,89 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
)
|
||||
|
||||
func RegisterTokenFilter(name string, constructor TokenFilterConstructor) {
|
||||
_, exists := tokenFilters[name]
|
||||
if exists {
|
||||
panic(fmt.Errorf("attempted to register duplicate token filter named '%s'", name))
|
||||
}
|
||||
tokenFilters[name] = constructor
|
||||
}
|
||||
|
||||
type TokenFilterConstructor func(config map[string]interface{}, cache *Cache) (analysis.TokenFilter, error)
|
||||
type TokenFilterRegistry map[string]TokenFilterConstructor
|
||||
|
||||
type TokenFilterCache struct {
|
||||
*ConcurrentCache
|
||||
}
|
||||
|
||||
func NewTokenFilterCache() *TokenFilterCache {
|
||||
return &TokenFilterCache{
|
||||
NewConcurrentCache(),
|
||||
}
|
||||
}
|
||||
|
||||
func TokenFilterBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
|
||||
cons, registered := tokenFilters[name]
|
||||
if !registered {
|
||||
return nil, fmt.Errorf("no token filter with name or type '%s' registered", name)
|
||||
}
|
||||
tokenFilter, err := cons(config, cache)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error building token filter: %v", err)
|
||||
}
|
||||
return tokenFilter, nil
|
||||
}
|
||||
|
||||
func (c *TokenFilterCache) TokenFilterNamed(name string, cache *Cache) (analysis.TokenFilter, error) {
|
||||
item, err := c.ItemNamed(name, cache, TokenFilterBuild)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return item.(analysis.TokenFilter), nil
|
||||
}
|
||||
|
||||
func (c *TokenFilterCache) DefineTokenFilter(name string, typ string, config map[string]interface{}, cache *Cache) (analysis.TokenFilter, error) {
|
||||
item, err := c.DefineItem(name, typ, config, cache, TokenFilterBuild)
|
||||
if err != nil {
|
||||
if err == ErrAlreadyDefined {
|
||||
return nil, fmt.Errorf("token filter named '%s' already defined", name)
|
||||
}
|
||||
return nil, err
|
||||
}
|
||||
return item.(analysis.TokenFilter), nil
|
||||
}
|
||||
|
||||
func TokenFilterTypesAndInstances() ([]string, []string) {
|
||||
emptyConfig := map[string]interface{}{}
|
||||
emptyCache := NewCache()
|
||||
var types []string
|
||||
var instances []string
|
||||
for name, cons := range tokenFilters {
|
||||
_, err := cons(emptyConfig, emptyCache)
|
||||
if err == nil {
|
||||
instances = append(instances, name)
|
||||
} else {
|
||||
types = append(types, name)
|
||||
}
|
||||
}
|
||||
return types, instances
|
||||
}
|
|
@ -0,0 +1,89 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
)
|
||||
|
||||
func RegisterTokenMap(name string, constructor TokenMapConstructor) {
|
||||
_, exists := tokenMaps[name]
|
||||
if exists {
|
||||
panic(fmt.Errorf("attempted to register duplicate token map named '%s'", name))
|
||||
}
|
||||
tokenMaps[name] = constructor
|
||||
}
|
||||
|
||||
type TokenMapConstructor func(config map[string]interface{}, cache *Cache) (analysis.TokenMap, error)
|
||||
type TokenMapRegistry map[string]TokenMapConstructor
|
||||
|
||||
type TokenMapCache struct {
|
||||
*ConcurrentCache
|
||||
}
|
||||
|
||||
func NewTokenMapCache() *TokenMapCache {
|
||||
return &TokenMapCache{
|
||||
NewConcurrentCache(),
|
||||
}
|
||||
}
|
||||
|
||||
func TokenMapBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
|
||||
cons, registered := tokenMaps[name]
|
||||
if !registered {
|
||||
return nil, fmt.Errorf("no token map with name or type '%s' registered", name)
|
||||
}
|
||||
tokenMap, err := cons(config, cache)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error building token map: %v", err)
|
||||
}
|
||||
return tokenMap, nil
|
||||
}
|
||||
|
||||
func (c *TokenMapCache) TokenMapNamed(name string, cache *Cache) (analysis.TokenMap, error) {
|
||||
item, err := c.ItemNamed(name, cache, TokenMapBuild)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return item.(analysis.TokenMap), nil
|
||||
}
|
||||
|
||||
func (c *TokenMapCache) DefineTokenMap(name string, typ string, config map[string]interface{}, cache *Cache) (analysis.TokenMap, error) {
|
||||
item, err := c.DefineItem(name, typ, config, cache, TokenMapBuild)
|
||||
if err != nil {
|
||||
if err == ErrAlreadyDefined {
|
||||
return nil, fmt.Errorf("token map named '%s' already defined", name)
|
||||
}
|
||||
return nil, err
|
||||
}
|
||||
return item.(analysis.TokenMap), nil
|
||||
}
|
||||
|
||||
func TokenMapTypesAndInstances() ([]string, []string) {
|
||||
emptyConfig := map[string]interface{}{}
|
||||
emptyCache := NewCache()
|
||||
var types []string
|
||||
var instances []string
|
||||
for name, cons := range tokenMaps {
|
||||
_, err := cons(emptyConfig, emptyCache)
|
||||
if err == nil {
|
||||
instances = append(instances, name)
|
||||
} else {
|
||||
types = append(types, name)
|
||||
}
|
||||
}
|
||||
return types, instances
|
||||
}
|
|
@ -0,0 +1,89 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package registry
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
)
|
||||
|
||||
func RegisterTokenizer(name string, constructor TokenizerConstructor) {
|
||||
_, exists := tokenizers[name]
|
||||
if exists {
|
||||
panic(fmt.Errorf("attempted to register duplicate tokenizer named '%s'", name))
|
||||
}
|
||||
tokenizers[name] = constructor
|
||||
}
|
||||
|
||||
type TokenizerConstructor func(config map[string]interface{}, cache *Cache) (analysis.Tokenizer, error)
|
||||
type TokenizerRegistry map[string]TokenizerConstructor
|
||||
|
||||
type TokenizerCache struct {
|
||||
*ConcurrentCache
|
||||
}
|
||||
|
||||
func NewTokenizerCache() *TokenizerCache {
|
||||
return &TokenizerCache{
|
||||
NewConcurrentCache(),
|
||||
}
|
||||
}
|
||||
|
||||
func TokenizerBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
|
||||
cons, registered := tokenizers[name]
|
||||
if !registered {
|
||||
return nil, fmt.Errorf("no tokenizer with name or type '%s' registered", name)
|
||||
}
|
||||
tokenizer, err := cons(config, cache)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error building tokenizer: %v", err)
|
||||
}
|
||||
return tokenizer, nil
|
||||
}
|
||||
|
||||
func (c *TokenizerCache) TokenizerNamed(name string, cache *Cache) (analysis.Tokenizer, error) {
|
||||
item, err := c.ItemNamed(name, cache, TokenizerBuild)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return item.(analysis.Tokenizer), nil
|
||||
}
|
||||
|
||||
func (c *TokenizerCache) DefineTokenizer(name string, typ string, config map[string]interface{}, cache *Cache) (analysis.Tokenizer, error) {
|
||||
item, err := c.DefineItem(name, typ, config, cache, TokenizerBuild)
|
||||
if err != nil {
|
||||
if err == ErrAlreadyDefined {
|
||||
return nil, fmt.Errorf("tokenizer named '%s' already defined", name)
|
||||
}
|
||||
return nil, err
|
||||
}
|
||||
return item.(analysis.Tokenizer), nil
|
||||
}
|
||||
|
||||
func TokenizerTypesAndInstances() ([]string, []string) {
|
||||
emptyConfig := map[string]interface{}{}
|
||||
emptyCache := NewCache()
|
||||
var types []string
|
||||
var instances []string
|
||||
for name, cons := range tokenizers {
|
||||
_, err := cons(emptyConfig, emptyCache)
|
||||
if err == nil {
|
||||
instances = append(instances, name)
|
||||
} else {
|
||||
types = append(types, name)
|
||||
}
|
||||
}
|
||||
return types, instances
|
||||
}
|
|
@ -0,0 +1,52 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package search
|
||||
|
||||
import (
|
||||
"context"
|
||||
"time"
|
||||
|
||||
"github.com/blevesearch/bleve/index"
|
||||
)
|
||||
|
||||
type Collector interface {
|
||||
Collect(ctx context.Context, searcher Searcher, reader index.IndexReader) error
|
||||
Results() DocumentMatchCollection
|
||||
Total() uint64
|
||||
MaxScore() float64
|
||||
Took() time.Duration
|
||||
SetFacetsBuilder(facetsBuilder *FacetsBuilder)
|
||||
FacetResults() FacetResults
|
||||
}
|
||||
|
||||
// DocumentMatchHandler is the type of document match callback
|
||||
// bleve will invoke during the search.
|
||||
// Eventually, bleve will indicate the completion of an ongoing search,
|
||||
// by passing a nil value for the document match callback.
|
||||
// The application should take a copy of the hit/documentMatch
|
||||
// if it wish to own it or need prolonged access to it.
|
||||
type DocumentMatchHandler func(hit *DocumentMatch) error
|
||||
|
||||
type MakeDocumentMatchHandlerKeyType string
|
||||
|
||||
var MakeDocumentMatchHandlerKey = MakeDocumentMatchHandlerKeyType(
|
||||
"MakeDocumentMatchHandlerKey")
|
||||
|
||||
// MakeDocumentMatchHandler is an optional DocumentMatchHandler
|
||||
// builder function which the applications can pass to bleve.
|
||||
// These builder methods gives a DocumentMatchHandler function
|
||||
// to bleve, which it will invoke on every document matches.
|
||||
type MakeDocumentMatchHandler func(ctx *SearchContext) (
|
||||
callback DocumentMatchHandler, loadID bool, err error)
|
|
@ -0,0 +1,55 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package search
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"reflect"
|
||||
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeExplanation int
|
||||
|
||||
func init() {
|
||||
var e Explanation
|
||||
reflectStaticSizeExplanation = int(reflect.TypeOf(e).Size())
|
||||
}
|
||||
|
||||
type Explanation struct {
|
||||
Value float64 `json:"value"`
|
||||
Message string `json:"message"`
|
||||
Children []*Explanation `json:"children,omitempty"`
|
||||
}
|
||||
|
||||
func (expl *Explanation) String() string {
|
||||
js, err := json.MarshalIndent(expl, "", " ")
|
||||
if err != nil {
|
||||
return fmt.Sprintf("error serializing explanation to json: %v", err)
|
||||
}
|
||||
return string(js)
|
||||
}
|
||||
|
||||
func (expl *Explanation) Size() int {
|
||||
sizeInBytes := reflectStaticSizeExplanation + size.SizeOfPtr +
|
||||
len(expl.Message)
|
||||
|
||||
for _, entry := range expl.Children {
|
||||
sizeInBytes += entry.Size()
|
||||
}
|
||||
|
||||
return sizeInBytes
|
||||
}
|
|
@ -0,0 +1,341 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package search
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
"sort"
|
||||
|
||||
"github.com/blevesearch/bleve/index"
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeFacetsBuilder int
|
||||
var reflectStaticSizeFacetResult int
|
||||
var reflectStaticSizeTermFacet int
|
||||
var reflectStaticSizeNumericRangeFacet int
|
||||
var reflectStaticSizeDateRangeFacet int
|
||||
|
||||
func init() {
|
||||
var fb FacetsBuilder
|
||||
reflectStaticSizeFacetsBuilder = int(reflect.TypeOf(fb).Size())
|
||||
var fr FacetResult
|
||||
reflectStaticSizeFacetResult = int(reflect.TypeOf(fr).Size())
|
||||
var tf TermFacet
|
||||
reflectStaticSizeTermFacet = int(reflect.TypeOf(tf).Size())
|
||||
var nrf NumericRangeFacet
|
||||
reflectStaticSizeNumericRangeFacet = int(reflect.TypeOf(nrf).Size())
|
||||
var drf DateRangeFacet
|
||||
reflectStaticSizeDateRangeFacet = int(reflect.TypeOf(drf).Size())
|
||||
}
|
||||
|
||||
type FacetBuilder interface {
|
||||
StartDoc()
|
||||
UpdateVisitor(field string, term []byte)
|
||||
EndDoc()
|
||||
|
||||
Result() *FacetResult
|
||||
Field() string
|
||||
|
||||
Size() int
|
||||
}
|
||||
|
||||
type FacetsBuilder struct {
|
||||
indexReader index.IndexReader
|
||||
facetNames []string
|
||||
facets []FacetBuilder
|
||||
fields []string
|
||||
}
|
||||
|
||||
func NewFacetsBuilder(indexReader index.IndexReader) *FacetsBuilder {
|
||||
return &FacetsBuilder{
|
||||
indexReader: indexReader,
|
||||
}
|
||||
}
|
||||
|
||||
func (fb *FacetsBuilder) Size() int {
|
||||
sizeInBytes := reflectStaticSizeFacetsBuilder + size.SizeOfPtr
|
||||
|
||||
for k, v := range fb.facets {
|
||||
sizeInBytes += size.SizeOfString + v.Size() + len(fb.facetNames[k])
|
||||
}
|
||||
|
||||
for _, entry := range fb.fields {
|
||||
sizeInBytes += size.SizeOfString + len(entry)
|
||||
}
|
||||
|
||||
return sizeInBytes
|
||||
}
|
||||
|
||||
func (fb *FacetsBuilder) Add(name string, facetBuilder FacetBuilder) {
|
||||
fb.facetNames = append(fb.facetNames, name)
|
||||
fb.facets = append(fb.facets, facetBuilder)
|
||||
fb.fields = append(fb.fields, facetBuilder.Field())
|
||||
}
|
||||
|
||||
func (fb *FacetsBuilder) RequiredFields() []string {
|
||||
return fb.fields
|
||||
}
|
||||
|
||||
func (fb *FacetsBuilder) StartDoc() {
|
||||
for _, facetBuilder := range fb.facets {
|
||||
facetBuilder.StartDoc()
|
||||
}
|
||||
}
|
||||
|
||||
func (fb *FacetsBuilder) EndDoc() {
|
||||
for _, facetBuilder := range fb.facets {
|
||||
facetBuilder.EndDoc()
|
||||
}
|
||||
}
|
||||
|
||||
func (fb *FacetsBuilder) UpdateVisitor(field string, term []byte) {
|
||||
for _, facetBuilder := range fb.facets {
|
||||
facetBuilder.UpdateVisitor(field, term)
|
||||
}
|
||||
}
|
||||
|
||||
type TermFacet struct {
|
||||
Term string `json:"term"`
|
||||
Count int `json:"count"`
|
||||
}
|
||||
|
||||
type TermFacets []*TermFacet
|
||||
|
||||
func (tf TermFacets) Add(termFacet *TermFacet) TermFacets {
|
||||
for _, existingTerm := range tf {
|
||||
if termFacet.Term == existingTerm.Term {
|
||||
existingTerm.Count += termFacet.Count
|
||||
return tf
|
||||
}
|
||||
}
|
||||
// if we got here it wasn't already in the existing terms
|
||||
tf = append(tf, termFacet)
|
||||
return tf
|
||||
}
|
||||
|
||||
func (tf TermFacets) Len() int { return len(tf) }
|
||||
func (tf TermFacets) Swap(i, j int) { tf[i], tf[j] = tf[j], tf[i] }
|
||||
func (tf TermFacets) Less(i, j int) bool {
|
||||
if tf[i].Count == tf[j].Count {
|
||||
return tf[i].Term < tf[j].Term
|
||||
}
|
||||
return tf[i].Count > tf[j].Count
|
||||
}
|
||||
|
||||
type NumericRangeFacet struct {
|
||||
Name string `json:"name"`
|
||||
Min *float64 `json:"min,omitempty"`
|
||||
Max *float64 `json:"max,omitempty"`
|
||||
Count int `json:"count"`
|
||||
}
|
||||
|
||||
func (nrf *NumericRangeFacet) Same(other *NumericRangeFacet) bool {
|
||||
if nrf.Min == nil && other.Min != nil {
|
||||
return false
|
||||
}
|
||||
if nrf.Min != nil && other.Min == nil {
|
||||
return false
|
||||
}
|
||||
if nrf.Min != nil && other.Min != nil && *nrf.Min != *other.Min {
|
||||
return false
|
||||
}
|
||||
if nrf.Max == nil && other.Max != nil {
|
||||
return false
|
||||
}
|
||||
if nrf.Max != nil && other.Max == nil {
|
||||
return false
|
||||
}
|
||||
if nrf.Max != nil && other.Max != nil && *nrf.Max != *other.Max {
|
||||
return false
|
||||
}
|
||||
|
||||
return true
|
||||
}
|
||||
|
||||
type NumericRangeFacets []*NumericRangeFacet
|
||||
|
||||
func (nrf NumericRangeFacets) Add(numericRangeFacet *NumericRangeFacet) NumericRangeFacets {
|
||||
for _, existingNr := range nrf {
|
||||
if numericRangeFacet.Same(existingNr) {
|
||||
existingNr.Count += numericRangeFacet.Count
|
||||
return nrf
|
||||
}
|
||||
}
|
||||
// if we got here it wasn't already in the existing terms
|
||||
nrf = append(nrf, numericRangeFacet)
|
||||
return nrf
|
||||
}
|
||||
|
||||
func (nrf NumericRangeFacets) Len() int { return len(nrf) }
|
||||
func (nrf NumericRangeFacets) Swap(i, j int) { nrf[i], nrf[j] = nrf[j], nrf[i] }
|
||||
func (nrf NumericRangeFacets) Less(i, j int) bool {
|
||||
if nrf[i].Count == nrf[j].Count {
|
||||
return nrf[i].Name < nrf[j].Name
|
||||
}
|
||||
return nrf[i].Count > nrf[j].Count
|
||||
}
|
||||
|
||||
type DateRangeFacet struct {
|
||||
Name string `json:"name"`
|
||||
Start *string `json:"start,omitempty"`
|
||||
End *string `json:"end,omitempty"`
|
||||
Count int `json:"count"`
|
||||
}
|
||||
|
||||
func (drf *DateRangeFacet) Same(other *DateRangeFacet) bool {
|
||||
if drf.Start == nil && other.Start != nil {
|
||||
return false
|
||||
}
|
||||
if drf.Start != nil && other.Start == nil {
|
||||
return false
|
||||
}
|
||||
if drf.Start != nil && other.Start != nil && *drf.Start != *other.Start {
|
||||
return false
|
||||
}
|
||||
if drf.End == nil && other.End != nil {
|
||||
return false
|
||||
}
|
||||
if drf.End != nil && other.End == nil {
|
||||
return false
|
||||
}
|
||||
if drf.End != nil && other.End != nil && *drf.End != *other.End {
|
||||
return false
|
||||
}
|
||||
|
||||
return true
|
||||
}
|
||||
|
||||
type DateRangeFacets []*DateRangeFacet
|
||||
|
||||
func (drf DateRangeFacets) Add(dateRangeFacet *DateRangeFacet) DateRangeFacets {
|
||||
for _, existingDr := range drf {
|
||||
if dateRangeFacet.Same(existingDr) {
|
||||
existingDr.Count += dateRangeFacet.Count
|
||||
return drf
|
||||
}
|
||||
}
|
||||
// if we got here it wasn't already in the existing terms
|
||||
drf = append(drf, dateRangeFacet)
|
||||
return drf
|
||||
}
|
||||
|
||||
func (drf DateRangeFacets) Len() int { return len(drf) }
|
||||
func (drf DateRangeFacets) Swap(i, j int) { drf[i], drf[j] = drf[j], drf[i] }
|
||||
func (drf DateRangeFacets) Less(i, j int) bool {
|
||||
if drf[i].Count == drf[j].Count {
|
||||
return drf[i].Name < drf[j].Name
|
||||
}
|
||||
return drf[i].Count > drf[j].Count
|
||||
}
|
||||
|
||||
type FacetResult struct {
|
||||
Field string `json:"field"`
|
||||
Total int `json:"total"`
|
||||
Missing int `json:"missing"`
|
||||
Other int `json:"other"`
|
||||
Terms TermFacets `json:"terms,omitempty"`
|
||||
NumericRanges NumericRangeFacets `json:"numeric_ranges,omitempty"`
|
||||
DateRanges DateRangeFacets `json:"date_ranges,omitempty"`
|
||||
}
|
||||
|
||||
func (fr *FacetResult) Size() int {
|
||||
return reflectStaticSizeFacetResult + size.SizeOfPtr +
|
||||
len(fr.Field) +
|
||||
len(fr.Terms)*(reflectStaticSizeTermFacet+size.SizeOfPtr) +
|
||||
len(fr.NumericRanges)*(reflectStaticSizeNumericRangeFacet+size.SizeOfPtr) +
|
||||
len(fr.DateRanges)*(reflectStaticSizeDateRangeFacet+size.SizeOfPtr)
|
||||
}
|
||||
|
||||
func (fr *FacetResult) Merge(other *FacetResult) {
|
||||
fr.Total += other.Total
|
||||
fr.Missing += other.Missing
|
||||
fr.Other += other.Other
|
||||
if fr.Terms != nil && other.Terms != nil {
|
||||
for _, term := range other.Terms {
|
||||
fr.Terms = fr.Terms.Add(term)
|
||||
}
|
||||
}
|
||||
if fr.NumericRanges != nil && other.NumericRanges != nil {
|
||||
for _, nr := range other.NumericRanges {
|
||||
fr.NumericRanges = fr.NumericRanges.Add(nr)
|
||||
}
|
||||
}
|
||||
if fr.DateRanges != nil && other.DateRanges != nil {
|
||||
for _, dr := range other.DateRanges {
|
||||
fr.DateRanges = fr.DateRanges.Add(dr)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (fr *FacetResult) Fixup(size int) {
|
||||
if fr.Terms != nil {
|
||||
sort.Sort(fr.Terms)
|
||||
if len(fr.Terms) > size {
|
||||
moveToOther := fr.Terms[size:]
|
||||
for _, mto := range moveToOther {
|
||||
fr.Other += mto.Count
|
||||
}
|
||||
fr.Terms = fr.Terms[0:size]
|
||||
}
|
||||
} else if fr.NumericRanges != nil {
|
||||
sort.Sort(fr.NumericRanges)
|
||||
if len(fr.NumericRanges) > size {
|
||||
moveToOther := fr.NumericRanges[size:]
|
||||
for _, mto := range moveToOther {
|
||||
fr.Other += mto.Count
|
||||
}
|
||||
fr.NumericRanges = fr.NumericRanges[0:size]
|
||||
}
|
||||
} else if fr.DateRanges != nil {
|
||||
sort.Sort(fr.DateRanges)
|
||||
if len(fr.DateRanges) > size {
|
||||
moveToOther := fr.DateRanges[size:]
|
||||
for _, mto := range moveToOther {
|
||||
fr.Other += mto.Count
|
||||
}
|
||||
fr.DateRanges = fr.DateRanges[0:size]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type FacetResults map[string]*FacetResult
|
||||
|
||||
func (fr FacetResults) Merge(other FacetResults) {
|
||||
for name, oFacetResult := range other {
|
||||
facetResult, ok := fr[name]
|
||||
if ok {
|
||||
facetResult.Merge(oFacetResult)
|
||||
} else {
|
||||
fr[name] = oFacetResult
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (fr FacetResults) Fixup(name string, size int) {
|
||||
facetResult, ok := fr[name]
|
||||
if ok {
|
||||
facetResult.Fixup(size)
|
||||
}
|
||||
}
|
||||
|
||||
func (fb *FacetsBuilder) Results() FacetResults {
|
||||
fr := make(FacetResults)
|
||||
for i, facetBuilder := range fb.facets {
|
||||
facetResult := facetBuilder.Result()
|
||||
fr[fb.facetNames[i]] = facetResult
|
||||
}
|
||||
return fr
|
||||
}
|
64
vendor/github.com/blevesearch/bleve/search/highlight/highlighter.go
generated
vendored
Normal file
64
vendor/github.com/blevesearch/bleve/search/highlight/highlighter.go
generated
vendored
Normal file
|
@ -0,0 +1,64 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package highlight
|
||||
|
||||
import (
|
||||
"github.com/blevesearch/bleve/document"
|
||||
"github.com/blevesearch/bleve/search"
|
||||
)
|
||||
|
||||
type Fragment struct {
|
||||
Orig []byte
|
||||
ArrayPositions []uint64
|
||||
Start int
|
||||
End int
|
||||
Score float64
|
||||
Index int // used by heap
|
||||
}
|
||||
|
||||
func (f *Fragment) Overlaps(other *Fragment) bool {
|
||||
if other.Start >= f.Start && other.Start < f.End {
|
||||
return true
|
||||
} else if f.Start >= other.Start && f.Start < other.End {
|
||||
return true
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
type Fragmenter interface {
|
||||
Fragment([]byte, TermLocations) []*Fragment
|
||||
}
|
||||
|
||||
type FragmentFormatter interface {
|
||||
Format(f *Fragment, orderedTermLocations TermLocations) string
|
||||
}
|
||||
|
||||
type FragmentScorer interface {
|
||||
Score(f *Fragment) float64
|
||||
}
|
||||
|
||||
type Highlighter interface {
|
||||
Fragmenter() Fragmenter
|
||||
SetFragmenter(Fragmenter)
|
||||
|
||||
FragmentFormatter() FragmentFormatter
|
||||
SetFragmentFormatter(FragmentFormatter)
|
||||
|
||||
Separator() string
|
||||
SetSeparator(string)
|
||||
|
||||
BestFragmentInField(*search.DocumentMatch, *document.Document, string) string
|
||||
BestFragmentsInField(*search.DocumentMatch, *document.Document, string, int) []string
|
||||
}
|
105
vendor/github.com/blevesearch/bleve/search/highlight/term_locations.go
generated
vendored
Normal file
105
vendor/github.com/blevesearch/bleve/search/highlight/term_locations.go
generated
vendored
Normal file
|
@ -0,0 +1,105 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package highlight
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
"sort"
|
||||
|
||||
"github.com/blevesearch/bleve/search"
|
||||
)
|
||||
|
||||
type TermLocation struct {
|
||||
Term string
|
||||
ArrayPositions search.ArrayPositions
|
||||
Pos int
|
||||
Start int
|
||||
End int
|
||||
}
|
||||
|
||||
func (tl *TermLocation) Overlaps(other *TermLocation) bool {
|
||||
if reflect.DeepEqual(tl.ArrayPositions, other.ArrayPositions) {
|
||||
if other.Start >= tl.Start && other.Start < tl.End {
|
||||
return true
|
||||
} else if tl.Start >= other.Start && tl.Start < other.End {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
type TermLocations []*TermLocation
|
||||
|
||||
func (t TermLocations) Len() int { return len(t) }
|
||||
func (t TermLocations) Swap(i, j int) { t[i], t[j] = t[j], t[i] }
|
||||
func (t TermLocations) Less(i, j int) bool {
|
||||
|
||||
shortestArrayPositions := len(t[i].ArrayPositions)
|
||||
if len(t[j].ArrayPositions) < shortestArrayPositions {
|
||||
shortestArrayPositions = len(t[j].ArrayPositions)
|
||||
}
|
||||
|
||||
// compare all the common array positions
|
||||
for api := 0; api < shortestArrayPositions; api++ {
|
||||
if t[i].ArrayPositions[api] < t[j].ArrayPositions[api] {
|
||||
return true
|
||||
}
|
||||
if t[i].ArrayPositions[api] > t[j].ArrayPositions[api] {
|
||||
return false
|
||||
}
|
||||
}
|
||||
// all the common array positions are the same
|
||||
if len(t[i].ArrayPositions) < len(t[j].ArrayPositions) {
|
||||
return true // j array positions, longer so greater
|
||||
} else if len(t[i].ArrayPositions) > len(t[j].ArrayPositions) {
|
||||
return false // j array positions, shorter so less
|
||||
}
|
||||
|
||||
// array positions the same, compare starts
|
||||
return t[i].Start < t[j].Start
|
||||
}
|
||||
|
||||
func (t TermLocations) MergeOverlapping() {
|
||||
var lastTl *TermLocation
|
||||
for i, tl := range t {
|
||||
if lastTl == nil && tl != nil {
|
||||
lastTl = tl
|
||||
} else if lastTl != nil && tl != nil {
|
||||
if lastTl.Overlaps(tl) {
|
||||
// ok merge this with previous
|
||||
lastTl.End = tl.End
|
||||
t[i] = nil
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func OrderTermLocations(tlm search.TermLocationMap) TermLocations {
|
||||
rv := make(TermLocations, 0)
|
||||
for term, locations := range tlm {
|
||||
for _, location := range locations {
|
||||
tl := TermLocation{
|
||||
Term: term,
|
||||
ArrayPositions: location.ArrayPositions,
|
||||
Pos: int(location.Pos),
|
||||
Start: int(location.Start),
|
||||
End: int(location.End),
|
||||
}
|
||||
rv = append(rv, &tl)
|
||||
}
|
||||
}
|
||||
sort.Sort(rv)
|
||||
return rv
|
||||
}
|
|
@ -0,0 +1,114 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package search
|
||||
|
||||
import (
|
||||
"math"
|
||||
)
|
||||
|
||||
func LevenshteinDistance(a, b string) int {
|
||||
la := len(a)
|
||||
lb := len(b)
|
||||
d := make([]int, la+1)
|
||||
var lastdiag, olddiag, temp int
|
||||
|
||||
for i := 1; i <= la; i++ {
|
||||
d[i] = i
|
||||
}
|
||||
for i := 1; i <= lb; i++ {
|
||||
d[0] = i
|
||||
lastdiag = i - 1
|
||||
for j := 1; j <= la; j++ {
|
||||
olddiag = d[j]
|
||||
min := d[j] + 1
|
||||
if (d[j-1] + 1) < min {
|
||||
min = d[j-1] + 1
|
||||
}
|
||||
if a[j-1] == b[i-1] {
|
||||
temp = 0
|
||||
} else {
|
||||
temp = 1
|
||||
}
|
||||
if (lastdiag + temp) < min {
|
||||
min = lastdiag + temp
|
||||
}
|
||||
d[j] = min
|
||||
lastdiag = olddiag
|
||||
}
|
||||
}
|
||||
return d[la]
|
||||
}
|
||||
|
||||
// LevenshteinDistanceMax same as LevenshteinDistance but
|
||||
// attempts to bail early once we know the distance
|
||||
// will be greater than max
|
||||
// in which case the first return val will be the max
|
||||
// and the second will be true, indicating max was exceeded
|
||||
func LevenshteinDistanceMax(a, b string, max int) (int, bool) {
|
||||
v, wasMax, _ := LevenshteinDistanceMaxReuseSlice(a, b, max, nil)
|
||||
return v, wasMax
|
||||
}
|
||||
|
||||
func LevenshteinDistanceMaxReuseSlice(a, b string, max int, d []int) (int, bool, []int) {
|
||||
la := len(a)
|
||||
lb := len(b)
|
||||
|
||||
ld := int(math.Abs(float64(la - lb)))
|
||||
if ld > max {
|
||||
return max, true, d
|
||||
}
|
||||
|
||||
if cap(d) < la+1 {
|
||||
d = make([]int, la+1)
|
||||
}
|
||||
d = d[:la+1]
|
||||
|
||||
var lastdiag, olddiag, temp int
|
||||
|
||||
for i := 1; i <= la; i++ {
|
||||
d[i] = i
|
||||
}
|
||||
for i := 1; i <= lb; i++ {
|
||||
d[0] = i
|
||||
lastdiag = i - 1
|
||||
rowmin := max + 1
|
||||
for j := 1; j <= la; j++ {
|
||||
olddiag = d[j]
|
||||
min := d[j] + 1
|
||||
if (d[j-1] + 1) < min {
|
||||
min = d[j-1] + 1
|
||||
}
|
||||
if a[j-1] == b[i-1] {
|
||||
temp = 0
|
||||
} else {
|
||||
temp = 1
|
||||
}
|
||||
if (lastdiag + temp) < min {
|
||||
min = lastdiag + temp
|
||||
}
|
||||
if min < rowmin {
|
||||
rowmin = min
|
||||
}
|
||||
d[j] = min
|
||||
|
||||
lastdiag = olddiag
|
||||
}
|
||||
// after each row if rowmin isn't less than max stop
|
||||
if rowmin > max {
|
||||
return max, true, d
|
||||
}
|
||||
}
|
||||
return d[la], false, d
|
||||
}
|
|
@ -0,0 +1,91 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package search
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
)
|
||||
|
||||
var reflectStaticSizeDocumentMatchPool int
|
||||
|
||||
func init() {
|
||||
var dmp DocumentMatchPool
|
||||
reflectStaticSizeDocumentMatchPool = int(reflect.TypeOf(dmp).Size())
|
||||
}
|
||||
|
||||
// DocumentMatchPoolTooSmall is a callback function that can be executed
|
||||
// when the DocumentMatchPool does not have sufficient capacity
|
||||
// By default we just perform just-in-time allocation, but you could log
|
||||
// a message, or panic, etc.
|
||||
type DocumentMatchPoolTooSmall func(p *DocumentMatchPool) *DocumentMatch
|
||||
|
||||
// DocumentMatchPool manages use/re-use of DocumentMatch instances
|
||||
// it pre-allocates space from a single large block with the expected
|
||||
// number of instances. It is not thread-safe as currently all
|
||||
// aspects of search take place in a single goroutine.
|
||||
type DocumentMatchPool struct {
|
||||
avail DocumentMatchCollection
|
||||
TooSmall DocumentMatchPoolTooSmall
|
||||
}
|
||||
|
||||
func defaultDocumentMatchPoolTooSmall(p *DocumentMatchPool) *DocumentMatch {
|
||||
return &DocumentMatch{}
|
||||
}
|
||||
|
||||
// NewDocumentMatchPool will build a DocumentMatchPool with memory
|
||||
// pre-allocated to accommodate the requested number of DocumentMatch
|
||||
// instances
|
||||
func NewDocumentMatchPool(size, sortsize int) *DocumentMatchPool {
|
||||
avail := make(DocumentMatchCollection, size)
|
||||
// pre-allocate the expected number of instances
|
||||
startBlock := make([]DocumentMatch, size)
|
||||
startSorts := make([]string, size*sortsize)
|
||||
// make these initial instances available
|
||||
i, j := 0, 0
|
||||
for i < size {
|
||||
avail[i] = &startBlock[i]
|
||||
avail[i].Sort = startSorts[j:j]
|
||||
i += 1
|
||||
j += sortsize
|
||||
}
|
||||
return &DocumentMatchPool{
|
||||
avail: avail,
|
||||
TooSmall: defaultDocumentMatchPoolTooSmall,
|
||||
}
|
||||
}
|
||||
|
||||
// Get returns an available DocumentMatch from the pool
|
||||
// if the pool was not allocated with sufficient size, an allocation will
|
||||
// occur to satisfy this request. As a side-effect this will grow the size
|
||||
// of the pool.
|
||||
func (p *DocumentMatchPool) Get() *DocumentMatch {
|
||||
var rv *DocumentMatch
|
||||
if len(p.avail) > 0 {
|
||||
rv, p.avail = p.avail[len(p.avail)-1], p.avail[:len(p.avail)-1]
|
||||
} else {
|
||||
rv = p.TooSmall(p)
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
// Put returns a DocumentMatch to the pool
|
||||
func (p *DocumentMatchPool) Put(d *DocumentMatch) {
|
||||
if d == nil {
|
||||
return
|
||||
}
|
||||
// reset DocumentMatch before returning it to available pool
|
||||
d.Reset()
|
||||
p.avail = append(p.avail, d)
|
||||
}
|
|
@ -0,0 +1,378 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package search
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"reflect"
|
||||
"sort"
|
||||
|
||||
"github.com/blevesearch/bleve/index"
|
||||
"github.com/blevesearch/bleve/size"
|
||||
)
|
||||
|
||||
var reflectStaticSizeDocumentMatch int
|
||||
var reflectStaticSizeSearchContext int
|
||||
var reflectStaticSizeLocation int
|
||||
|
||||
func init() {
|
||||
var dm DocumentMatch
|
||||
reflectStaticSizeDocumentMatch = int(reflect.TypeOf(dm).Size())
|
||||
var sc SearchContext
|
||||
reflectStaticSizeSearchContext = int(reflect.TypeOf(sc).Size())
|
||||
var l Location
|
||||
reflectStaticSizeLocation = int(reflect.TypeOf(l).Size())
|
||||
}
|
||||
|
||||
type ArrayPositions []uint64
|
||||
|
||||
func (ap ArrayPositions) Equals(other ArrayPositions) bool {
|
||||
if len(ap) != len(other) {
|
||||
return false
|
||||
}
|
||||
for i := range ap {
|
||||
if ap[i] != other[i] {
|
||||
return false
|
||||
}
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
func (ap ArrayPositions) Compare(other ArrayPositions) int {
|
||||
for i, p := range ap {
|
||||
if i >= len(other) {
|
||||
return 1
|
||||
}
|
||||
if p < other[i] {
|
||||
return -1
|
||||
}
|
||||
if p > other[i] {
|
||||
return 1
|
||||
}
|
||||
}
|
||||
if len(ap) < len(other) {
|
||||
return -1
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
type Location struct {
|
||||
// Pos is the position of the term within the field, starting at 1
|
||||
Pos uint64 `json:"pos"`
|
||||
|
||||
// Start and End are the byte offsets of the term in the field
|
||||
Start uint64 `json:"start"`
|
||||
End uint64 `json:"end"`
|
||||
|
||||
// ArrayPositions contains the positions of the term within any elements.
|
||||
ArrayPositions ArrayPositions `json:"array_positions"`
|
||||
}
|
||||
|
||||
func (l *Location) Size() int {
|
||||
return reflectStaticSizeLocation + size.SizeOfPtr +
|
||||
len(l.ArrayPositions)*size.SizeOfUint64
|
||||
}
|
||||
|
||||
type Locations []*Location
|
||||
|
||||
func (p Locations) Len() int { return len(p) }
|
||||
func (p Locations) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
|
||||
|
||||
func (p Locations) Less(i, j int) bool {
|
||||
c := p[i].ArrayPositions.Compare(p[j].ArrayPositions)
|
||||
if c < 0 {
|
||||
return true
|
||||
}
|
||||
if c > 0 {
|
||||
return false
|
||||
}
|
||||
return p[i].Pos < p[j].Pos
|
||||
}
|
||||
|
||||
func (p Locations) Dedupe() Locations { // destructive!
|
||||
if len(p) <= 1 {
|
||||
return p
|
||||
}
|
||||
|
||||
sort.Sort(p)
|
||||
|
||||
slow := 0
|
||||
|
||||
for _, pfast := range p {
|
||||
pslow := p[slow]
|
||||
if pslow.Pos == pfast.Pos &&
|
||||
pslow.Start == pfast.Start &&
|
||||
pslow.End == pfast.End &&
|
||||
pslow.ArrayPositions.Equals(pfast.ArrayPositions) {
|
||||
continue // duplicate, so only move fast ahead
|
||||
}
|
||||
|
||||
slow++
|
||||
|
||||
p[slow] = pfast
|
||||
}
|
||||
|
||||
return p[:slow+1]
|
||||
}
|
||||
|
||||
type TermLocationMap map[string]Locations
|
||||
|
||||
func (t TermLocationMap) AddLocation(term string, location *Location) {
|
||||
t[term] = append(t[term], location)
|
||||
}
|
||||
|
||||
type FieldTermLocationMap map[string]TermLocationMap
|
||||
|
||||
type FieldTermLocation struct {
|
||||
Field string
|
||||
Term string
|
||||
Location Location
|
||||
}
|
||||
|
||||
type FieldFragmentMap map[string][]string
|
||||
|
||||
type DocumentMatch struct {
|
||||
Index string `json:"index,omitempty"`
|
||||
ID string `json:"id"`
|
||||
IndexInternalID index.IndexInternalID `json:"-"`
|
||||
Score float64 `json:"score"`
|
||||
Expl *Explanation `json:"explanation,omitempty"`
|
||||
Locations FieldTermLocationMap `json:"locations,omitempty"`
|
||||
Fragments FieldFragmentMap `json:"fragments,omitempty"`
|
||||
Sort []string `json:"sort,omitempty"`
|
||||
|
||||
// Fields contains the values for document fields listed in
|
||||
// SearchRequest.Fields. Text fields are returned as strings, numeric
|
||||
// fields as float64s and date fields as time.RFC3339 formatted strings.
|
||||
Fields map[string]interface{} `json:"fields,omitempty"`
|
||||
|
||||
// used to maintain natural index order
|
||||
HitNumber uint64 `json:"-"`
|
||||
|
||||
// used to temporarily hold field term location information during
|
||||
// search processing in an efficient, recycle-friendly manner, to
|
||||
// be later incorporated into the Locations map when search
|
||||
// results are completed
|
||||
FieldTermLocations []FieldTermLocation `json:"-"`
|
||||
}
|
||||
|
||||
func (dm *DocumentMatch) AddFieldValue(name string, value interface{}) {
|
||||
if dm.Fields == nil {
|
||||
dm.Fields = make(map[string]interface{})
|
||||
}
|
||||
existingVal, ok := dm.Fields[name]
|
||||
if !ok {
|
||||
dm.Fields[name] = value
|
||||
return
|
||||
}
|
||||
|
||||
valSlice, ok := existingVal.([]interface{})
|
||||
if ok {
|
||||
// already a slice, append to it
|
||||
valSlice = append(valSlice, value)
|
||||
} else {
|
||||
// create a slice
|
||||
valSlice = []interface{}{existingVal, value}
|
||||
}
|
||||
dm.Fields[name] = valSlice
|
||||
}
|
||||
|
||||
// Reset allows an already allocated DocumentMatch to be reused
|
||||
func (dm *DocumentMatch) Reset() *DocumentMatch {
|
||||
// remember the []byte used for the IndexInternalID
|
||||
indexInternalID := dm.IndexInternalID
|
||||
// remember the []interface{} used for sort
|
||||
sort := dm.Sort
|
||||
// remember the FieldTermLocations backing array
|
||||
ftls := dm.FieldTermLocations
|
||||
for i := range ftls { // recycle the ArrayPositions of each location
|
||||
ftls[i].Location.ArrayPositions = ftls[i].Location.ArrayPositions[:0]
|
||||
}
|
||||
// idiom to copy over from empty DocumentMatch (0 allocations)
|
||||
*dm = DocumentMatch{}
|
||||
// reuse the []byte already allocated (and reset len to 0)
|
||||
dm.IndexInternalID = indexInternalID[:0]
|
||||
// reuse the []interface{} already allocated (and reset len to 0)
|
||||
dm.Sort = sort[:0]
|
||||
// reuse the FieldTermLocations already allocated (and reset len to 0)
|
||||
dm.FieldTermLocations = ftls[:0]
|
||||
return dm
|
||||
}
|
||||
|
||||
func (dm *DocumentMatch) Size() int {
|
||||
sizeInBytes := reflectStaticSizeDocumentMatch + size.SizeOfPtr +
|
||||
len(dm.Index) +
|
||||
len(dm.ID) +
|
||||
len(dm.IndexInternalID)
|
||||
|
||||
if dm.Expl != nil {
|
||||
sizeInBytes += dm.Expl.Size()
|
||||
}
|
||||
|
||||
for k, v := range dm.Locations {
|
||||
sizeInBytes += size.SizeOfString + len(k)
|
||||
for k1, v1 := range v {
|
||||
sizeInBytes += size.SizeOfString + len(k1) +
|
||||
size.SizeOfSlice
|
||||
for _, entry := range v1 {
|
||||
sizeInBytes += entry.Size()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for k, v := range dm.Fragments {
|
||||
sizeInBytes += size.SizeOfString + len(k) +
|
||||
size.SizeOfSlice
|
||||
|
||||
for _, entry := range v {
|
||||
sizeInBytes += size.SizeOfString + len(entry)
|
||||
}
|
||||
}
|
||||
|
||||
for _, entry := range dm.Sort {
|
||||
sizeInBytes += size.SizeOfString + len(entry)
|
||||
}
|
||||
|
||||
for k, _ := range dm.Fields {
|
||||
sizeInBytes += size.SizeOfString + len(k) +
|
||||
size.SizeOfPtr
|
||||
}
|
||||
|
||||
return sizeInBytes
|
||||
}
|
||||
|
||||
// Complete performs final preparation & transformation of the
|
||||
// DocumentMatch at the end of search processing, also allowing the
|
||||
// caller to provide an optional preallocated locations slice
|
||||
func (dm *DocumentMatch) Complete(prealloc []Location) []Location {
|
||||
// transform the FieldTermLocations slice into the Locations map
|
||||
nlocs := len(dm.FieldTermLocations)
|
||||
if nlocs > 0 {
|
||||
if cap(prealloc) < nlocs {
|
||||
prealloc = make([]Location, nlocs)
|
||||
}
|
||||
prealloc = prealloc[:nlocs]
|
||||
|
||||
var lastField string
|
||||
var tlm TermLocationMap
|
||||
var needsDedupe bool
|
||||
|
||||
for i, ftl := range dm.FieldTermLocations {
|
||||
if lastField != ftl.Field {
|
||||
lastField = ftl.Field
|
||||
|
||||
if dm.Locations == nil {
|
||||
dm.Locations = make(FieldTermLocationMap)
|
||||
}
|
||||
|
||||
tlm = dm.Locations[ftl.Field]
|
||||
if tlm == nil {
|
||||
tlm = make(TermLocationMap)
|
||||
dm.Locations[ftl.Field] = tlm
|
||||
}
|
||||
}
|
||||
|
||||
loc := &prealloc[i]
|
||||
*loc = ftl.Location
|
||||
|
||||
if len(loc.ArrayPositions) > 0 { // copy
|
||||
loc.ArrayPositions = append(ArrayPositions(nil), loc.ArrayPositions...)
|
||||
}
|
||||
|
||||
locs := tlm[ftl.Term]
|
||||
|
||||
// if the loc is before or at the last location, then there
|
||||
// might be duplicates that need to be deduplicated
|
||||
if !needsDedupe && len(locs) > 0 {
|
||||
last := locs[len(locs)-1]
|
||||
cmp := loc.ArrayPositions.Compare(last.ArrayPositions)
|
||||
if cmp < 0 || (cmp == 0 && loc.Pos <= last.Pos) {
|
||||
needsDedupe = true
|
||||
}
|
||||
}
|
||||
|
||||
tlm[ftl.Term] = append(locs, loc)
|
||||
|
||||
dm.FieldTermLocations[i] = FieldTermLocation{ // recycle
|
||||
Location: Location{
|
||||
ArrayPositions: ftl.Location.ArrayPositions[:0],
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
if needsDedupe {
|
||||
for _, tlm := range dm.Locations {
|
||||
for term, locs := range tlm {
|
||||
tlm[term] = locs.Dedupe()
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
dm.FieldTermLocations = dm.FieldTermLocations[:0] // recycle
|
||||
|
||||
return prealloc
|
||||
}
|
||||
|
||||
func (dm *DocumentMatch) String() string {
|
||||
return fmt.Sprintf("[%s-%f]", string(dm.IndexInternalID), dm.Score)
|
||||
}
|
||||
|
||||
type DocumentMatchCollection []*DocumentMatch
|
||||
|
||||
func (c DocumentMatchCollection) Len() int { return len(c) }
|
||||
func (c DocumentMatchCollection) Swap(i, j int) { c[i], c[j] = c[j], c[i] }
|
||||
func (c DocumentMatchCollection) Less(i, j int) bool { return c[i].Score > c[j].Score }
|
||||
|
||||
type Searcher interface {
|
||||
Next(ctx *SearchContext) (*DocumentMatch, error)
|
||||
Advance(ctx *SearchContext, ID index.IndexInternalID) (*DocumentMatch, error)
|
||||
Close() error
|
||||
Weight() float64
|
||||
SetQueryNorm(float64)
|
||||
Count() uint64
|
||||
Min() int
|
||||
Size() int
|
||||
|
||||
DocumentMatchPoolSize() int
|
||||
}
|
||||
|
||||
type SearcherOptions struct {
|
||||
Explain bool
|
||||
IncludeTermVectors bool
|
||||
Score string
|
||||
}
|
||||
|
||||
// SearchContext represents the context around a single search
|
||||
type SearchContext struct {
|
||||
DocumentMatchPool *DocumentMatchPool
|
||||
Collector Collector
|
||||
IndexReader index.IndexReader
|
||||
}
|
||||
|
||||
func (sc *SearchContext) Size() int {
|
||||
sizeInBytes := reflectStaticSizeSearchContext + size.SizeOfPtr +
|
||||
reflectStaticSizeDocumentMatchPool + size.SizeOfPtr
|
||||
|
||||
if sc.DocumentMatchPool != nil {
|
||||
for _, entry := range sc.DocumentMatchPool.avail {
|
||||
if entry != nil {
|
||||
sizeInBytes += entry.Size()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return sizeInBytes
|
||||
}
|
|
@ -0,0 +1,741 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package search
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"math"
|
||||
"sort"
|
||||
"strings"
|
||||
|
||||
"github.com/blevesearch/bleve/geo"
|
||||
"github.com/blevesearch/bleve/numeric"
|
||||
)
|
||||
|
||||
var HighTerm = strings.Repeat(string([]byte{0xff}), 10)
|
||||
var LowTerm = string([]byte{0x00})
|
||||
|
||||
type SearchSort interface {
|
||||
UpdateVisitor(field string, term []byte)
|
||||
Value(a *DocumentMatch) string
|
||||
Descending() bool
|
||||
|
||||
RequiresDocID() bool
|
||||
RequiresScoring() bool
|
||||
RequiresFields() []string
|
||||
|
||||
Reverse()
|
||||
|
||||
Copy() SearchSort
|
||||
}
|
||||
|
||||
func ParseSearchSortObj(input map[string]interface{}) (SearchSort, error) {
|
||||
descending, ok := input["desc"].(bool)
|
||||
by, ok := input["by"].(string)
|
||||
if !ok {
|
||||
return nil, fmt.Errorf("search sort must specify by")
|
||||
}
|
||||
switch by {
|
||||
case "id":
|
||||
return &SortDocID{
|
||||
Desc: descending,
|
||||
}, nil
|
||||
case "score":
|
||||
return &SortScore{
|
||||
Desc: descending,
|
||||
}, nil
|
||||
case "geo_distance":
|
||||
field, ok := input["field"].(string)
|
||||
if !ok {
|
||||
return nil, fmt.Errorf("search sort mode geo_distance must specify field")
|
||||
}
|
||||
lon, lat, foundLocation := geo.ExtractGeoPoint(input["location"])
|
||||
if !foundLocation {
|
||||
return nil, fmt.Errorf("unable to parse geo_distance location")
|
||||
}
|
||||
rvd := &SortGeoDistance{
|
||||
Field: field,
|
||||
Desc: descending,
|
||||
Lon: lon,
|
||||
Lat: lat,
|
||||
unitMult: 1.0,
|
||||
}
|
||||
if distUnit, ok := input["unit"].(string); ok {
|
||||
var err error
|
||||
rvd.unitMult, err = geo.ParseDistanceUnit(distUnit)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
rvd.Unit = distUnit
|
||||
}
|
||||
return rvd, nil
|
||||
case "field":
|
||||
field, ok := input["field"].(string)
|
||||
if !ok {
|
||||
return nil, fmt.Errorf("search sort mode field must specify field")
|
||||
}
|
||||
rv := &SortField{
|
||||
Field: field,
|
||||
Desc: descending,
|
||||
}
|
||||
typ, ok := input["type"].(string)
|
||||
if ok {
|
||||
switch typ {
|
||||
case "auto":
|
||||
rv.Type = SortFieldAuto
|
||||
case "string":
|
||||
rv.Type = SortFieldAsString
|
||||
case "number":
|
||||
rv.Type = SortFieldAsNumber
|
||||
case "date":
|
||||
rv.Type = SortFieldAsDate
|
||||
default:
|
||||
return nil, fmt.Errorf("unknown sort field type: %s", typ)
|
||||
}
|
||||
}
|
||||
mode, ok := input["mode"].(string)
|
||||
if ok {
|
||||
switch mode {
|
||||
case "default":
|
||||
rv.Mode = SortFieldDefault
|
||||
case "min":
|
||||
rv.Mode = SortFieldMin
|
||||
case "max":
|
||||
rv.Mode = SortFieldMax
|
||||
default:
|
||||
return nil, fmt.Errorf("unknown sort field mode: %s", mode)
|
||||
}
|
||||
}
|
||||
missing, ok := input["missing"].(string)
|
||||
if ok {
|
||||
switch missing {
|
||||
case "first":
|
||||
rv.Missing = SortFieldMissingFirst
|
||||
case "last":
|
||||
rv.Missing = SortFieldMissingLast
|
||||
default:
|
||||
return nil, fmt.Errorf("unknown sort field missing: %s", missing)
|
||||
}
|
||||
}
|
||||
return rv, nil
|
||||
}
|
||||
|
||||
return nil, fmt.Errorf("unknown search sort by: %s", by)
|
||||
}
|
||||
|
||||
func ParseSearchSortString(input string) SearchSort {
|
||||
descending := false
|
||||
if strings.HasPrefix(input, "-") {
|
||||
descending = true
|
||||
input = input[1:]
|
||||
} else if strings.HasPrefix(input, "+") {
|
||||
input = input[1:]
|
||||
}
|
||||
if input == "_id" {
|
||||
return &SortDocID{
|
||||
Desc: descending,
|
||||
}
|
||||
} else if input == "_score" {
|
||||
return &SortScore{
|
||||
Desc: descending,
|
||||
}
|
||||
}
|
||||
return &SortField{
|
||||
Field: input,
|
||||
Desc: descending,
|
||||
}
|
||||
}
|
||||
|
||||
func ParseSearchSortJSON(input json.RawMessage) (SearchSort, error) {
|
||||
// first try to parse it as string
|
||||
var sortString string
|
||||
err := json.Unmarshal(input, &sortString)
|
||||
if err != nil {
|
||||
var sortObj map[string]interface{}
|
||||
err = json.Unmarshal(input, &sortObj)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return ParseSearchSortObj(sortObj)
|
||||
}
|
||||
return ParseSearchSortString(sortString), nil
|
||||
}
|
||||
|
||||
func ParseSortOrderStrings(in []string) SortOrder {
|
||||
rv := make(SortOrder, 0, len(in))
|
||||
for _, i := range in {
|
||||
ss := ParseSearchSortString(i)
|
||||
rv = append(rv, ss)
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
func ParseSortOrderJSON(in []json.RawMessage) (SortOrder, error) {
|
||||
rv := make(SortOrder, 0, len(in))
|
||||
for _, i := range in {
|
||||
ss, err := ParseSearchSortJSON(i)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
rv = append(rv, ss)
|
||||
}
|
||||
return rv, nil
|
||||
}
|
||||
|
||||
type SortOrder []SearchSort
|
||||
|
||||
func (so SortOrder) Value(doc *DocumentMatch) {
|
||||
for _, soi := range so {
|
||||
doc.Sort = append(doc.Sort, soi.Value(doc))
|
||||
}
|
||||
}
|
||||
|
||||
func (so SortOrder) UpdateVisitor(field string, term []byte) {
|
||||
for _, soi := range so {
|
||||
soi.UpdateVisitor(field, term)
|
||||
}
|
||||
}
|
||||
|
||||
func (so SortOrder) Copy() SortOrder {
|
||||
rv := make(SortOrder, len(so))
|
||||
for i, soi := range so {
|
||||
rv[i] = soi.Copy()
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
// Compare will compare two document matches using the specified sort order
|
||||
// if both are numbers, we avoid converting back to term
|
||||
func (so SortOrder) Compare(cachedScoring, cachedDesc []bool, i, j *DocumentMatch) int {
|
||||
// compare the documents on all search sorts until a differences is found
|
||||
for x := range so {
|
||||
c := 0
|
||||
if cachedScoring[x] {
|
||||
if i.Score < j.Score {
|
||||
c = -1
|
||||
} else if i.Score > j.Score {
|
||||
c = 1
|
||||
}
|
||||
} else {
|
||||
iVal := i.Sort[x]
|
||||
jVal := j.Sort[x]
|
||||
c = strings.Compare(iVal, jVal)
|
||||
}
|
||||
|
||||
if c == 0 {
|
||||
continue
|
||||
}
|
||||
if cachedDesc[x] {
|
||||
c = -c
|
||||
}
|
||||
return c
|
||||
}
|
||||
// if they are the same at this point, impose order based on index natural sort order
|
||||
if i.HitNumber == j.HitNumber {
|
||||
return 0
|
||||
} else if i.HitNumber > j.HitNumber {
|
||||
return 1
|
||||
}
|
||||
return -1
|
||||
}
|
||||
|
||||
func (so SortOrder) RequiresScore() bool {
|
||||
for _, soi := range so {
|
||||
if soi.RequiresScoring() {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func (so SortOrder) RequiresDocID() bool {
|
||||
for _, soi := range so {
|
||||
if soi.RequiresDocID() {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func (so SortOrder) RequiredFields() []string {
|
||||
var rv []string
|
||||
for _, soi := range so {
|
||||
rv = append(rv, soi.RequiresFields()...)
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
func (so SortOrder) CacheIsScore() []bool {
|
||||
rv := make([]bool, 0, len(so))
|
||||
for _, soi := range so {
|
||||
rv = append(rv, soi.RequiresScoring())
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
func (so SortOrder) CacheDescending() []bool {
|
||||
rv := make([]bool, 0, len(so))
|
||||
for _, soi := range so {
|
||||
rv = append(rv, soi.Descending())
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
func (so SortOrder) Reverse() {
|
||||
for _, soi := range so {
|
||||
soi.Reverse()
|
||||
}
|
||||
}
|
||||
|
||||
// SortFieldType lets you control some internal sort behavior
|
||||
// normally leaving this to the zero-value of SortFieldAuto is fine
|
||||
type SortFieldType int
|
||||
|
||||
const (
|
||||
// SortFieldAuto applies heuristics attempt to automatically sort correctly
|
||||
SortFieldAuto SortFieldType = iota
|
||||
// SortFieldAsString forces sort as string (no prefix coded terms removed)
|
||||
SortFieldAsString
|
||||
// SortFieldAsNumber forces sort as string (prefix coded terms with shift > 0 removed)
|
||||
SortFieldAsNumber
|
||||
// SortFieldAsDate forces sort as string (prefix coded terms with shift > 0 removed)
|
||||
SortFieldAsDate
|
||||
)
|
||||
|
||||
// SortFieldMode describes the behavior if the field has multiple values
|
||||
type SortFieldMode int
|
||||
|
||||
const (
|
||||
// SortFieldDefault uses the first (or only) value, this is the default zero-value
|
||||
SortFieldDefault SortFieldMode = iota // FIXME name is confusing
|
||||
// SortFieldMin uses the minimum value
|
||||
SortFieldMin
|
||||
// SortFieldMax uses the maximum value
|
||||
SortFieldMax
|
||||
)
|
||||
|
||||
// SortFieldMissing controls where documents missing a field value should be sorted
|
||||
type SortFieldMissing int
|
||||
|
||||
const (
|
||||
// SortFieldMissingLast sorts documents missing a field at the end
|
||||
SortFieldMissingLast SortFieldMissing = iota
|
||||
|
||||
// SortFieldMissingFirst sorts documents missing a field at the beginning
|
||||
SortFieldMissingFirst
|
||||
)
|
||||
|
||||
// SortField will sort results by the value of a stored field
|
||||
// Field is the name of the field
|
||||
// Descending reverse the sort order (default false)
|
||||
// Type allows forcing of string/number/date behavior (default auto)
|
||||
// Mode controls behavior for multi-values fields (default first)
|
||||
// Missing controls behavior of missing values (default last)
|
||||
type SortField struct {
|
||||
Field string
|
||||
Desc bool
|
||||
Type SortFieldType
|
||||
Mode SortFieldMode
|
||||
Missing SortFieldMissing
|
||||
values [][]byte
|
||||
tmp [][]byte
|
||||
}
|
||||
|
||||
// UpdateVisitor notifies this sort field that in this document
|
||||
// this field has the specified term
|
||||
func (s *SortField) UpdateVisitor(field string, term []byte) {
|
||||
if field == s.Field {
|
||||
s.values = append(s.values, term)
|
||||
}
|
||||
}
|
||||
|
||||
// Value returns the sort value of the DocumentMatch
|
||||
// it also resets the state of this SortField for
|
||||
// processing the next document
|
||||
func (s *SortField) Value(i *DocumentMatch) string {
|
||||
iTerms := s.filterTermsByType(s.values)
|
||||
iTerm := s.filterTermsByMode(iTerms)
|
||||
s.values = s.values[:0]
|
||||
return iTerm
|
||||
}
|
||||
|
||||
// Descending determines the order of the sort
|
||||
func (s *SortField) Descending() bool {
|
||||
return s.Desc
|
||||
}
|
||||
|
||||
func (s *SortField) filterTermsByMode(terms [][]byte) string {
|
||||
if len(terms) == 1 || (len(terms) > 1 && s.Mode == SortFieldDefault) {
|
||||
return string(terms[0])
|
||||
} else if len(terms) > 1 {
|
||||
switch s.Mode {
|
||||
case SortFieldMin:
|
||||
sort.Sort(BytesSlice(terms))
|
||||
return string(terms[0])
|
||||
case SortFieldMax:
|
||||
sort.Sort(BytesSlice(terms))
|
||||
return string(terms[len(terms)-1])
|
||||
}
|
||||
}
|
||||
|
||||
// handle missing terms
|
||||
if s.Missing == SortFieldMissingLast {
|
||||
if s.Desc {
|
||||
return LowTerm
|
||||
}
|
||||
return HighTerm
|
||||
}
|
||||
if s.Desc {
|
||||
return HighTerm
|
||||
}
|
||||
return LowTerm
|
||||
}
|
||||
|
||||
// filterTermsByType attempts to make one pass on the terms
|
||||
// if we are in auto-mode AND all the terms look like prefix-coded numbers
|
||||
// return only the terms which had shift of 0
|
||||
// if we are in explicit number or date mode, return only valid
|
||||
// prefix coded numbers with shift of 0
|
||||
func (s *SortField) filterTermsByType(terms [][]byte) [][]byte {
|
||||
stype := s.Type
|
||||
if stype == SortFieldAuto {
|
||||
allTermsPrefixCoded := true
|
||||
termsWithShiftZero := s.tmp[:0]
|
||||
for _, term := range terms {
|
||||
valid, shift := numeric.ValidPrefixCodedTermBytes(term)
|
||||
if valid && shift == 0 {
|
||||
termsWithShiftZero = append(termsWithShiftZero, term)
|
||||
} else if !valid {
|
||||
allTermsPrefixCoded = false
|
||||
}
|
||||
}
|
||||
if allTermsPrefixCoded {
|
||||
terms = termsWithShiftZero
|
||||
s.tmp = termsWithShiftZero[:0]
|
||||
}
|
||||
} else if stype == SortFieldAsNumber || stype == SortFieldAsDate {
|
||||
termsWithShiftZero := s.tmp[:0]
|
||||
for _, term := range terms {
|
||||
valid, shift := numeric.ValidPrefixCodedTermBytes(term)
|
||||
if valid && shift == 0 {
|
||||
termsWithShiftZero = append(termsWithShiftZero, term)
|
||||
}
|
||||
}
|
||||
terms = termsWithShiftZero
|
||||
s.tmp = termsWithShiftZero[:0]
|
||||
}
|
||||
return terms
|
||||
}
|
||||
|
||||
// RequiresDocID says this SearchSort does not require the DocID be loaded
|
||||
func (s *SortField) RequiresDocID() bool { return false }
|
||||
|
||||
// RequiresScoring says this SearchStore does not require scoring
|
||||
func (s *SortField) RequiresScoring() bool { return false }
|
||||
|
||||
// RequiresFields says this SearchStore requires the specified stored field
|
||||
func (s *SortField) RequiresFields() []string { return []string{s.Field} }
|
||||
|
||||
func (s *SortField) MarshalJSON() ([]byte, error) {
|
||||
// see if simple format can be used
|
||||
if s.Missing == SortFieldMissingLast &&
|
||||
s.Mode == SortFieldDefault &&
|
||||
s.Type == SortFieldAuto {
|
||||
if s.Desc {
|
||||
return json.Marshal("-" + s.Field)
|
||||
}
|
||||
return json.Marshal(s.Field)
|
||||
}
|
||||
sfm := map[string]interface{}{
|
||||
"by": "field",
|
||||
"field": s.Field,
|
||||
}
|
||||
if s.Desc {
|
||||
sfm["desc"] = true
|
||||
}
|
||||
if s.Missing > SortFieldMissingLast {
|
||||
switch s.Missing {
|
||||
case SortFieldMissingFirst:
|
||||
sfm["missing"] = "first"
|
||||
}
|
||||
}
|
||||
if s.Mode > SortFieldDefault {
|
||||
switch s.Mode {
|
||||
case SortFieldMin:
|
||||
sfm["mode"] = "min"
|
||||
case SortFieldMax:
|
||||
sfm["mode"] = "max"
|
||||
}
|
||||
}
|
||||
if s.Type > SortFieldAuto {
|
||||
switch s.Type {
|
||||
case SortFieldAsString:
|
||||
sfm["type"] = "string"
|
||||
case SortFieldAsNumber:
|
||||
sfm["type"] = "number"
|
||||
case SortFieldAsDate:
|
||||
sfm["type"] = "date"
|
||||
}
|
||||
}
|
||||
|
||||
return json.Marshal(sfm)
|
||||
}
|
||||
|
||||
func (s *SortField) Copy() SearchSort {
|
||||
rv := *s
|
||||
return &rv
|
||||
}
|
||||
|
||||
func (s *SortField) Reverse() {
|
||||
s.Desc = !s.Desc
|
||||
if s.Missing == SortFieldMissingFirst {
|
||||
s.Missing = SortFieldMissingLast
|
||||
} else {
|
||||
s.Missing = SortFieldMissingFirst
|
||||
}
|
||||
}
|
||||
|
||||
// SortDocID will sort results by the document identifier
|
||||
type SortDocID struct {
|
||||
Desc bool
|
||||
}
|
||||
|
||||
// UpdateVisitor is a no-op for SortDocID as it's value
|
||||
// is not dependent on any field terms
|
||||
func (s *SortDocID) UpdateVisitor(field string, term []byte) {
|
||||
}
|
||||
|
||||
// Value returns the sort value of the DocumentMatch
|
||||
func (s *SortDocID) Value(i *DocumentMatch) string {
|
||||
return i.ID
|
||||
}
|
||||
|
||||
// Descending determines the order of the sort
|
||||
func (s *SortDocID) Descending() bool {
|
||||
return s.Desc
|
||||
}
|
||||
|
||||
// RequiresDocID says this SearchSort does require the DocID be loaded
|
||||
func (s *SortDocID) RequiresDocID() bool { return true }
|
||||
|
||||
// RequiresScoring says this SearchStore does not require scoring
|
||||
func (s *SortDocID) RequiresScoring() bool { return false }
|
||||
|
||||
// RequiresFields says this SearchStore does not require any stored fields
|
||||
func (s *SortDocID) RequiresFields() []string { return nil }
|
||||
|
||||
func (s *SortDocID) MarshalJSON() ([]byte, error) {
|
||||
if s.Desc {
|
||||
return json.Marshal("-_id")
|
||||
}
|
||||
return json.Marshal("_id")
|
||||
}
|
||||
|
||||
func (s *SortDocID) Copy() SearchSort {
|
||||
rv := *s
|
||||
return &rv
|
||||
}
|
||||
|
||||
func (s *SortDocID) Reverse() {
|
||||
s.Desc = !s.Desc
|
||||
}
|
||||
|
||||
// SortScore will sort results by the document match score
|
||||
type SortScore struct {
|
||||
Desc bool
|
||||
}
|
||||
|
||||
// UpdateVisitor is a no-op for SortScore as it's value
|
||||
// is not dependent on any field terms
|
||||
func (s *SortScore) UpdateVisitor(field string, term []byte) {
|
||||
}
|
||||
|
||||
// Value returns the sort value of the DocumentMatch
|
||||
func (s *SortScore) Value(i *DocumentMatch) string {
|
||||
return "_score"
|
||||
}
|
||||
|
||||
// Descending determines the order of the sort
|
||||
func (s *SortScore) Descending() bool {
|
||||
return s.Desc
|
||||
}
|
||||
|
||||
// RequiresDocID says this SearchSort does not require the DocID be loaded
|
||||
func (s *SortScore) RequiresDocID() bool { return false }
|
||||
|
||||
// RequiresScoring says this SearchStore does require scoring
|
||||
func (s *SortScore) RequiresScoring() bool { return true }
|
||||
|
||||
// RequiresFields says this SearchStore does not require any store fields
|
||||
func (s *SortScore) RequiresFields() []string { return nil }
|
||||
|
||||
func (s *SortScore) MarshalJSON() ([]byte, error) {
|
||||
if s.Desc {
|
||||
return json.Marshal("-_score")
|
||||
}
|
||||
return json.Marshal("_score")
|
||||
}
|
||||
|
||||
func (s *SortScore) Copy() SearchSort {
|
||||
rv := *s
|
||||
return &rv
|
||||
}
|
||||
|
||||
func (s *SortScore) Reverse() {
|
||||
s.Desc = !s.Desc
|
||||
}
|
||||
|
||||
var maxDistance = string(numeric.MustNewPrefixCodedInt64(math.MaxInt64, 0))
|
||||
|
||||
// NewSortGeoDistance creates SearchSort instance for sorting documents by
|
||||
// their distance from the specified point.
|
||||
func NewSortGeoDistance(field, unit string, lon, lat float64, desc bool) (
|
||||
*SortGeoDistance, error) {
|
||||
rv := &SortGeoDistance{
|
||||
Field: field,
|
||||
Desc: desc,
|
||||
Unit: unit,
|
||||
Lon: lon,
|
||||
Lat: lat,
|
||||
}
|
||||
var err error
|
||||
rv.unitMult, err = geo.ParseDistanceUnit(unit)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return rv, nil
|
||||
}
|
||||
|
||||
// SortGeoDistance will sort results by the distance of an
|
||||
// indexed geo point, from the provided location.
|
||||
// Field is the name of the field
|
||||
// Descending reverse the sort order (default false)
|
||||
type SortGeoDistance struct {
|
||||
Field string
|
||||
Desc bool
|
||||
Unit string
|
||||
values []string
|
||||
Lon float64
|
||||
Lat float64
|
||||
unitMult float64
|
||||
}
|
||||
|
||||
// UpdateVisitor notifies this sort field that in this document
|
||||
// this field has the specified term
|
||||
func (s *SortGeoDistance) UpdateVisitor(field string, term []byte) {
|
||||
if field == s.Field {
|
||||
s.values = append(s.values, string(term))
|
||||
}
|
||||
}
|
||||
|
||||
// Value returns the sort value of the DocumentMatch
|
||||
// it also resets the state of this SortField for
|
||||
// processing the next document
|
||||
func (s *SortGeoDistance) Value(i *DocumentMatch) string {
|
||||
iTerms := s.filterTermsByType(s.values)
|
||||
iTerm := s.filterTermsByMode(iTerms)
|
||||
s.values = s.values[:0]
|
||||
|
||||
if iTerm == "" {
|
||||
return maxDistance
|
||||
}
|
||||
|
||||
i64, err := numeric.PrefixCoded(iTerm).Int64()
|
||||
if err != nil {
|
||||
return maxDistance
|
||||
}
|
||||
docLon := geo.MortonUnhashLon(uint64(i64))
|
||||
docLat := geo.MortonUnhashLat(uint64(i64))
|
||||
|
||||
dist := geo.Haversin(s.Lon, s.Lat, docLon, docLat)
|
||||
// dist is returned in km, so convert to m
|
||||
dist *= 1000
|
||||
if s.unitMult != 0 {
|
||||
dist /= s.unitMult
|
||||
}
|
||||
distInt64 := numeric.Float64ToInt64(dist)
|
||||
return string(numeric.MustNewPrefixCodedInt64(distInt64, 0))
|
||||
}
|
||||
|
||||
// Descending determines the order of the sort
|
||||
func (s *SortGeoDistance) Descending() bool {
|
||||
return s.Desc
|
||||
}
|
||||
|
||||
func (s *SortGeoDistance) filterTermsByMode(terms []string) string {
|
||||
if len(terms) >= 1 {
|
||||
return terms[0]
|
||||
}
|
||||
|
||||
return ""
|
||||
}
|
||||
|
||||
// filterTermsByType attempts to make one pass on the terms
|
||||
// return only valid prefix coded numbers with shift of 0
|
||||
func (s *SortGeoDistance) filterTermsByType(terms []string) []string {
|
||||
var termsWithShiftZero []string
|
||||
for _, term := range terms {
|
||||
valid, shift := numeric.ValidPrefixCodedTerm(term)
|
||||
if valid && shift == 0 {
|
||||
termsWithShiftZero = append(termsWithShiftZero, term)
|
||||
}
|
||||
}
|
||||
return termsWithShiftZero
|
||||
}
|
||||
|
||||
// RequiresDocID says this SearchSort does not require the DocID be loaded
|
||||
func (s *SortGeoDistance) RequiresDocID() bool { return false }
|
||||
|
||||
// RequiresScoring says this SearchStore does not require scoring
|
||||
func (s *SortGeoDistance) RequiresScoring() bool { return false }
|
||||
|
||||
// RequiresFields says this SearchStore requires the specified stored field
|
||||
func (s *SortGeoDistance) RequiresFields() []string { return []string{s.Field} }
|
||||
|
||||
func (s *SortGeoDistance) MarshalJSON() ([]byte, error) {
|
||||
sfm := map[string]interface{}{
|
||||
"by": "geo_distance",
|
||||
"field": s.Field,
|
||||
"location": map[string]interface{}{
|
||||
"lon": s.Lon,
|
||||
"lat": s.Lat,
|
||||
},
|
||||
}
|
||||
if s.Unit != "" {
|
||||
sfm["unit"] = s.Unit
|
||||
}
|
||||
if s.Desc {
|
||||
sfm["desc"] = true
|
||||
}
|
||||
|
||||
return json.Marshal(sfm)
|
||||
}
|
||||
|
||||
func (s *SortGeoDistance) Copy() SearchSort {
|
||||
rv := *s
|
||||
return &rv
|
||||
}
|
||||
|
||||
func (s *SortGeoDistance) Reverse() {
|
||||
s.Desc = !s.Desc
|
||||
}
|
||||
|
||||
type BytesSlice [][]byte
|
||||
|
||||
func (p BytesSlice) Len() int { return len(p) }
|
||||
func (p BytesSlice) Less(i, j int) bool { return bytes.Compare(p[i], p[j]) < 0 }
|
||||
func (p BytesSlice) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
|
|
@ -0,0 +1,69 @@
|
|||
// Copyright (c) 2014 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package search
|
||||
|
||||
func MergeLocations(locations []FieldTermLocationMap) FieldTermLocationMap {
|
||||
rv := locations[0]
|
||||
|
||||
for i := 1; i < len(locations); i++ {
|
||||
nextLocations := locations[i]
|
||||
for field, termLocationMap := range nextLocations {
|
||||
rvTermLocationMap, rvHasField := rv[field]
|
||||
if rvHasField {
|
||||
rv[field] = MergeTermLocationMaps(rvTermLocationMap, termLocationMap)
|
||||
} else {
|
||||
rv[field] = termLocationMap
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return rv
|
||||
}
|
||||
|
||||
func MergeTermLocationMaps(rv, other TermLocationMap) TermLocationMap {
|
||||
for term, locationMap := range other {
|
||||
// for a given term/document there cannot be different locations
|
||||
// if they came back from different clauses, overwrite is ok
|
||||
rv[term] = locationMap
|
||||
}
|
||||
return rv
|
||||
}
|
||||
|
||||
func MergeFieldTermLocations(dest []FieldTermLocation, matches []*DocumentMatch) []FieldTermLocation {
|
||||
n := len(dest)
|
||||
for _, dm := range matches {
|
||||
n += len(dm.FieldTermLocations)
|
||||
}
|
||||
if cap(dest) < n {
|
||||
dest = append(make([]FieldTermLocation, 0, n), dest...)
|
||||
}
|
||||
|
||||
for _, dm := range matches {
|
||||
for _, ftl := range dm.FieldTermLocations {
|
||||
dest = append(dest, FieldTermLocation{
|
||||
Field: ftl.Field,
|
||||
Term: ftl.Term,
|
||||
Location: Location{
|
||||
Pos: ftl.Location.Pos,
|
||||
Start: ftl.Location.Start,
|
||||
End: ftl.Location.End,
|
||||
ArrayPositions: append(ArrayPositions(nil), ftl.Location.ArrayPositions...),
|
||||
},
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
return dest
|
||||
}
|
|
@ -0,0 +1,59 @@
|
|||
// Copyright (c) 2018 Couchbase, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
package size
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
)
|
||||
|
||||
func init() {
|
||||
var b bool
|
||||
SizeOfBool = int(reflect.TypeOf(b).Size())
|
||||
var f32 float32
|
||||
SizeOfFloat32 = int(reflect.TypeOf(f32).Size())
|
||||
var f64 float64
|
||||
SizeOfFloat64 = int(reflect.TypeOf(f64).Size())
|
||||
var i int
|
||||
SizeOfInt = int(reflect.TypeOf(i).Size())
|
||||
var m map[int]int
|
||||
SizeOfMap = int(reflect.TypeOf(m).Size())
|
||||
var ptr *int
|
||||
SizeOfPtr = int(reflect.TypeOf(ptr).Size())
|
||||
var slice []int
|
||||
SizeOfSlice = int(reflect.TypeOf(slice).Size())
|
||||
var str string
|
||||
SizeOfString = int(reflect.TypeOf(str).Size())
|
||||
var u8 uint8
|
||||
SizeOfUint8 = int(reflect.TypeOf(u8).Size())
|
||||
var u16 uint16
|
||||
SizeOfUint16 = int(reflect.TypeOf(u16).Size())
|
||||
var u32 uint32
|
||||
SizeOfUint32 = int(reflect.TypeOf(u32).Size())
|
||||
var u64 uint64
|
||||
SizeOfUint64 = int(reflect.TypeOf(u64).Size())
|
||||
}
|
||||
|
||||
var SizeOfBool int
|
||||
var SizeOfFloat32 int
|
||||
var SizeOfFloat64 int
|
||||
var SizeOfInt int
|
||||
var SizeOfMap int
|
||||
var SizeOfPtr int
|
||||
var SizeOfSlice int
|
||||
var SizeOfString int
|
||||
var SizeOfUint8 int
|
||||
var SizeOfUint16 int
|
||||
var SizeOfUint32 int
|
||||
var SizeOfUint64 int
|
|
@ -0,0 +1,8 @@
|
|||
#*
|
||||
*.sublime-*
|
||||
*~
|
||||
.#*
|
||||
.project
|
||||
.settings
|
||||
.DS_Store
|
||||
/testdata
|
|
@ -0,0 +1,16 @@
|
|||
language: go
|
||||
|
||||
go:
|
||||
- 1.4
|
||||
|
||||
script:
|
||||
- go get golang.org/x/tools/cmd/vet
|
||||
- go get golang.org/x/tools/cmd/cover
|
||||
- go get github.com/mattn/goveralls
|
||||
- go test -v -covermode=count -coverprofile=profile.out
|
||||
- go vet
|
||||
- goveralls -service drone.io -coverprofile=profile.out -repotoken $COVERALLS
|
||||
|
||||
notifications:
|
||||
email:
|
||||
- marty.schoch@gmail.com
|
|
@ -0,0 +1,19 @@
|
|||
Copyright (c) 2013 Charles Iliya Krempeaux <charles@reptile.ca> :: http://changelog.ca/
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in
|
||||
all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
THE SOFTWARE.
|
|
@ -0,0 +1,118 @@
|
|||
# This fork...
|
||||
|
||||
I'm maintaining this fork because the original author was not replying to issues or pull requests. For now I plan on maintaining this fork as necessary.
|
||||
|
||||
## Status
|
||||
|
||||
[![Build Status](https://travis-ci.org/blevesearch/go-porterstemmer.svg?branch=master)](https://travis-ci.org/blevesearch/go-porterstemmer)
|
||||
|
||||
[![Coverage Status](https://coveralls.io/repos/blevesearch/go-porterstemmer/badge.png?branch=HEAD)](https://coveralls.io/r/blevesearch/go-porterstemmer?branch=HEAD)
|
||||
|
||||
# Go Porter Stemmer
|
||||
|
||||
A native Go clean room implementation of the Porter Stemming Algorithm.
|
||||
|
||||
This algorithm is of interest to people doing Machine Learning or
|
||||
Natural Language Processing (NLP).
|
||||
|
||||
This is NOT a port. This is a native Go implementation from the human-readable
|
||||
description of the algorithm.
|
||||
|
||||
I've tried to make it (more) efficient by NOT internally using string's, but
|
||||
instead internally using []rune's and using the same (array) buffer used by
|
||||
the []rune slice (and sub-slices) at all steps of the algorithm.
|
||||
|
||||
For Porter Stemmer algorithm, see:
|
||||
|
||||
http://tartarus.org/martin/PorterStemmer/def.txt (URL #1)
|
||||
|
||||
http://tartarus.org/martin/PorterStemmer/ (URL #2)
|
||||
|
||||
# Departures
|
||||
|
||||
Also, since when I initially implemented it, it failed the tests at...
|
||||
|
||||
http://tartarus.org/martin/PorterStemmer/voc.txt (URL #3)
|
||||
|
||||
http://tartarus.org/martin/PorterStemmer/output.txt (URL #4)
|
||||
|
||||
... after reading the human-readble text over and over again to try to figure out
|
||||
what the error I made was (and doing all sorts of things to debug it) I came to the
|
||||
conclusion that the some of these tests were wrong according to the human-readable
|
||||
description of the algorithm.
|
||||
|
||||
This led me to wonder if maybe other people's code that was passing these tests had
|
||||
rules that were not in the human-readable description. Which led me to look at the source
|
||||
code here...
|
||||
|
||||
http://tartarus.org/martin/PorterStemmer/c.txt (URL #5)
|
||||
|
||||
... When I looked there I noticed that there are some items marked as a "DEPARTURE",
|
||||
which differ from the original algorithm. (There are 2 of these.)
|
||||
|
||||
I implemented these departures, and the tests at URL #3 and URL #4 all passed.
|
||||
|
||||
## Usage
|
||||
|
||||
To use this Golang library, use with something like:
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"github.com/reiver/go-porterstemmer"
|
||||
)
|
||||
|
||||
func main() {
|
||||
|
||||
word := "Waxes"
|
||||
|
||||
stem := porterstemmer.StemString(word)
|
||||
|
||||
fmt.Printf("The word [%s] has the stem [%s].\n", word, stem)
|
||||
}
|
||||
|
||||
Alternatively, if you want to be a bit more efficient, use []rune slices instead, with code like:
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"github.com/reiver/go-porterstemmer"
|
||||
)
|
||||
|
||||
func main() {
|
||||
|
||||
word := []rune("Waxes")
|
||||
|
||||
stem := porterstemmer.Stem(word)
|
||||
|
||||
fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
|
||||
}
|
||||
|
||||
Although NOTE that the above code may modify original slice (named "word" in the example) as a side
|
||||
effect, for efficiency reasons. And that the slice named "stem" in the example above may be a
|
||||
sub-slice of the slice named "word".
|
||||
|
||||
Also alternatively, if you already know that your word is already lowercase (and you don't need
|
||||
this library to lowercase your word for you) you can instead use code like:
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"github.com/reiver/go-porterstemmer"
|
||||
)
|
||||
|
||||
func main() {
|
||||
|
||||
word := []rune("waxes")
|
||||
|
||||
stem := porterstemmer.StemWithoutLowerCasing(word)
|
||||
|
||||
fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
|
||||
}
|
||||
|
||||
Again NOTE (like with the previous example) that the above code may modify original slice (named
|
||||
"word" in the example) as a side effect, for efficiency reasons. And that the slice named "stem"
|
||||
in the example above may be a sub-slice of the slice named "word".
|
|
@ -0,0 +1,839 @@
|
|||
package porterstemmer
|
||||
|
||||
import (
|
||||
// "log"
|
||||
"unicode"
|
||||
)
|
||||
|
||||
func isConsonant(s []rune, i int) bool {
|
||||
|
||||
//DEBUG
|
||||
//log.Printf("isConsonant: [%+v]", string(s[i]))
|
||||
|
||||
result := true
|
||||
|
||||
switch s[i] {
|
||||
case 'a', 'e', 'i', 'o', 'u':
|
||||
result = false
|
||||
case 'y':
|
||||
if 0 == i {
|
||||
result = true
|
||||
} else {
|
||||
result = !isConsonant(s, i-1)
|
||||
}
|
||||
default:
|
||||
result = true
|
||||
}
|
||||
|
||||
return result
|
||||
}
|
||||
|
||||
func measure(s []rune) uint {
|
||||
|
||||
// Initialize.
|
||||
lenS := len(s)
|
||||
result := uint(0)
|
||||
i := 0
|
||||
|
||||
// Short Circuit.
|
||||
if 0 == lenS {
|
||||
/////////// RETURN
|
||||
return result
|
||||
}
|
||||
|
||||
// Ignore (potential) consonant sequence at the beginning of word.
|
||||
for isConsonant(s, i) {
|
||||
|
||||
//DEBUG
|
||||
//log.Printf("[measure([%s])] Eat Consonant [%d] -> [%s]", string(s), i, string(s[i]))
|
||||
|
||||
i++
|
||||
if i >= lenS {
|
||||
/////////////// RETURN
|
||||
return result
|
||||
}
|
||||
}
|
||||
|
||||
// For each pair of a vowel sequence followed by a consonant sequence, increment result.
|
||||
Outer:
|
||||
for i < lenS {
|
||||
|
||||
for !isConsonant(s, i) {
|
||||
|
||||
//DEBUG
|
||||
//log.Printf("[measure([%s])] VOWEL [%d] -> [%s]", string(s), i, string(s[i]))
|
||||
|
||||
i++
|
||||
if i >= lenS {
|
||||
/////////// BREAK
|
||||
break Outer
|
||||
}
|
||||
}
|
||||
for isConsonant(s, i) {
|
||||
|
||||
//DEBUG
|
||||
//log.Printf("[measure([%s])] CONSONANT [%d] -> [%s]", string(s), i, string(s[i]))
|
||||
|
||||
i++
|
||||
if i >= lenS {
|
||||
result++
|
||||
/////////// BREAK
|
||||
break Outer
|
||||
}
|
||||
}
|
||||
result++
|
||||
}
|
||||
|
||||
// Return
|
||||
return result
|
||||
}
|
||||
|
||||
func hasSuffix(s, suffix []rune) bool {
|
||||
|
||||
lenSMinusOne := len(s) - 1
|
||||
lenSuffixMinusOne := len(suffix) - 1
|
||||
|
||||
if lenSMinusOne <= lenSuffixMinusOne {
|
||||
return false
|
||||
} else if s[lenSMinusOne] != suffix[lenSuffixMinusOne] { // I suspect checking this first should speed this function up in practice.
|
||||
/////// RETURN
|
||||
return false
|
||||
} else {
|
||||
|
||||
for i := 0; i < lenSuffixMinusOne; i++ {
|
||||
|
||||
if suffix[i] != s[lenSMinusOne-lenSuffixMinusOne+i] {
|
||||
/////////////// RETURN
|
||||
return false
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
return true
|
||||
}
|
||||
|
||||
func containsVowel(s []rune) bool {
|
||||
|
||||
lenS := len(s)
|
||||
|
||||
for i := 0; i < lenS; i++ {
|
||||
|
||||
if !isConsonant(s, i) {
|
||||
/////////// RETURN
|
||||
return true
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
return false
|
||||
}
|
||||
|
||||
func hasRepeatDoubleConsonantSuffix(s []rune) bool {
|
||||
|
||||
// Initialize.
|
||||
lenS := len(s)
|
||||
|
||||
result := false
|
||||
|
||||
// Do it!
|
||||
if 2 > lenS {
|
||||
result = false
|
||||
} else if s[lenS-1] == s[lenS-2] && isConsonant(s, lenS-1) { // Will using isConsonant() cause a problem with "YY"?
|
||||
result = true
|
||||
} else {
|
||||
result = false
|
||||
}
|
||||
|
||||
// Return,
|
||||
return result
|
||||
}
|
||||
|
||||
func hasConsonantVowelConsonantSuffix(s []rune) bool {
|
||||
|
||||
// Initialize.
|
||||
lenS := len(s)
|
||||
|
||||
result := false
|
||||
|
||||
// Do it!
|
||||
if 3 > lenS {
|
||||
result = false
|
||||
} else if isConsonant(s, lenS-3) && !isConsonant(s, lenS-2) && isConsonant(s, lenS-1) {
|
||||
result = true
|
||||
} else {
|
||||
result = false
|
||||
}
|
||||
|
||||
// Return
|
||||
return result
|
||||
}
|
||||
|
||||
func step1a(s []rune) []rune {
|
||||
|
||||
// Initialize.
|
||||
var result []rune = s
|
||||
|
||||
lenS := len(s)
|
||||
|
||||
// Do it!
|
||||
if suffix := []rune("sses"); hasSuffix(s, suffix) {
|
||||
|
||||
lenTrim := 2
|
||||
|
||||
subSlice := s[:lenS-lenTrim]
|
||||
|
||||
result = subSlice
|
||||
} else if suffix := []rune("ies"); hasSuffix(s, suffix) {
|
||||
lenTrim := 2
|
||||
|
||||
subSlice := s[:lenS-lenTrim]
|
||||
|
||||
result = subSlice
|
||||
} else if suffix := []rune("ss"); hasSuffix(s, suffix) {
|
||||
|
||||
result = s
|
||||
} else if suffix := []rune("s"); hasSuffix(s, suffix) {
|
||||
|
||||
lenSuffix := 1
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
result = subSlice
|
||||
}
|
||||
|
||||
// Return.
|
||||
return result
|
||||
}
|
||||
|
||||
func step1b(s []rune) []rune {
|
||||
|
||||
// Initialize.
|
||||
var result []rune = s
|
||||
|
||||
lenS := len(s)
|
||||
|
||||
// Do it!
|
||||
if suffix := []rune("eed"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 0 < m {
|
||||
lenTrim := 1
|
||||
|
||||
result = s[:lenS-lenTrim]
|
||||
}
|
||||
} else if suffix := []rune("ed"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
if containsVowel(subSlice) {
|
||||
|
||||
if suffix2 := []rune("at"); hasSuffix(subSlice, suffix2) {
|
||||
lenTrim := -1
|
||||
|
||||
result = s[:lenS-lenSuffix-lenTrim]
|
||||
} else if suffix2 := []rune("bl"); hasSuffix(subSlice, suffix2) {
|
||||
lenTrim := -1
|
||||
|
||||
result = s[:lenS-lenSuffix-lenTrim]
|
||||
} else if suffix2 := []rune("iz"); hasSuffix(subSlice, suffix2) {
|
||||
lenTrim := -1
|
||||
|
||||
result = s[:lenS-lenSuffix-lenTrim]
|
||||
} else if c := subSlice[len(subSlice)-1]; 'l' != c && 's' != c && 'z' != c && hasRepeatDoubleConsonantSuffix(subSlice) {
|
||||
lenTrim := 1
|
||||
|
||||
lenSubSlice := len(subSlice)
|
||||
|
||||
result = subSlice[:lenSubSlice-lenTrim]
|
||||
} else if c := subSlice[len(subSlice)-1]; 1 == measure(subSlice) && hasConsonantVowelConsonantSuffix(subSlice) && 'w' != c && 'x' != c && 'y' != c {
|
||||
lenTrim := -1
|
||||
|
||||
result = s[:lenS-lenSuffix-lenTrim]
|
||||
|
||||
result[len(result)-1] = 'e'
|
||||
} else {
|
||||
result = subSlice
|
||||
}
|
||||
|
||||
}
|
||||
} else if suffix := []rune("ing"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
if containsVowel(subSlice) {
|
||||
|
||||
if suffix2 := []rune("at"); hasSuffix(subSlice, suffix2) {
|
||||
lenTrim := -1
|
||||
|
||||
result = s[:lenS-lenSuffix-lenTrim]
|
||||
|
||||
result[len(result)-1] = 'e'
|
||||
} else if suffix2 := []rune("bl"); hasSuffix(subSlice, suffix2) {
|
||||
lenTrim := -1
|
||||
|
||||
result = s[:lenS-lenSuffix-lenTrim]
|
||||
|
||||
result[len(result)-1] = 'e'
|
||||
} else if suffix2 := []rune("iz"); hasSuffix(subSlice, suffix2) {
|
||||
lenTrim := -1
|
||||
|
||||
result = s[:lenS-lenSuffix-lenTrim]
|
||||
|
||||
result[len(result)-1] = 'e'
|
||||
} else if c := subSlice[len(subSlice)-1]; 'l' != c && 's' != c && 'z' != c && hasRepeatDoubleConsonantSuffix(subSlice) {
|
||||
lenTrim := 1
|
||||
|
||||
lenSubSlice := len(subSlice)
|
||||
|
||||
result = subSlice[:lenSubSlice-lenTrim]
|
||||
} else if c := subSlice[len(subSlice)-1]; 1 == measure(subSlice) && hasConsonantVowelConsonantSuffix(subSlice) && 'w' != c && 'x' != c && 'y' != c {
|
||||
lenTrim := -1
|
||||
|
||||
result = s[:lenS-lenSuffix-lenTrim]
|
||||
|
||||
result[len(result)-1] = 'e'
|
||||
} else {
|
||||
result = subSlice
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
// Return.
|
||||
return result
|
||||
}
|
||||
|
||||
func step1c(s []rune) []rune {
|
||||
|
||||
// Initialize.
|
||||
lenS := len(s)
|
||||
|
||||
result := s
|
||||
|
||||
// Do it!
|
||||
if 2 > lenS {
|
||||
/////////// RETURN
|
||||
return result
|
||||
}
|
||||
|
||||
if 'y' == s[lenS-1] && containsVowel(s[:lenS-1]) {
|
||||
|
||||
result[lenS-1] = 'i'
|
||||
|
||||
} else if 'Y' == s[lenS-1] && containsVowel(s[:lenS-1]) {
|
||||
|
||||
result[lenS-1] = 'I'
|
||||
|
||||
}
|
||||
|
||||
// Return.
|
||||
return result
|
||||
}
|
||||
|
||||
func step2(s []rune) []rune {
|
||||
|
||||
// Initialize.
|
||||
lenS := len(s)
|
||||
|
||||
result := s
|
||||
|
||||
// Do it!
|
||||
if suffix := []rune("ational"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result[lenS-5] = 'e'
|
||||
result = result[:lenS-4]
|
||||
}
|
||||
} else if suffix := []rune("tional"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result = result[:lenS-2]
|
||||
}
|
||||
} else if suffix := []rune("enci"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result[lenS-1] = 'e'
|
||||
}
|
||||
} else if suffix := []rune("anci"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result[lenS-1] = 'e'
|
||||
}
|
||||
} else if suffix := []rune("izer"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result = s[:lenS-1]
|
||||
}
|
||||
} else if suffix := []rune("bli"); hasSuffix(s, suffix) { // --DEPARTURE--
|
||||
// } else if suffix := []rune("abli") ; hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result[lenS-1] = 'e'
|
||||
}
|
||||
} else if suffix := []rune("alli"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result = s[:lenS-2]
|
||||
}
|
||||
} else if suffix := []rune("entli"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result = s[:lenS-2]
|
||||
}
|
||||
} else if suffix := []rune("eli"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result = s[:lenS-2]
|
||||
}
|
||||
} else if suffix := []rune("ousli"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result = s[:lenS-2]
|
||||
}
|
||||
} else if suffix := []rune("ization"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result[lenS-5] = 'e'
|
||||
|
||||
result = s[:lenS-4]
|
||||
}
|
||||
} else if suffix := []rune("ation"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result[lenS-3] = 'e'
|
||||
|
||||
result = s[:lenS-2]
|
||||
}
|
||||
} else if suffix := []rune("ator"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result[lenS-2] = 'e'
|
||||
|
||||
result = s[:lenS-1]
|
||||
}
|
||||
} else if suffix := []rune("alism"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result = s[:lenS-3]
|
||||
}
|
||||
} else if suffix := []rune("iveness"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result = s[:lenS-4]
|
||||
}
|
||||
} else if suffix := []rune("fulness"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result = s[:lenS-4]
|
||||
}
|
||||
} else if suffix := []rune("ousness"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result = s[:lenS-4]
|
||||
}
|
||||
} else if suffix := []rune("aliti"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result = s[:lenS-3]
|
||||
}
|
||||
} else if suffix := []rune("iviti"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result[lenS-3] = 'e'
|
||||
|
||||
result = result[:lenS-2]
|
||||
}
|
||||
} else if suffix := []rune("biliti"); hasSuffix(s, suffix) {
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
result[lenS-5] = 'l'
|
||||
result[lenS-4] = 'e'
|
||||
|
||||
result = result[:lenS-3]
|
||||
}
|
||||
} else if suffix := []rune("logi"); hasSuffix(s, suffix) { // --DEPARTURE--
|
||||
if 0 < measure(s[:lenS-len(suffix)]) {
|
||||
lenTrim := 1
|
||||
|
||||
result = s[:lenS-lenTrim]
|
||||
}
|
||||
}
|
||||
|
||||
// Return.
|
||||
return result
|
||||
}
|
||||
|
||||
func step3(s []rune) []rune {
|
||||
|
||||
// Initialize.
|
||||
lenS := len(s)
|
||||
result := s
|
||||
|
||||
// Do it!
|
||||
if suffix := []rune("icate"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
if 0 < measure(s[:lenS-lenSuffix]) {
|
||||
result = result[:lenS-3]
|
||||
}
|
||||
} else if suffix := []rune("ative"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 0 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("alize"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
if 0 < measure(s[:lenS-lenSuffix]) {
|
||||
result = result[:lenS-3]
|
||||
}
|
||||
} else if suffix := []rune("iciti"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
if 0 < measure(s[:lenS-lenSuffix]) {
|
||||
result = result[:lenS-3]
|
||||
}
|
||||
} else if suffix := []rune("ical"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
if 0 < measure(s[:lenS-lenSuffix]) {
|
||||
result = result[:lenS-2]
|
||||
}
|
||||
} else if suffix := []rune("ful"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 0 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ness"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 0 < m {
|
||||
result = subSlice
|
||||
}
|
||||
}
|
||||
|
||||
// Return.
|
||||
return result
|
||||
}
|
||||
|
||||
func step4(s []rune) []rune {
|
||||
|
||||
// Initialize.
|
||||
lenS := len(s)
|
||||
result := s
|
||||
|
||||
// Do it!
|
||||
if suffix := []rune("al"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = result[:lenS-lenSuffix]
|
||||
}
|
||||
} else if suffix := []rune("ance"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = result[:lenS-lenSuffix]
|
||||
}
|
||||
} else if suffix := []rune("ence"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = result[:lenS-lenSuffix]
|
||||
}
|
||||
} else if suffix := []rune("er"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ic"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("able"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ible"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ant"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ement"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ment"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ent"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ion"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
c := subSlice[len(subSlice)-1]
|
||||
|
||||
if 1 < m && ('s' == c || 't' == c) {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ou"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ism"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ate"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("iti"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ous"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ive"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
} else if suffix := []rune("ize"); hasSuffix(s, suffix) {
|
||||
lenSuffix := len(suffix)
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
}
|
||||
|
||||
// Return.
|
||||
return result
|
||||
}
|
||||
|
||||
func step5a(s []rune) []rune {
|
||||
|
||||
// Initialize.
|
||||
lenS := len(s)
|
||||
result := s
|
||||
|
||||
// Do it!
|
||||
if 'e' == s[lenS-1] {
|
||||
lenSuffix := 1
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
} else if 1 == m {
|
||||
if c := subSlice[len(subSlice)-1]; !(hasConsonantVowelConsonantSuffix(subSlice) && 'w' != c && 'x' != c && 'y' != c) {
|
||||
result = subSlice
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Return.
|
||||
return result
|
||||
}
|
||||
|
||||
func step5b(s []rune) []rune {
|
||||
|
||||
// Initialize.
|
||||
lenS := len(s)
|
||||
result := s
|
||||
|
||||
// Do it!
|
||||
if 2 < lenS && 'l' == s[lenS-2] && 'l' == s[lenS-1] {
|
||||
|
||||
lenSuffix := 1
|
||||
|
||||
subSlice := s[:lenS-lenSuffix]
|
||||
|
||||
m := measure(subSlice)
|
||||
|
||||
if 1 < m {
|
||||
result = subSlice
|
||||
}
|
||||
}
|
||||
|
||||
// Return.
|
||||
return result
|
||||
}
|
||||
|
||||
func StemString(s string) string {
|
||||
|
||||
// Convert string to []rune
|
||||
runeArr := []rune(s)
|
||||
|
||||
// Stem.
|
||||
runeArr = Stem(runeArr)
|
||||
|
||||
// Convert []rune to string
|
||||
str := string(runeArr)
|
||||
|
||||
// Return.
|
||||
return str
|
||||
}
|
||||
|
||||
func Stem(s []rune) []rune {
|
||||
|
||||
// Initialize.
|
||||
lenS := len(s)
|
||||
|
||||
// Short circuit.
|
||||
if 0 == lenS {
|
||||
/////////// RETURN
|
||||
return s
|
||||
}
|
||||
|
||||
// Make all runes lowercase.
|
||||
for i := 0; i < lenS; i++ {
|
||||
s[i] = unicode.ToLower(s[i])
|
||||
}
|
||||
|
||||
// Stem
|
||||
result := StemWithoutLowerCasing(s)
|
||||
|
||||
// Return.
|
||||
return result
|
||||
}
|
||||
|
||||
func StemWithoutLowerCasing(s []rune) []rune {
|
||||
|
||||
// Initialize.
|
||||
lenS := len(s)
|
||||
|
||||
// Words that are of length 2 or less is already stemmed.
|
||||
// Don't do anything.
|
||||
if 2 >= lenS {
|
||||
/////////// RETURN
|
||||
return s
|
||||
}
|
||||
|
||||
// Stem
|
||||
s = step1a(s)
|
||||
s = step1b(s)
|
||||
s = step1c(s)
|
||||
s = step2(s)
|
||||
s = step3(s)
|
||||
s = step4(s)
|
||||
s = step5a(s)
|
||||
s = step5b(s)
|
||||
|
||||
// Return.
|
||||
return s
|
||||
}
|
|
@ -0,0 +1,2 @@
|
|||
language: go
|
||||
script: go test -race -cpu 1,2,4 -v ./...
|
|
@ -0,0 +1,63 @@
|
|||
Multibayes
|
||||
==========
|
||||
|
||||
[![Build Status](https://travis-ci.org/lytics/multibayes.svg?branch=master)](https://travis-ci.org/lytics/multibayes) [![GoDoc](https://godoc.org/github.com/lytics/multibayes?status.svg)](https://godoc.org/github.com/lytics/multibayes)
|
||||
|
||||
Multiclass naive Bayesian document classification.
|
||||
|
||||
Often in document classification, a document may have more than one relevant classification -- a question on [stackoverflow](http://stackoverflow.com) might have tags "go", "map", and "interface".
|
||||
|
||||
While multinomial Bayesian classification offers a one-of-many classification, multibayes offers tools for many-of-many classification. The multibayes library strives to offer efficient storage and calculation of multiple Bayesian posterior classification probabilities.
|
||||
|
||||
## Usage
|
||||
|
||||
A new classifier is created with the `NewClassifier` function, and can be trained by adding documents and classes by calling the `Add` method:
|
||||
|
||||
```go
|
||||
classifier.Add("A new document", []string{"class1", "class2"})
|
||||
```
|
||||
|
||||
Posterior probabilities for a new document are calculated by calling the `Posterior` method:
|
||||
|
||||
```go
|
||||
classifier.Posterior("Another new document")
|
||||
```
|
||||
|
||||
A posterior class probability is returned for each class observed in the training set, which the user can use to determine class assignment. A user can then assign classifications according to his or her own heuristics -- for example, by using all classes that yield a posterior probability greater than 0.8
|
||||
|
||||
|
||||
## Example
|
||||
|
||||
```go
|
||||
documents := []struct {
|
||||
Text string
|
||||
Classes []string
|
||||
}{
|
||||
{
|
||||
Text: "My dog has fleas.",
|
||||
Classes: []string{"vet"},
|
||||
},
|
||||
{
|
||||
Text: "My cat has ebola.",
|
||||
Classes: []string{"vet", "cdc"},
|
||||
},
|
||||
{
|
||||
Text: "Aaron has ebola.",
|
||||
Classes: []string{"cdc"},
|
||||
},
|
||||
}
|
||||
|
||||
classifier := NewClassifier()
|
||||
classifier.MinClassSize = 0
|
||||
|
||||
// train the classifier
|
||||
for _, document := range documents {
|
||||
classifier.Add(document.Text, document.Classes)
|
||||
}
|
||||
|
||||
// predict new classes
|
||||
probs := classifier.Posterior("Aaron's dog has fleas.")
|
||||
fmt.Printf("Posterior Probabilities: %+v\n", probs)
|
||||
|
||||
// Posterior Probabilities: map[vet:0.8571 cdc:0.2727]
|
||||
```
|
|
@ -0,0 +1,9 @@
|
|||
// Multiclass naive Bayesian document classification.
|
||||
//
|
||||
// While multinomial Bayesian classification offers
|
||||
// one-of-many classification, multibayes offers tools
|
||||
// for many-of-many classification. The multibayes
|
||||
// library strives to offer efficient storage and
|
||||
// calculation of multiple Bayesian posterior classification
|
||||
// probabilities.
|
||||
package multibayes
|
|
@ -0,0 +1,66 @@
|
|||
package multibayes
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"io/ioutil"
|
||||
)
|
||||
|
||||
type jsonableClassifier struct {
|
||||
Matrix *sparseMatrix `json:"matrix"`
|
||||
}
|
||||
|
||||
func (c *Classifier) MarshalJSON() ([]byte, error) {
|
||||
return json.Marshal(&jsonableClassifier{c.Matrix})
|
||||
}
|
||||
|
||||
func (c *Classifier) UnmarshalJSON(buf []byte) error {
|
||||
j := jsonableClassifier{}
|
||||
|
||||
err := json.Unmarshal(buf, &j)
|
||||
if err != nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
*c = *NewClassifier()
|
||||
c.Matrix = j.Matrix
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// Initialize a new classifier from a JSON byte slice.
|
||||
func NewClassifierFromJSON(buf []byte) (*Classifier, error) {
|
||||
classifier := &Classifier{}
|
||||
|
||||
err := classifier.UnmarshalJSON(buf)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return classifier, nil
|
||||
}
|
||||
|
||||
func LoadClassifierFromFile(filename string) (*Classifier, error) {
|
||||
buf, err := ioutil.ReadFile(filename)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return NewClassifierFromJSON(buf)
|
||||
}
|
||||
|
||||
func (s *sparseColumn) MarshalJSON() ([]byte, error) {
|
||||
return json.Marshal(s.Data)
|
||||
}
|
||||
|
||||
func (s *sparseColumn) UnmarshalJSON(buf []byte) error {
|
||||
var data []int
|
||||
|
||||
err := json.Unmarshal(buf, &data)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
s.Data = data
|
||||
|
||||
return nil
|
||||
}
|
|
@ -0,0 +1,73 @@
|
|||
package multibayes
|
||||
|
||||
type sparseMatrix struct {
|
||||
Tokens map[string]*sparseColumn `json:"tokens"` // []map[tokenindex]occurence
|
||||
Classes map[string]*sparseColumn `json:"classes"` // map[classname]classindex
|
||||
N int `json:"n"` // number of rows currently in the matrix
|
||||
}
|
||||
|
||||
type sparseColumn struct {
|
||||
Data []int `json:"data"`
|
||||
}
|
||||
|
||||
func newSparseColumn() *sparseColumn {
|
||||
return &sparseColumn{
|
||||
Data: make([]int, 0, 1000),
|
||||
}
|
||||
}
|
||||
|
||||
func (s *sparseColumn) Add(index int) {
|
||||
s.Data = append(s.Data, index)
|
||||
}
|
||||
|
||||
// return the number of rows that contain the column
|
||||
func (s *sparseColumn) Count() int {
|
||||
return len(s.Data)
|
||||
}
|
||||
|
||||
// sparse to dense
|
||||
func (s *sparseColumn) Expand(n int) []float64 {
|
||||
expanded := make([]float64, n)
|
||||
for _, index := range s.Data {
|
||||
expanded[index] = 1.0
|
||||
}
|
||||
return expanded
|
||||
}
|
||||
|
||||
func newSparseMatrix() *sparseMatrix {
|
||||
return &sparseMatrix{
|
||||
Tokens: make(map[string]*sparseColumn),
|
||||
Classes: make(map[string]*sparseColumn),
|
||||
N: 0,
|
||||
}
|
||||
}
|
||||
|
||||
func (s *sparseMatrix) Add(ngrams []ngram, classes []string) {
|
||||
if len(ngrams) == 0 || len(classes) == 0 {
|
||||
return
|
||||
}
|
||||
for _, class := range classes {
|
||||
if _, ok := s.Classes[class]; !ok {
|
||||
s.Classes[class] = newSparseColumn()
|
||||
}
|
||||
|
||||
s.Classes[class].Add(s.N)
|
||||
}
|
||||
|
||||
// add ngrams uniquely
|
||||
added := make(map[string]int)
|
||||
for _, ngram := range ngrams {
|
||||
gramString := ngram.String()
|
||||
if _, ok := s.Tokens[gramString]; !ok {
|
||||
s.Tokens[gramString] = newSparseColumn()
|
||||
}
|
||||
|
||||
// only add the document index once for the ngram
|
||||
if _, ok := added[gramString]; !ok {
|
||||
added[gramString] = 1
|
||||
s.Tokens[gramString].Add(s.N)
|
||||
}
|
||||
}
|
||||
// increment the row counter
|
||||
s.N++
|
||||
}
|
|
@ -0,0 +1,181 @@
|
|||
package multibayes
|
||||
|
||||
var (
|
||||
stopbytes = [][]byte{
|
||||
[]byte(`i`),
|
||||
[]byte(`me`),
|
||||
[]byte(`my`),
|
||||
[]byte(`myself`),
|
||||
[]byte(`we`),
|
||||
[]byte(`our`),
|
||||
[]byte(`ours`),
|
||||
[]byte(`ourselves`),
|
||||
[]byte(`you`),
|
||||
[]byte(`your`),
|
||||
[]byte(`yours`),
|
||||
[]byte(`yourself`),
|
||||
[]byte(`yourselves`),
|
||||
[]byte(`he`),
|
||||
[]byte(`him`),
|
||||
[]byte(`his`),
|
||||
[]byte(`himself`),
|
||||
[]byte(`she`),
|
||||
[]byte(`her`),
|
||||
[]byte(`hers`),
|
||||
[]byte(`herself`),
|
||||
[]byte(`it`),
|
||||
[]byte(`its`),
|
||||
[]byte(`itself`),
|
||||
[]byte(`they`),
|
||||
[]byte(`them`),
|
||||
[]byte(`their`),
|
||||
[]byte(`theirs`),
|
||||
[]byte(`themselves`),
|
||||
[]byte(`what`),
|
||||
[]byte(`which`),
|
||||
[]byte(`who`),
|
||||
[]byte(`whom`),
|
||||
[]byte(`this`),
|
||||
[]byte(`that`),
|
||||
[]byte(`these`),
|
||||
[]byte(`those`),
|
||||
[]byte(`am`),
|
||||
[]byte(`is`),
|
||||
[]byte(`are`),
|
||||
[]byte(`was`),
|
||||
[]byte(`were`),
|
||||
[]byte(`be`),
|
||||
[]byte(`been`),
|
||||
[]byte(`being`),
|
||||
[]byte(`have`),
|
||||
[]byte(`has`),
|
||||
[]byte(`had`),
|
||||
[]byte(`having`),
|
||||
[]byte(`do`),
|
||||
[]byte(`does`),
|
||||
[]byte(`did`),
|
||||
[]byte(`doing`),
|
||||
[]byte(`would`),
|
||||
[]byte(`should`),
|
||||
[]byte(`could`),
|
||||
[]byte(`ought`),
|
||||
[]byte(`i'm`),
|
||||
[]byte(`you're`),
|
||||
[]byte(`he's`),
|
||||
[]byte(`she's`),
|
||||
[]byte(`it's`),
|
||||
[]byte(`we're`),
|
||||
[]byte(`they're`),
|
||||
[]byte(`i've`),
|
||||
[]byte(`you've`),
|
||||
[]byte(`we've`),
|
||||
[]byte(`they've`),
|
||||
[]byte(`i'd`),
|
||||
[]byte(`you'd`),
|
||||
[]byte(`he'd`),
|
||||
[]byte(`she'd`),
|
||||
[]byte(`we'd`),
|
||||
[]byte(`they'd`),
|
||||
[]byte(`i'll`),
|
||||
[]byte(`you'll`),
|
||||
[]byte(`he'll`),
|
||||
[]byte(`she'll`),
|
||||
[]byte(`we'll`),
|
||||
[]byte(`they'll`),
|
||||
[]byte(`isn't`),
|
||||
[]byte(`aren't`),
|
||||
[]byte(`wasn't`),
|
||||
[]byte(`weren't`),
|
||||
[]byte(`hasn't`),
|
||||
[]byte(`haven't`),
|
||||
[]byte(`hadn't`),
|
||||
[]byte(`doesn't`),
|
||||
[]byte(`don't`),
|
||||
[]byte(`didn't`),
|
||||
[]byte(`won't`),
|
||||
[]byte(`wouldn't`),
|
||||
[]byte(`shan't`),
|
||||
[]byte(`shouldn't`),
|
||||
[]byte(`can't`),
|
||||
[]byte(`cannot`),
|
||||
[]byte(`couldn't`),
|
||||
[]byte(`mustn't`),
|
||||
[]byte(`let's`),
|
||||
[]byte(`that's`),
|
||||
[]byte(`who's`),
|
||||
[]byte(`what's`),
|
||||
[]byte(`here's`),
|
||||
[]byte(`there's`),
|
||||
[]byte(`when's`),
|
||||
[]byte(`where's`),
|
||||
[]byte(`why's`),
|
||||
[]byte(`how's`),
|
||||
[]byte(`a`),
|
||||
[]byte(`an`),
|
||||
[]byte(`the`),
|
||||
[]byte(`and`),
|
||||
[]byte(`but`),
|
||||
[]byte(`if`),
|
||||
[]byte(`or`),
|
||||
[]byte(`because`),
|
||||
[]byte(`as`),
|
||||
[]byte(`until`),
|
||||
[]byte(`while`),
|
||||
[]byte(`of`),
|
||||
[]byte(`at`),
|
||||
[]byte(`by`),
|
||||
[]byte(`for`),
|
||||
[]byte(`with`),
|
||||
[]byte(`about`),
|
||||
[]byte(`against`),
|
||||
[]byte(`between`),
|
||||
[]byte(`into`),
|
||||
[]byte(`through`),
|
||||
[]byte(`during`),
|
||||
[]byte(`before`),
|
||||
[]byte(`after`),
|
||||
[]byte(`above`),
|
||||
[]byte(`below`),
|
||||
[]byte(`to`),
|
||||
[]byte(`from`),
|
||||
[]byte(`up`),
|
||||
[]byte(`down`),
|
||||
[]byte(`in`),
|
||||
[]byte(`out`),
|
||||
[]byte(`on`),
|
||||
[]byte(`off`),
|
||||
[]byte(`over`),
|
||||
[]byte(`under`),
|
||||
[]byte(`again`),
|
||||
[]byte(`further`),
|
||||
[]byte(`then`),
|
||||
[]byte(`once`),
|
||||
[]byte(`here`),
|
||||
[]byte(`there`),
|
||||
[]byte(`when`),
|
||||
[]byte(`where`),
|
||||
[]byte(`why`),
|
||||
[]byte(`how`),
|
||||
[]byte(`all`),
|
||||
[]byte(`any`),
|
||||
[]byte(`both`),
|
||||
[]byte(`each`),
|
||||
[]byte(`few`),
|
||||
[]byte(`more`),
|
||||
[]byte(`most`),
|
||||
[]byte(`other`),
|
||||
[]byte(`some`),
|
||||
[]byte(`such`),
|
||||
[]byte(`no`),
|
||||
[]byte(`nor`),
|
||||
[]byte(`not`),
|
||||
[]byte(`only`),
|
||||
[]byte(`own`),
|
||||
[]byte(`same`),
|
||||
[]byte(`so`),
|
||||
[]byte(`than`),
|
||||
[]byte(`too`),
|
||||
[]byte(`very`),
|
||||
[]byte(`-`),
|
||||
}
|
||||
)
|
|
@ -0,0 +1,33 @@
|
|||
package multibayes
|
||||
|
||||
type document struct {
|
||||
Text string
|
||||
Classes []string
|
||||
}
|
||||
|
||||
func getTestData() []document {
|
||||
|
||||
documents := []document{
|
||||
{
|
||||
Text: "My dog has fleas.",
|
||||
Classes: []string{"vet"},
|
||||
},
|
||||
{
|
||||
Text: "My cat has ebola.",
|
||||
Classes: []string{"vet", "cdc"},
|
||||
},
|
||||
{
|
||||
Text: "Aaron has ebola.",
|
||||
Classes: []string{"cdc"},
|
||||
},
|
||||
}
|
||||
|
||||
return documents
|
||||
}
|
||||
|
||||
func (c *Classifier) trainWithTestData() {
|
||||
testdata := getTestData()
|
||||
for _, document := range testdata {
|
||||
c.Add(document.Text, document.Classes)
|
||||
}
|
||||
}
|
|
@ -0,0 +1,166 @@
|
|||
package multibayes
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/base64"
|
||||
"regexp"
|
||||
"strings"
|
||||
|
||||
"github.com/blevesearch/bleve/analysis"
|
||||
regexp_tokenizer "github.com/blevesearch/bleve/analysis/tokenizer/regexp"
|
||||
"github.com/blevesearch/go-porterstemmer"
|
||||
)
|
||||
|
||||
const (
|
||||
tokenSeparator = "_"
|
||||
)
|
||||
|
||||
type ngram struct {
|
||||
Tokens [][]byte
|
||||
}
|
||||
|
||||
// encodes in base64 for safe comparison
|
||||
func (ng *ngram) String() string {
|
||||
encoded := make([]string, len(ng.Tokens))
|
||||
|
||||
for i, token := range ng.Tokens {
|
||||
encoded[i] = string(token)
|
||||
//encoded[i] = base64.StdEncoding.EncodeToString(token) // safer?
|
||||
}
|
||||
|
||||
return strings.Join(encoded, tokenSeparator)
|
||||
}
|
||||
|
||||
func decodeNGram(s string) (*ngram, error) {
|
||||
encodedTokens := strings.Split(s, tokenSeparator)
|
||||
|
||||
tokens := make([][]byte, len(encodedTokens))
|
||||
|
||||
var err error
|
||||
for i, encodedToken := range encodedTokens {
|
||||
tokens[i], err = base64.StdEncoding.DecodeString(encodedToken)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
}
|
||||
return &ngram{tokens}, nil
|
||||
}
|
||||
|
||||
type tokenizerConf struct {
|
||||
regexp *regexp.Regexp
|
||||
NGramSize int64
|
||||
}
|
||||
|
||||
type tokenizer struct {
|
||||
regexp_tokenizer.RegexpTokenizer
|
||||
Conf *tokenizerConf
|
||||
}
|
||||
|
||||
func validateConf(tc *tokenizerConf) {
|
||||
tc.regexp = regexp.MustCompile(`[0-9A-z_'\-]+|\%|\$`)
|
||||
|
||||
// TODO: We force NGramSize = 1 so as to create disjoint ngrams,
|
||||
// which is necessary for the naive assumption of conditional
|
||||
// independence among tokens. It would be great to allow ngrams
|
||||
// to be greater than 1 and select only disjoint ngrams from the
|
||||
// tokenizer.
|
||||
tc.NGramSize = 1
|
||||
}
|
||||
|
||||
func newTokenizer(tc *tokenizerConf) (*tokenizer, error) {
|
||||
validateConf(tc)
|
||||
|
||||
return &tokenizer{*regexp_tokenizer.NewRegexpTokenizer(tc.regexp), tc}, nil
|
||||
}
|
||||
|
||||
// Tokenize and Gramify
|
||||
func (t *tokenizer) Parse(doc string) []ngram {
|
||||
// maybe use token types for datetimes or something instead of
|
||||
// the actual byte slice
|
||||
alltokens := t.Tokenize([]byte(strings.ToLower(doc)))
|
||||
filtered := make(map[int][]byte)
|
||||
for i, token := range alltokens {
|
||||
exclude := false
|
||||
for _, stop := range stopbytes {
|
||||
if bytes.Equal(token.Term, stop) {
|
||||
exclude = true
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
if exclude {
|
||||
continue
|
||||
}
|
||||
|
||||
tokenString := porterstemmer.StemString(string(token.Term))
|
||||
//tokenBytes := porterstemmer.Stem(token.Term) // takes runes, not bytes
|
||||
|
||||
if token.Type == analysis.Numeric {
|
||||
tokenString = "NUMBER"
|
||||
} else if token.Type == analysis.DateTime {
|
||||
tokenString = "DATE"
|
||||
}
|
||||
|
||||
filtered[i] = []byte(tokenString)
|
||||
}
|
||||
|
||||
// only consider sequential terms as candidates for ngrams
|
||||
// terms separated by stopwords are ineligible
|
||||
allNGrams := make([]ngram, 0, 100)
|
||||
currentTokens := make([][]byte, 0, 100)
|
||||
|
||||
lastObserved := -1
|
||||
for i, token := range filtered {
|
||||
if (i - 1) != lastObserved {
|
||||
|
||||
ngrams := t.tokensToNGrams(currentTokens)
|
||||
allNGrams = append(allNGrams, ngrams...)
|
||||
|
||||
currentTokens = make([][]byte, 0, 100)
|
||||
}
|
||||
|
||||
currentTokens = append(currentTokens, token)
|
||||
lastObserved = i
|
||||
}
|
||||
|
||||
// bring in the last one
|
||||
if len(currentTokens) > 0 {
|
||||
ngrams := t.tokensToNGrams(currentTokens)
|
||||
allNGrams = append(allNGrams, ngrams...)
|
||||
}
|
||||
|
||||
return allNGrams
|
||||
}
|
||||
|
||||
func (t *tokenizer) tokensToNGrams(tokens [][]byte) []ngram {
|
||||
nTokens := int64(len(tokens))
|
||||
|
||||
nNGrams := int64(0)
|
||||
for i := int64(1); i <= t.Conf.NGramSize; i++ {
|
||||
chosen := choose(nTokens, i)
|
||||
nNGrams += chosen
|
||||
}
|
||||
|
||||
ngrams := make([]ngram, 0, nNGrams)
|
||||
for ngramSize := int64(1); ngramSize <= t.Conf.NGramSize; ngramSize++ {
|
||||
nNGramsOfSize := choose(nTokens, ngramSize)
|
||||
|
||||
for i := int64(0); i < nNGramsOfSize; i++ {
|
||||
ngrams = append(ngrams, ngram{tokens[i:(i + ngramSize)]})
|
||||
}
|
||||
}
|
||||
|
||||
return ngrams
|
||||
}
|
||||
|
||||
// not a binomial coefficient -- combinations must be sequential
|
||||
func choose(n, k int64) int64 {
|
||||
return max(n-k+int64(1), 0)
|
||||
}
|
||||
|
||||
func max(x, y int64) int64 {
|
||||
if x > y {
|
||||
return x
|
||||
}
|
||||
return y
|
||||
}
|
|
@ -0,0 +1,16 @@
|
|||
# github.com/blevesearch/bleve v0.8.1
|
||||
github.com/blevesearch/bleve/analysis
|
||||
github.com/blevesearch/bleve/analysis/tokenizer/regexp
|
||||
github.com/blevesearch/bleve/document
|
||||
github.com/blevesearch/bleve/geo
|
||||
github.com/blevesearch/bleve/index
|
||||
github.com/blevesearch/bleve/index/store
|
||||
github.com/blevesearch/bleve/numeric
|
||||
github.com/blevesearch/bleve/registry
|
||||
github.com/blevesearch/bleve/search
|
||||
github.com/blevesearch/bleve/search/highlight
|
||||
github.com/blevesearch/bleve/size
|
||||
# github.com/blevesearch/go-porterstemmer v1.0.2
|
||||
github.com/blevesearch/go-porterstemmer
|
||||
# github.com/lytics/multibayes v0.0.0-20161108162840-3457a5582021
|
||||
github.com/lytics/multibayes
|
|
@ -0,0 +1,4 @@
|
|||
guns
|
||||
cors
|
||||
/favicon.ico
|
||||
favicon
|
|
@ -0,0 +1,26 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"log"
|
||||
"runtime"
|
||||
"time"
|
||||
)
|
||||
|
||||
func init() {
|
||||
|
||||
log.Println("Garbage Collector Thread Starting")
|
||||
|
||||
go memoryCleanerThread()
|
||||
|
||||
}
|
||||
|
||||
func memoryCleanerThread() {
|
||||
|
||||
for {
|
||||
time.Sleep(10 * time.Minute)
|
||||
log.Println("Time to clean memory...")
|
||||
runtime.GC()
|
||||
log.Println("Garbage Collection done.")
|
||||
}
|
||||
|
||||
}
|
Loading…
Reference in New Issue