Initial commit

master 1.0.0
LowEel 2020-10-08 21:53:05 +02:00
parent b50c929081
commit abf901d932
90 changed files with 9854 additions and 0 deletions

6
.gitignore vendored Normal file
View File

@ -0,0 +1,6 @@
zardoz
bayes.*
/logs
logs/*
binaries/*
binaries

3
.vscode/settings.json vendored Normal file
View File

@ -0,0 +1,3 @@
{
"go.inferGopath": false
}

15
LICENSE Normal file
View File

@ -0,0 +1,15 @@
Zardoz: a lightweight WAF , based on Pseudo-Bayes machine learning.
Copyright (C) 2020 loweel@keinpfusch.net
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.

122
README.md Normal file
View File

@ -0,0 +1,122 @@
# Zardoz: a lightweight WAF , based on Pseudo-Bayes machine learning.
Zardoz is a small WAF, aiming to take off HTTP calls which are well-known to end in some HTTP error. It behaves like a reverse proxy, running as a frontend. It intercepts the calls, forwards them when needed and learns how the server reacts from the Status Code.
After a while, the bayes classifier is able to understand what is a "good" HTTP call and a bad one, based on the header contents.
It is designed to don't consume much memory neither CPU, so that you don't need powerful servers to keep it running, neither it can introduce high latency on the web server.
## STATUS:
This is just an experiment I'm doing with Pseudo-Bayes classifiers. It works pretty well with my blog. Run in production at your own risk.
## Compiling:
Requirements:
- golang >= 1.12.9
build:
```bash
git clone https://git.keinpfusch.net/LowEel/zardoz
cd zardoz
go build
```
## Starting:
Zardoz has no configuration file, it entirely depends from environment string.
In Dockerfile, this maps like:
```bash
ENV REVERSEURL http://10.0.1.1:3000
ENV PROXYPORT :17000
ENV TRIGGER 0.6
ENV SENIORITY 1025
ENV DEBUG false
ENV DUMPFILE /somewhere/bayes.txt
ENV COLLECTION 2048
```
Using a bash script, this means something like:
```bash
export REVERSEURL=http://10.0.1.1:3000
export PROXYPORT=":17000"
export TRIGGER="0.6"
export SENIORITY="1025"
export DEBUG="true"
export DUMPFILE="/somewhere/bayes.txt"
export COLLECTION
./zardoz
```
## Understanding Configuration:
**REVERSEURL** is the server zardoz will be a reverse proxy for. This maps to IP and port of the server you want to protect.
**PROXYPORT** is the IP and PORT where zardoz will listen. If you want zardoz to listen on all ports, just write like ":17000", meaning, it will listen on all interfaces at port 17000
**TRIGGER**: this is one of the trickiest part. We can describe the behavior of zardoz in quadrants, like:
| - | BAD > GOOD | BAD < GOOD |
| ------------------------------- | ----------- | ---------- |
| **\| GOOD - BAD \| > TRIGGER** | BLOCK | PASS |
| **\| GOOD - BAD \| <= TRIGGER** | BLOCK+LEARN | PASS+LEARN |
The value of trigger can be from 0 to 1, like "0.5" or "0.6". The difference between BLOCK without learning and block with learning is execution time. On the point of view of user experience, it will change nothing (user will be blocked) but in case of "block+learn" the machine will try to learn the lesson.
Basically, if the GOOD and BAD are very far, "likelyhood" is very high, so that block and pass are taken strictly.
If the likelyhood is lesser than TRIGGER, then we aren't sure the prediction is good, so zardoz executes the PASS or BLOCK, but it waits for the response , and learns from it. To summerize, the concept is about "likelyhood", which makes the difference between an action and the same action + LEARN.
Personally I've got good results putting the trigger at 0.6, meaning this is not disturbing so much users, and in the same time it has filtered tons of malicious scan.
**SENIORITY**: since Zardoz will learn what is good for your web server, it takes time to gain seniority. To start Zardoz as empty and leave it to decide will generate some terrible behavior, because of false positives and false negatives. Plus, at the beginning Zardoz is supposed to ALWAYS learn.
The parameter "SENIORITY" is then the amount of requests it will set in "PASS+LEARN" before the filtering starts. During this time, it will learn from real traffic. It will block no traffic unless "seniority" is reach. If you set it to 1025, it will learn from 1025 requests and then it will start to actually filter the requests. The number depends by many factors: if you have a lot of page served and a lot of contents, I suggest to increase the number.
**DUMPFILE**
This is where you want the dumpfile to be saved. Useful with Docker volumes.
**COLLECTION**
The amount of collected tokens which are considered enough to do a good job. This depends by your service. This is useful to limit memory usage if your server has a very complex content, by example.
**TROUBLESHOOTING:**
If DEBUG is set to "false" or not set, minute Zardoz will dump the sparse matrix describing to the whole bayesian learning, into a file named bayes.json. This contains the weighted matrix of calls and classes. If Zardoz is not behaving like you expected, you may give a look to this file. The format is a classic sparse matrix. WARNING: this file **may** contain cookies or other sensitive headers.
DEBUG : if set to "true", Zardoz will create a folder "logs" and log what happens, together with the dump of sparse matrix. If set to "false" or not set, sparse matrix will be available on disk for post-mortem.
**CREDIT**
Credits for the Bayesian Implementation to Jake Brukhman : https://github.com/jbrukh/bayesian
## TODO:
- [ ] Loading Bayesian data from file.
- [X] Better Logging
- [ ] Configurable block message.
- [ ] Usage Statistics/Metrics sent to influxDB/prometheus/whatever

51
alloc.go Normal file
View File

@ -0,0 +1,51 @@
package main
import (
"fmt"
"log"
"net/http"
"time"
)
//HTTPFlow is a type containg all the data we need.
type HTTPFlow struct {
request *http.Request
response *http.Response
sensitivity float64 // value who triggers decisions
seniority int64
collection float64
refreshtime time.Duration
}
//DebugLog tells if logs are in debug mode or not
var DebugLog bool
//ProxyFlow represents our flow
var ProxyFlow HTTPFlow
//ZClassifier is our bayesian classifier
var ZClassifier *ByClassifier
//BlockMessage is the messgae we return when blocking
var BlockMessage string
//Maturity is the minimal amount of request , needed to say Zardoz has learnt enough
var Maturity int64
func init() {
ZClassifier = new(ByClassifier)
ZClassifier.enroll()
ProxyFlow.sensitivity = 0.5
ProxyFlow.seniority = 0
bl, err := Asset("assets/message.txt")
if err != nil {
log.Println("Cannot marshal asset error message!!")
BlockMessage = ""
} else {
BlockMessage = fmt.Sprintf("%s", bl)
}
}

40
assets/message.txt Normal file

File diff suppressed because one or more lines are too long

237
bindata.go Normal file

File diff suppressed because one or more lines are too long

6
blacklist.txt Normal file
View File

@ -0,0 +1,6 @@
penis
wallet
/.well-known/host-meta
/.well-known/host-meta/
/.well-known/nodeinfo

24
build.sh Executable file
View File

@ -0,0 +1,24 @@
#!/bin/bash
rm ./zardoz
GOOS=linux GOARCH=arm64 CGO_ENABLED=0 go build -mod=vendor
file zardoz
mv ./zardoz ./binaries/arm64/zardoz
tar -cvzf ./binaries/tgz/zardoz_arm64.tgz -C ./binaries/arm64 . --owner=0 --group=0
GOOS=linux GOARCH=arm CGO_ENABLED=0 GOARM=7 go build -mod=vendor
file zardoz
mv ./zardoz ./binaries/armv7/zardoz
tar -cvzf ./binaries/tgz/zardoz_armv7.tgz -C ./binaries/armv7 . --owner=0 --group=0
GOOS=linux GOARCH=mips CGO_ENABLED=0 go build -mod=vendor
file zardoz
mv ./zardoz ./binaries/mips32/zardoz
tar -cvzf ./binaries/tgz/zardoz_mips32.tgz -C ./binaries/mips32 . --owner=0 --group=0
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -mod=vendor
file zardoz
mv ./zardoz ./binaries/amd64/zardoz
tar -cvzf ./binaries/tgz/zardoz_amd64.tgz -C ./binaries/amd64 . --owner=0 --group=0

147
classifier.go Normal file
View File

@ -0,0 +1,147 @@
package main
import (
"bytes"
"fmt"
"io/ioutil"
"log"
"net"
"net/http"
)
//Zexpressions is the set of regexp being used by zardoz
var Zexpressions = []string{
`[[:alpha:]]{4,32}`, // alpha digit token
`[ ]([A-Za-z0-9-_]{4,}\.)+\w+`, // domain name
`[ ]/[A-Za-z0-9-_/.]{4,}[ ]`, // URI path (also partial)
`[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}`, // IP address
`[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}`, // UUID
}
func passAndLearn(resp *http.Response) error {
ProxyFlow.response = resp
ProxyFlow.seniority++
req := ProxyFlow.request
switch {
case isAuth(resp):
log.Println("401: We don't want to store credentials")
case isError(resp):
buf := bytes.NewBufferString(BlockMessage)
resp.Body = ioutil.NopCloser(buf)
resp.Status = "403 Forbidden"
resp.StatusCode = 403
resp.Header["Content-Length"] = []string{fmt.Sprint(buf.Len())}
resp.Header.Set("Content-Encoding", "none")
resp.Header.Set("Cache-Control", "no-cache, no-store")
log.Println("Filing inside bad class")
feedRequest(req, "BAD")
ControPlane.StatsTokens <- "DOWNGRADE"
case isSuccess(resp):
log.Println("Filing inside Good Class: ", resp.StatusCode)
feedRequest(req, "GOOD")
}
return nil
}
func blockAndlearn(resp *http.Response) error {
ProxyFlow.response = resp
ProxyFlow.seniority++
req := ProxyFlow.request
buf := bytes.NewBufferString(BlockMessage)
resp.Body = ioutil.NopCloser(buf)
resp.Status = "403 Forbidden"
resp.StatusCode = 403
resp.Header["Content-Length"] = []string{fmt.Sprint(buf.Len())}
resp.Header.Set("Content-Encoding", "none")
resp.Header.Set("Cache-Control", "no-cache, no-store")
switch {
case isAuth(resp):
log.Println("401: We don't want to store credentials")
case isError(resp):
log.Println("Filing inside bad class")
feedRequest(req, "BAD")
case isSuccess(resp):
log.Println("Filing inside Good Class: ", resp.StatusCode)
ControPlane.StatsTokens <- "UPGRADED"
feedRequest(req, "GOOD")
}
return nil
}
func feedRequest(req *http.Request, class string) {
feed := SourceIP(req)
// feed := formatRequest(req)
if class == "BAD" {
log.Println("Feeding BAD token: ", feed)
ControPlane.BadTokens <- feed
}
if class == "GOOD" {
log.Println("Feeding GOOD Token:", feed)
ControPlane.GoodTokens <- feed
}
}
//Unique returns unique elements in the string
func Unique(slice []string) []string {
// create a map with all the values as key
uniqMap := make(map[string]struct{})
for _, v := range slice {
uniqMap[v] = struct{}{}
}
// turn the map keys into a slice
uniqSlice := make([]string, 0, len(uniqMap))
for v := range uniqMap {
uniqSlice = append(uniqSlice, v)
}
return uniqSlice
}
func isSuccess(resp *http.Response) bool {
return resp.StatusCode <= 299
}
func isAuth(resp *http.Response) bool {
return resp.StatusCode == 401
}
func isError(resp *http.Response) bool {
return resp.StatusCode >= 400 && resp.StatusCode != 401
}
//SourceIP returns the source IP of a http call
func SourceIP(req *http.Request) string {
var feed string
if feed = req.Header.Get("X-Forwarded-For"); feed != "" {
log.Println("Got X-Forwarded-For: " + feed)
} else {
feed, _, _ = net.SplitHostPort(req.RemoteAddr)
log.Println("NO X-Forwarded-For, using: "+feed+" from ", req.RemoteAddr)
}
return feed
}

93
file.go Normal file
View File

@ -0,0 +1,93 @@
package main
import (
"encoding/json"
"fmt"
"io"
"log"
"os"
"time"
)
// WriteToFile will print any string of text to a file safely by
// checking for errors and syncing at the end.
func writeToFile(filename string, data string) error {
file, err := os.Create(filename)
if err != nil {
return err
}
defer file.Close()
_, err = io.WriteString(file, data)
if err != nil {
return err
}
return file.Sync()
}
func handlepanic() {
if a := recover(); a != nil {
fmt.Println("OPS!: Recovering from:", a)
}
}
func saveBayesToFile() {
log.Println("Trying to write json file")
defer handlepanic()
dumpfile := os.Getenv("DUMPFILE")
if dumpfile == "" {
dumpfile = "bayes.json"
}
ZClassifier.STATS.busy.Lock()
defer ZClassifier.STATS.busy.Unlock()
statsREPORT, err := json.MarshalIndent(ZClassifier.STATS.stats, "", " ")
if err != nil {
statsREPORT = []byte(err.Error())
}
ZClassifier.Working.busy.Lock()
defer ZClassifier.Working.busy.Unlock()
wScores, err := json.MarshalIndent(ZClassifier.Working.sMap, "", " ")
if err != nil {
wScores = []byte(err.Error())
}
ZClassifier.Learning.busy.Lock()
defer ZClassifier.Learning.busy.Unlock()
lScores, err := json.MarshalIndent(ZClassifier.Learning.sMap, "", " ")
if err != nil {
lScores = []byte(err.Error())
}
report := fmt.Sprintf("STATS: %s\n WORKING: %s\n LEARNING: %s\n", statsREPORT, wScores, lScores)
writeToFile(dumpfile, report)
log.Println(report)
}
func jsonEngine() {
for {
log.Println("Zardoz Seniority: ", ProxyFlow.seniority)
saveBayesToFile()
time.Sleep(1 * time.Minute)
}
}
func init() {
log.Printf("File Engine Starting")
go jsonEngine()
log.Printf("FIle Engine Started")
}

11
go.mod Normal file
View File

@ -0,0 +1,11 @@
module zardoz
go 1.13
require (
github.com/blevesearch/bleve v0.8.1 // indirect
github.com/blevesearch/go-porterstemmer v1.0.2 // indirect
github.com/go-bindata/go-bindata v3.1.2+incompatible // indirect
github.com/jteeuwen/go-bindata v3.0.7+incompatible // indirect
github.com/lytics/multibayes v0.0.0-20161108162840-3457a5582021
)

10
go.sum Normal file
View File

@ -0,0 +1,10 @@
github.com/blevesearch/bleve v0.8.1 h1:20zBREtGe8dvBxCC+717SaxKcUVQOWk3/Fm75vabKpU=
github.com/blevesearch/bleve v0.8.1/go.mod h1:Y2lmIkzV6mcNfAnAdOd+ZxHkHchhBfU/xroGIp61wfw=
github.com/blevesearch/go-porterstemmer v1.0.2 h1:qe7n69gBd1OLY5sHKnxQHIbzn0LNJA4hpAf+5XDxV2I=
github.com/blevesearch/go-porterstemmer v1.0.2/go.mod h1:haWQqFT3RdOGz7PJuM3or/pWNJS1pKkoZJWCkWu0DVA=
github.com/go-bindata/go-bindata v3.1.2+incompatible h1:5vjJMVhowQdPzjE1LdxyFF7YFTXg5IgGVW4gBr5IbvE=
github.com/go-bindata/go-bindata v3.1.2+incompatible/go.mod h1:xK8Dsgwmeed+BBsSy2XTopBn/8uK2HWuGSnA11C3Joo=
github.com/jteeuwen/go-bindata v3.0.7+incompatible h1:91Uy4d9SYVr1kyTJ15wJsog+esAZZl7JmEfTkwmhJts=
github.com/jteeuwen/go-bindata v3.0.7+incompatible/go.mod h1:JVvhzYOiGBnFSYRyV00iY8q7/0PThjIYav1p9h5dmKs=
github.com/lytics/multibayes v0.0.0-20161108162840-3457a5582021 h1:J9Pk5h7TJlqMQtcINI23BUa0+bbxRXPMf7r8gAlfNxo=
github.com/lytics/multibayes v0.0.0-20161108162840-3457a5582021/go.mod h1:lXjTNxya7kn6QNxA3fW8WGYQq0KL/SUcPE9AwcPSgwI=

80
handler.go Normal file
View File

@ -0,0 +1,80 @@
package main
import (
"fmt"
"log"
"math"
"net/http"
"net/http/httputil"
)
func handler(p *httputil.ReverseProxy) func(http.ResponseWriter, *http.Request) {
return func(w http.ResponseWriter, r *http.Request) {
//put the request inside our structure
ProxyFlow.request = r
log.Println("Received HTTP Request")
probs := ZClassifier.Posterior(SourceIP(r))
log.Printf("Posterior Probabilities: %+v\n", probs)
action := quadrant(probs)
ControPlane.StatsTokens <- action
switch action {
case "BLOCK", "BLOCKLEARN":
p.ModifyResponse = blockAndlearn
w.Header().Set("Probabilities", fmt.Sprintf("%v ", probs))
log.Println("Request Blocked")
p.ServeHTTP(w, r)
case "PASS", "PASSLEARN":
p.ModifyResponse = passAndLearn
w.Header().Set("Probabilities", fmt.Sprintf("%v ", probs))
p.ServeHTTP(w, r)
log.Println("Passing Request")
default:
log.Println("No Decision: PASS and LEARN")
p.ModifyResponse = passAndLearn
w.Header().Set("Probabilities", fmt.Sprintf("%v ", probs))
p.ServeHTTP(w, r)
}
}
}
func quadrant(p map[string]float64) string {
sure := math.Abs(p["BAD"]-p["GOOD"]) >= ProxyFlow.sensitivity
badish := p["BAD"] > p["GOOD"]
goodish := p["GOOD"] > p["BAD"]
if ProxyFlow.seniority < Maturity {
log.Println("Seniority too low. Waiting.")
return "PASSLEARN"
}
if sure {
if goodish {
return "PASS"
}
if badish {
return "BLOCK"
}
} else {
if goodish {
return "PASSLEARN"
}
if badish {
return "BLOCKLEARN"
}
}
return "PASSLEARN"
}

125
log.go Normal file
View File

@ -0,0 +1,125 @@
package main
import (
"io/ioutil"
"log"
"os"
"path/filepath"
"time"
)
//Zardozlogfile defines the log structure
type Zardozlogfile struct {
filename string
logfile *os.File
active bool
}
//VSlogfile is the logger we use
var VSlogfile Zardozlogfile
func init() {
verbose := os.Getenv("DEBUG")
log.Println("Verbose mode on: ", verbose)
DebugLog = (verbose == "true")
log.Println("DebugLog: ", DebugLog)
log.Println("Starting Log Engine")
// just the first time
var currentFolder = Hpwd()
os.MkdirAll(filepath.Join(currentFolder, "logs"), 0755)
//
VSlogfile.active = DebugLog
VSlogfile.SetLogFolder()
go VSlogfile.RotateLogFolder()
}
//RotateLogFolder rotates the log folder
func (lf *Zardozlogfile) RotateLogFolder() {
for {
time.Sleep(1 * time.Hour)
if lf.logfile != nil {
err := lf.logfile.Close()
log.Println("[TOOLS][LOG] close logfile returned: ", err)
}
lf.SetLogFolder()
}
}
//SetLogFolder sets the log folder
func (lf *Zardozlogfile) SetLogFolder() {
if DebugLog {
lf.EnableLog()
} else {
lf.DisableLog()
}
if lf.active {
const layout = "2006-01-02.15"
orario := time.Now().UTC()
var currentFolder = Hpwd()
lf.filename = filepath.Join(currentFolder, "logs", "Zardoz."+orario.Format(layout)+"00.log")
lf.logfile, _ = os.Create(lf.filename)
log.Println("[TOOLS][LOG] Logfile is: " + lf.filename)
log.SetOutput(lf.logfile)
// log.SetFlags(log.LstdFlags | log.Lshortfile | log.LUTC)
log.SetFlags(log.LstdFlags | log.LUTC)
} else {
log.SetOutput(ioutil.Discard)
}
}
//EnableLog enables logging
func (lf *Zardozlogfile) EnableLog() {
lf.active = true
}
//DisableLog disables logging
func (lf *Zardozlogfile) DisableLog() {
lf.active = false
log.SetFlags(0)
log.SetOutput(ioutil.Discard)
}
//LogEngineStart just triggers the init for the package, and logs it.
func LogEngineStart() {
log.Println("LogRotation Init")
}
//Hpwd behaves like the unix pwd command, returning the current path
func Hpwd() string {
tmpLoc, err := os.Getwd()
if err != nil {
tmpLoc = "/tmp"
log.Printf("[TOOLS][FS] Problem getting unix pwd: %s", err.Error())
}
return tmpLoc
}

53
main.go Normal file
View File

@ -0,0 +1,53 @@
package main
import (
"log"
"net/http"
"net/http/httputil"
"net/url"
"os"
"strconv"
)
func main() {
vip := os.Getenv("REVERSEURL")
pport := os.Getenv("PROXYPORT")
sensitivity := os.Getenv("TRIGGER")
maturity := os.Getenv("SENIORITY")
collect := os.Getenv("COLLECTION")
log.Println("Reverse path is: ", vip)
log.Println("Reverse port is: ", pport)
remote, err := url.Parse(vip)
if err != nil {
panic(err)
}
ProxyFlow.sensitivity, err = strconv.ParseFloat(sensitivity, 64)
if err != nil {
ProxyFlow.sensitivity = 0.5
}
log.Println("Trigger is: ", ProxyFlow.sensitivity)
Maturity, err = strconv.ParseInt(maturity, 10, 64)
if err != nil {
Maturity = 1024
}
log.Println("Minimum request to learn: ", Maturity)
ProxyFlow.collection, err = strconv.ParseFloat(collect, 64)
if err != nil {
// This is because we assume every example should add at least one token
ProxyFlow.collection = float64(Maturity)
}
log.Println("Collection limit is: ", ProxyFlow.collection)
proxy := httputil.NewSingleHostReverseProxy(remote)
http.HandleFunc("/", handler(proxy))
err = http.ListenAndServe(pport, nil)
if err != nil {
panic(err)
}
}

276
matrix.go Normal file
View File

@ -0,0 +1,276 @@
package main
import (
"bufio"
"log"
"os"
"strings"
"sync"
"time"
)
//ByControlPlane contains all the channels we need.
type ByControlPlane struct {
BadTokens chan string
GoodTokens chan string
StatsTokens chan string
}
type safeClassifier struct {
sMap map[string]string
busy sync.Mutex
}
type safeStats struct {
stats map[string]int64
busy sync.Mutex
}
//ControPlane is the variabile
var ControPlane ByControlPlane
//ByClassifier is the structure containing our Pseudo-Bayes classifier.
type ByClassifier struct {
STATS safeStats
Learning safeClassifier
Working safeClassifier
Generation int64
}
//AddStats adds the statistics after proper blocking.
func (c *ByClassifier) AddStats(action string) {
c.STATS.busy.Lock()
defer c.STATS.busy.Unlock()
if _, exists := c.STATS.stats[action]; exists {
c.STATS.stats[action]++
} else {
c.STATS.stats[action] = 1
}
}
//IsBAD inserts a bad key in the right place.
func (c *ByClassifier) IsBAD(key string) {
log.Println("BAD Received", key)
k := strings.Fields(key)
c.Learning.busy.Lock()
defer c.Learning.busy.Unlock()
for _, tk := range k {
if kind, exists := c.Learning.sMap[tk]; exists {
switch kind {
case "BAD":
log.Println("Word was known as bad:", tk)
case "GOOD":
c.Learning.sMap[tk] = "MEH"
log.Println("So sad, work was known as good", tk)
case "MEH":
log.Println("Word was known as ambiguos:", tk)
}
} else {
c.Learning.sMap[tk] = "BAD"
}
}
log.Println("BAD Learned", key)
}
//IsGOOD inserts the key in the right place.
func (c *ByClassifier) IsGOOD(key string) {
k := strings.Fields(key)
log.Println("GOOD Received", key)
c.Learning.busy.Lock()
defer c.Learning.busy.Unlock()
for _, tk := range k {
if kind, exists := c.Learning.sMap[tk]; exists {
switch kind {
case "GOOD":
log.Println("Word was known as good: ", tk)
case "BAD":
c.Learning.sMap[tk] = "MEH"
log.Println("So sad, work was known as bad: ", tk)
case "MEH":
log.Println("Word was known as ambiguos: ", tk)
}
} else {
c.Learning.sMap[tk] = "GOOD"
}
}
log.Println("GOOD Learned", key)
}
//Posterior calculates Shannon based entropy using bad and good as different distributions
func (c *ByClassifier) Posterior(hdr string) map[string]float64 {
tokens := strings.Fields(hdr)
ff := make(map[string]float64)
if c.Generation == 0 || len(tokens) == 0 {
ff["BAD"] = 0.5
ff["GOOD"] = 0.5
return ff
}
log.Println("Posterior locking the Working Bayesian")
c.Working.busy.Lock()
defer c.Working.busy.Unlock()
var totalGood, totalBad float64
for _, tk := range tokens {
if kind, exists := c.Working.sMap[tk]; exists {
switch kind {
case "BAD":
totalBad++
case "GOOD":
totalGood++
}
}
}
ff["GOOD"] = 1 - (totalBad / float64(len(tokens)))
ff["BAD"] = 1 - (totalGood / float64(len(tokens)))
return ff
}
func (c *ByClassifier) enroll() {
ControPlane.BadTokens = make(chan string, 2048)
ControPlane.GoodTokens = make(chan string, 2048)
ControPlane.StatsTokens = make(chan string, 2048)
c.Generation = 0
c.Learning.sMap = make(map[string]string)
c.Working.sMap = make(map[string]string)
c.STATS.stats = make(map[string]int64)
c.readInitList("blacklist.txt", "BAD")
c.readInitList("whitelist.txt", "GOOD")
go c.readBadTokens()
go c.readGoodTokens()
go c.readStatsTokens()
go c.updateLearners()
log.Println("Classifier populated...")
}
func (c *ByClassifier) readBadTokens() {
log.Println("Start reading BAD tokens")
for token := range ControPlane.BadTokens {
log.Println("Received BAD Token: ", token)
c.IsBAD(token)
}
}
func (c *ByClassifier) readGoodTokens() {
log.Println("Start reading GOOD tokens")
for token := range ControPlane.GoodTokens {
log.Println("Received GOOD Token: ", token)
c.IsGOOD(token)
}
}
func (c *ByClassifier) readStatsTokens() {
log.Println("Start reading STATS tokens")
for token := range ControPlane.StatsTokens {
c.AddStats(token)
}
}
func (c *ByClassifier) readInitList(filePath, class string) {
inFile, err := os.Open(filePath)
if err != nil {
log.Println(err.Error() + `: ` + filePath)
return
}
defer inFile.Close()
scanner := bufio.NewScanner(inFile)
for scanner.Scan() {
if len(scanner.Text()) > 3 {
switch class {
case "BAD":
log.Println("Loading into Blacklist: ", scanner.Text()) // the line
c.IsBAD(scanner.Text())
case "GOOD":
log.Println("Loading into Whitelist: ", scanner.Text()) // the line
c.IsGOOD(scanner.Text())
}
}
}
}
func (c *ByClassifier) updateLearners() {
log.Println("Bayes Updater Start...")
ticker := time.NewTicker(10 * time.Second)
for ; true; <-ticker.C {
var currentGen int64
log.Println("Maturity is:", Maturity)
log.Println("Seniority is:", ProxyFlow.seniority)
if Maturity > 0 {
currentGen = ProxyFlow.seniority / Maturity
} else {
currentGen = 0
}
log.Println("Current Generation is: ", currentGen)
log.Println("Working Generation is: ", c.Generation)
if currentGen > c.Generation || float64(len(c.Learning.sMap)) > ProxyFlow.collection {
c.Learning.busy.Lock()
c.Working.busy.Lock()
c.Working.sMap = c.Learning.sMap
c.Learning.sMap = make(map[string]string)
c.Generation = currentGen
log.Println("Generation Updated to: ", c.Generation)
ControPlane.StatsTokens <- "GENERATION"
c.Learning.busy.Unlock()
c.Working.busy.Unlock()
}
}
}

8
run.sh Executable file
View File

@ -0,0 +1,8 @@
export REVERSEURL=https://google.com
export PROXYPORT=":8089"
export TRIGGER="0.6"
#export SENIORITY="1025"
export SENIORITY="15"
export DEBUG="true"
export DUMPFILE="bayes.json"
./zardoz

202
vendor/github.com/blevesearch/bleve/LICENSE generated vendored Normal file
View File

@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

152
vendor/github.com/blevesearch/bleve/analysis/freq.go generated vendored Normal file
View File

@ -0,0 +1,152 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package analysis
import (
"reflect"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeTokenLocation int
var reflectStaticSizeTokenFreq int
func init() {
var tl TokenLocation
reflectStaticSizeTokenLocation = int(reflect.TypeOf(tl).Size())
var tf TokenFreq
reflectStaticSizeTokenFreq = int(reflect.TypeOf(tf).Size())
}
// TokenLocation represents one occurrence of a term at a particular location in
// a field. Start, End and Position have the same meaning as in analysis.Token.
// Field and ArrayPositions identify the field value in the source document.
// See document.Field for details.
type TokenLocation struct {
Field string
ArrayPositions []uint64
Start int
End int
Position int
}
func (tl *TokenLocation) Size() int {
rv := reflectStaticSizeTokenLocation
rv += len(tl.ArrayPositions) * size.SizeOfUint64
return rv
}
// TokenFreq represents all the occurrences of a term in all fields of a
// document.
type TokenFreq struct {
Term []byte
Locations []*TokenLocation
frequency int
}
func (tf *TokenFreq) Size() int {
rv := reflectStaticSizeTokenFreq
rv += len(tf.Term)
for _, loc := range tf.Locations {
rv += loc.Size()
}
return rv
}
func (tf *TokenFreq) Frequency() int {
return tf.frequency
}
// TokenFrequencies maps document terms to their combined frequencies from all
// fields.
type TokenFrequencies map[string]*TokenFreq
func (tfs TokenFrequencies) Size() int {
rv := size.SizeOfMap
rv += len(tfs) * (size.SizeOfString + size.SizeOfPtr)
for k, v := range tfs {
rv += len(k)
rv += v.Size()
}
return rv
}
func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies) {
// walk the new token frequencies
for tfk, tf := range other {
// set the remoteField value in incoming token freqs
for _, l := range tf.Locations {
l.Field = remoteField
}
existingTf, exists := tfs[tfk]
if exists {
existingTf.Locations = append(existingTf.Locations, tf.Locations...)
existingTf.frequency = existingTf.frequency + tf.frequency
} else {
tfs[tfk] = &TokenFreq{
Term: tf.Term,
frequency: tf.frequency,
Locations: make([]*TokenLocation, len(tf.Locations)),
}
copy(tfs[tfk].Locations, tf.Locations)
}
}
}
func TokenFrequency(tokens TokenStream, arrayPositions []uint64, includeTermVectors bool) TokenFrequencies {
rv := make(map[string]*TokenFreq, len(tokens))
if includeTermVectors {
tls := make([]TokenLocation, len(tokens))
tlNext := 0
for _, token := range tokens {
tls[tlNext] = TokenLocation{
ArrayPositions: arrayPositions,
Start: token.Start,
End: token.End,
Position: token.Position,
}
curr, ok := rv[string(token.Term)]
if ok {
curr.Locations = append(curr.Locations, &tls[tlNext])
curr.frequency++
} else {
rv[string(token.Term)] = &TokenFreq{
Term: token.Term,
Locations: []*TokenLocation{&tls[tlNext]},
frequency: 1,
}
}
tlNext++
}
} else {
for _, token := range tokens {
curr, exists := rv[string(token.Term)]
if exists {
curr.frequency++
} else {
rv[string(token.Term)] = &TokenFreq{
Term: token.Term,
frequency: 1,
}
}
}
}
return rv
}

View File

@ -0,0 +1,7 @@
# full line comment
marty
steve # trailing comment
| different format of comment
dustin
siri | different style trailing comment
multiple words with different whitespace

View File

@ -0,0 +1,84 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package regexp
import (
"fmt"
"regexp"
"strconv"
"github.com/blevesearch/bleve/analysis"
"github.com/blevesearch/bleve/registry"
)
const Name = "regexp"
var IdeographRegexp = regexp.MustCompile(`\p{Han}|\p{Hangul}|\p{Hiragana}|\p{Katakana}`)
type RegexpTokenizer struct {
r *regexp.Regexp
}
func NewRegexpTokenizer(r *regexp.Regexp) *RegexpTokenizer {
return &RegexpTokenizer{
r: r,
}
}
func (rt *RegexpTokenizer) Tokenize(input []byte) analysis.TokenStream {
matches := rt.r.FindAllIndex(input, -1)
rv := make(analysis.TokenStream, 0, len(matches))
for i, match := range matches {
matchBytes := input[match[0]:match[1]]
if match[1]-match[0] > 0 {
token := analysis.Token{
Term: matchBytes,
Start: match[0],
End: match[1],
Position: i + 1,
Type: detectTokenType(matchBytes),
}
rv = append(rv, &token)
}
}
return rv
}
func RegexpTokenizerConstructor(config map[string]interface{}, cache *registry.Cache) (analysis.Tokenizer, error) {
rval, ok := config["regexp"].(string)
if !ok {
return nil, fmt.Errorf("must specify regexp")
}
r, err := regexp.Compile(rval)
if err != nil {
return nil, fmt.Errorf("unable to build regexp tokenizer: %v", err)
}
return NewRegexpTokenizer(r), nil
}
func init() {
registry.RegisterTokenizer(Name, RegexpTokenizerConstructor)
}
func detectTokenType(termBytes []byte) analysis.TokenType {
if IdeographRegexp.Match(termBytes) {
return analysis.Ideographic
}
_, err := strconv.ParseFloat(string(termBytes), 64)
if err == nil {
return analysis.Numeric
}
return analysis.AlphaNumeric
}

View File

@ -0,0 +1,76 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package analysis
import (
"bufio"
"bytes"
"io"
"io/ioutil"
"strings"
)
type TokenMap map[string]bool
func NewTokenMap() TokenMap {
return make(TokenMap, 0)
}
// LoadFile reads in a list of tokens from a text file,
// one per line.
// Comments are supported using `#` or `|`
func (t TokenMap) LoadFile(filename string) error {
data, err := ioutil.ReadFile(filename)
if err != nil {
return err
}
return t.LoadBytes(data)
}
// LoadBytes reads in a list of tokens from memory,
// one per line.
// Comments are supported using `#` or `|`
func (t TokenMap) LoadBytes(data []byte) error {
bytesReader := bytes.NewReader(data)
bufioReader := bufio.NewReader(bytesReader)
line, err := bufioReader.ReadString('\n')
for err == nil {
t.LoadLine(line)
line, err = bufioReader.ReadString('\n')
}
// if the err was EOF we still need to process the last value
if err == io.EOF {
t.LoadLine(line)
return nil
}
return err
}
func (t TokenMap) LoadLine(line string) {
// find the start of a comment, if any
startComment := strings.IndexAny(line, "#|")
if startComment >= 0 {
line = line[:startComment]
}
tokens := strings.Fields(line)
for _, token := range tokens {
t.AddToken(token)
}
}
func (t TokenMap) AddToken(token string) {
t[token] = true
}

103
vendor/github.com/blevesearch/bleve/analysis/type.go generated vendored Normal file
View File

@ -0,0 +1,103 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package analysis
import (
"fmt"
"time"
)
type CharFilter interface {
Filter([]byte) []byte
}
type TokenType int
const (
AlphaNumeric TokenType = iota
Ideographic
Numeric
DateTime
Shingle
Single
Double
Boolean
)
// Token represents one occurrence of a term at a particular location in a
// field.
type Token struct {
// Start specifies the byte offset of the beginning of the term in the
// field.
Start int `json:"start"`
// End specifies the byte offset of the end of the term in the field.
End int `json:"end"`
Term []byte `json:"term"`
// Position specifies the 1-based index of the token in the sequence of
// occurrences of its term in the field.
Position int `json:"position"`
Type TokenType `json:"type"`
KeyWord bool `json:"keyword"`
}
func (t *Token) String() string {
return fmt.Sprintf("Start: %d End: %d Position: %d Token: %s Type: %d", t.Start, t.End, t.Position, string(t.Term), t.Type)
}
type TokenStream []*Token
// A Tokenizer splits an input string into tokens, the usual behaviour being to
// map words to tokens.
type Tokenizer interface {
Tokenize([]byte) TokenStream
}
// A TokenFilter adds, transforms or removes tokens from a token stream.
type TokenFilter interface {
Filter(TokenStream) TokenStream
}
type Analyzer struct {
CharFilters []CharFilter
Tokenizer Tokenizer
TokenFilters []TokenFilter
}
func (a *Analyzer) Analyze(input []byte) TokenStream {
if a.CharFilters != nil {
for _, cf := range a.CharFilters {
input = cf.Filter(input)
}
}
tokens := a.Tokenizer.Tokenize(input)
if a.TokenFilters != nil {
for _, tf := range a.TokenFilters {
tokens = tf.Filter(tokens)
}
}
return tokens
}
var ErrInvalidDateTime = fmt.Errorf("unable to parse datetime with any of the layouts")
type DateTimeParser interface {
ParseDateTime(string) (time.Time, error)
}
type ByteArrayConverter interface {
Convert([]byte) (interface{}, error)
}

92
vendor/github.com/blevesearch/bleve/analysis/util.go generated vendored Normal file
View File

@ -0,0 +1,92 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package analysis
import (
"bytes"
"unicode/utf8"
)
func DeleteRune(in []rune, pos int) []rune {
if pos >= len(in) {
return in
}
copy(in[pos:], in[pos+1:])
return in[:len(in)-1]
}
func InsertRune(in []rune, pos int, r rune) []rune {
// create a new slice 1 rune larger
rv := make([]rune, len(in)+1)
// copy the characters before the insert pos
copy(rv[0:pos], in[0:pos])
// set the inserted rune
rv[pos] = r
// copy the characters after the insert pos
copy(rv[pos+1:], in[pos:])
return rv
}
// BuildTermFromRunesOptimistic will build a term from the provided runes
// AND optimistically attempt to encode into the provided buffer
// if at any point it appears the buffer is too small, a new buffer is
// allocated and that is used instead
// this should be used in cases where frequently the new term is the same
// length or shorter than the original term (in number of bytes)
func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte {
rv := buf
used := 0
for _, r := range runes {
nextLen := utf8.RuneLen(r)
if used+nextLen > len(rv) {
// alloc new buf
buf = make([]byte, len(runes)*utf8.UTFMax)
// copy work we've already done
copy(buf, rv[:used])
rv = buf
}
written := utf8.EncodeRune(rv[used:], r)
used += written
}
return rv[:used]
}
func BuildTermFromRunes(runes []rune) []byte {
return BuildTermFromRunesOptimistic(make([]byte, len(runes)*utf8.UTFMax), runes)
}
func TruncateRunes(input []byte, num int) []byte {
runes := bytes.Runes(input)
runes = runes[:len(runes)-num]
out := BuildTermFromRunes(runes)
return out
}
func RunesEndsWith(input []rune, suffix string) bool {
inputLen := len(input)
suffixRunes := []rune(suffix)
suffixLen := len(suffixRunes)
if suffixLen > inputLen {
return false
}
for i := suffixLen - 1; i >= 0; i-- {
if input[inputLen-(suffixLen-i)] != suffixRunes[i] {
return false
}
}
return true
}

View File

@ -0,0 +1,101 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package document
import (
"fmt"
"reflect"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeDocument int
func init() {
var d Document
reflectStaticSizeDocument = int(reflect.TypeOf(d).Size())
}
type Document struct {
ID string `json:"id"`
Fields []Field `json:"fields"`
CompositeFields []*CompositeField
}
func NewDocument(id string) *Document {
return &Document{
ID: id,
Fields: make([]Field, 0),
CompositeFields: make([]*CompositeField, 0),
}
}
func (d *Document) Size() int {
sizeInBytes := reflectStaticSizeDocument + size.SizeOfPtr +
len(d.ID)
for _, entry := range d.Fields {
sizeInBytes += entry.Size()
}
for _, entry := range d.CompositeFields {
sizeInBytes += entry.Size()
}
return sizeInBytes
}
func (d *Document) AddField(f Field) *Document {
switch f := f.(type) {
case *CompositeField:
d.CompositeFields = append(d.CompositeFields, f)
default:
d.Fields = append(d.Fields, f)
}
return d
}
func (d *Document) GoString() string {
fields := ""
for i, field := range d.Fields {
if i != 0 {
fields += ", "
}
fields += fmt.Sprintf("%#v", field)
}
compositeFields := ""
for i, field := range d.CompositeFields {
if i != 0 {
compositeFields += ", "
}
compositeFields += fmt.Sprintf("%#v", field)
}
return fmt.Sprintf("&document.Document{ID:%s, Fields: %s, CompositeFields: %s}", d.ID, fields, compositeFields)
}
func (d *Document) NumPlainTextBytes() uint64 {
rv := uint64(0)
for _, field := range d.Fields {
rv += field.NumPlainTextBytes()
}
for _, compositeField := range d.CompositeFields {
for _, field := range d.Fields {
if compositeField.includesField(field.Name()) {
rv += field.NumPlainTextBytes()
}
}
}
return rv
}

41
vendor/github.com/blevesearch/bleve/document/field.go generated vendored Normal file
View File

@ -0,0 +1,41 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package document
import (
"github.com/blevesearch/bleve/analysis"
)
type Field interface {
// Name returns the path of the field from the root DocumentMapping.
// A root field path is "field", a subdocument field is "parent.field".
Name() string
// ArrayPositions returns the intermediate document and field indices
// required to resolve the field value in the document. For example, if the
// field path is "doc1.doc2.field" where doc1 and doc2 are slices or
// arrays, ArrayPositions returns 2 indices used to resolve "doc2" value in
// "doc1", then "field" in "doc2".
ArrayPositions() []uint64
Options() IndexingOptions
Analyze() (int, analysis.TokenFrequencies)
Value() []byte
// NumPlainTextBytes should return the number of plain text bytes
// that this field represents - this is a common metric for tracking
// the rate of indexing
NumPlainTextBytes() uint64
Size() int
}

View File

@ -0,0 +1,123 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package document
import (
"fmt"
"reflect"
"github.com/blevesearch/bleve/analysis"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeBooleanField int
func init() {
var f BooleanField
reflectStaticSizeBooleanField = int(reflect.TypeOf(f).Size())
}
const DefaultBooleanIndexingOptions = StoreField | IndexField | DocValues
type BooleanField struct {
name string
arrayPositions []uint64
options IndexingOptions
value []byte
numPlainTextBytes uint64
}
func (b *BooleanField) Size() int {
return reflectStaticSizeBooleanField + size.SizeOfPtr +
len(b.name) +
len(b.arrayPositions)*size.SizeOfUint64 +
len(b.value)
}
func (b *BooleanField) Name() string {
return b.name
}
func (b *BooleanField) ArrayPositions() []uint64 {
return b.arrayPositions
}
func (b *BooleanField) Options() IndexingOptions {
return b.options
}
func (b *BooleanField) Analyze() (int, analysis.TokenFrequencies) {
tokens := make(analysis.TokenStream, 0)
tokens = append(tokens, &analysis.Token{
Start: 0,
End: len(b.value),
Term: b.value,
Position: 1,
Type: analysis.Boolean,
})
fieldLength := len(tokens)
tokenFreqs := analysis.TokenFrequency(tokens, b.arrayPositions, b.options.IncludeTermVectors())
return fieldLength, tokenFreqs
}
func (b *BooleanField) Value() []byte {
return b.value
}
func (b *BooleanField) Boolean() (bool, error) {
if len(b.value) == 1 {
return b.value[0] == 'T', nil
}
return false, fmt.Errorf("boolean field has %d bytes", len(b.value))
}
func (b *BooleanField) GoString() string {
return fmt.Sprintf("&document.BooleanField{Name:%s, Options: %s, Value: %s}", b.name, b.options, b.value)
}
func (b *BooleanField) NumPlainTextBytes() uint64 {
return b.numPlainTextBytes
}
func NewBooleanFieldFromBytes(name string, arrayPositions []uint64, value []byte) *BooleanField {
return &BooleanField{
name: name,
arrayPositions: arrayPositions,
value: value,
options: DefaultNumericIndexingOptions,
numPlainTextBytes: uint64(len(value)),
}
}
func NewBooleanField(name string, arrayPositions []uint64, b bool) *BooleanField {
return NewBooleanFieldWithIndexingOptions(name, arrayPositions, b, DefaultNumericIndexingOptions)
}
func NewBooleanFieldWithIndexingOptions(name string, arrayPositions []uint64, b bool, options IndexingOptions) *BooleanField {
numPlainTextBytes := 5
v := []byte("F")
if b {
numPlainTextBytes = 4
v = []byte("T")
}
return &BooleanField{
name: name,
arrayPositions: arrayPositions,
value: v,
options: options,
numPlainTextBytes: uint64(numPlainTextBytes),
}
}

View File

@ -0,0 +1,124 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package document
import (
"reflect"
"github.com/blevesearch/bleve/analysis"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeCompositeField int
func init() {
var cf CompositeField
reflectStaticSizeCompositeField = int(reflect.TypeOf(cf).Size())
}
const DefaultCompositeIndexingOptions = IndexField
type CompositeField struct {
name string
includedFields map[string]bool
excludedFields map[string]bool
defaultInclude bool
options IndexingOptions
totalLength int
compositeFrequencies analysis.TokenFrequencies
}
func NewCompositeField(name string, defaultInclude bool, include []string, exclude []string) *CompositeField {
return NewCompositeFieldWithIndexingOptions(name, defaultInclude, include, exclude, DefaultCompositeIndexingOptions)
}
func NewCompositeFieldWithIndexingOptions(name string, defaultInclude bool, include []string, exclude []string, options IndexingOptions) *CompositeField {
rv := &CompositeField{
name: name,
options: options,
defaultInclude: defaultInclude,
includedFields: make(map[string]bool, len(include)),
excludedFields: make(map[string]bool, len(exclude)),
compositeFrequencies: make(analysis.TokenFrequencies),
}
for _, i := range include {
rv.includedFields[i] = true
}
for _, e := range exclude {
rv.excludedFields[e] = true
}
return rv
}
func (c *CompositeField) Size() int {
sizeInBytes := reflectStaticSizeCompositeField + size.SizeOfPtr +
len(c.name)
for k, _ := range c.includedFields {
sizeInBytes += size.SizeOfString + len(k) + size.SizeOfBool
}
for k, _ := range c.excludedFields {
sizeInBytes += size.SizeOfString + len(k) + size.SizeOfBool
}
return sizeInBytes
}
func (c *CompositeField) Name() string {
return c.name
}
func (c *CompositeField) ArrayPositions() []uint64 {
return []uint64{}
}
func (c *CompositeField) Options() IndexingOptions {
return c.options
}
func (c *CompositeField) Analyze() (int, analysis.TokenFrequencies) {
return c.totalLength, c.compositeFrequencies
}
func (c *CompositeField) Value() []byte {
return []byte{}
}
func (c *CompositeField) NumPlainTextBytes() uint64 {
return 0
}
func (c *CompositeField) includesField(field string) bool {
shouldInclude := c.defaultInclude
_, fieldShouldBeIncluded := c.includedFields[field]
if fieldShouldBeIncluded {
shouldInclude = true
}
_, fieldShouldBeExcluded := c.excludedFields[field]
if fieldShouldBeExcluded {
shouldInclude = false
}
return shouldInclude
}
func (c *CompositeField) Compose(field string, length int, freq analysis.TokenFrequencies) {
if c.includesField(field) {
c.totalLength += length
c.compositeFrequencies.MergeAll(field, freq)
}
}

View File

@ -0,0 +1,159 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package document
import (
"fmt"
"math"
"reflect"
"time"
"github.com/blevesearch/bleve/analysis"
"github.com/blevesearch/bleve/numeric"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeDateTimeField int
func init() {
var f DateTimeField
reflectStaticSizeDateTimeField = int(reflect.TypeOf(f).Size())
}
const DefaultDateTimeIndexingOptions = StoreField | IndexField | DocValues
const DefaultDateTimePrecisionStep uint = 4
var MinTimeRepresentable = time.Unix(0, math.MinInt64)
var MaxTimeRepresentable = time.Unix(0, math.MaxInt64)
type DateTimeField struct {
name string
arrayPositions []uint64
options IndexingOptions
value numeric.PrefixCoded
numPlainTextBytes uint64
}
func (n *DateTimeField) Size() int {
return reflectStaticSizeDateTimeField + size.SizeOfPtr +
len(n.name) +
len(n.arrayPositions)*size.SizeOfUint64
}
func (n *DateTimeField) Name() string {
return n.name
}
func (n *DateTimeField) ArrayPositions() []uint64 {
return n.arrayPositions
}
func (n *DateTimeField) Options() IndexingOptions {
return n.options
}
func (n *DateTimeField) Analyze() (int, analysis.TokenFrequencies) {
tokens := make(analysis.TokenStream, 0)
tokens = append(tokens, &analysis.Token{
Start: 0,
End: len(n.value),
Term: n.value,
Position: 1,
Type: analysis.DateTime,
})
original, err := n.value.Int64()
if err == nil {
shift := DefaultDateTimePrecisionStep
for shift < 64 {
shiftEncoded, err := numeric.NewPrefixCodedInt64(original, shift)
if err != nil {
break
}
token := analysis.Token{
Start: 0,
End: len(shiftEncoded),
Term: shiftEncoded,
Position: 1,
Type: analysis.DateTime,
}
tokens = append(tokens, &token)
shift += DefaultDateTimePrecisionStep
}
}
fieldLength := len(tokens)
tokenFreqs := analysis.TokenFrequency(tokens, n.arrayPositions, n.options.IncludeTermVectors())
return fieldLength, tokenFreqs
}
func (n *DateTimeField) Value() []byte {
return n.value
}
func (n *DateTimeField) DateTime() (time.Time, error) {
i64, err := n.value.Int64()
if err != nil {
return time.Time{}, err
}
return time.Unix(0, i64).UTC(), nil
}
func (n *DateTimeField) GoString() string {
return fmt.Sprintf("&document.DateField{Name:%s, Options: %s, Value: %s}", n.name, n.options, n.value)
}
func (n *DateTimeField) NumPlainTextBytes() uint64 {
return n.numPlainTextBytes
}
func NewDateTimeFieldFromBytes(name string, arrayPositions []uint64, value []byte) *DateTimeField {
return &DateTimeField{
name: name,
arrayPositions: arrayPositions,
value: value,
options: DefaultDateTimeIndexingOptions,
numPlainTextBytes: uint64(len(value)),
}
}
func NewDateTimeField(name string, arrayPositions []uint64, dt time.Time) (*DateTimeField, error) {
return NewDateTimeFieldWithIndexingOptions(name, arrayPositions, dt, DefaultDateTimeIndexingOptions)
}
func NewDateTimeFieldWithIndexingOptions(name string, arrayPositions []uint64, dt time.Time, options IndexingOptions) (*DateTimeField, error) {
if canRepresent(dt) {
dtInt64 := dt.UnixNano()
prefixCoded := numeric.MustNewPrefixCodedInt64(dtInt64, 0)
return &DateTimeField{
name: name,
arrayPositions: arrayPositions,
value: prefixCoded,
options: options,
// not correct, just a place holder until we revisit how fields are
// represented and can fix this better
numPlainTextBytes: uint64(8),
}, nil
}
return nil, fmt.Errorf("cannot represent %s in this type", dt)
}
func canRepresent(dt time.Time) bool {
if dt.Before(MinTimeRepresentable) || dt.After(MaxTimeRepresentable) {
return false
}
return true
}

View File

@ -0,0 +1,152 @@
// Copyright (c) 2017 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package document
import (
"fmt"
"reflect"
"github.com/blevesearch/bleve/analysis"
"github.com/blevesearch/bleve/geo"
"github.com/blevesearch/bleve/numeric"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeGeoPointField int
func init() {
var f GeoPointField
reflectStaticSizeGeoPointField = int(reflect.TypeOf(f).Size())
}
var GeoPrecisionStep uint = 9
type GeoPointField struct {
name string
arrayPositions []uint64
options IndexingOptions
value numeric.PrefixCoded
numPlainTextBytes uint64
}
func (n *GeoPointField) Size() int {
return reflectStaticSizeGeoPointField + size.SizeOfPtr +
len(n.name) +
len(n.arrayPositions)*size.SizeOfUint64
}
func (n *GeoPointField) Name() string {
return n.name
}
func (n *GeoPointField) ArrayPositions() []uint64 {
return n.arrayPositions
}
func (n *GeoPointField) Options() IndexingOptions {
return n.options
}
func (n *GeoPointField) Analyze() (int, analysis.TokenFrequencies) {
tokens := make(analysis.TokenStream, 0)
tokens = append(tokens, &analysis.Token{
Start: 0,
End: len(n.value),
Term: n.value,
Position: 1,
Type: analysis.Numeric,
})
original, err := n.value.Int64()
if err == nil {
shift := GeoPrecisionStep
for shift < 64 {
shiftEncoded, err := numeric.NewPrefixCodedInt64(original, shift)
if err != nil {
break
}
token := analysis.Token{
Start: 0,
End: len(shiftEncoded),
Term: shiftEncoded,
Position: 1,
Type: analysis.Numeric,
}
tokens = append(tokens, &token)
shift += GeoPrecisionStep
}
}
fieldLength := len(tokens)
tokenFreqs := analysis.TokenFrequency(tokens, n.arrayPositions, n.options.IncludeTermVectors())
return fieldLength, tokenFreqs
}
func (n *GeoPointField) Value() []byte {
return n.value
}
func (n *GeoPointField) Lon() (float64, error) {
i64, err := n.value.Int64()
if err != nil {
return 0.0, err
}
return geo.MortonUnhashLon(uint64(i64)), nil
}
func (n *GeoPointField) Lat() (float64, error) {
i64, err := n.value.Int64()
if err != nil {
return 0.0, err
}
return geo.MortonUnhashLat(uint64(i64)), nil
}
func (n *GeoPointField) GoString() string {
return fmt.Sprintf("&document.GeoPointField{Name:%s, Options: %s, Value: %s}", n.name, n.options, n.value)
}
func (n *GeoPointField) NumPlainTextBytes() uint64 {
return n.numPlainTextBytes
}
func NewGeoPointFieldFromBytes(name string, arrayPositions []uint64, value []byte) *GeoPointField {
return &GeoPointField{
name: name,
arrayPositions: arrayPositions,
value: value,
options: DefaultNumericIndexingOptions,
numPlainTextBytes: uint64(len(value)),
}
}
func NewGeoPointField(name string, arrayPositions []uint64, lon, lat float64) *GeoPointField {
return NewGeoPointFieldWithIndexingOptions(name, arrayPositions, lon, lat, DefaultNumericIndexingOptions)
}
func NewGeoPointFieldWithIndexingOptions(name string, arrayPositions []uint64, lon, lat float64, options IndexingOptions) *GeoPointField {
mhash := geo.MortonHash(lon, lat)
prefixCoded := numeric.MustNewPrefixCodedInt64(int64(mhash), 0)
return &GeoPointField{
name: name,
arrayPositions: arrayPositions,
value: prefixCoded,
options: options,
// not correct, just a place holder until we revisit how fields are
// represented and can fix this better
numPlainTextBytes: uint64(8),
}
}

View File

@ -0,0 +1,145 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package document
import (
"fmt"
"reflect"
"github.com/blevesearch/bleve/analysis"
"github.com/blevesearch/bleve/numeric"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeNumericField int
func init() {
var f NumericField
reflectStaticSizeNumericField = int(reflect.TypeOf(f).Size())
}
const DefaultNumericIndexingOptions = StoreField | IndexField | DocValues
const DefaultPrecisionStep uint = 4
type NumericField struct {
name string
arrayPositions []uint64
options IndexingOptions
value numeric.PrefixCoded
numPlainTextBytes uint64
}
func (n *NumericField) Size() int {
return reflectStaticSizeNumericField + size.SizeOfPtr +
len(n.name) +
len(n.arrayPositions)*size.SizeOfPtr
}
func (n *NumericField) Name() string {
return n.name
}
func (n *NumericField) ArrayPositions() []uint64 {
return n.arrayPositions
}
func (n *NumericField) Options() IndexingOptions {
return n.options
}
func (n *NumericField) Analyze() (int, analysis.TokenFrequencies) {
tokens := make(analysis.TokenStream, 0)
tokens = append(tokens, &analysis.Token{
Start: 0,
End: len(n.value),
Term: n.value,
Position: 1,
Type: analysis.Numeric,
})
original, err := n.value.Int64()
if err == nil {
shift := DefaultPrecisionStep
for shift < 64 {
shiftEncoded, err := numeric.NewPrefixCodedInt64(original, shift)
if err != nil {
break
}
token := analysis.Token{
Start: 0,
End: len(shiftEncoded),
Term: shiftEncoded,
Position: 1,
Type: analysis.Numeric,
}
tokens = append(tokens, &token)
shift += DefaultPrecisionStep
}
}
fieldLength := len(tokens)
tokenFreqs := analysis.TokenFrequency(tokens, n.arrayPositions, n.options.IncludeTermVectors())
return fieldLength, tokenFreqs
}
func (n *NumericField) Value() []byte {
return n.value
}
func (n *NumericField) Number() (float64, error) {
i64, err := n.value.Int64()
if err != nil {
return 0.0, err
}
return numeric.Int64ToFloat64(i64), nil
}
func (n *NumericField) GoString() string {
return fmt.Sprintf("&document.NumericField{Name:%s, Options: %s, Value: %s}", n.name, n.options, n.value)
}
func (n *NumericField) NumPlainTextBytes() uint64 {
return n.numPlainTextBytes
}
func NewNumericFieldFromBytes(name string, arrayPositions []uint64, value []byte) *NumericField {
return &NumericField{
name: name,
arrayPositions: arrayPositions,
value: value,
options: DefaultNumericIndexingOptions,
numPlainTextBytes: uint64(len(value)),
}
}
func NewNumericField(name string, arrayPositions []uint64, number float64) *NumericField {
return NewNumericFieldWithIndexingOptions(name, arrayPositions, number, DefaultNumericIndexingOptions)
}
func NewNumericFieldWithIndexingOptions(name string, arrayPositions []uint64, number float64, options IndexingOptions) *NumericField {
numberInt64 := numeric.Float64ToInt64(number)
prefixCoded := numeric.MustNewPrefixCodedInt64(numberInt64, 0)
return &NumericField{
name: name,
arrayPositions: arrayPositions,
value: prefixCoded,
options: options,
// not correct, just a place holder until we revisit how fields are
// represented and can fix this better
numPlainTextBytes: uint64(8),
}
}

View File

@ -0,0 +1,139 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package document
import (
"fmt"
"reflect"
"github.com/blevesearch/bleve/analysis"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeTextField int
func init() {
var f TextField
reflectStaticSizeTextField = int(reflect.TypeOf(f).Size())
}
const DefaultTextIndexingOptions = IndexField | DocValues
type TextField struct {
name string
arrayPositions []uint64
options IndexingOptions
analyzer *analysis.Analyzer
value []byte
numPlainTextBytes uint64
}
func (t *TextField) Size() int {
return reflectStaticSizeTextField + size.SizeOfPtr +
len(t.name) +
len(t.arrayPositions)*size.SizeOfUint64 +
len(t.value)
}
func (t *TextField) Name() string {
return t.name
}
func (t *TextField) ArrayPositions() []uint64 {
return t.arrayPositions
}
func (t *TextField) Options() IndexingOptions {
return t.options
}
func (t *TextField) Analyze() (int, analysis.TokenFrequencies) {
var tokens analysis.TokenStream
if t.analyzer != nil {
bytesToAnalyze := t.Value()
if t.options.IsStored() {
// need to copy
bytesCopied := make([]byte, len(bytesToAnalyze))
copy(bytesCopied, bytesToAnalyze)
bytesToAnalyze = bytesCopied
}
tokens = t.analyzer.Analyze(bytesToAnalyze)
} else {
tokens = analysis.TokenStream{
&analysis.Token{
Start: 0,
End: len(t.value),
Term: t.value,
Position: 1,
Type: analysis.AlphaNumeric,
},
}
}
fieldLength := len(tokens) // number of tokens in this doc field
tokenFreqs := analysis.TokenFrequency(tokens, t.arrayPositions, t.options.IncludeTermVectors())
return fieldLength, tokenFreqs
}
func (t *TextField) Analyzer() *analysis.Analyzer {
return t.analyzer
}
func (t *TextField) Value() []byte {
return t.value
}
func (t *TextField) GoString() string {
return fmt.Sprintf("&document.TextField{Name:%s, Options: %s, Analyzer: %v, Value: %s, ArrayPositions: %v}", t.name, t.options, t.analyzer, t.value, t.arrayPositions)
}
func (t *TextField) NumPlainTextBytes() uint64 {
return t.numPlainTextBytes
}
func NewTextField(name string, arrayPositions []uint64, value []byte) *TextField {
return NewTextFieldWithIndexingOptions(name, arrayPositions, value, DefaultTextIndexingOptions)
}
func NewTextFieldWithIndexingOptions(name string, arrayPositions []uint64, value []byte, options IndexingOptions) *TextField {
return &TextField{
name: name,
arrayPositions: arrayPositions,
options: options,
value: value,
numPlainTextBytes: uint64(len(value)),
}
}
func NewTextFieldWithAnalyzer(name string, arrayPositions []uint64, value []byte, analyzer *analysis.Analyzer) *TextField {
return &TextField{
name: name,
arrayPositions: arrayPositions,
options: DefaultTextIndexingOptions,
analyzer: analyzer,
value: value,
numPlainTextBytes: uint64(len(value)),
}
}
func NewTextFieldCustom(name string, arrayPositions []uint64, value []byte, options IndexingOptions, analyzer *analysis.Analyzer) *TextField {
return &TextField{
name: name,
arrayPositions: arrayPositions,
options: options,
analyzer: analyzer,
value: value,
numPlainTextBytes: uint64(len(value)),
}
}

View File

@ -0,0 +1,66 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package document
type IndexingOptions int
const (
IndexField IndexingOptions = 1 << iota
StoreField
IncludeTermVectors
DocValues
)
func (o IndexingOptions) IsIndexed() bool {
return o&IndexField != 0
}
func (o IndexingOptions) IsStored() bool {
return o&StoreField != 0
}
func (o IndexingOptions) IncludeTermVectors() bool {
return o&IncludeTermVectors != 0
}
func (o IndexingOptions) IncludeDocValues() bool {
return o&DocValues != 0
}
func (o IndexingOptions) String() string {
rv := ""
if o.IsIndexed() {
rv += "INDEXED"
}
if o.IsStored() {
if rv != "" {
rv += ", "
}
rv += "STORE"
}
if o.IncludeTermVectors() {
if rv != "" {
rv += ", "
}
rv += "TV"
}
if o.IncludeDocValues() {
if rv != "" {
rv += ", "
}
rv += "DV"
}
return rv
}

9
vendor/github.com/blevesearch/bleve/geo/README.md generated vendored Normal file
View File

@ -0,0 +1,9 @@
# geo support in bleve
First, all of this geo code is a Go adaptation of the [Lucene 5.3.2 sandbox geo support](https://lucene.apache.org/core/5_3_2/sandbox/org/apache/lucene/util/package-summary.html).
## Notes
- All of the APIs will use float64 for lon/lat values.
- When describing a point in function arguments or return values, we always use the order lon, lat.
- High level APIs will use TopLeft and BottomRight to describe bounding boxes. This may not map cleanly to min/max lon/lat when crossing the dateline. The lower level APIs will use min/max lon/lat and require the higher-level code to split boxes accordingly.

208
vendor/github.com/blevesearch/bleve/geo/geo.go generated vendored Normal file
View File

@ -0,0 +1,208 @@
// Copyright (c) 2017 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package geo
import (
"fmt"
"math"
"github.com/blevesearch/bleve/numeric"
)
// GeoBits is the number of bits used for a single geo point
// Currently this is 32bits for lon and 32bits for lat
var GeoBits uint = 32
var minLon = -180.0
var minLat = -90.0
var maxLon = 180.0
var maxLat = 90.0
var minLonRad = minLon * degreesToRadian
var minLatRad = minLat * degreesToRadian
var maxLonRad = maxLon * degreesToRadian
var maxLatRad = maxLat * degreesToRadian
var geoTolerance = 1E-6
var lonScale = float64((uint64(0x1)<<GeoBits)-1) / 360.0
var latScale = float64((uint64(0x1)<<GeoBits)-1) / 180.0
// Point represents a geo point.
type Point struct {
Lon float64
Lat float64
}
// MortonHash computes the morton hash value for the provided geo point
// This point is ordered as lon, lat.
func MortonHash(lon, lat float64) uint64 {
return numeric.Interleave(scaleLon(lon), scaleLat(lat))
}
func scaleLon(lon float64) uint64 {
rv := uint64((lon - minLon) * lonScale)
return rv
}
func scaleLat(lat float64) uint64 {
rv := uint64((lat - minLat) * latScale)
return rv
}
// MortonUnhashLon extracts the longitude value from the provided morton hash.
func MortonUnhashLon(hash uint64) float64 {
return unscaleLon(numeric.Deinterleave(hash))
}
// MortonUnhashLat extracts the latitude value from the provided morton hash.
func MortonUnhashLat(hash uint64) float64 {
return unscaleLat(numeric.Deinterleave(hash >> 1))
}
func unscaleLon(lon uint64) float64 {
return (float64(lon) / lonScale) + minLon
}
func unscaleLat(lat uint64) float64 {
return (float64(lat) / latScale) + minLat
}
// compareGeo will compare two float values and see if they are the same
// taking into consideration a known geo tolerance.
func compareGeo(a, b float64) float64 {
compare := a - b
if math.Abs(compare) <= geoTolerance {
return 0
}
return compare
}
// RectIntersects checks whether rectangles a and b intersect
func RectIntersects(aMinX, aMinY, aMaxX, aMaxY, bMinX, bMinY, bMaxX, bMaxY float64) bool {
return !(aMaxX < bMinX || aMinX > bMaxX || aMaxY < bMinY || aMinY > bMaxY)
}
// RectWithin checks whether box a is within box b
func RectWithin(aMinX, aMinY, aMaxX, aMaxY, bMinX, bMinY, bMaxX, bMaxY float64) bool {
rv := !(aMinX < bMinX || aMinY < bMinY || aMaxX > bMaxX || aMaxY > bMaxY)
return rv
}
// BoundingBoxContains checks whether the lon/lat point is within the box
func BoundingBoxContains(lon, lat, minLon, minLat, maxLon, maxLat float64) bool {
return compareGeo(lon, minLon) >= 0 && compareGeo(lon, maxLon) <= 0 &&
compareGeo(lat, minLat) >= 0 && compareGeo(lat, maxLat) <= 0
}
const degreesToRadian = math.Pi / 180
const radiansToDegrees = 180 / math.Pi
// DegreesToRadians converts an angle in degrees to radians
func DegreesToRadians(d float64) float64 {
return d * degreesToRadian
}
// RadiansToDegrees converts an angle in radians to degress
func RadiansToDegrees(r float64) float64 {
return r * radiansToDegrees
}
var earthMeanRadiusMeters = 6371008.7714
func RectFromPointDistance(lon, lat, dist float64) (float64, float64, float64, float64, error) {
err := checkLongitude(lon)
if err != nil {
return 0, 0, 0, 0, err
}
err = checkLatitude(lat)
if err != nil {
return 0, 0, 0, 0, err
}
radLon := DegreesToRadians(lon)
radLat := DegreesToRadians(lat)
radDistance := (dist + 7e-2) / earthMeanRadiusMeters
minLatL := radLat - radDistance
maxLatL := radLat + radDistance
var minLonL, maxLonL float64
if minLatL > minLatRad && maxLatL < maxLatRad {
deltaLon := asin(sin(radDistance) / cos(radLat))
minLonL = radLon - deltaLon
if minLonL < minLonRad {
minLonL += 2 * math.Pi
}
maxLonL = radLon + deltaLon
if maxLonL > maxLonRad {
maxLonL -= 2 * math.Pi
}
} else {
// pole is inside distance
minLatL = math.Max(minLatL, minLatRad)
maxLatL = math.Min(maxLatL, maxLatRad)
minLonL = minLonRad
maxLonL = maxLonRad
}
return RadiansToDegrees(minLonL),
RadiansToDegrees(maxLatL),
RadiansToDegrees(maxLonL),
RadiansToDegrees(minLatL),
nil
}
func checkLatitude(latitude float64) error {
if math.IsNaN(latitude) || latitude < minLat || latitude > maxLat {
return fmt.Errorf("invalid latitude %f; must be between %f and %f", latitude, minLat, maxLat)
}
return nil
}
func checkLongitude(longitude float64) error {
if math.IsNaN(longitude) || longitude < minLon || longitude > maxLon {
return fmt.Errorf("invalid longitude %f; must be between %f and %f", longitude, minLon, maxLon)
}
return nil
}
func BoundingRectangleForPolygon(polygon []Point) (
float64, float64, float64, float64, error) {
err := checkLongitude(polygon[0].Lon)
if err != nil {
return 0, 0, 0, 0, err
}
err = checkLatitude(polygon[0].Lat)
if err != nil {
return 0, 0, 0, 0, err
}
maxY, minY := polygon[0].Lat, polygon[0].Lat
maxX, minX := polygon[0].Lon, polygon[0].Lon
for i := 1; i < len(polygon); i++ {
err := checkLongitude(polygon[i].Lon)
if err != nil {
return 0, 0, 0, 0, err
}
err = checkLatitude(polygon[i].Lat)
if err != nil {
return 0, 0, 0, 0, err
}
maxY = math.Max(maxY, polygon[i].Lat)
minY = math.Min(minY, polygon[i].Lat)
maxX = math.Max(maxX, polygon[i].Lon)
minX = math.Min(minX, polygon[i].Lon)
}
return minX, maxY, maxX, minY, nil
}

98
vendor/github.com/blevesearch/bleve/geo/geo_dist.go generated vendored Normal file
View File

@ -0,0 +1,98 @@
// Copyright (c) 2017 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package geo
import (
"fmt"
"math"
"strconv"
"strings"
)
type distanceUnit struct {
conv float64
suffixes []string
}
var inch = distanceUnit{0.0254, []string{"in", "inch"}}
var yard = distanceUnit{0.9144, []string{"yd", "yards"}}
var feet = distanceUnit{0.3048, []string{"ft", "feet"}}
var kilom = distanceUnit{1000, []string{"km", "kilometers"}}
var nauticalm = distanceUnit{1852.0, []string{"nm", "nauticalmiles"}}
var millim = distanceUnit{0.001, []string{"mm", "millimeters"}}
var centim = distanceUnit{0.01, []string{"cm", "centimeters"}}
var miles = distanceUnit{1609.344, []string{"mi", "miles"}}
var meters = distanceUnit{1, []string{"m", "meters"}}
var distanceUnits = []*distanceUnit{
&inch, &yard, &feet, &kilom, &nauticalm, &millim, &centim, &miles, &meters,
}
// ParseDistance attempts to parse a distance string and return distance in
// meters. Example formats supported:
// "5in" "5inch" "7yd" "7yards" "9ft" "9feet" "11km" "11kilometers"
// "3nm" "3nauticalmiles" "13mm" "13millimeters" "15cm" "15centimeters"
// "17mi" "17miles" "19m" "19meters"
// If the unit cannot be determined, the entire string is parsed and the
// unit of meters is assumed.
// If the number portion cannot be parsed, 0 and the parse error are returned.
func ParseDistance(d string) (float64, error) {
for _, unit := range distanceUnits {
for _, unitSuffix := range unit.suffixes {
if strings.HasSuffix(d, unitSuffix) {
parsedNum, err := strconv.ParseFloat(d[0:len(d)-len(unitSuffix)], 64)
if err != nil {
return 0, err
}
return parsedNum * unit.conv, nil
}
}
}
// no unit matched, try assuming meters?
parsedNum, err := strconv.ParseFloat(d, 64)
if err != nil {
return 0, err
}
return parsedNum, nil
}
// ParseDistanceUnit attempts to parse a distance unit and return the
// multiplier for converting this to meters. If the unit cannot be parsed
// then 0 and the error message is returned.
func ParseDistanceUnit(u string) (float64, error) {
for _, unit := range distanceUnits {
for _, unitSuffix := range unit.suffixes {
if u == unitSuffix {
return unit.conv, nil
}
}
}
return 0, fmt.Errorf("unknown distance unit: %s", u)
}
// Haversin computes the distance between two points.
// This implemenation uses the sloppy math implemenations which trade off
// accuracy for performance. The distance returned is in kilometers.
func Haversin(lon1, lat1, lon2, lat2 float64) float64 {
x1 := lat1 * degreesToRadian
x2 := lat2 * degreesToRadian
h1 := 1 - cos(x1-x2)
h2 := 1 - cos((lon1-lon2)*degreesToRadian)
h := (h1 + cos(x1)*cos(x2)*h2) / 2
avgLat := (x1 + x2) / 2
diameter := earthDiameter(avgLat)
return diameter * asin(math.Min(1, math.Sqrt(h)))
}

111
vendor/github.com/blevesearch/bleve/geo/geohash.go generated vendored Normal file
View File

@ -0,0 +1,111 @@
// Copyright (c) 2019 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// This implementation is inspired from the geohash-js
// ref: https://github.com/davetroy/geohash-js
package geo
// encoding encapsulates an encoding defined by a given base32 alphabet.
type encoding struct {
enc string
dec [256]byte
}
// newEncoding constructs a new encoding defined by the given alphabet,
// which must be a 32-byte string.
func newEncoding(encoder string) *encoding {
e := new(encoding)
e.enc = encoder
for i := 0; i < len(e.dec); i++ {
e.dec[i] = 0xff
}
for i := 0; i < len(encoder); i++ {
e.dec[encoder[i]] = byte(i)
}
return e
}
// base32encoding with the Geohash alphabet.
var base32encoding = newEncoding("0123456789bcdefghjkmnpqrstuvwxyz")
var masks = []uint64{16, 8, 4, 2, 1}
// DecodeGeoHash decodes the string geohash faster with
// higher precision. This api is in experimental phase.
func DecodeGeoHash(geoHash string) (float64, float64) {
even := true
lat := []float64{-90.0, 90.0}
lon := []float64{-180.0, 180.0}
for i := 0; i < len(geoHash); i++ {
cd := uint64(base32encoding.dec[geoHash[i]])
for j := 0; j < 5; j++ {
if even {
if cd&masks[j] > 0 {
lon[0] = (lon[0] + lon[1]) / 2
} else {
lon[1] = (lon[0] + lon[1]) / 2
}
} else {
if cd&masks[j] > 0 {
lat[0] = (lat[0] + lat[1]) / 2
} else {
lat[1] = (lat[0] + lat[1]) / 2
}
}
even = !even
}
}
return (lat[0] + lat[1]) / 2, (lon[0] + lon[1]) / 2
}
func EncodeGeoHash(lat, lon float64) string {
even := true
lats := []float64{-90.0, 90.0}
lons := []float64{-180.0, 180.0}
precision := 12
var ch, bit uint64
var geoHash string
for len(geoHash) < precision {
if even {
mid := (lons[0] + lons[1]) / 2
if lon > mid {
ch |= masks[bit]
lons[0] = mid
} else {
lons[1] = mid
}
} else {
mid := (lats[0] + lats[1]) / 2
if lat > mid {
ch |= masks[bit]
lats[0] = mid
} else {
lats[1] = mid
}
}
even = !even
if bit < 4 {
bit++
} else {
geoHash += string(base32encoding.enc[ch])
ch = 0
bit = 0
}
}
return geoHash
}

179
vendor/github.com/blevesearch/bleve/geo/parse.go generated vendored Normal file
View File

@ -0,0 +1,179 @@
// Copyright (c) 2017 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package geo
import (
"reflect"
"strconv"
"strings"
)
// ExtractGeoPoint takes an arbitrary interface{} and tries it's best to
// interpret it is as geo point. Supported formats:
// Container:
// slice length 2 (GeoJSON)
// first element lon, second element lat
// string (coordinates separated by comma, or a geohash)
// first element lat, second element lon
// map[string]interface{}
// exact keys lat and lon or lng
// struct
// w/exported fields case-insensitive match on lat and lon or lng
// struct
// satisfying Later and Loner or Lnger interfaces
//
// in all cases values must be some sort of numeric-like thing: int/uint/float
func ExtractGeoPoint(thing interface{}) (lon, lat float64, success bool) {
var foundLon, foundLat bool
thingVal := reflect.ValueOf(thing)
if !thingVal.IsValid() {
return lon, lat, false
}
thingTyp := thingVal.Type()
// is it a slice
if thingVal.Kind() == reflect.Slice {
// must be length 2
if thingVal.Len() == 2 {
first := thingVal.Index(0)
if first.CanInterface() {
firstVal := first.Interface()
lon, foundLon = extractNumericVal(firstVal)
}
second := thingVal.Index(1)
if second.CanInterface() {
secondVal := second.Interface()
lat, foundLat = extractNumericVal(secondVal)
}
}
}
// is it a string
if thingVal.Kind() == reflect.String {
geoStr := thingVal.Interface().(string)
if strings.Contains(geoStr, ",") {
// geo point with coordinates split by comma
points := strings.Split(geoStr, ",")
for i, point := range points {
// trim any leading or trailing white spaces
points[i] = strings.TrimSpace(point)
}
if len(points) == 2 {
var err error
lat, err = strconv.ParseFloat(points[0], 64)
if err == nil {
foundLat = true
}
lon, err = strconv.ParseFloat(points[1], 64)
if err == nil {
foundLon = true
}
}
} else {
// geohash
lat, lon = DecodeGeoHash(geoStr)
foundLat = true
foundLon = true
}
}
// is it a map
if l, ok := thing.(map[string]interface{}); ok {
if lval, ok := l["lon"]; ok {
lon, foundLon = extractNumericVal(lval)
} else if lval, ok := l["lng"]; ok {
lon, foundLon = extractNumericVal(lval)
}
if lval, ok := l["lat"]; ok {
lat, foundLat = extractNumericVal(lval)
}
}
// now try reflection on struct fields
if thingVal.Kind() == reflect.Struct {
for i := 0; i < thingVal.NumField(); i++ {
fieldName := thingTyp.Field(i).Name
if strings.HasPrefix(strings.ToLower(fieldName), "lon") {
if thingVal.Field(i).CanInterface() {
fieldVal := thingVal.Field(i).Interface()
lon, foundLon = extractNumericVal(fieldVal)
}
}
if strings.HasPrefix(strings.ToLower(fieldName), "lng") {
if thingVal.Field(i).CanInterface() {
fieldVal := thingVal.Field(i).Interface()
lon, foundLon = extractNumericVal(fieldVal)
}
}
if strings.HasPrefix(strings.ToLower(fieldName), "lat") {
if thingVal.Field(i).CanInterface() {
fieldVal := thingVal.Field(i).Interface()
lat, foundLat = extractNumericVal(fieldVal)
}
}
}
}
// last hope, some interfaces
// lon
if l, ok := thing.(loner); ok {
lon = l.Lon()
foundLon = true
} else if l, ok := thing.(lnger); ok {
lon = l.Lng()
foundLon = true
}
// lat
if l, ok := thing.(later); ok {
lat = l.Lat()
foundLat = true
}
return lon, lat, foundLon && foundLat
}
// extract numeric value (if possible) and returns a float64
func extractNumericVal(v interface{}) (float64, bool) {
val := reflect.ValueOf(v)
if !val.IsValid() {
return 0, false
}
typ := val.Type()
switch typ.Kind() {
case reflect.Float32, reflect.Float64:
return val.Float(), true
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return float64(val.Int()), true
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return float64(val.Uint()), true
}
return 0, false
}
// various support interfaces which can be used to find lat/lon
type loner interface {
Lon() float64
}
type later interface {
Lat() float64
}
type lnger interface {
Lng() float64
}

212
vendor/github.com/blevesearch/bleve/geo/sloppy.go generated vendored Normal file
View File

@ -0,0 +1,212 @@
// Copyright (c) 2017 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package geo
import (
"math"
)
var earthDiameterPerLatitude []float64
var sinTab []float64
var cosTab []float64
var asinTab []float64
var asinDer1DivF1Tab []float64
var asinDer2DivF2Tab []float64
var asinDer3DivF3Tab []float64
var asinDer4DivF4Tab []float64
const radiusTabsSize = (1 << 10) + 1
const radiusDelta = (math.Pi / 2) / (radiusTabsSize - 1)
const radiusIndexer = 1 / radiusDelta
const sinCosTabsSize = (1 << 11) + 1
const asinTabsSize = (1 << 13) + 1
const oneDivF2 = 1 / 2.0
const oneDivF3 = 1 / 6.0
const oneDivF4 = 1 / 24.0
// 1.57079632673412561417e+00 first 33 bits of pi/2
var pio2Hi = math.Float64frombits(0x3FF921FB54400000)
// 6.07710050650619224932e-11 pi/2 - PIO2_HI
var pio2Lo = math.Float64frombits(0x3DD0B4611A626331)
var asinPio2Hi = math.Float64frombits(0x3FF921FB54442D18) // 1.57079632679489655800e+00
var asinPio2Lo = math.Float64frombits(0x3C91A62633145C07) // 6.12323399573676603587e-17
var asinPs0 = math.Float64frombits(0x3fc5555555555555) // 1.66666666666666657415e-01
var asinPs1 = math.Float64frombits(0xbfd4d61203eb6f7d) // -3.25565818622400915405e-01
var asinPs2 = math.Float64frombits(0x3fc9c1550e884455) // 2.01212532134862925881e-01
var asinPs3 = math.Float64frombits(0xbfa48228b5688f3b) // -4.00555345006794114027e-02
var asinPs4 = math.Float64frombits(0x3f49efe07501b288) // 7.91534994289814532176e-04
var asinPs5 = math.Float64frombits(0x3f023de10dfdf709) // 3.47933107596021167570e-05
var asinQs1 = math.Float64frombits(0xc0033a271c8a2d4b) // -2.40339491173441421878e+00
var asinQs2 = math.Float64frombits(0x40002ae59c598ac8) // 2.02094576023350569471e+00
var asinQs3 = math.Float64frombits(0xbfe6066c1b8d0159) // -6.88283971605453293030e-01
var asinQs4 = math.Float64frombits(0x3fb3b8c5b12e9282) // 7.70381505559019352791e-02
var twoPiHi = 4 * pio2Hi
var twoPiLo = 4 * pio2Lo
var sinCosDeltaHi = twoPiHi/sinCosTabsSize - 1
var sinCosDeltaLo = twoPiLo/sinCosTabsSize - 1
var sinCosIndexer = 1 / (sinCosDeltaHi + sinCosDeltaLo)
var sinCosMaxValueForIntModulo = ((math.MaxInt64 >> 9) / sinCosIndexer) * 0.99
var asinMaxValueForTabs = math.Sin(73.0 * degreesToRadian)
var asinDelta = asinMaxValueForTabs / (asinTabsSize - 1)
var asinIndexer = 1 / asinDelta
func init() {
// initializes the tables used for the sloppy math functions
// sin and cos
sinTab = make([]float64, sinCosTabsSize)
cosTab = make([]float64, sinCosTabsSize)
sinCosPiIndex := (sinCosTabsSize - 1) / 2
sinCosPiMul2Index := 2 * sinCosPiIndex
sinCosPiMul05Index := sinCosPiIndex / 2
sinCosPiMul15Index := 3 * sinCosPiIndex / 2
for i := 0; i < sinCosTabsSize; i++ {
// angle: in [0,2*PI].
angle := float64(i)*sinCosDeltaHi + float64(i)*sinCosDeltaLo
sinAngle := math.Sin(angle)
cosAngle := math.Cos(angle)
// For indexes corresponding to null cosine or sine, we make sure the value is zero
// and not an epsilon. This allows for a much better accuracy for results close to zero.
if i == sinCosPiIndex {
sinAngle = 0.0
} else if i == sinCosPiMul2Index {
sinAngle = 0.0
} else if i == sinCosPiMul05Index {
sinAngle = 0.0
} else if i == sinCosPiMul15Index {
sinAngle = 0.0
}
sinTab[i] = sinAngle
cosTab[i] = cosAngle
}
// asin
asinTab = make([]float64, asinTabsSize)
asinDer1DivF1Tab = make([]float64, asinTabsSize)
asinDer2DivF2Tab = make([]float64, asinTabsSize)
asinDer3DivF3Tab = make([]float64, asinTabsSize)
asinDer4DivF4Tab = make([]float64, asinTabsSize)
for i := 0; i < asinTabsSize; i++ {
// x: in [0,ASIN_MAX_VALUE_FOR_TABS].
x := float64(i) * asinDelta
asinTab[i] = math.Asin(x)
oneMinusXSqInv := 1.0 / (1 - x*x)
oneMinusXSqInv05 := math.Sqrt(oneMinusXSqInv)
oneMinusXSqInv15 := oneMinusXSqInv05 * oneMinusXSqInv
oneMinusXSqInv25 := oneMinusXSqInv15 * oneMinusXSqInv
oneMinusXSqInv35 := oneMinusXSqInv25 * oneMinusXSqInv
asinDer1DivF1Tab[i] = oneMinusXSqInv05
asinDer2DivF2Tab[i] = (x * oneMinusXSqInv15) * oneDivF2
asinDer3DivF3Tab[i] = ((1 + 2*x*x) * oneMinusXSqInv25) * oneDivF3
asinDer4DivF4Tab[i] = ((5 + 2*x*(2+x*(5-2*x))) * oneMinusXSqInv35) * oneDivF4
}
// earth radius
a := 6378137.0
b := 6356752.31420
a2 := a * a
b2 := b * b
earthDiameterPerLatitude = make([]float64, radiusTabsSize)
earthDiameterPerLatitude[0] = 2.0 * a / 1000
earthDiameterPerLatitude[radiusTabsSize-1] = 2.0 * b / 1000
for i := 1; i < radiusTabsSize-1; i++ {
lat := math.Pi * float64(i) / (2*radiusTabsSize - 1)
one := math.Pow(a2*math.Cos(lat), 2)
two := math.Pow(b2*math.Sin(lat), 2)
three := math.Pow(float64(a)*math.Cos(lat), 2)
four := math.Pow(b*math.Sin(lat), 2)
radius := math.Sqrt((one + two) / (three + four))
earthDiameterPerLatitude[i] = 2 * radius / 1000
}
}
// earthDiameter returns an estimation of the earth's diameter at the specified
// latitude in kilometers
func earthDiameter(lat float64) float64 {
index := math.Mod(math.Abs(lat)*radiusIndexer+0.5, float64(len(earthDiameterPerLatitude)))
if math.IsNaN(index) {
return 0
}
return earthDiameterPerLatitude[int(index)]
}
var pio2 = math.Pi / 2
func sin(a float64) float64 {
return cos(a - pio2)
}
// cos is a sloppy math (faster) implementation of math.Cos
func cos(a float64) float64 {
if a < 0.0 {
a = -a
}
if a > sinCosMaxValueForIntModulo {
return math.Cos(a)
}
// index: possibly outside tables range.
index := int(a*sinCosIndexer + 0.5)
delta := (a - float64(index)*sinCosDeltaHi) - float64(index)*sinCosDeltaLo
// Making sure index is within tables range.
// Last value of each table is the same than first, so we ignore it (tabs size minus one) for modulo.
index &= (sinCosTabsSize - 2) // index % (SIN_COS_TABS_SIZE-1)
indexCos := cosTab[index]
indexSin := sinTab[index]
return indexCos + delta*(-indexSin+delta*(-indexCos*oneDivF2+delta*(indexSin*oneDivF3+delta*indexCos*oneDivF4)))
}
// asin is a sloppy math (faster) implementation of math.Asin
func asin(a float64) float64 {
var negateResult bool
if a < 0 {
a = -a
negateResult = true
}
if a <= asinMaxValueForTabs {
index := int(a*asinIndexer + 0.5)
delta := a - float64(index)*asinDelta
result := asinTab[index] + delta*(asinDer1DivF1Tab[index]+delta*(asinDer2DivF2Tab[index]+delta*(asinDer3DivF3Tab[index]+delta*asinDer4DivF4Tab[index])))
if negateResult {
return -result
}
return result
}
// value > ASIN_MAX_VALUE_FOR_TABS, or value is NaN
// This part is derived from fdlibm.
if a < 1 {
t := (1.0 - a) * 0.5
p := t * (asinPs0 + t*(asinPs1+t*(asinPs2+t*(asinPs3+t*(asinPs4+t+asinPs5)))))
q := 1.0 + t*(asinQs1+t*(asinQs2+t*(asinQs3+t*asinQs4)))
s := math.Sqrt(t)
z := s + s*(p/q)
result := asinPio2Hi - ((z + z) - asinPio2Lo)
if negateResult {
return -result
}
return result
}
// value >= 1.0, or value is NaN
if a == 1.0 {
if negateResult {
return -math.Pi / 2
}
return math.Pi / 2
}
return math.NaN()
}

110
vendor/github.com/blevesearch/bleve/index/analysis.go generated vendored Normal file
View File

@ -0,0 +1,110 @@
// Copyright (c) 2015 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package index
import (
"reflect"
"github.com/blevesearch/bleve/analysis"
"github.com/blevesearch/bleve/document"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeAnalysisResult int
func init() {
var ar AnalysisResult
reflectStaticSizeAnalysisResult = int(reflect.TypeOf(ar).Size())
}
type IndexRow interface {
KeySize() int
KeyTo([]byte) (int, error)
Key() []byte
ValueSize() int
ValueTo([]byte) (int, error)
Value() []byte
}
type AnalysisResult struct {
DocID string
Rows []IndexRow
// scorch
Document *document.Document
Analyzed []analysis.TokenFrequencies
Length []int
}
func (a *AnalysisResult) Size() int {
rv := reflectStaticSizeAnalysisResult
for _, analyzedI := range a.Analyzed {
rv += analyzedI.Size()
}
rv += len(a.Length) * size.SizeOfInt
return rv
}
type AnalysisWork struct {
i Index
d *document.Document
rc chan *AnalysisResult
}
func NewAnalysisWork(i Index, d *document.Document, rc chan *AnalysisResult) *AnalysisWork {
return &AnalysisWork{
i: i,
d: d,
rc: rc,
}
}
type AnalysisQueue struct {
queue chan *AnalysisWork
done chan struct{}
}
func (q *AnalysisQueue) Queue(work *AnalysisWork) {
q.queue <- work
}
func (q *AnalysisQueue) Close() {
close(q.done)
}
func NewAnalysisQueue(numWorkers int) *AnalysisQueue {
rv := AnalysisQueue{
queue: make(chan *AnalysisWork),
done: make(chan struct{}),
}
for i := 0; i < numWorkers; i++ {
go AnalysisWorker(rv)
}
return &rv
}
func AnalysisWorker(q AnalysisQueue) {
// read work off the queue
for {
select {
case <-q.done:
return
case w := <-q.queue:
r := w.i.Analyze(w.d)
w.rc <- r
}
}
}

View File

@ -0,0 +1,88 @@
// Copyright (c) 2015 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package index
import (
"sync"
)
type FieldCache struct {
fieldIndexes map[string]uint16
indexFields []string
lastFieldIndex int
mutex sync.RWMutex
}
func NewFieldCache() *FieldCache {
return &FieldCache{
fieldIndexes: make(map[string]uint16),
lastFieldIndex: -1,
}
}
func (f *FieldCache) AddExisting(field string, index uint16) {
f.mutex.Lock()
f.addLOCKED(field, index)
f.mutex.Unlock()
}
func (f *FieldCache) addLOCKED(field string, index uint16) uint16 {
f.fieldIndexes[field] = index
if len(f.indexFields) < int(index)+1 {
prevIndexFields := f.indexFields
f.indexFields = make([]string, int(index)+16)
copy(f.indexFields, prevIndexFields)
}
f.indexFields[int(index)] = field
if int(index) > f.lastFieldIndex {
f.lastFieldIndex = int(index)
}
return index
}
// FieldNamed returns the index of the field, and whether or not it existed
// before this call. if createIfMissing is true, and new field index is assigned
// but the second return value will still be false
func (f *FieldCache) FieldNamed(field string, createIfMissing bool) (uint16, bool) {
f.mutex.RLock()
if index, ok := f.fieldIndexes[field]; ok {
f.mutex.RUnlock()
return index, true
} else if !createIfMissing {
f.mutex.RUnlock()
return 0, false
}
// trade read lock for write lock
f.mutex.RUnlock()
f.mutex.Lock()
// need to check again with write lock
if index, ok := f.fieldIndexes[field]; ok {
f.mutex.Unlock()
return index, true
}
// assign next field id
index := f.addLOCKED(field, uint16(f.lastFieldIndex+1))
f.mutex.Unlock()
return index, false
}
func (f *FieldCache) FieldIndexed(index uint16) (field string) {
f.mutex.RLock()
if int(index) < len(f.indexFields) {
field = f.indexFields[int(index)]
}
f.mutex.RUnlock()
return field
}

369
vendor/github.com/blevesearch/bleve/index/index.go generated vendored Normal file
View File

@ -0,0 +1,369 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package index
import (
"bytes"
"encoding/json"
"fmt"
"reflect"
"github.com/blevesearch/bleve/document"
"github.com/blevesearch/bleve/index/store"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeTermFieldDoc int
var reflectStaticSizeTermFieldVector int
func init() {
var tfd TermFieldDoc
reflectStaticSizeTermFieldDoc = int(reflect.TypeOf(tfd).Size())
var tfv TermFieldVector
reflectStaticSizeTermFieldVector = int(reflect.TypeOf(tfv).Size())
}
var ErrorUnknownStorageType = fmt.Errorf("unknown storage type")
type Index interface {
Open() error
Close() error
Update(doc *document.Document) error
Delete(id string) error
Batch(batch *Batch) error
SetInternal(key, val []byte) error
DeleteInternal(key []byte) error
// Reader returns a low-level accessor on the index data. Close it to
// release associated resources.
Reader() (IndexReader, error)
Stats() json.Marshaler
StatsMap() map[string]interface{}
Analyze(d *document.Document) *AnalysisResult
Advanced() (store.KVStore, error)
}
type DocumentFieldTermVisitor func(field string, term []byte)
type IndexReader interface {
TermFieldReader(term []byte, field string, includeFreq, includeNorm, includeTermVectors bool) (TermFieldReader, error)
// DocIDReader returns an iterator over all doc ids
// The caller must close returned instance to release associated resources.
DocIDReaderAll() (DocIDReader, error)
DocIDReaderOnly(ids []string) (DocIDReader, error)
FieldDict(field string) (FieldDict, error)
// FieldDictRange is currently defined to include the start and end terms
FieldDictRange(field string, startTerm []byte, endTerm []byte) (FieldDict, error)
FieldDictPrefix(field string, termPrefix []byte) (FieldDict, error)
Document(id string) (*document.Document, error)
DocumentVisitFieldTerms(id IndexInternalID, fields []string, visitor DocumentFieldTermVisitor) error
DocValueReader(fields []string) (DocValueReader, error)
Fields() ([]string, error)
GetInternal(key []byte) ([]byte, error)
DocCount() (uint64, error)
ExternalID(id IndexInternalID) (string, error)
InternalID(id string) (IndexInternalID, error)
DumpAll() chan interface{}
DumpDoc(id string) chan interface{}
DumpFields() chan interface{}
Close() error
}
// The Regexp interface defines the subset of the regexp.Regexp API
// methods that are used by bleve indexes, allowing callers to pass in
// alternate implementations.
type Regexp interface {
FindStringIndex(s string) (loc []int)
LiteralPrefix() (prefix string, complete bool)
String() string
}
type IndexReaderRegexp interface {
FieldDictRegexp(field string, regex string) (FieldDict, error)
}
type IndexReaderFuzzy interface {
FieldDictFuzzy(field string, term string, fuzziness int, prefix string) (FieldDict, error)
}
type IndexReaderOnly interface {
FieldDictOnly(field string, onlyTerms [][]byte, includeCount bool) (FieldDict, error)
}
type IndexReaderContains interface {
FieldDictContains(field string) (FieldDictContains, error)
}
// FieldTerms contains the terms used by a document, keyed by field
type FieldTerms map[string][]string
// FieldsNotYetCached returns a list of fields not yet cached out of a larger list of fields
func (f FieldTerms) FieldsNotYetCached(fields []string) []string {
rv := make([]string, 0, len(fields))
for _, field := range fields {
if _, ok := f[field]; !ok {
rv = append(rv, field)
}
}
return rv
}
// Merge will combine two FieldTerms
// it assumes that the terms lists are complete (thus do not need to be merged)
// field terms from the other list always replace the ones in the receiver
func (f FieldTerms) Merge(other FieldTerms) {
for field, terms := range other {
f[field] = terms
}
}
type TermFieldVector struct {
Field string
ArrayPositions []uint64
Pos uint64
Start uint64
End uint64
}
func (tfv *TermFieldVector) Size() int {
return reflectStaticSizeTermFieldVector + size.SizeOfPtr +
len(tfv.Field) + len(tfv.ArrayPositions)*size.SizeOfUint64
}
// IndexInternalID is an opaque document identifier interal to the index impl
type IndexInternalID []byte
func (id IndexInternalID) Equals(other IndexInternalID) bool {
return id.Compare(other) == 0
}
func (id IndexInternalID) Compare(other IndexInternalID) int {
return bytes.Compare(id, other)
}
type TermFieldDoc struct {
Term string
ID IndexInternalID
Freq uint64
Norm float64
Vectors []*TermFieldVector
}
func (tfd *TermFieldDoc) Size() int {
sizeInBytes := reflectStaticSizeTermFieldDoc + size.SizeOfPtr +
len(tfd.Term) + len(tfd.ID)
for _, entry := range tfd.Vectors {
sizeInBytes += entry.Size()
}
return sizeInBytes
}
// Reset allows an already allocated TermFieldDoc to be reused
func (tfd *TermFieldDoc) Reset() *TermFieldDoc {
// remember the []byte used for the ID
id := tfd.ID
vectors := tfd.Vectors
// idiom to copy over from empty TermFieldDoc (0 allocations)
*tfd = TermFieldDoc{}
// reuse the []byte already allocated (and reset len to 0)
tfd.ID = id[:0]
tfd.Vectors = vectors[:0]
return tfd
}
// TermFieldReader is the interface exposing the enumeration of documents
// containing a given term in a given field. Documents are returned in byte
// lexicographic order over their identifiers.
type TermFieldReader interface {
// Next returns the next document containing the term in this field, or nil
// when it reaches the end of the enumeration. The preAlloced TermFieldDoc
// is optional, and when non-nil, will be used instead of allocating memory.
Next(preAlloced *TermFieldDoc) (*TermFieldDoc, error)
// Advance resets the enumeration at specified document or its immediate
// follower.
Advance(ID IndexInternalID, preAlloced *TermFieldDoc) (*TermFieldDoc, error)
// Count returns the number of documents contains the term in this field.
Count() uint64
Close() error
Size() int
}
type DictEntry struct {
Term string
Count uint64
}
type FieldDict interface {
Next() (*DictEntry, error)
Close() error
}
type FieldDictContains interface {
Contains(key []byte) (bool, error)
}
// DocIDReader is the interface exposing enumeration of documents identifiers.
// Close the reader to release associated resources.
type DocIDReader interface {
// Next returns the next document internal identifier in the natural
// index order, nil when the end of the sequence is reached.
Next() (IndexInternalID, error)
// Advance resets the iteration to the first internal identifier greater than
// or equal to ID. If ID is smaller than the start of the range, the iteration
// will start there instead. If ID is greater than or equal to the end of
// the range, Next() call will return io.EOF.
Advance(ID IndexInternalID) (IndexInternalID, error)
Size() int
Close() error
}
type BatchCallback func(error)
type Batch struct {
IndexOps map[string]*document.Document
InternalOps map[string][]byte
persistedCallback BatchCallback
}
func NewBatch() *Batch {
return &Batch{
IndexOps: make(map[string]*document.Document),
InternalOps: make(map[string][]byte),
}
}
func (b *Batch) Update(doc *document.Document) {
b.IndexOps[doc.ID] = doc
}
func (b *Batch) Delete(id string) {
b.IndexOps[id] = nil
}
func (b *Batch) SetInternal(key, val []byte) {
b.InternalOps[string(key)] = val
}
func (b *Batch) DeleteInternal(key []byte) {
b.InternalOps[string(key)] = nil
}
func (b *Batch) SetPersistedCallback(f BatchCallback) {
b.persistedCallback = f
}
func (b *Batch) PersistedCallback() BatchCallback {
return b.persistedCallback
}
func (b *Batch) String() string {
rv := fmt.Sprintf("Batch (%d ops, %d internal ops)\n", len(b.IndexOps), len(b.InternalOps))
for k, v := range b.IndexOps {
if v != nil {
rv += fmt.Sprintf("\tINDEX - '%s'\n", k)
} else {
rv += fmt.Sprintf("\tDELETE - '%s'\n", k)
}
}
for k, v := range b.InternalOps {
if v != nil {
rv += fmt.Sprintf("\tSET INTERNAL - '%s'\n", k)
} else {
rv += fmt.Sprintf("\tDELETE INTERNAL - '%s'\n", k)
}
}
return rv
}
func (b *Batch) Reset() {
b.IndexOps = make(map[string]*document.Document)
b.InternalOps = make(map[string][]byte)
b.persistedCallback = nil
}
func (b *Batch) Merge(o *Batch) {
for k, v := range o.IndexOps {
b.IndexOps[k] = v
}
for k, v := range o.InternalOps {
b.InternalOps[k] = v
}
}
func (b *Batch) TotalDocSize() int {
var s int
for k, v := range b.IndexOps {
if v != nil {
s += v.Size() + size.SizeOfString
}
s += len(k)
}
return s
}
// Optimizable represents an optional interface that implementable by
// optimizable resources (e.g., TermFieldReaders, Searchers). These
// optimizable resources are provided the same OptimizableContext
// instance, so that they can coordinate via dynamic interface
// casting.
type Optimizable interface {
Optimize(kind string, octx OptimizableContext) (OptimizableContext, error)
}
// Represents a result of optimization -- see the Finish() method.
type Optimized interface{}
type OptimizableContext interface {
// Once all the optimzable resources have been provided the same
// OptimizableContext instance, the optimization preparations are
// finished or completed via the Finish() method.
//
// Depending on the optimization being performed, the Finish()
// method might return a non-nil Optimized instance. For example,
// the Optimized instance might represent an optimized
// TermFieldReader instance.
Finish() (Optimized, error)
}
type DocValueReader interface {
VisitDocValues(id IndexInternalID, visitor DocumentFieldTermVisitor) error
}

View File

@ -0,0 +1,62 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package store
type op struct {
K []byte
V []byte
}
type EmulatedBatch struct {
Ops []*op
Merger *EmulatedMerge
}
func NewEmulatedBatch(mo MergeOperator) *EmulatedBatch {
return &EmulatedBatch{
Ops: make([]*op, 0, 1000),
Merger: NewEmulatedMerge(mo),
}
}
func (b *EmulatedBatch) Set(key, val []byte) {
ck := make([]byte, len(key))
copy(ck, key)
cv := make([]byte, len(val))
copy(cv, val)
b.Ops = append(b.Ops, &op{ck, cv})
}
func (b *EmulatedBatch) Delete(key []byte) {
ck := make([]byte, len(key))
copy(ck, key)
b.Ops = append(b.Ops, &op{ck, nil})
}
func (b *EmulatedBatch) Merge(key, val []byte) {
ck := make([]byte, len(key))
copy(ck, key)
cv := make([]byte, len(val))
copy(cv, val)
b.Merger.Merge(key, val)
}
func (b *EmulatedBatch) Reset() {
b.Ops = b.Ops[:0]
}
func (b *EmulatedBatch) Close() error {
return nil
}

View File

@ -0,0 +1,174 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package store
import "encoding/json"
// KVStore is an abstraction for working with KV stores. Note that
// in order to be used with the bleve.registry, it must also implement
// a constructor function of the registry.KVStoreConstructor type.
type KVStore interface {
// Writer returns a KVWriter which can be used to
// make changes to the KVStore. If a writer cannot
// be obtained a non-nil error is returned.
Writer() (KVWriter, error)
// Reader returns a KVReader which can be used to
// read data from the KVStore. If a reader cannot
// be obtained a non-nil error is returned.
Reader() (KVReader, error)
// Close closes the KVStore
Close() error
}
// KVReader is an abstraction of an **ISOLATED** reader
// In this context isolated is defined to mean that
// writes/deletes made after the KVReader is opened
// are not observed.
// Because there is usually a cost associated with
// keeping isolated readers active, users should
// close them as soon as they are no longer needed.
type KVReader interface {
// Get returns the value associated with the key
// If the key does not exist, nil is returned.
// The caller owns the bytes returned.
Get(key []byte) ([]byte, error)
// MultiGet retrieves multiple values in one call.
MultiGet(keys [][]byte) ([][]byte, error)
// PrefixIterator returns a KVIterator that will
// visit all K/V pairs with the provided prefix
PrefixIterator(prefix []byte) KVIterator
// RangeIterator returns a KVIterator that will
// visit all K/V pairs >= start AND < end
RangeIterator(start, end []byte) KVIterator
// Close closes the iterator
Close() error
}
// KVIterator is an abstraction around key iteration
type KVIterator interface {
// Seek will advance the iterator to the specified key
Seek(key []byte)
// Next will advance the iterator to the next key
Next()
// Key returns the key pointed to by the iterator
// The bytes returned are **ONLY** valid until the next call to Seek/Next/Close
// Continued use after that requires that they be copied.
Key() []byte
// Value returns the value pointed to by the iterator
// The bytes returned are **ONLY** valid until the next call to Seek/Next/Close
// Continued use after that requires that they be copied.
Value() []byte
// Valid returns whether or not the iterator is in a valid state
Valid() bool
// Current returns Key(),Value(),Valid() in a single operation
Current() ([]byte, []byte, bool)
// Close closes the iterator
Close() error
}
// KVWriter is an abstraction for mutating the KVStore
// KVWriter does **NOT** enforce restrictions of a single writer
// if the underlying KVStore allows concurrent writes, the
// KVWriter interface should also do so, it is up to the caller
// to do this in a way that is safe and makes sense
type KVWriter interface {
// NewBatch returns a KVBatch for performing batch operations on this kvstore
NewBatch() KVBatch
// NewBatchEx returns a KVBatch and an associated byte array
// that's pre-sized based on the KVBatchOptions. The caller can
// use the returned byte array for keys and values associated with
// the batch. Once the batch is either executed or closed, the
// associated byte array should no longer be accessed by the
// caller.
NewBatchEx(KVBatchOptions) ([]byte, KVBatch, error)
// ExecuteBatch will execute the KVBatch, the provided KVBatch **MUST** have
// been created by the same KVStore (though not necessarily the same KVWriter)
// Batch execution is atomic, either all the operations or none will be performed
ExecuteBatch(batch KVBatch) error
// Close closes the writer
Close() error
}
// KVBatchOptions provides the KVWriter.NewBatchEx() method with batch
// preparation and preallocation information.
type KVBatchOptions struct {
// TotalBytes is the sum of key and value bytes needed by the
// caller for the entire batch. It affects the size of the
// returned byte array of KVWrite.NewBatchEx().
TotalBytes int
// NumSets is the number of Set() calls the caller will invoke on
// the KVBatch.
NumSets int
// NumDeletes is the number of Delete() calls the caller will invoke
// on the KVBatch.
NumDeletes int
// NumMerges is the number of Merge() calls the caller will invoke
// on the KVBatch.
NumMerges int
}
// KVBatch is an abstraction for making multiple KV mutations at once
type KVBatch interface {
// Set updates the key with the specified value
// both key and value []byte may be reused as soon as this call returns
Set(key, val []byte)
// Delete removes the specified key
// the key []byte may be reused as soon as this call returns
Delete(key []byte)
// Merge merges old value with the new value at the specified key
// as prescribed by the KVStores merge operator
// both key and value []byte may be reused as soon as this call returns
Merge(key, val []byte)
// Reset frees resources for this batch and allows reuse
Reset()
// Close frees resources
Close() error
}
// KVStoreStats is an optional interface that KVStores can implement
// if they're able to report any useful stats
type KVStoreStats interface {
// Stats returns a JSON serializable object representing stats for this KVStore
Stats() json.Marshaler
StatsMap() map[string]interface{}
}

View File

@ -0,0 +1,64 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package store
// At the moment this happens to be the same interface as described by
// RocksDB, but this may not always be the case.
type MergeOperator interface {
// FullMerge the full sequence of operands on top of the existingValue
// if no value currently exists, existingValue is nil
// return the merged value, and success/failure
FullMerge(key, existingValue []byte, operands [][]byte) ([]byte, bool)
// Partially merge these two operands.
// If partial merge cannot be done, return nil,false, which will defer
// all processing until the FullMerge is done.
PartialMerge(key, leftOperand, rightOperand []byte) ([]byte, bool)
// Name returns an identifier for the operator
Name() string
}
type EmulatedMerge struct {
Merges map[string][][]byte
mo MergeOperator
}
func NewEmulatedMerge(mo MergeOperator) *EmulatedMerge {
return &EmulatedMerge{
Merges: make(map[string][][]byte),
mo: mo,
}
}
func (m *EmulatedMerge) Merge(key, val []byte) {
ops, ok := m.Merges[string(key)]
if ok && len(ops) > 0 {
last := ops[len(ops)-1]
mergedVal, partialMergeOk := m.mo.PartialMerge(key, last, val)
if partialMergeOk {
// replace last entry with the result of the merge
ops[len(ops)-1] = mergedVal
} else {
// could not partial merge, append this to the end
ops = append(ops, val)
}
} else {
ops = [][]byte{val}
}
m.Merges[string(key)] = ops
}

View File

@ -0,0 +1,33 @@
// Copyright (c) 2016 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package store
// MultiGet is a helper function to retrieve mutiple keys from a
// KVReader, and might be used by KVStore implementations that don't
// have a native multi-get facility.
func MultiGet(kvreader KVReader, keys [][]byte) ([][]byte, error) {
vals := make([][]byte, 0, len(keys))
for i, key := range keys {
val, err := kvreader.Get(key)
if err != nil {
return nil, err
}
vals[i] = val
}
return vals, nil
}

43
vendor/github.com/blevesearch/bleve/numeric/bin.go generated vendored Normal file
View File

@ -0,0 +1,43 @@
package numeric
var interleaveMagic = []uint64{
0x5555555555555555,
0x3333333333333333,
0x0F0F0F0F0F0F0F0F,
0x00FF00FF00FF00FF,
0x0000FFFF0000FFFF,
0x00000000FFFFFFFF,
0xAAAAAAAAAAAAAAAA,
}
var interleaveShift = []uint{1, 2, 4, 8, 16}
// Interleave the first 32 bits of each uint64
// apdated from org.apache.lucene.util.BitUtil
// which was adapted from:
// http://graphics.stanford.edu/~seander/bithacks.html#InterleaveBMN
func Interleave(v1, v2 uint64) uint64 {
v1 = (v1 | (v1 << interleaveShift[4])) & interleaveMagic[4]
v1 = (v1 | (v1 << interleaveShift[3])) & interleaveMagic[3]
v1 = (v1 | (v1 << interleaveShift[2])) & interleaveMagic[2]
v1 = (v1 | (v1 << interleaveShift[1])) & interleaveMagic[1]
v1 = (v1 | (v1 << interleaveShift[0])) & interleaveMagic[0]
v2 = (v2 | (v2 << interleaveShift[4])) & interleaveMagic[4]
v2 = (v2 | (v2 << interleaveShift[3])) & interleaveMagic[3]
v2 = (v2 | (v2 << interleaveShift[2])) & interleaveMagic[2]
v2 = (v2 | (v2 << interleaveShift[1])) & interleaveMagic[1]
v2 = (v2 | (v2 << interleaveShift[0])) & interleaveMagic[0]
return (v2 << 1) | v1
}
// Deinterleave the 32-bit value starting at position 0
// to get the other 32-bit value, shift it by 1 first
func Deinterleave(b uint64) uint64 {
b &= interleaveMagic[0]
b = (b ^ (b >> interleaveShift[0])) & interleaveMagic[1]
b = (b ^ (b >> interleaveShift[1])) & interleaveMagic[2]
b = (b ^ (b >> interleaveShift[2])) & interleaveMagic[3]
b = (b ^ (b >> interleaveShift[3])) & interleaveMagic[4]
b = (b ^ (b >> interleaveShift[4])) & interleaveMagic[5]
return b
}

34
vendor/github.com/blevesearch/bleve/numeric/float.go generated vendored Normal file
View File

@ -0,0 +1,34 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package numeric
import (
"math"
)
func Float64ToInt64(f float64) int64 {
fasint := int64(math.Float64bits(f))
if fasint < 0 {
fasint = fasint ^ 0x7fffffffffffffff
}
return fasint
}
func Int64ToFloat64(i int64) float64 {
if i < 0 {
i ^= 0x7fffffffffffffff
}
return math.Float64frombits(uint64(i))
}

View File

@ -0,0 +1,111 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package numeric
import "fmt"
const ShiftStartInt64 byte = 0x20
// PrefixCoded is a byte array encoding of
// 64-bit numeric values shifted by 0-63 bits
type PrefixCoded []byte
func NewPrefixCodedInt64(in int64, shift uint) (PrefixCoded, error) {
rv, _, err := NewPrefixCodedInt64Prealloc(in, shift, nil)
return rv, err
}
func NewPrefixCodedInt64Prealloc(in int64, shift uint, prealloc []byte) (
rv PrefixCoded, preallocRest []byte, err error) {
if shift > 63 {
return nil, prealloc, fmt.Errorf("cannot shift %d, must be between 0 and 63", shift)
}
nChars := ((63 - shift) / 7) + 1
size := int(nChars + 1)
if len(prealloc) >= size {
rv = PrefixCoded(prealloc[0:size])
preallocRest = prealloc[size:]
} else {
rv = make(PrefixCoded, size)
}
rv[0] = ShiftStartInt64 + byte(shift)
sortableBits := int64(uint64(in) ^ 0x8000000000000000)
sortableBits = int64(uint64(sortableBits) >> shift)
for nChars > 0 {
// Store 7 bits per byte for compatibility
// with UTF-8 encoding of terms
rv[nChars] = byte(sortableBits & 0x7f)
nChars--
sortableBits = int64(uint64(sortableBits) >> 7)
}
return rv, preallocRest, nil
}
func MustNewPrefixCodedInt64(in int64, shift uint) PrefixCoded {
rv, err := NewPrefixCodedInt64(in, shift)
if err != nil {
panic(err)
}
return rv
}
// Shift returns the number of bits shifted
// returns 0 if in uninitialized state
func (p PrefixCoded) Shift() (uint, error) {
if len(p) > 0 {
shift := p[0] - ShiftStartInt64
if shift < 0 || shift < 63 {
return uint(shift), nil
}
}
return 0, fmt.Errorf("invalid prefix coded value")
}
func (p PrefixCoded) Int64() (int64, error) {
shift, err := p.Shift()
if err != nil {
return 0, err
}
var sortableBits int64
for _, inbyte := range p[1:] {
sortableBits <<= 7
sortableBits |= int64(inbyte)
}
return int64(uint64((sortableBits << shift)) ^ 0x8000000000000000), nil
}
func ValidPrefixCodedTerm(p string) (bool, int) {
return ValidPrefixCodedTermBytes([]byte(p))
}
func ValidPrefixCodedTermBytes(p []byte) (bool, int) {
if len(p) > 0 {
if p[0] < ShiftStartInt64 || p[0] > ShiftStartInt64+63 {
return false, 0
}
shift := p[0] - ShiftStartInt64
nChars := ((63 - int(shift)) / 7) + 1
if len(p) != nChars+1 {
return false, 0
}
return true, int(shift)
}
return false, 0
}

View File

@ -0,0 +1,89 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/analysis"
)
func RegisterAnalyzer(name string, constructor AnalyzerConstructor) {
_, exists := analyzers[name]
if exists {
panic(fmt.Errorf("attempted to register duplicate analyzer named '%s'", name))
}
analyzers[name] = constructor
}
type AnalyzerConstructor func(config map[string]interface{}, cache *Cache) (*analysis.Analyzer, error)
type AnalyzerRegistry map[string]AnalyzerConstructor
type AnalyzerCache struct {
*ConcurrentCache
}
func NewAnalyzerCache() *AnalyzerCache {
return &AnalyzerCache{
NewConcurrentCache(),
}
}
func AnalyzerBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
cons, registered := analyzers[name]
if !registered {
return nil, fmt.Errorf("no analyzer with name or type '%s' registered", name)
}
analyzer, err := cons(config, cache)
if err != nil {
return nil, fmt.Errorf("error building analyzer: %v", err)
}
return analyzer, nil
}
func (c *AnalyzerCache) AnalyzerNamed(name string, cache *Cache) (*analysis.Analyzer, error) {
item, err := c.ItemNamed(name, cache, AnalyzerBuild)
if err != nil {
return nil, err
}
return item.(*analysis.Analyzer), nil
}
func (c *AnalyzerCache) DefineAnalyzer(name string, typ string, config map[string]interface{}, cache *Cache) (*analysis.Analyzer, error) {
item, err := c.DefineItem(name, typ, config, cache, AnalyzerBuild)
if err != nil {
if err == ErrAlreadyDefined {
return nil, fmt.Errorf("analyzer named '%s' already defined", name)
}
return nil, err
}
return item.(*analysis.Analyzer), nil
}
func AnalyzerTypesAndInstances() ([]string, []string) {
emptyConfig := map[string]interface{}{}
emptyCache := NewCache()
var types []string
var instances []string
for name, cons := range analyzers {
_, err := cons(emptyConfig, emptyCache)
if err == nil {
instances = append(instances, name)
} else {
types = append(types, name)
}
}
return types, instances
}

87
vendor/github.com/blevesearch/bleve/registry/cache.go generated vendored Normal file
View File

@ -0,0 +1,87 @@
// Copyright (c) 2016 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"sync"
)
var ErrAlreadyDefined = fmt.Errorf("item already defined")
type CacheBuild func(name string, config map[string]interface{}, cache *Cache) (interface{}, error)
type ConcurrentCache struct {
mutex sync.RWMutex
data map[string]interface{}
}
func NewConcurrentCache() *ConcurrentCache {
return &ConcurrentCache{
data: make(map[string]interface{}),
}
}
func (c *ConcurrentCache) ItemNamed(name string, cache *Cache, build CacheBuild) (interface{}, error) {
c.mutex.RLock()
item, cached := c.data[name]
if cached {
c.mutex.RUnlock()
return item, nil
}
// give up read lock
c.mutex.RUnlock()
// try to build it
newItem, err := build(name, nil, cache)
if err != nil {
return nil, err
}
// acquire write lock
c.mutex.Lock()
defer c.mutex.Unlock()
// check again because it could have been created while trading locks
item, cached = c.data[name]
if cached {
return item, nil
}
c.data[name] = newItem
return newItem, nil
}
func (c *ConcurrentCache) DefineItem(name string, typ string, config map[string]interface{}, cache *Cache, build CacheBuild) (interface{}, error) {
c.mutex.RLock()
_, cached := c.data[name]
if cached {
c.mutex.RUnlock()
return nil, ErrAlreadyDefined
}
// give up read lock so others lookups can proceed
c.mutex.RUnlock()
// really not there, try to build it
newItem, err := build(typ, config, cache)
if err != nil {
return nil, err
}
// now we've built it, acquire lock
c.mutex.Lock()
defer c.mutex.Unlock()
// check again because it could have been created while trading locks
_, cached = c.data[name]
if cached {
return nil, ErrAlreadyDefined
}
c.data[name] = newItem
return newItem, nil
}

View File

@ -0,0 +1,89 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/analysis"
)
func RegisterCharFilter(name string, constructor CharFilterConstructor) {
_, exists := charFilters[name]
if exists {
panic(fmt.Errorf("attempted to register duplicate char filter named '%s'", name))
}
charFilters[name] = constructor
}
type CharFilterConstructor func(config map[string]interface{}, cache *Cache) (analysis.CharFilter, error)
type CharFilterRegistry map[string]CharFilterConstructor
type CharFilterCache struct {
*ConcurrentCache
}
func NewCharFilterCache() *CharFilterCache {
return &CharFilterCache{
NewConcurrentCache(),
}
}
func CharFilterBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
cons, registered := charFilters[name]
if !registered {
return nil, fmt.Errorf("no char filter with name or type '%s' registered", name)
}
charFilter, err := cons(config, cache)
if err != nil {
return nil, fmt.Errorf("error building char filter: %v", err)
}
return charFilter, nil
}
func (c *CharFilterCache) CharFilterNamed(name string, cache *Cache) (analysis.CharFilter, error) {
item, err := c.ItemNamed(name, cache, CharFilterBuild)
if err != nil {
return nil, err
}
return item.(analysis.CharFilter), nil
}
func (c *CharFilterCache) DefineCharFilter(name string, typ string, config map[string]interface{}, cache *Cache) (analysis.CharFilter, error) {
item, err := c.DefineItem(name, typ, config, cache, CharFilterBuild)
if err != nil {
if err == ErrAlreadyDefined {
return nil, fmt.Errorf("char filter named '%s' already defined", name)
}
return nil, err
}
return item.(analysis.CharFilter), nil
}
func CharFilterTypesAndInstances() ([]string, []string) {
emptyConfig := map[string]interface{}{}
emptyCache := NewCache()
var types []string
var instances []string
for name, cons := range charFilters {
_, err := cons(emptyConfig, emptyCache)
if err == nil {
instances = append(instances, name)
} else {
types = append(types, name)
}
}
return types, instances
}

View File

@ -0,0 +1,89 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/analysis"
)
func RegisterDateTimeParser(name string, constructor DateTimeParserConstructor) {
_, exists := dateTimeParsers[name]
if exists {
panic(fmt.Errorf("attempted to register duplicate date time parser named '%s'", name))
}
dateTimeParsers[name] = constructor
}
type DateTimeParserConstructor func(config map[string]interface{}, cache *Cache) (analysis.DateTimeParser, error)
type DateTimeParserRegistry map[string]DateTimeParserConstructor
type DateTimeParserCache struct {
*ConcurrentCache
}
func NewDateTimeParserCache() *DateTimeParserCache {
return &DateTimeParserCache{
NewConcurrentCache(),
}
}
func DateTimeParserBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
cons, registered := dateTimeParsers[name]
if !registered {
return nil, fmt.Errorf("no date time parser with name or type '%s' registered", name)
}
dateTimeParser, err := cons(config, cache)
if err != nil {
return nil, fmt.Errorf("error building date time parser: %v", err)
}
return dateTimeParser, nil
}
func (c *DateTimeParserCache) DateTimeParserNamed(name string, cache *Cache) (analysis.DateTimeParser, error) {
item, err := c.ItemNamed(name, cache, DateTimeParserBuild)
if err != nil {
return nil, err
}
return item.(analysis.DateTimeParser), nil
}
func (c *DateTimeParserCache) DefineDateTimeParser(name string, typ string, config map[string]interface{}, cache *Cache) (analysis.DateTimeParser, error) {
item, err := c.DefineItem(name, typ, config, cache, DateTimeParserBuild)
if err != nil {
if err == ErrAlreadyDefined {
return nil, fmt.Errorf("date time parser named '%s' already defined", name)
}
return nil, err
}
return item.(analysis.DateTimeParser), nil
}
func DateTimeParserTypesAndInstances() ([]string, []string) {
emptyConfig := map[string]interface{}{}
emptyCache := NewCache()
var types []string
var instances []string
for name, cons := range dateTimeParsers {
_, err := cons(emptyConfig, emptyCache)
if err == nil {
instances = append(instances, name)
} else {
types = append(types, name)
}
}
return types, instances
}

View File

@ -0,0 +1,89 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/search/highlight"
)
func RegisterFragmentFormatter(name string, constructor FragmentFormatterConstructor) {
_, exists := fragmentFormatters[name]
if exists {
panic(fmt.Errorf("attempted to register duplicate fragment formatter named '%s'", name))
}
fragmentFormatters[name] = constructor
}
type FragmentFormatterConstructor func(config map[string]interface{}, cache *Cache) (highlight.FragmentFormatter, error)
type FragmentFormatterRegistry map[string]FragmentFormatterConstructor
type FragmentFormatterCache struct {
*ConcurrentCache
}
func NewFragmentFormatterCache() *FragmentFormatterCache {
return &FragmentFormatterCache{
NewConcurrentCache(),
}
}
func FragmentFormatterBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
cons, registered := fragmentFormatters[name]
if !registered {
return nil, fmt.Errorf("no fragment formatter with name or type '%s' registered", name)
}
fragmentFormatter, err := cons(config, cache)
if err != nil {
return nil, fmt.Errorf("error building fragment formatter: %v", err)
}
return fragmentFormatter, nil
}
func (c *FragmentFormatterCache) FragmentFormatterNamed(name string, cache *Cache) (highlight.FragmentFormatter, error) {
item, err := c.ItemNamed(name, cache, FragmentFormatterBuild)
if err != nil {
return nil, err
}
return item.(highlight.FragmentFormatter), nil
}
func (c *FragmentFormatterCache) DefineFragmentFormatter(name string, typ string, config map[string]interface{}, cache *Cache) (highlight.FragmentFormatter, error) {
item, err := c.DefineItem(name, typ, config, cache, FragmentFormatterBuild)
if err != nil {
if err == ErrAlreadyDefined {
return nil, fmt.Errorf("fragment formatter named '%s' already defined", name)
}
return nil, err
}
return item.(highlight.FragmentFormatter), nil
}
func FragmentFormatterTypesAndInstances() ([]string, []string) {
emptyConfig := map[string]interface{}{}
emptyCache := NewCache()
var types []string
var instances []string
for name, cons := range fragmentFormatters {
_, err := cons(emptyConfig, emptyCache)
if err == nil {
instances = append(instances, name)
} else {
types = append(types, name)
}
}
return types, instances
}

View File

@ -0,0 +1,89 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/search/highlight"
)
func RegisterFragmenter(name string, constructor FragmenterConstructor) {
_, exists := fragmenters[name]
if exists {
panic(fmt.Errorf("attempted to register duplicate fragmenter named '%s'", name))
}
fragmenters[name] = constructor
}
type FragmenterConstructor func(config map[string]interface{}, cache *Cache) (highlight.Fragmenter, error)
type FragmenterRegistry map[string]FragmenterConstructor
type FragmenterCache struct {
*ConcurrentCache
}
func NewFragmenterCache() *FragmenterCache {
return &FragmenterCache{
NewConcurrentCache(),
}
}
func FragmenterBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
cons, registered := fragmenters[name]
if !registered {
return nil, fmt.Errorf("no fragmenter with name or type '%s' registered", name)
}
fragmenter, err := cons(config, cache)
if err != nil {
return nil, fmt.Errorf("error building fragmenter: %v", err)
}
return fragmenter, nil
}
func (c *FragmenterCache) FragmenterNamed(name string, cache *Cache) (highlight.Fragmenter, error) {
item, err := c.ItemNamed(name, cache, FragmenterBuild)
if err != nil {
return nil, err
}
return item.(highlight.Fragmenter), nil
}
func (c *FragmenterCache) DefineFragmenter(name string, typ string, config map[string]interface{}, cache *Cache) (highlight.Fragmenter, error) {
item, err := c.DefineItem(name, typ, config, cache, FragmenterBuild)
if err != nil {
if err == ErrAlreadyDefined {
return nil, fmt.Errorf("fragmenter named '%s' already defined", name)
}
return nil, err
}
return item.(highlight.Fragmenter), nil
}
func FragmenterTypesAndInstances() ([]string, []string) {
emptyConfig := map[string]interface{}{}
emptyCache := NewCache()
var types []string
var instances []string
for name, cons := range fragmenters {
_, err := cons(emptyConfig, emptyCache)
if err == nil {
instances = append(instances, name)
} else {
types = append(types, name)
}
}
return types, instances
}

View File

@ -0,0 +1,89 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/search/highlight"
)
func RegisterHighlighter(name string, constructor HighlighterConstructor) {
_, exists := highlighters[name]
if exists {
panic(fmt.Errorf("attempted to register duplicate highlighter named '%s'", name))
}
highlighters[name] = constructor
}
type HighlighterConstructor func(config map[string]interface{}, cache *Cache) (highlight.Highlighter, error)
type HighlighterRegistry map[string]HighlighterConstructor
type HighlighterCache struct {
*ConcurrentCache
}
func NewHighlighterCache() *HighlighterCache {
return &HighlighterCache{
NewConcurrentCache(),
}
}
func HighlighterBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
cons, registered := highlighters[name]
if !registered {
return nil, fmt.Errorf("no highlighter with name or type '%s' registered", name)
}
highlighter, err := cons(config, cache)
if err != nil {
return nil, fmt.Errorf("error building highlighter: %v", err)
}
return highlighter, nil
}
func (c *HighlighterCache) HighlighterNamed(name string, cache *Cache) (highlight.Highlighter, error) {
item, err := c.ItemNamed(name, cache, HighlighterBuild)
if err != nil {
return nil, err
}
return item.(highlight.Highlighter), nil
}
func (c *HighlighterCache) DefineHighlighter(name string, typ string, config map[string]interface{}, cache *Cache) (highlight.Highlighter, error) {
item, err := c.DefineItem(name, typ, config, cache, HighlighterBuild)
if err != nil {
if err == ErrAlreadyDefined {
return nil, fmt.Errorf("highlighter named '%s' already defined", name)
}
return nil, err
}
return item.(highlight.Highlighter), nil
}
func HighlighterTypesAndInstances() ([]string, []string) {
emptyConfig := map[string]interface{}{}
emptyCache := NewCache()
var types []string
var instances []string
for name, cons := range highlighters {
_, err := cons(emptyConfig, emptyCache)
if err == nil {
instances = append(instances, name)
} else {
types = append(types, name)
}
}
return types, instances
}

View File

@ -0,0 +1,45 @@
// Copyright (c) 2015 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/index"
)
func RegisterIndexType(name string, constructor IndexTypeConstructor) {
_, exists := indexTypes[name]
if exists {
panic(fmt.Errorf("attempted to register duplicate index encoding named '%s'", name))
}
indexTypes[name] = constructor
}
type IndexTypeConstructor func(storeName string, storeConfig map[string]interface{}, analysisQueue *index.AnalysisQueue) (index.Index, error)
type IndexTypeRegistry map[string]IndexTypeConstructor
func IndexTypeConstructorByName(name string) IndexTypeConstructor {
return indexTypes[name]
}
func IndexTypesAndInstances() ([]string, []string) {
var types []string
var instances []string
for name := range stores {
types = append(types, name)
}
return types, instances
}

View File

@ -0,0 +1,184 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/analysis"
"github.com/blevesearch/bleve/search/highlight"
)
var stores = make(KVStoreRegistry, 0)
var indexTypes = make(IndexTypeRegistry, 0)
// highlight
var fragmentFormatters = make(FragmentFormatterRegistry, 0)
var fragmenters = make(FragmenterRegistry, 0)
var highlighters = make(HighlighterRegistry, 0)
// analysis
var charFilters = make(CharFilterRegistry, 0)
var tokenizers = make(TokenizerRegistry, 0)
var tokenMaps = make(TokenMapRegistry, 0)
var tokenFilters = make(TokenFilterRegistry, 0)
var analyzers = make(AnalyzerRegistry, 0)
var dateTimeParsers = make(DateTimeParserRegistry, 0)
type Cache struct {
CharFilters *CharFilterCache
Tokenizers *TokenizerCache
TokenMaps *TokenMapCache
TokenFilters *TokenFilterCache
Analyzers *AnalyzerCache
DateTimeParsers *DateTimeParserCache
FragmentFormatters *FragmentFormatterCache
Fragmenters *FragmenterCache
Highlighters *HighlighterCache
}
func NewCache() *Cache {
return &Cache{
CharFilters: NewCharFilterCache(),
Tokenizers: NewTokenizerCache(),
TokenMaps: NewTokenMapCache(),
TokenFilters: NewTokenFilterCache(),
Analyzers: NewAnalyzerCache(),
DateTimeParsers: NewDateTimeParserCache(),
FragmentFormatters: NewFragmentFormatterCache(),
Fragmenters: NewFragmenterCache(),
Highlighters: NewHighlighterCache(),
}
}
func typeFromConfig(config map[string]interface{}) (string, error) {
prop, ok := config["type"]
if !ok {
return "", fmt.Errorf("'type' property is not defined")
}
typ, ok := prop.(string)
if !ok {
return "", fmt.Errorf("'type' property must be a string, not %T", prop)
}
return typ, nil
}
func (c *Cache) CharFilterNamed(name string) (analysis.CharFilter, error) {
return c.CharFilters.CharFilterNamed(name, c)
}
func (c *Cache) DefineCharFilter(name string, config map[string]interface{}) (analysis.CharFilter, error) {
typ, err := typeFromConfig(config)
if err != nil {
return nil, err
}
return c.CharFilters.DefineCharFilter(name, typ, config, c)
}
func (c *Cache) TokenizerNamed(name string) (analysis.Tokenizer, error) {
return c.Tokenizers.TokenizerNamed(name, c)
}
func (c *Cache) DefineTokenizer(name string, config map[string]interface{}) (analysis.Tokenizer, error) {
typ, err := typeFromConfig(config)
if err != nil {
return nil, fmt.Errorf("cannot resolve '%s' tokenizer type: %s", name, err)
}
return c.Tokenizers.DefineTokenizer(name, typ, config, c)
}
func (c *Cache) TokenMapNamed(name string) (analysis.TokenMap, error) {
return c.TokenMaps.TokenMapNamed(name, c)
}
func (c *Cache) DefineTokenMap(name string, config map[string]interface{}) (analysis.TokenMap, error) {
typ, err := typeFromConfig(config)
if err != nil {
return nil, err
}
return c.TokenMaps.DefineTokenMap(name, typ, config, c)
}
func (c *Cache) TokenFilterNamed(name string) (analysis.TokenFilter, error) {
return c.TokenFilters.TokenFilterNamed(name, c)
}
func (c *Cache) DefineTokenFilter(name string, config map[string]interface{}) (analysis.TokenFilter, error) {
typ, err := typeFromConfig(config)
if err != nil {
return nil, err
}
return c.TokenFilters.DefineTokenFilter(name, typ, config, c)
}
func (c *Cache) AnalyzerNamed(name string) (*analysis.Analyzer, error) {
return c.Analyzers.AnalyzerNamed(name, c)
}
func (c *Cache) DefineAnalyzer(name string, config map[string]interface{}) (*analysis.Analyzer, error) {
typ, err := typeFromConfig(config)
if err != nil {
return nil, err
}
return c.Analyzers.DefineAnalyzer(name, typ, config, c)
}
func (c *Cache) DateTimeParserNamed(name string) (analysis.DateTimeParser, error) {
return c.DateTimeParsers.DateTimeParserNamed(name, c)
}
func (c *Cache) DefineDateTimeParser(name string, config map[string]interface{}) (analysis.DateTimeParser, error) {
typ, err := typeFromConfig(config)
if err != nil {
return nil, err
}
return c.DateTimeParsers.DefineDateTimeParser(name, typ, config, c)
}
func (c *Cache) FragmentFormatterNamed(name string) (highlight.FragmentFormatter, error) {
return c.FragmentFormatters.FragmentFormatterNamed(name, c)
}
func (c *Cache) DefineFragmentFormatter(name string, config map[string]interface{}) (highlight.FragmentFormatter, error) {
typ, err := typeFromConfig(config)
if err != nil {
return nil, err
}
return c.FragmentFormatters.DefineFragmentFormatter(name, typ, config, c)
}
func (c *Cache) FragmenterNamed(name string) (highlight.Fragmenter, error) {
return c.Fragmenters.FragmenterNamed(name, c)
}
func (c *Cache) DefineFragmenter(name string, config map[string]interface{}) (highlight.Fragmenter, error) {
typ, err := typeFromConfig(config)
if err != nil {
return nil, err
}
return c.Fragmenters.DefineFragmenter(name, typ, config, c)
}
func (c *Cache) HighlighterNamed(name string) (highlight.Highlighter, error) {
return c.Highlighters.HighlighterNamed(name, c)
}
func (c *Cache) DefineHighlighter(name string, config map[string]interface{}) (highlight.Highlighter, error) {
typ, err := typeFromConfig(config)
if err != nil {
return nil, err
}
return c.Highlighters.DefineHighlighter(name, typ, config, c)
}

51
vendor/github.com/blevesearch/bleve/registry/store.go generated vendored Normal file
View File

@ -0,0 +1,51 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/index/store"
)
func RegisterKVStore(name string, constructor KVStoreConstructor) {
_, exists := stores[name]
if exists {
panic(fmt.Errorf("attempted to register duplicate store named '%s'", name))
}
stores[name] = constructor
}
// KVStoreConstructor is used to build a KVStore of a specific type when
// specificied by the index configuration. In addition to meeting the
// store.KVStore interface, KVStores must also support this constructor.
// Note that currently the values of config must
// be able to be marshaled and unmarshaled using the encoding/json library (used
// when reading/writing the index metadata file).
type KVStoreConstructor func(mo store.MergeOperator, config map[string]interface{}) (store.KVStore, error)
type KVStoreRegistry map[string]KVStoreConstructor
func KVStoreConstructorByName(name string) KVStoreConstructor {
return stores[name]
}
func KVStoreTypesAndInstances() ([]string, []string) {
var types []string
var instances []string
for name := range stores {
types = append(types, name)
}
return types, instances
}

View File

@ -0,0 +1,89 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/analysis"
)
func RegisterTokenFilter(name string, constructor TokenFilterConstructor) {
_, exists := tokenFilters[name]
if exists {
panic(fmt.Errorf("attempted to register duplicate token filter named '%s'", name))
}
tokenFilters[name] = constructor
}
type TokenFilterConstructor func(config map[string]interface{}, cache *Cache) (analysis.TokenFilter, error)
type TokenFilterRegistry map[string]TokenFilterConstructor
type TokenFilterCache struct {
*ConcurrentCache
}
func NewTokenFilterCache() *TokenFilterCache {
return &TokenFilterCache{
NewConcurrentCache(),
}
}
func TokenFilterBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
cons, registered := tokenFilters[name]
if !registered {
return nil, fmt.Errorf("no token filter with name or type '%s' registered", name)
}
tokenFilter, err := cons(config, cache)
if err != nil {
return nil, fmt.Errorf("error building token filter: %v", err)
}
return tokenFilter, nil
}
func (c *TokenFilterCache) TokenFilterNamed(name string, cache *Cache) (analysis.TokenFilter, error) {
item, err := c.ItemNamed(name, cache, TokenFilterBuild)
if err != nil {
return nil, err
}
return item.(analysis.TokenFilter), nil
}
func (c *TokenFilterCache) DefineTokenFilter(name string, typ string, config map[string]interface{}, cache *Cache) (analysis.TokenFilter, error) {
item, err := c.DefineItem(name, typ, config, cache, TokenFilterBuild)
if err != nil {
if err == ErrAlreadyDefined {
return nil, fmt.Errorf("token filter named '%s' already defined", name)
}
return nil, err
}
return item.(analysis.TokenFilter), nil
}
func TokenFilterTypesAndInstances() ([]string, []string) {
emptyConfig := map[string]interface{}{}
emptyCache := NewCache()
var types []string
var instances []string
for name, cons := range tokenFilters {
_, err := cons(emptyConfig, emptyCache)
if err == nil {
instances = append(instances, name)
} else {
types = append(types, name)
}
}
return types, instances
}

View File

@ -0,0 +1,89 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/analysis"
)
func RegisterTokenMap(name string, constructor TokenMapConstructor) {
_, exists := tokenMaps[name]
if exists {
panic(fmt.Errorf("attempted to register duplicate token map named '%s'", name))
}
tokenMaps[name] = constructor
}
type TokenMapConstructor func(config map[string]interface{}, cache *Cache) (analysis.TokenMap, error)
type TokenMapRegistry map[string]TokenMapConstructor
type TokenMapCache struct {
*ConcurrentCache
}
func NewTokenMapCache() *TokenMapCache {
return &TokenMapCache{
NewConcurrentCache(),
}
}
func TokenMapBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
cons, registered := tokenMaps[name]
if !registered {
return nil, fmt.Errorf("no token map with name or type '%s' registered", name)
}
tokenMap, err := cons(config, cache)
if err != nil {
return nil, fmt.Errorf("error building token map: %v", err)
}
return tokenMap, nil
}
func (c *TokenMapCache) TokenMapNamed(name string, cache *Cache) (analysis.TokenMap, error) {
item, err := c.ItemNamed(name, cache, TokenMapBuild)
if err != nil {
return nil, err
}
return item.(analysis.TokenMap), nil
}
func (c *TokenMapCache) DefineTokenMap(name string, typ string, config map[string]interface{}, cache *Cache) (analysis.TokenMap, error) {
item, err := c.DefineItem(name, typ, config, cache, TokenMapBuild)
if err != nil {
if err == ErrAlreadyDefined {
return nil, fmt.Errorf("token map named '%s' already defined", name)
}
return nil, err
}
return item.(analysis.TokenMap), nil
}
func TokenMapTypesAndInstances() ([]string, []string) {
emptyConfig := map[string]interface{}{}
emptyCache := NewCache()
var types []string
var instances []string
for name, cons := range tokenMaps {
_, err := cons(emptyConfig, emptyCache)
if err == nil {
instances = append(instances, name)
} else {
types = append(types, name)
}
}
return types, instances
}

View File

@ -0,0 +1,89 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package registry
import (
"fmt"
"github.com/blevesearch/bleve/analysis"
)
func RegisterTokenizer(name string, constructor TokenizerConstructor) {
_, exists := tokenizers[name]
if exists {
panic(fmt.Errorf("attempted to register duplicate tokenizer named '%s'", name))
}
tokenizers[name] = constructor
}
type TokenizerConstructor func(config map[string]interface{}, cache *Cache) (analysis.Tokenizer, error)
type TokenizerRegistry map[string]TokenizerConstructor
type TokenizerCache struct {
*ConcurrentCache
}
func NewTokenizerCache() *TokenizerCache {
return &TokenizerCache{
NewConcurrentCache(),
}
}
func TokenizerBuild(name string, config map[string]interface{}, cache *Cache) (interface{}, error) {
cons, registered := tokenizers[name]
if !registered {
return nil, fmt.Errorf("no tokenizer with name or type '%s' registered", name)
}
tokenizer, err := cons(config, cache)
if err != nil {
return nil, fmt.Errorf("error building tokenizer: %v", err)
}
return tokenizer, nil
}
func (c *TokenizerCache) TokenizerNamed(name string, cache *Cache) (analysis.Tokenizer, error) {
item, err := c.ItemNamed(name, cache, TokenizerBuild)
if err != nil {
return nil, err
}
return item.(analysis.Tokenizer), nil
}
func (c *TokenizerCache) DefineTokenizer(name string, typ string, config map[string]interface{}, cache *Cache) (analysis.Tokenizer, error) {
item, err := c.DefineItem(name, typ, config, cache, TokenizerBuild)
if err != nil {
if err == ErrAlreadyDefined {
return nil, fmt.Errorf("tokenizer named '%s' already defined", name)
}
return nil, err
}
return item.(analysis.Tokenizer), nil
}
func TokenizerTypesAndInstances() ([]string, []string) {
emptyConfig := map[string]interface{}{}
emptyCache := NewCache()
var types []string
var instances []string
for name, cons := range tokenizers {
_, err := cons(emptyConfig, emptyCache)
if err == nil {
instances = append(instances, name)
} else {
types = append(types, name)
}
}
return types, instances
}

View File

@ -0,0 +1,52 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package search
import (
"context"
"time"
"github.com/blevesearch/bleve/index"
)
type Collector interface {
Collect(ctx context.Context, searcher Searcher, reader index.IndexReader) error
Results() DocumentMatchCollection
Total() uint64
MaxScore() float64
Took() time.Duration
SetFacetsBuilder(facetsBuilder *FacetsBuilder)
FacetResults() FacetResults
}
// DocumentMatchHandler is the type of document match callback
// bleve will invoke during the search.
// Eventually, bleve will indicate the completion of an ongoing search,
// by passing a nil value for the document match callback.
// The application should take a copy of the hit/documentMatch
// if it wish to own it or need prolonged access to it.
type DocumentMatchHandler func(hit *DocumentMatch) error
type MakeDocumentMatchHandlerKeyType string
var MakeDocumentMatchHandlerKey = MakeDocumentMatchHandlerKeyType(
"MakeDocumentMatchHandlerKey")
// MakeDocumentMatchHandler is an optional DocumentMatchHandler
// builder function which the applications can pass to bleve.
// These builder methods gives a DocumentMatchHandler function
// to bleve, which it will invoke on every document matches.
type MakeDocumentMatchHandler func(ctx *SearchContext) (
callback DocumentMatchHandler, loadID bool, err error)

View File

@ -0,0 +1,55 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package search
import (
"encoding/json"
"fmt"
"reflect"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeExplanation int
func init() {
var e Explanation
reflectStaticSizeExplanation = int(reflect.TypeOf(e).Size())
}
type Explanation struct {
Value float64 `json:"value"`
Message string `json:"message"`
Children []*Explanation `json:"children,omitempty"`
}
func (expl *Explanation) String() string {
js, err := json.MarshalIndent(expl, "", " ")
if err != nil {
return fmt.Sprintf("error serializing explanation to json: %v", err)
}
return string(js)
}
func (expl *Explanation) Size() int {
sizeInBytes := reflectStaticSizeExplanation + size.SizeOfPtr +
len(expl.Message)
for _, entry := range expl.Children {
sizeInBytes += entry.Size()
}
return sizeInBytes
}

View File

@ -0,0 +1,341 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package search
import (
"reflect"
"sort"
"github.com/blevesearch/bleve/index"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeFacetsBuilder int
var reflectStaticSizeFacetResult int
var reflectStaticSizeTermFacet int
var reflectStaticSizeNumericRangeFacet int
var reflectStaticSizeDateRangeFacet int
func init() {
var fb FacetsBuilder
reflectStaticSizeFacetsBuilder = int(reflect.TypeOf(fb).Size())
var fr FacetResult
reflectStaticSizeFacetResult = int(reflect.TypeOf(fr).Size())
var tf TermFacet
reflectStaticSizeTermFacet = int(reflect.TypeOf(tf).Size())
var nrf NumericRangeFacet
reflectStaticSizeNumericRangeFacet = int(reflect.TypeOf(nrf).Size())
var drf DateRangeFacet
reflectStaticSizeDateRangeFacet = int(reflect.TypeOf(drf).Size())
}
type FacetBuilder interface {
StartDoc()
UpdateVisitor(field string, term []byte)
EndDoc()
Result() *FacetResult
Field() string
Size() int
}
type FacetsBuilder struct {
indexReader index.IndexReader
facetNames []string
facets []FacetBuilder
fields []string
}
func NewFacetsBuilder(indexReader index.IndexReader) *FacetsBuilder {
return &FacetsBuilder{
indexReader: indexReader,
}
}
func (fb *FacetsBuilder) Size() int {
sizeInBytes := reflectStaticSizeFacetsBuilder + size.SizeOfPtr
for k, v := range fb.facets {
sizeInBytes += size.SizeOfString + v.Size() + len(fb.facetNames[k])
}
for _, entry := range fb.fields {
sizeInBytes += size.SizeOfString + len(entry)
}
return sizeInBytes
}
func (fb *FacetsBuilder) Add(name string, facetBuilder FacetBuilder) {
fb.facetNames = append(fb.facetNames, name)
fb.facets = append(fb.facets, facetBuilder)
fb.fields = append(fb.fields, facetBuilder.Field())
}
func (fb *FacetsBuilder) RequiredFields() []string {
return fb.fields
}
func (fb *FacetsBuilder) StartDoc() {
for _, facetBuilder := range fb.facets {
facetBuilder.StartDoc()
}
}
func (fb *FacetsBuilder) EndDoc() {
for _, facetBuilder := range fb.facets {
facetBuilder.EndDoc()
}
}
func (fb *FacetsBuilder) UpdateVisitor(field string, term []byte) {
for _, facetBuilder := range fb.facets {
facetBuilder.UpdateVisitor(field, term)
}
}
type TermFacet struct {
Term string `json:"term"`
Count int `json:"count"`
}
type TermFacets []*TermFacet
func (tf TermFacets) Add(termFacet *TermFacet) TermFacets {
for _, existingTerm := range tf {
if termFacet.Term == existingTerm.Term {
existingTerm.Count += termFacet.Count
return tf
}
}
// if we got here it wasn't already in the existing terms
tf = append(tf, termFacet)
return tf
}
func (tf TermFacets) Len() int { return len(tf) }
func (tf TermFacets) Swap(i, j int) { tf[i], tf[j] = tf[j], tf[i] }
func (tf TermFacets) Less(i, j int) bool {
if tf[i].Count == tf[j].Count {
return tf[i].Term < tf[j].Term
}
return tf[i].Count > tf[j].Count
}
type NumericRangeFacet struct {
Name string `json:"name"`
Min *float64 `json:"min,omitempty"`
Max *float64 `json:"max,omitempty"`
Count int `json:"count"`
}
func (nrf *NumericRangeFacet) Same(other *NumericRangeFacet) bool {
if nrf.Min == nil && other.Min != nil {
return false
}
if nrf.Min != nil && other.Min == nil {
return false
}
if nrf.Min != nil && other.Min != nil && *nrf.Min != *other.Min {
return false
}
if nrf.Max == nil && other.Max != nil {
return false
}
if nrf.Max != nil && other.Max == nil {
return false
}
if nrf.Max != nil && other.Max != nil && *nrf.Max != *other.Max {
return false
}
return true
}
type NumericRangeFacets []*NumericRangeFacet
func (nrf NumericRangeFacets) Add(numericRangeFacet *NumericRangeFacet) NumericRangeFacets {
for _, existingNr := range nrf {
if numericRangeFacet.Same(existingNr) {
existingNr.Count += numericRangeFacet.Count
return nrf
}
}
// if we got here it wasn't already in the existing terms
nrf = append(nrf, numericRangeFacet)
return nrf
}
func (nrf NumericRangeFacets) Len() int { return len(nrf) }
func (nrf NumericRangeFacets) Swap(i, j int) { nrf[i], nrf[j] = nrf[j], nrf[i] }
func (nrf NumericRangeFacets) Less(i, j int) bool {
if nrf[i].Count == nrf[j].Count {
return nrf[i].Name < nrf[j].Name
}
return nrf[i].Count > nrf[j].Count
}
type DateRangeFacet struct {
Name string `json:"name"`
Start *string `json:"start,omitempty"`
End *string `json:"end,omitempty"`
Count int `json:"count"`
}
func (drf *DateRangeFacet) Same(other *DateRangeFacet) bool {
if drf.Start == nil && other.Start != nil {
return false
}
if drf.Start != nil && other.Start == nil {
return false
}
if drf.Start != nil && other.Start != nil && *drf.Start != *other.Start {
return false
}
if drf.End == nil && other.End != nil {
return false
}
if drf.End != nil && other.End == nil {
return false
}
if drf.End != nil && other.End != nil && *drf.End != *other.End {
return false
}
return true
}
type DateRangeFacets []*DateRangeFacet
func (drf DateRangeFacets) Add(dateRangeFacet *DateRangeFacet) DateRangeFacets {
for _, existingDr := range drf {
if dateRangeFacet.Same(existingDr) {
existingDr.Count += dateRangeFacet.Count
return drf
}
}
// if we got here it wasn't already in the existing terms
drf = append(drf, dateRangeFacet)
return drf
}
func (drf DateRangeFacets) Len() int { return len(drf) }
func (drf DateRangeFacets) Swap(i, j int) { drf[i], drf[j] = drf[j], drf[i] }
func (drf DateRangeFacets) Less(i, j int) bool {
if drf[i].Count == drf[j].Count {
return drf[i].Name < drf[j].Name
}
return drf[i].Count > drf[j].Count
}
type FacetResult struct {
Field string `json:"field"`
Total int `json:"total"`
Missing int `json:"missing"`
Other int `json:"other"`
Terms TermFacets `json:"terms,omitempty"`
NumericRanges NumericRangeFacets `json:"numeric_ranges,omitempty"`
DateRanges DateRangeFacets `json:"date_ranges,omitempty"`
}
func (fr *FacetResult) Size() int {
return reflectStaticSizeFacetResult + size.SizeOfPtr +
len(fr.Field) +
len(fr.Terms)*(reflectStaticSizeTermFacet+size.SizeOfPtr) +
len(fr.NumericRanges)*(reflectStaticSizeNumericRangeFacet+size.SizeOfPtr) +
len(fr.DateRanges)*(reflectStaticSizeDateRangeFacet+size.SizeOfPtr)
}
func (fr *FacetResult) Merge(other *FacetResult) {
fr.Total += other.Total
fr.Missing += other.Missing
fr.Other += other.Other
if fr.Terms != nil && other.Terms != nil {
for _, term := range other.Terms {
fr.Terms = fr.Terms.Add(term)
}
}
if fr.NumericRanges != nil && other.NumericRanges != nil {
for _, nr := range other.NumericRanges {
fr.NumericRanges = fr.NumericRanges.Add(nr)
}
}
if fr.DateRanges != nil && other.DateRanges != nil {
for _, dr := range other.DateRanges {
fr.DateRanges = fr.DateRanges.Add(dr)
}
}
}
func (fr *FacetResult) Fixup(size int) {
if fr.Terms != nil {
sort.Sort(fr.Terms)
if len(fr.Terms) > size {
moveToOther := fr.Terms[size:]
for _, mto := range moveToOther {
fr.Other += mto.Count
}
fr.Terms = fr.Terms[0:size]
}
} else if fr.NumericRanges != nil {
sort.Sort(fr.NumericRanges)
if len(fr.NumericRanges) > size {
moveToOther := fr.NumericRanges[size:]
for _, mto := range moveToOther {
fr.Other += mto.Count
}
fr.NumericRanges = fr.NumericRanges[0:size]
}
} else if fr.DateRanges != nil {
sort.Sort(fr.DateRanges)
if len(fr.DateRanges) > size {
moveToOther := fr.DateRanges[size:]
for _, mto := range moveToOther {
fr.Other += mto.Count
}
fr.DateRanges = fr.DateRanges[0:size]
}
}
}
type FacetResults map[string]*FacetResult
func (fr FacetResults) Merge(other FacetResults) {
for name, oFacetResult := range other {
facetResult, ok := fr[name]
if ok {
facetResult.Merge(oFacetResult)
} else {
fr[name] = oFacetResult
}
}
}
func (fr FacetResults) Fixup(name string, size int) {
facetResult, ok := fr[name]
if ok {
facetResult.Fixup(size)
}
}
func (fb *FacetsBuilder) Results() FacetResults {
fr := make(FacetResults)
for i, facetBuilder := range fb.facets {
facetResult := facetBuilder.Result()
fr[fb.facetNames[i]] = facetResult
}
return fr
}

View File

@ -0,0 +1,64 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package highlight
import (
"github.com/blevesearch/bleve/document"
"github.com/blevesearch/bleve/search"
)
type Fragment struct {
Orig []byte
ArrayPositions []uint64
Start int
End int
Score float64
Index int // used by heap
}
func (f *Fragment) Overlaps(other *Fragment) bool {
if other.Start >= f.Start && other.Start < f.End {
return true
} else if f.Start >= other.Start && f.Start < other.End {
return true
}
return false
}
type Fragmenter interface {
Fragment([]byte, TermLocations) []*Fragment
}
type FragmentFormatter interface {
Format(f *Fragment, orderedTermLocations TermLocations) string
}
type FragmentScorer interface {
Score(f *Fragment) float64
}
type Highlighter interface {
Fragmenter() Fragmenter
SetFragmenter(Fragmenter)
FragmentFormatter() FragmentFormatter
SetFragmentFormatter(FragmentFormatter)
Separator() string
SetSeparator(string)
BestFragmentInField(*search.DocumentMatch, *document.Document, string) string
BestFragmentsInField(*search.DocumentMatch, *document.Document, string, int) []string
}

View File

@ -0,0 +1,105 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package highlight
import (
"reflect"
"sort"
"github.com/blevesearch/bleve/search"
)
type TermLocation struct {
Term string
ArrayPositions search.ArrayPositions
Pos int
Start int
End int
}
func (tl *TermLocation) Overlaps(other *TermLocation) bool {
if reflect.DeepEqual(tl.ArrayPositions, other.ArrayPositions) {
if other.Start >= tl.Start && other.Start < tl.End {
return true
} else if tl.Start >= other.Start && tl.Start < other.End {
return true
}
}
return false
}
type TermLocations []*TermLocation
func (t TermLocations) Len() int { return len(t) }
func (t TermLocations) Swap(i, j int) { t[i], t[j] = t[j], t[i] }
func (t TermLocations) Less(i, j int) bool {
shortestArrayPositions := len(t[i].ArrayPositions)
if len(t[j].ArrayPositions) < shortestArrayPositions {
shortestArrayPositions = len(t[j].ArrayPositions)
}
// compare all the common array positions
for api := 0; api < shortestArrayPositions; api++ {
if t[i].ArrayPositions[api] < t[j].ArrayPositions[api] {
return true
}
if t[i].ArrayPositions[api] > t[j].ArrayPositions[api] {
return false
}
}
// all the common array positions are the same
if len(t[i].ArrayPositions) < len(t[j].ArrayPositions) {
return true // j array positions, longer so greater
} else if len(t[i].ArrayPositions) > len(t[j].ArrayPositions) {
return false // j array positions, shorter so less
}
// array positions the same, compare starts
return t[i].Start < t[j].Start
}
func (t TermLocations) MergeOverlapping() {
var lastTl *TermLocation
for i, tl := range t {
if lastTl == nil && tl != nil {
lastTl = tl
} else if lastTl != nil && tl != nil {
if lastTl.Overlaps(tl) {
// ok merge this with previous
lastTl.End = tl.End
t[i] = nil
}
}
}
}
func OrderTermLocations(tlm search.TermLocationMap) TermLocations {
rv := make(TermLocations, 0)
for term, locations := range tlm {
for _, location := range locations {
tl := TermLocation{
Term: term,
ArrayPositions: location.ArrayPositions,
Pos: int(location.Pos),
Start: int(location.Start),
End: int(location.End),
}
rv = append(rv, &tl)
}
}
sort.Sort(rv)
return rv
}

View File

@ -0,0 +1,114 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package search
import (
"math"
)
func LevenshteinDistance(a, b string) int {
la := len(a)
lb := len(b)
d := make([]int, la+1)
var lastdiag, olddiag, temp int
for i := 1; i <= la; i++ {
d[i] = i
}
for i := 1; i <= lb; i++ {
d[0] = i
lastdiag = i - 1
for j := 1; j <= la; j++ {
olddiag = d[j]
min := d[j] + 1
if (d[j-1] + 1) < min {
min = d[j-1] + 1
}
if a[j-1] == b[i-1] {
temp = 0
} else {
temp = 1
}
if (lastdiag + temp) < min {
min = lastdiag + temp
}
d[j] = min
lastdiag = olddiag
}
}
return d[la]
}
// LevenshteinDistanceMax same as LevenshteinDistance but
// attempts to bail early once we know the distance
// will be greater than max
// in which case the first return val will be the max
// and the second will be true, indicating max was exceeded
func LevenshteinDistanceMax(a, b string, max int) (int, bool) {
v, wasMax, _ := LevenshteinDistanceMaxReuseSlice(a, b, max, nil)
return v, wasMax
}
func LevenshteinDistanceMaxReuseSlice(a, b string, max int, d []int) (int, bool, []int) {
la := len(a)
lb := len(b)
ld := int(math.Abs(float64(la - lb)))
if ld > max {
return max, true, d
}
if cap(d) < la+1 {
d = make([]int, la+1)
}
d = d[:la+1]
var lastdiag, olddiag, temp int
for i := 1; i <= la; i++ {
d[i] = i
}
for i := 1; i <= lb; i++ {
d[0] = i
lastdiag = i - 1
rowmin := max + 1
for j := 1; j <= la; j++ {
olddiag = d[j]
min := d[j] + 1
if (d[j-1] + 1) < min {
min = d[j-1] + 1
}
if a[j-1] == b[i-1] {
temp = 0
} else {
temp = 1
}
if (lastdiag + temp) < min {
min = lastdiag + temp
}
if min < rowmin {
rowmin = min
}
d[j] = min
lastdiag = olddiag
}
// after each row if rowmin isn't less than max stop
if rowmin > max {
return max, true, d
}
}
return d[la], false, d
}

91
vendor/github.com/blevesearch/bleve/search/pool.go generated vendored Normal file
View File

@ -0,0 +1,91 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package search
import (
"reflect"
)
var reflectStaticSizeDocumentMatchPool int
func init() {
var dmp DocumentMatchPool
reflectStaticSizeDocumentMatchPool = int(reflect.TypeOf(dmp).Size())
}
// DocumentMatchPoolTooSmall is a callback function that can be executed
// when the DocumentMatchPool does not have sufficient capacity
// By default we just perform just-in-time allocation, but you could log
// a message, or panic, etc.
type DocumentMatchPoolTooSmall func(p *DocumentMatchPool) *DocumentMatch
// DocumentMatchPool manages use/re-use of DocumentMatch instances
// it pre-allocates space from a single large block with the expected
// number of instances. It is not thread-safe as currently all
// aspects of search take place in a single goroutine.
type DocumentMatchPool struct {
avail DocumentMatchCollection
TooSmall DocumentMatchPoolTooSmall
}
func defaultDocumentMatchPoolTooSmall(p *DocumentMatchPool) *DocumentMatch {
return &DocumentMatch{}
}
// NewDocumentMatchPool will build a DocumentMatchPool with memory
// pre-allocated to accommodate the requested number of DocumentMatch
// instances
func NewDocumentMatchPool(size, sortsize int) *DocumentMatchPool {
avail := make(DocumentMatchCollection, size)
// pre-allocate the expected number of instances
startBlock := make([]DocumentMatch, size)
startSorts := make([]string, size*sortsize)
// make these initial instances available
i, j := 0, 0
for i < size {
avail[i] = &startBlock[i]
avail[i].Sort = startSorts[j:j]
i += 1
j += sortsize
}
return &DocumentMatchPool{
avail: avail,
TooSmall: defaultDocumentMatchPoolTooSmall,
}
}
// Get returns an available DocumentMatch from the pool
// if the pool was not allocated with sufficient size, an allocation will
// occur to satisfy this request. As a side-effect this will grow the size
// of the pool.
func (p *DocumentMatchPool) Get() *DocumentMatch {
var rv *DocumentMatch
if len(p.avail) > 0 {
rv, p.avail = p.avail[len(p.avail)-1], p.avail[:len(p.avail)-1]
} else {
rv = p.TooSmall(p)
}
return rv
}
// Put returns a DocumentMatch to the pool
func (p *DocumentMatchPool) Put(d *DocumentMatch) {
if d == nil {
return
}
// reset DocumentMatch before returning it to available pool
d.Reset()
p.avail = append(p.avail, d)
}

378
vendor/github.com/blevesearch/bleve/search/search.go generated vendored Normal file
View File

@ -0,0 +1,378 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package search
import (
"fmt"
"reflect"
"sort"
"github.com/blevesearch/bleve/index"
"github.com/blevesearch/bleve/size"
)
var reflectStaticSizeDocumentMatch int
var reflectStaticSizeSearchContext int
var reflectStaticSizeLocation int
func init() {
var dm DocumentMatch
reflectStaticSizeDocumentMatch = int(reflect.TypeOf(dm).Size())
var sc SearchContext
reflectStaticSizeSearchContext = int(reflect.TypeOf(sc).Size())
var l Location
reflectStaticSizeLocation = int(reflect.TypeOf(l).Size())
}
type ArrayPositions []uint64
func (ap ArrayPositions) Equals(other ArrayPositions) bool {
if len(ap) != len(other) {
return false
}
for i := range ap {
if ap[i] != other[i] {
return false
}
}
return true
}
func (ap ArrayPositions) Compare(other ArrayPositions) int {
for i, p := range ap {
if i >= len(other) {
return 1
}
if p < other[i] {
return -1
}
if p > other[i] {
return 1
}
}
if len(ap) < len(other) {
return -1
}
return 0
}
type Location struct {
// Pos is the position of the term within the field, starting at 1
Pos uint64 `json:"pos"`
// Start and End are the byte offsets of the term in the field
Start uint64 `json:"start"`
End uint64 `json:"end"`
// ArrayPositions contains the positions of the term within any elements.
ArrayPositions ArrayPositions `json:"array_positions"`
}
func (l *Location) Size() int {
return reflectStaticSizeLocation + size.SizeOfPtr +
len(l.ArrayPositions)*size.SizeOfUint64
}
type Locations []*Location
func (p Locations) Len() int { return len(p) }
func (p Locations) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
func (p Locations) Less(i, j int) bool {
c := p[i].ArrayPositions.Compare(p[j].ArrayPositions)
if c < 0 {
return true
}
if c > 0 {
return false
}
return p[i].Pos < p[j].Pos
}
func (p Locations) Dedupe() Locations { // destructive!
if len(p) <= 1 {
return p
}
sort.Sort(p)
slow := 0
for _, pfast := range p {
pslow := p[slow]
if pslow.Pos == pfast.Pos &&
pslow.Start == pfast.Start &&
pslow.End == pfast.End &&
pslow.ArrayPositions.Equals(pfast.ArrayPositions) {
continue // duplicate, so only move fast ahead
}
slow++
p[slow] = pfast
}
return p[:slow+1]
}
type TermLocationMap map[string]Locations
func (t TermLocationMap) AddLocation(term string, location *Location) {
t[term] = append(t[term], location)
}
type FieldTermLocationMap map[string]TermLocationMap
type FieldTermLocation struct {
Field string
Term string
Location Location
}
type FieldFragmentMap map[string][]string
type DocumentMatch struct {
Index string `json:"index,omitempty"`
ID string `json:"id"`
IndexInternalID index.IndexInternalID `json:"-"`
Score float64 `json:"score"`
Expl *Explanation `json:"explanation,omitempty"`
Locations FieldTermLocationMap `json:"locations,omitempty"`
Fragments FieldFragmentMap `json:"fragments,omitempty"`
Sort []string `json:"sort,omitempty"`
// Fields contains the values for document fields listed in
// SearchRequest.Fields. Text fields are returned as strings, numeric
// fields as float64s and date fields as time.RFC3339 formatted strings.
Fields map[string]interface{} `json:"fields,omitempty"`
// used to maintain natural index order
HitNumber uint64 `json:"-"`
// used to temporarily hold field term location information during
// search processing in an efficient, recycle-friendly manner, to
// be later incorporated into the Locations map when search
// results are completed
FieldTermLocations []FieldTermLocation `json:"-"`
}
func (dm *DocumentMatch) AddFieldValue(name string, value interface{}) {
if dm.Fields == nil {
dm.Fields = make(map[string]interface{})
}
existingVal, ok := dm.Fields[name]
if !ok {
dm.Fields[name] = value
return
}
valSlice, ok := existingVal.([]interface{})
if ok {
// already a slice, append to it
valSlice = append(valSlice, value)
} else {
// create a slice
valSlice = []interface{}{existingVal, value}
}
dm.Fields[name] = valSlice
}
// Reset allows an already allocated DocumentMatch to be reused
func (dm *DocumentMatch) Reset() *DocumentMatch {
// remember the []byte used for the IndexInternalID
indexInternalID := dm.IndexInternalID
// remember the []interface{} used for sort
sort := dm.Sort
// remember the FieldTermLocations backing array
ftls := dm.FieldTermLocations
for i := range ftls { // recycle the ArrayPositions of each location
ftls[i].Location.ArrayPositions = ftls[i].Location.ArrayPositions[:0]
}
// idiom to copy over from empty DocumentMatch (0 allocations)
*dm = DocumentMatch{}
// reuse the []byte already allocated (and reset len to 0)
dm.IndexInternalID = indexInternalID[:0]
// reuse the []interface{} already allocated (and reset len to 0)
dm.Sort = sort[:0]
// reuse the FieldTermLocations already allocated (and reset len to 0)
dm.FieldTermLocations = ftls[:0]
return dm
}
func (dm *DocumentMatch) Size() int {
sizeInBytes := reflectStaticSizeDocumentMatch + size.SizeOfPtr +
len(dm.Index) +
len(dm.ID) +
len(dm.IndexInternalID)
if dm.Expl != nil {
sizeInBytes += dm.Expl.Size()
}
for k, v := range dm.Locations {
sizeInBytes += size.SizeOfString + len(k)
for k1, v1 := range v {
sizeInBytes += size.SizeOfString + len(k1) +
size.SizeOfSlice
for _, entry := range v1 {
sizeInBytes += entry.Size()
}
}
}
for k, v := range dm.Fragments {
sizeInBytes += size.SizeOfString + len(k) +
size.SizeOfSlice
for _, entry := range v {
sizeInBytes += size.SizeOfString + len(entry)
}
}
for _, entry := range dm.Sort {
sizeInBytes += size.SizeOfString + len(entry)
}
for k, _ := range dm.Fields {
sizeInBytes += size.SizeOfString + len(k) +
size.SizeOfPtr
}
return sizeInBytes
}
// Complete performs final preparation & transformation of the
// DocumentMatch at the end of search processing, also allowing the
// caller to provide an optional preallocated locations slice
func (dm *DocumentMatch) Complete(prealloc []Location) []Location {
// transform the FieldTermLocations slice into the Locations map
nlocs := len(dm.FieldTermLocations)
if nlocs > 0 {
if cap(prealloc) < nlocs {
prealloc = make([]Location, nlocs)
}
prealloc = prealloc[:nlocs]
var lastField string
var tlm TermLocationMap
var needsDedupe bool
for i, ftl := range dm.FieldTermLocations {
if lastField != ftl.Field {
lastField = ftl.Field
if dm.Locations == nil {
dm.Locations = make(FieldTermLocationMap)
}
tlm = dm.Locations[ftl.Field]
if tlm == nil {
tlm = make(TermLocationMap)
dm.Locations[ftl.Field] = tlm
}
}
loc := &prealloc[i]
*loc = ftl.Location
if len(loc.ArrayPositions) > 0 { // copy
loc.ArrayPositions = append(ArrayPositions(nil), loc.ArrayPositions...)
}
locs := tlm[ftl.Term]
// if the loc is before or at the last location, then there
// might be duplicates that need to be deduplicated
if !needsDedupe && len(locs) > 0 {
last := locs[len(locs)-1]
cmp := loc.ArrayPositions.Compare(last.ArrayPositions)
if cmp < 0 || (cmp == 0 && loc.Pos <= last.Pos) {
needsDedupe = true
}
}
tlm[ftl.Term] = append(locs, loc)
dm.FieldTermLocations[i] = FieldTermLocation{ // recycle
Location: Location{
ArrayPositions: ftl.Location.ArrayPositions[:0],
},
}
}
if needsDedupe {
for _, tlm := range dm.Locations {
for term, locs := range tlm {
tlm[term] = locs.Dedupe()
}
}
}
}
dm.FieldTermLocations = dm.FieldTermLocations[:0] // recycle
return prealloc
}
func (dm *DocumentMatch) String() string {
return fmt.Sprintf("[%s-%f]", string(dm.IndexInternalID), dm.Score)
}
type DocumentMatchCollection []*DocumentMatch
func (c DocumentMatchCollection) Len() int { return len(c) }
func (c DocumentMatchCollection) Swap(i, j int) { c[i], c[j] = c[j], c[i] }
func (c DocumentMatchCollection) Less(i, j int) bool { return c[i].Score > c[j].Score }
type Searcher interface {
Next(ctx *SearchContext) (*DocumentMatch, error)
Advance(ctx *SearchContext, ID index.IndexInternalID) (*DocumentMatch, error)
Close() error
Weight() float64
SetQueryNorm(float64)
Count() uint64
Min() int
Size() int
DocumentMatchPoolSize() int
}
type SearcherOptions struct {
Explain bool
IncludeTermVectors bool
Score string
}
// SearchContext represents the context around a single search
type SearchContext struct {
DocumentMatchPool *DocumentMatchPool
Collector Collector
IndexReader index.IndexReader
}
func (sc *SearchContext) Size() int {
sizeInBytes := reflectStaticSizeSearchContext + size.SizeOfPtr +
reflectStaticSizeDocumentMatchPool + size.SizeOfPtr
if sc.DocumentMatchPool != nil {
for _, entry := range sc.DocumentMatchPool.avail {
if entry != nil {
sizeInBytes += entry.Size()
}
}
}
return sizeInBytes
}

741
vendor/github.com/blevesearch/bleve/search/sort.go generated vendored Normal file
View File

@ -0,0 +1,741 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package search
import (
"bytes"
"encoding/json"
"fmt"
"math"
"sort"
"strings"
"github.com/blevesearch/bleve/geo"
"github.com/blevesearch/bleve/numeric"
)
var HighTerm = strings.Repeat(string([]byte{0xff}), 10)
var LowTerm = string([]byte{0x00})
type SearchSort interface {
UpdateVisitor(field string, term []byte)
Value(a *DocumentMatch) string
Descending() bool
RequiresDocID() bool
RequiresScoring() bool
RequiresFields() []string
Reverse()
Copy() SearchSort
}
func ParseSearchSortObj(input map[string]interface{}) (SearchSort, error) {
descending, ok := input["desc"].(bool)
by, ok := input["by"].(string)
if !ok {
return nil, fmt.Errorf("search sort must specify by")
}
switch by {
case "id":
return &SortDocID{
Desc: descending,
}, nil
case "score":
return &SortScore{
Desc: descending,
}, nil
case "geo_distance":
field, ok := input["field"].(string)
if !ok {
return nil, fmt.Errorf("search sort mode geo_distance must specify field")
}
lon, lat, foundLocation := geo.ExtractGeoPoint(input["location"])
if !foundLocation {
return nil, fmt.Errorf("unable to parse geo_distance location")
}
rvd := &SortGeoDistance{
Field: field,
Desc: descending,
Lon: lon,
Lat: lat,
unitMult: 1.0,
}
if distUnit, ok := input["unit"].(string); ok {
var err error
rvd.unitMult, err = geo.ParseDistanceUnit(distUnit)
if err != nil {
return nil, err
}
rvd.Unit = distUnit
}
return rvd, nil
case "field":
field, ok := input["field"].(string)
if !ok {
return nil, fmt.Errorf("search sort mode field must specify field")
}
rv := &SortField{
Field: field,
Desc: descending,
}
typ, ok := input["type"].(string)
if ok {
switch typ {
case "auto":
rv.Type = SortFieldAuto
case "string":
rv.Type = SortFieldAsString
case "number":
rv.Type = SortFieldAsNumber
case "date":
rv.Type = SortFieldAsDate
default:
return nil, fmt.Errorf("unknown sort field type: %s", typ)
}
}
mode, ok := input["mode"].(string)
if ok {
switch mode {
case "default":
rv.Mode = SortFieldDefault
case "min":
rv.Mode = SortFieldMin
case "max":
rv.Mode = SortFieldMax
default:
return nil, fmt.Errorf("unknown sort field mode: %s", mode)
}
}
missing, ok := input["missing"].(string)
if ok {
switch missing {
case "first":
rv.Missing = SortFieldMissingFirst
case "last":
rv.Missing = SortFieldMissingLast
default:
return nil, fmt.Errorf("unknown sort field missing: %s", missing)
}
}
return rv, nil
}
return nil, fmt.Errorf("unknown search sort by: %s", by)
}
func ParseSearchSortString(input string) SearchSort {
descending := false
if strings.HasPrefix(input, "-") {
descending = true
input = input[1:]
} else if strings.HasPrefix(input, "+") {
input = input[1:]
}
if input == "_id" {
return &SortDocID{
Desc: descending,
}
} else if input == "_score" {
return &SortScore{
Desc: descending,
}
}
return &SortField{
Field: input,
Desc: descending,
}
}
func ParseSearchSortJSON(input json.RawMessage) (SearchSort, error) {
// first try to parse it as string
var sortString string
err := json.Unmarshal(input, &sortString)
if err != nil {
var sortObj map[string]interface{}
err = json.Unmarshal(input, &sortObj)
if err != nil {
return nil, err
}
return ParseSearchSortObj(sortObj)
}
return ParseSearchSortString(sortString), nil
}
func ParseSortOrderStrings(in []string) SortOrder {
rv := make(SortOrder, 0, len(in))
for _, i := range in {
ss := ParseSearchSortString(i)
rv = append(rv, ss)
}
return rv
}
func ParseSortOrderJSON(in []json.RawMessage) (SortOrder, error) {
rv := make(SortOrder, 0, len(in))
for _, i := range in {
ss, err := ParseSearchSortJSON(i)
if err != nil {
return nil, err
}
rv = append(rv, ss)
}
return rv, nil
}
type SortOrder []SearchSort
func (so SortOrder) Value(doc *DocumentMatch) {
for _, soi := range so {
doc.Sort = append(doc.Sort, soi.Value(doc))
}
}
func (so SortOrder) UpdateVisitor(field string, term []byte) {
for _, soi := range so {
soi.UpdateVisitor(field, term)
}
}
func (so SortOrder) Copy() SortOrder {
rv := make(SortOrder, len(so))
for i, soi := range so {
rv[i] = soi.Copy()
}
return rv
}
// Compare will compare two document matches using the specified sort order
// if both are numbers, we avoid converting back to term
func (so SortOrder) Compare(cachedScoring, cachedDesc []bool, i, j *DocumentMatch) int {
// compare the documents on all search sorts until a differences is found
for x := range so {
c := 0
if cachedScoring[x] {
if i.Score < j.Score {
c = -1
} else if i.Score > j.Score {
c = 1
}
} else {
iVal := i.Sort[x]
jVal := j.Sort[x]
c = strings.Compare(iVal, jVal)
}
if c == 0 {
continue
}
if cachedDesc[x] {
c = -c
}
return c
}
// if they are the same at this point, impose order based on index natural sort order
if i.HitNumber == j.HitNumber {
return 0
} else if i.HitNumber > j.HitNumber {
return 1
}
return -1
}
func (so SortOrder) RequiresScore() bool {
for _, soi := range so {
if soi.RequiresScoring() {
return true
}
}
return false
}
func (so SortOrder) RequiresDocID() bool {
for _, soi := range so {
if soi.RequiresDocID() {
return true
}
}
return false
}
func (so SortOrder) RequiredFields() []string {
var rv []string
for _, soi := range so {
rv = append(rv, soi.RequiresFields()...)
}
return rv
}
func (so SortOrder) CacheIsScore() []bool {
rv := make([]bool, 0, len(so))
for _, soi := range so {
rv = append(rv, soi.RequiresScoring())
}
return rv
}
func (so SortOrder) CacheDescending() []bool {
rv := make([]bool, 0, len(so))
for _, soi := range so {
rv = append(rv, soi.Descending())
}
return rv
}
func (so SortOrder) Reverse() {
for _, soi := range so {
soi.Reverse()
}
}
// SortFieldType lets you control some internal sort behavior
// normally leaving this to the zero-value of SortFieldAuto is fine
type SortFieldType int
const (
// SortFieldAuto applies heuristics attempt to automatically sort correctly
SortFieldAuto SortFieldType = iota
// SortFieldAsString forces sort as string (no prefix coded terms removed)
SortFieldAsString
// SortFieldAsNumber forces sort as string (prefix coded terms with shift > 0 removed)
SortFieldAsNumber
// SortFieldAsDate forces sort as string (prefix coded terms with shift > 0 removed)
SortFieldAsDate
)
// SortFieldMode describes the behavior if the field has multiple values
type SortFieldMode int
const (
// SortFieldDefault uses the first (or only) value, this is the default zero-value
SortFieldDefault SortFieldMode = iota // FIXME name is confusing
// SortFieldMin uses the minimum value
SortFieldMin
// SortFieldMax uses the maximum value
SortFieldMax
)
// SortFieldMissing controls where documents missing a field value should be sorted
type SortFieldMissing int
const (
// SortFieldMissingLast sorts documents missing a field at the end
SortFieldMissingLast SortFieldMissing = iota
// SortFieldMissingFirst sorts documents missing a field at the beginning
SortFieldMissingFirst
)
// SortField will sort results by the value of a stored field
// Field is the name of the field
// Descending reverse the sort order (default false)
// Type allows forcing of string/number/date behavior (default auto)
// Mode controls behavior for multi-values fields (default first)
// Missing controls behavior of missing values (default last)
type SortField struct {
Field string
Desc bool
Type SortFieldType
Mode SortFieldMode
Missing SortFieldMissing
values [][]byte
tmp [][]byte
}
// UpdateVisitor notifies this sort field that in this document
// this field has the specified term
func (s *SortField) UpdateVisitor(field string, term []byte) {
if field == s.Field {
s.values = append(s.values, term)
}
}
// Value returns the sort value of the DocumentMatch
// it also resets the state of this SortField for
// processing the next document
func (s *SortField) Value(i *DocumentMatch) string {
iTerms := s.filterTermsByType(s.values)
iTerm := s.filterTermsByMode(iTerms)
s.values = s.values[:0]
return iTerm
}
// Descending determines the order of the sort
func (s *SortField) Descending() bool {
return s.Desc
}
func (s *SortField) filterTermsByMode(terms [][]byte) string {
if len(terms) == 1 || (len(terms) > 1 && s.Mode == SortFieldDefault) {
return string(terms[0])
} else if len(terms) > 1 {
switch s.Mode {
case SortFieldMin:
sort.Sort(BytesSlice(terms))
return string(terms[0])
case SortFieldMax:
sort.Sort(BytesSlice(terms))
return string(terms[len(terms)-1])
}
}
// handle missing terms
if s.Missing == SortFieldMissingLast {
if s.Desc {
return LowTerm
}
return HighTerm
}
if s.Desc {
return HighTerm
}
return LowTerm
}
// filterTermsByType attempts to make one pass on the terms
// if we are in auto-mode AND all the terms look like prefix-coded numbers
// return only the terms which had shift of 0
// if we are in explicit number or date mode, return only valid
// prefix coded numbers with shift of 0
func (s *SortField) filterTermsByType(terms [][]byte) [][]byte {
stype := s.Type
if stype == SortFieldAuto {
allTermsPrefixCoded := true
termsWithShiftZero := s.tmp[:0]
for _, term := range terms {
valid, shift := numeric.ValidPrefixCodedTermBytes(term)
if valid && shift == 0 {
termsWithShiftZero = append(termsWithShiftZero, term)
} else if !valid {
allTermsPrefixCoded = false
}
}
if allTermsPrefixCoded {
terms = termsWithShiftZero
s.tmp = termsWithShiftZero[:0]
}
} else if stype == SortFieldAsNumber || stype == SortFieldAsDate {
termsWithShiftZero := s.tmp[:0]
for _, term := range terms {
valid, shift := numeric.ValidPrefixCodedTermBytes(term)
if valid && shift == 0 {
termsWithShiftZero = append(termsWithShiftZero, term)
}
}
terms = termsWithShiftZero
s.tmp = termsWithShiftZero[:0]
}
return terms
}
// RequiresDocID says this SearchSort does not require the DocID be loaded
func (s *SortField) RequiresDocID() bool { return false }
// RequiresScoring says this SearchStore does not require scoring
func (s *SortField) RequiresScoring() bool { return false }
// RequiresFields says this SearchStore requires the specified stored field
func (s *SortField) RequiresFields() []string { return []string{s.Field} }
func (s *SortField) MarshalJSON() ([]byte, error) {
// see if simple format can be used
if s.Missing == SortFieldMissingLast &&
s.Mode == SortFieldDefault &&
s.Type == SortFieldAuto {
if s.Desc {
return json.Marshal("-" + s.Field)
}
return json.Marshal(s.Field)
}
sfm := map[string]interface{}{
"by": "field",
"field": s.Field,
}
if s.Desc {
sfm["desc"] = true
}
if s.Missing > SortFieldMissingLast {
switch s.Missing {
case SortFieldMissingFirst:
sfm["missing"] = "first"
}
}
if s.Mode > SortFieldDefault {
switch s.Mode {
case SortFieldMin:
sfm["mode"] = "min"
case SortFieldMax:
sfm["mode"] = "max"
}
}
if s.Type > SortFieldAuto {
switch s.Type {
case SortFieldAsString:
sfm["type"] = "string"
case SortFieldAsNumber:
sfm["type"] = "number"
case SortFieldAsDate:
sfm["type"] = "date"
}
}
return json.Marshal(sfm)
}
func (s *SortField) Copy() SearchSort {
rv := *s
return &rv
}
func (s *SortField) Reverse() {
s.Desc = !s.Desc
if s.Missing == SortFieldMissingFirst {
s.Missing = SortFieldMissingLast
} else {
s.Missing = SortFieldMissingFirst
}
}
// SortDocID will sort results by the document identifier
type SortDocID struct {
Desc bool
}
// UpdateVisitor is a no-op for SortDocID as it's value
// is not dependent on any field terms
func (s *SortDocID) UpdateVisitor(field string, term []byte) {
}
// Value returns the sort value of the DocumentMatch
func (s *SortDocID) Value(i *DocumentMatch) string {
return i.ID
}
// Descending determines the order of the sort
func (s *SortDocID) Descending() bool {
return s.Desc
}
// RequiresDocID says this SearchSort does require the DocID be loaded
func (s *SortDocID) RequiresDocID() bool { return true }
// RequiresScoring says this SearchStore does not require scoring
func (s *SortDocID) RequiresScoring() bool { return false }
// RequiresFields says this SearchStore does not require any stored fields
func (s *SortDocID) RequiresFields() []string { return nil }
func (s *SortDocID) MarshalJSON() ([]byte, error) {
if s.Desc {
return json.Marshal("-_id")
}
return json.Marshal("_id")
}
func (s *SortDocID) Copy() SearchSort {
rv := *s
return &rv
}
func (s *SortDocID) Reverse() {
s.Desc = !s.Desc
}
// SortScore will sort results by the document match score
type SortScore struct {
Desc bool
}
// UpdateVisitor is a no-op for SortScore as it's value
// is not dependent on any field terms
func (s *SortScore) UpdateVisitor(field string, term []byte) {
}
// Value returns the sort value of the DocumentMatch
func (s *SortScore) Value(i *DocumentMatch) string {
return "_score"
}
// Descending determines the order of the sort
func (s *SortScore) Descending() bool {
return s.Desc
}
// RequiresDocID says this SearchSort does not require the DocID be loaded
func (s *SortScore) RequiresDocID() bool { return false }
// RequiresScoring says this SearchStore does require scoring
func (s *SortScore) RequiresScoring() bool { return true }
// RequiresFields says this SearchStore does not require any store fields
func (s *SortScore) RequiresFields() []string { return nil }
func (s *SortScore) MarshalJSON() ([]byte, error) {
if s.Desc {
return json.Marshal("-_score")
}
return json.Marshal("_score")
}
func (s *SortScore) Copy() SearchSort {
rv := *s
return &rv
}
func (s *SortScore) Reverse() {
s.Desc = !s.Desc
}
var maxDistance = string(numeric.MustNewPrefixCodedInt64(math.MaxInt64, 0))
// NewSortGeoDistance creates SearchSort instance for sorting documents by
// their distance from the specified point.
func NewSortGeoDistance(field, unit string, lon, lat float64, desc bool) (
*SortGeoDistance, error) {
rv := &SortGeoDistance{
Field: field,
Desc: desc,
Unit: unit,
Lon: lon,
Lat: lat,
}
var err error
rv.unitMult, err = geo.ParseDistanceUnit(unit)
if err != nil {
return nil, err
}
return rv, nil
}
// SortGeoDistance will sort results by the distance of an
// indexed geo point, from the provided location.
// Field is the name of the field
// Descending reverse the sort order (default false)
type SortGeoDistance struct {
Field string
Desc bool
Unit string
values []string
Lon float64
Lat float64
unitMult float64
}
// UpdateVisitor notifies this sort field that in this document
// this field has the specified term
func (s *SortGeoDistance) UpdateVisitor(field string, term []byte) {
if field == s.Field {
s.values = append(s.values, string(term))
}
}
// Value returns the sort value of the DocumentMatch
// it also resets the state of this SortField for
// processing the next document
func (s *SortGeoDistance) Value(i *DocumentMatch) string {
iTerms := s.filterTermsByType(s.values)
iTerm := s.filterTermsByMode(iTerms)
s.values = s.values[:0]
if iTerm == "" {
return maxDistance
}
i64, err := numeric.PrefixCoded(iTerm).Int64()
if err != nil {
return maxDistance
}
docLon := geo.MortonUnhashLon(uint64(i64))
docLat := geo.MortonUnhashLat(uint64(i64))
dist := geo.Haversin(s.Lon, s.Lat, docLon, docLat)
// dist is returned in km, so convert to m
dist *= 1000
if s.unitMult != 0 {
dist /= s.unitMult
}
distInt64 := numeric.Float64ToInt64(dist)
return string(numeric.MustNewPrefixCodedInt64(distInt64, 0))
}
// Descending determines the order of the sort
func (s *SortGeoDistance) Descending() bool {
return s.Desc
}
func (s *SortGeoDistance) filterTermsByMode(terms []string) string {
if len(terms) >= 1 {
return terms[0]
}
return ""
}
// filterTermsByType attempts to make one pass on the terms
// return only valid prefix coded numbers with shift of 0
func (s *SortGeoDistance) filterTermsByType(terms []string) []string {
var termsWithShiftZero []string
for _, term := range terms {
valid, shift := numeric.ValidPrefixCodedTerm(term)
if valid && shift == 0 {
termsWithShiftZero = append(termsWithShiftZero, term)
}
}
return termsWithShiftZero
}
// RequiresDocID says this SearchSort does not require the DocID be loaded
func (s *SortGeoDistance) RequiresDocID() bool { return false }
// RequiresScoring says this SearchStore does not require scoring
func (s *SortGeoDistance) RequiresScoring() bool { return false }
// RequiresFields says this SearchStore requires the specified stored field
func (s *SortGeoDistance) RequiresFields() []string { return []string{s.Field} }
func (s *SortGeoDistance) MarshalJSON() ([]byte, error) {
sfm := map[string]interface{}{
"by": "geo_distance",
"field": s.Field,
"location": map[string]interface{}{
"lon": s.Lon,
"lat": s.Lat,
},
}
if s.Unit != "" {
sfm["unit"] = s.Unit
}
if s.Desc {
sfm["desc"] = true
}
return json.Marshal(sfm)
}
func (s *SortGeoDistance) Copy() SearchSort {
rv := *s
return &rv
}
func (s *SortGeoDistance) Reverse() {
s.Desc = !s.Desc
}
type BytesSlice [][]byte
func (p BytesSlice) Len() int { return len(p) }
func (p BytesSlice) Less(i, j int) bool { return bytes.Compare(p[i], p[j]) < 0 }
func (p BytesSlice) Swap(i, j int) { p[i], p[j] = p[j], p[i] }

69
vendor/github.com/blevesearch/bleve/search/util.go generated vendored Normal file
View File

@ -0,0 +1,69 @@
// Copyright (c) 2014 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package search
func MergeLocations(locations []FieldTermLocationMap) FieldTermLocationMap {
rv := locations[0]
for i := 1; i < len(locations); i++ {
nextLocations := locations[i]
for field, termLocationMap := range nextLocations {
rvTermLocationMap, rvHasField := rv[field]
if rvHasField {
rv[field] = MergeTermLocationMaps(rvTermLocationMap, termLocationMap)
} else {
rv[field] = termLocationMap
}
}
}
return rv
}
func MergeTermLocationMaps(rv, other TermLocationMap) TermLocationMap {
for term, locationMap := range other {
// for a given term/document there cannot be different locations
// if they came back from different clauses, overwrite is ok
rv[term] = locationMap
}
return rv
}
func MergeFieldTermLocations(dest []FieldTermLocation, matches []*DocumentMatch) []FieldTermLocation {
n := len(dest)
for _, dm := range matches {
n += len(dm.FieldTermLocations)
}
if cap(dest) < n {
dest = append(make([]FieldTermLocation, 0, n), dest...)
}
for _, dm := range matches {
for _, ftl := range dm.FieldTermLocations {
dest = append(dest, FieldTermLocation{
Field: ftl.Field,
Term: ftl.Term,
Location: Location{
Pos: ftl.Location.Pos,
Start: ftl.Location.Start,
End: ftl.Location.End,
ArrayPositions: append(ArrayPositions(nil), ftl.Location.ArrayPositions...),
},
})
}
}
return dest
}

59
vendor/github.com/blevesearch/bleve/size/sizes.go generated vendored Normal file
View File

@ -0,0 +1,59 @@
// Copyright (c) 2018 Couchbase, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package size
import (
"reflect"
)
func init() {
var b bool
SizeOfBool = int(reflect.TypeOf(b).Size())
var f32 float32
SizeOfFloat32 = int(reflect.TypeOf(f32).Size())
var f64 float64
SizeOfFloat64 = int(reflect.TypeOf(f64).Size())
var i int
SizeOfInt = int(reflect.TypeOf(i).Size())
var m map[int]int
SizeOfMap = int(reflect.TypeOf(m).Size())
var ptr *int
SizeOfPtr = int(reflect.TypeOf(ptr).Size())
var slice []int
SizeOfSlice = int(reflect.TypeOf(slice).Size())
var str string
SizeOfString = int(reflect.TypeOf(str).Size())
var u8 uint8
SizeOfUint8 = int(reflect.TypeOf(u8).Size())
var u16 uint16
SizeOfUint16 = int(reflect.TypeOf(u16).Size())
var u32 uint32
SizeOfUint32 = int(reflect.TypeOf(u32).Size())
var u64 uint64
SizeOfUint64 = int(reflect.TypeOf(u64).Size())
}
var SizeOfBool int
var SizeOfFloat32 int
var SizeOfFloat64 int
var SizeOfInt int
var SizeOfMap int
var SizeOfPtr int
var SizeOfSlice int
var SizeOfString int
var SizeOfUint8 int
var SizeOfUint16 int
var SizeOfUint32 int
var SizeOfUint64 int

View File

@ -0,0 +1,8 @@
#*
*.sublime-*
*~
.#*
.project
.settings
.DS_Store
/testdata

View File

@ -0,0 +1,16 @@
language: go
go:
- 1.4
script:
- go get golang.org/x/tools/cmd/vet
- go get golang.org/x/tools/cmd/cover
- go get github.com/mattn/goveralls
- go test -v -covermode=count -coverprofile=profile.out
- go vet
- goveralls -service drone.io -coverprofile=profile.out -repotoken $COVERALLS
notifications:
email:
- marty.schoch@gmail.com

19
vendor/github.com/blevesearch/go-porterstemmer/LICENSE generated vendored Normal file
View File

@ -0,0 +1,19 @@
Copyright (c) 2013 Charles Iliya Krempeaux <charles@reptile.ca> :: http://changelog.ca/
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@ -0,0 +1,118 @@
# This fork...
I'm maintaining this fork because the original author was not replying to issues or pull requests. For now I plan on maintaining this fork as necessary.
## Status
[![Build Status](https://travis-ci.org/blevesearch/go-porterstemmer.svg?branch=master)](https://travis-ci.org/blevesearch/go-porterstemmer)
[![Coverage Status](https://coveralls.io/repos/blevesearch/go-porterstemmer/badge.png?branch=HEAD)](https://coveralls.io/r/blevesearch/go-porterstemmer?branch=HEAD)
# Go Porter Stemmer
A native Go clean room implementation of the Porter Stemming Algorithm.
This algorithm is of interest to people doing Machine Learning or
Natural Language Processing (NLP).
This is NOT a port. This is a native Go implementation from the human-readable
description of the algorithm.
I've tried to make it (more) efficient by NOT internally using string's, but
instead internally using []rune's and using the same (array) buffer used by
the []rune slice (and sub-slices) at all steps of the algorithm.
For Porter Stemmer algorithm, see:
http://tartarus.org/martin/PorterStemmer/def.txt (URL #1)
http://tartarus.org/martin/PorterStemmer/ (URL #2)
# Departures
Also, since when I initially implemented it, it failed the tests at...
http://tartarus.org/martin/PorterStemmer/voc.txt (URL #3)
http://tartarus.org/martin/PorterStemmer/output.txt (URL #4)
... after reading the human-readble text over and over again to try to figure out
what the error I made was (and doing all sorts of things to debug it) I came to the
conclusion that the some of these tests were wrong according to the human-readable
description of the algorithm.
This led me to wonder if maybe other people's code that was passing these tests had
rules that were not in the human-readable description. Which led me to look at the source
code here...
http://tartarus.org/martin/PorterStemmer/c.txt (URL #5)
... When I looked there I noticed that there are some items marked as a "DEPARTURE",
which differ from the original algorithm. (There are 2 of these.)
I implemented these departures, and the tests at URL #3 and URL #4 all passed.
## Usage
To use this Golang library, use with something like:
package main
import (
"fmt"
"github.com/reiver/go-porterstemmer"
)
func main() {
word := "Waxes"
stem := porterstemmer.StemString(word)
fmt.Printf("The word [%s] has the stem [%s].\n", word, stem)
}
Alternatively, if you want to be a bit more efficient, use []rune slices instead, with code like:
package main
import (
"fmt"
"github.com/reiver/go-porterstemmer"
)
func main() {
word := []rune("Waxes")
stem := porterstemmer.Stem(word)
fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
}
Although NOTE that the above code may modify original slice (named "word" in the example) as a side
effect, for efficiency reasons. And that the slice named "stem" in the example above may be a
sub-slice of the slice named "word".
Also alternatively, if you already know that your word is already lowercase (and you don't need
this library to lowercase your word for you) you can instead use code like:
package main
import (
"fmt"
"github.com/reiver/go-porterstemmer"
)
func main() {
word := []rune("waxes")
stem := porterstemmer.StemWithoutLowerCasing(word)
fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
}
Again NOTE (like with the previous example) that the above code may modify original slice (named
"word" in the example) as a side effect, for efficiency reasons. And that the slice named "stem"
in the example above may be a sub-slice of the slice named "word".

View File

@ -0,0 +1,839 @@
package porterstemmer
import (
// "log"
"unicode"
)
func isConsonant(s []rune, i int) bool {
//DEBUG
//log.Printf("isConsonant: [%+v]", string(s[i]))
result := true
switch s[i] {
case 'a', 'e', 'i', 'o', 'u':
result = false
case 'y':
if 0 == i {
result = true
} else {
result = !isConsonant(s, i-1)
}
default:
result = true
}
return result
}
func measure(s []rune) uint {
// Initialize.
lenS := len(s)
result := uint(0)
i := 0
// Short Circuit.
if 0 == lenS {
/////////// RETURN
return result
}
// Ignore (potential) consonant sequence at the beginning of word.
for isConsonant(s, i) {
//DEBUG
//log.Printf("[measure([%s])] Eat Consonant [%d] -> [%s]", string(s), i, string(s[i]))
i++
if i >= lenS {
/////////////// RETURN
return result
}
}
// For each pair of a vowel sequence followed by a consonant sequence, increment result.
Outer:
for i < lenS {
for !isConsonant(s, i) {
//DEBUG
//log.Printf("[measure([%s])] VOWEL [%d] -> [%s]", string(s), i, string(s[i]))
i++
if i >= lenS {
/////////// BREAK
break Outer
}
}
for isConsonant(s, i) {
//DEBUG
//log.Printf("[measure([%s])] CONSONANT [%d] -> [%s]", string(s), i, string(s[i]))
i++
if i >= lenS {
result++
/////////// BREAK
break Outer
}
}
result++
}
// Return
return result
}
func hasSuffix(s, suffix []rune) bool {
lenSMinusOne := len(s) - 1
lenSuffixMinusOne := len(suffix) - 1
if lenSMinusOne <= lenSuffixMinusOne {
return false
} else if s[lenSMinusOne] != suffix[lenSuffixMinusOne] { // I suspect checking this first should speed this function up in practice.
/////// RETURN
return false
} else {
for i := 0; i < lenSuffixMinusOne; i++ {
if suffix[i] != s[lenSMinusOne-lenSuffixMinusOne+i] {
/////////////// RETURN
return false
}
}
}
return true
}
func containsVowel(s []rune) bool {
lenS := len(s)
for i := 0; i < lenS; i++ {
if !isConsonant(s, i) {
/////////// RETURN
return true
}
}
return false
}
func hasRepeatDoubleConsonantSuffix(s []rune) bool {
// Initialize.
lenS := len(s)
result := false
// Do it!
if 2 > lenS {
result = false
} else if s[lenS-1] == s[lenS-2] && isConsonant(s, lenS-1) { // Will using isConsonant() cause a problem with "YY"?
result = true
} else {
result = false
}
// Return,
return result
}
func hasConsonantVowelConsonantSuffix(s []rune) bool {
// Initialize.
lenS := len(s)
result := false
// Do it!
if 3 > lenS {
result = false
} else if isConsonant(s, lenS-3) && !isConsonant(s, lenS-2) && isConsonant(s, lenS-1) {
result = true
} else {
result = false
}
// Return
return result
}
func step1a(s []rune) []rune {
// Initialize.
var result []rune = s
lenS := len(s)
// Do it!
if suffix := []rune("sses"); hasSuffix(s, suffix) {
lenTrim := 2
subSlice := s[:lenS-lenTrim]
result = subSlice
} else if suffix := []rune("ies"); hasSuffix(s, suffix) {
lenTrim := 2
subSlice := s[:lenS-lenTrim]
result = subSlice
} else if suffix := []rune("ss"); hasSuffix(s, suffix) {
result = s
} else if suffix := []rune("s"); hasSuffix(s, suffix) {
lenSuffix := 1
subSlice := s[:lenS-lenSuffix]
result = subSlice
}
// Return.
return result
}
func step1b(s []rune) []rune {
// Initialize.
var result []rune = s
lenS := len(s)
// Do it!
if suffix := []rune("eed"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 0 < m {
lenTrim := 1
result = s[:lenS-lenTrim]
}
} else if suffix := []rune("ed"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
if containsVowel(subSlice) {
if suffix2 := []rune("at"); hasSuffix(subSlice, suffix2) {
lenTrim := -1
result = s[:lenS-lenSuffix-lenTrim]
} else if suffix2 := []rune("bl"); hasSuffix(subSlice, suffix2) {
lenTrim := -1
result = s[:lenS-lenSuffix-lenTrim]
} else if suffix2 := []rune("iz"); hasSuffix(subSlice, suffix2) {
lenTrim := -1
result = s[:lenS-lenSuffix-lenTrim]
} else if c := subSlice[len(subSlice)-1]; 'l' != c && 's' != c && 'z' != c && hasRepeatDoubleConsonantSuffix(subSlice) {
lenTrim := 1
lenSubSlice := len(subSlice)
result = subSlice[:lenSubSlice-lenTrim]
} else if c := subSlice[len(subSlice)-1]; 1 == measure(subSlice) && hasConsonantVowelConsonantSuffix(subSlice) && 'w' != c && 'x' != c && 'y' != c {
lenTrim := -1
result = s[:lenS-lenSuffix-lenTrim]
result[len(result)-1] = 'e'
} else {
result = subSlice
}
}
} else if suffix := []rune("ing"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
if containsVowel(subSlice) {
if suffix2 := []rune("at"); hasSuffix(subSlice, suffix2) {
lenTrim := -1
result = s[:lenS-lenSuffix-lenTrim]
result[len(result)-1] = 'e'
} else if suffix2 := []rune("bl"); hasSuffix(subSlice, suffix2) {
lenTrim := -1
result = s[:lenS-lenSuffix-lenTrim]
result[len(result)-1] = 'e'
} else if suffix2 := []rune("iz"); hasSuffix(subSlice, suffix2) {
lenTrim := -1
result = s[:lenS-lenSuffix-lenTrim]
result[len(result)-1] = 'e'
} else if c := subSlice[len(subSlice)-1]; 'l' != c && 's' != c && 'z' != c && hasRepeatDoubleConsonantSuffix(subSlice) {
lenTrim := 1
lenSubSlice := len(subSlice)
result = subSlice[:lenSubSlice-lenTrim]
} else if c := subSlice[len(subSlice)-1]; 1 == measure(subSlice) && hasConsonantVowelConsonantSuffix(subSlice) && 'w' != c && 'x' != c && 'y' != c {
lenTrim := -1
result = s[:lenS-lenSuffix-lenTrim]
result[len(result)-1] = 'e'
} else {
result = subSlice
}
}
}
// Return.
return result
}
func step1c(s []rune) []rune {
// Initialize.
lenS := len(s)
result := s
// Do it!
if 2 > lenS {
/////////// RETURN
return result
}
if 'y' == s[lenS-1] && containsVowel(s[:lenS-1]) {
result[lenS-1] = 'i'
} else if 'Y' == s[lenS-1] && containsVowel(s[:lenS-1]) {
result[lenS-1] = 'I'
}
// Return.
return result
}
func step2(s []rune) []rune {
// Initialize.
lenS := len(s)
result := s
// Do it!
if suffix := []rune("ational"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result[lenS-5] = 'e'
result = result[:lenS-4]
}
} else if suffix := []rune("tional"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result = result[:lenS-2]
}
} else if suffix := []rune("enci"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result[lenS-1] = 'e'
}
} else if suffix := []rune("anci"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result[lenS-1] = 'e'
}
} else if suffix := []rune("izer"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result = s[:lenS-1]
}
} else if suffix := []rune("bli"); hasSuffix(s, suffix) { // --DEPARTURE--
// } else if suffix := []rune("abli") ; hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result[lenS-1] = 'e'
}
} else if suffix := []rune("alli"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result = s[:lenS-2]
}
} else if suffix := []rune("entli"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result = s[:lenS-2]
}
} else if suffix := []rune("eli"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result = s[:lenS-2]
}
} else if suffix := []rune("ousli"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result = s[:lenS-2]
}
} else if suffix := []rune("ization"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result[lenS-5] = 'e'
result = s[:lenS-4]
}
} else if suffix := []rune("ation"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result[lenS-3] = 'e'
result = s[:lenS-2]
}
} else if suffix := []rune("ator"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result[lenS-2] = 'e'
result = s[:lenS-1]
}
} else if suffix := []rune("alism"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result = s[:lenS-3]
}
} else if suffix := []rune("iveness"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result = s[:lenS-4]
}
} else if suffix := []rune("fulness"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result = s[:lenS-4]
}
} else if suffix := []rune("ousness"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result = s[:lenS-4]
}
} else if suffix := []rune("aliti"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result = s[:lenS-3]
}
} else if suffix := []rune("iviti"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result[lenS-3] = 'e'
result = result[:lenS-2]
}
} else if suffix := []rune("biliti"); hasSuffix(s, suffix) {
if 0 < measure(s[:lenS-len(suffix)]) {
result[lenS-5] = 'l'
result[lenS-4] = 'e'
result = result[:lenS-3]
}
} else if suffix := []rune("logi"); hasSuffix(s, suffix) { // --DEPARTURE--
if 0 < measure(s[:lenS-len(suffix)]) {
lenTrim := 1
result = s[:lenS-lenTrim]
}
}
// Return.
return result
}
func step3(s []rune) []rune {
// Initialize.
lenS := len(s)
result := s
// Do it!
if suffix := []rune("icate"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
if 0 < measure(s[:lenS-lenSuffix]) {
result = result[:lenS-3]
}
} else if suffix := []rune("ative"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 0 < m {
result = subSlice
}
} else if suffix := []rune("alize"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
if 0 < measure(s[:lenS-lenSuffix]) {
result = result[:lenS-3]
}
} else if suffix := []rune("iciti"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
if 0 < measure(s[:lenS-lenSuffix]) {
result = result[:lenS-3]
}
} else if suffix := []rune("ical"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
if 0 < measure(s[:lenS-lenSuffix]) {
result = result[:lenS-2]
}
} else if suffix := []rune("ful"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 0 < m {
result = subSlice
}
} else if suffix := []rune("ness"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 0 < m {
result = subSlice
}
}
// Return.
return result
}
func step4(s []rune) []rune {
// Initialize.
lenS := len(s)
result := s
// Do it!
if suffix := []rune("al"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = result[:lenS-lenSuffix]
}
} else if suffix := []rune("ance"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = result[:lenS-lenSuffix]
}
} else if suffix := []rune("ence"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = result[:lenS-lenSuffix]
}
} else if suffix := []rune("er"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ic"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("able"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ible"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ant"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ement"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ment"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ent"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ion"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
c := subSlice[len(subSlice)-1]
if 1 < m && ('s' == c || 't' == c) {
result = subSlice
}
} else if suffix := []rune("ou"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ism"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ate"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("iti"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ous"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ive"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
} else if suffix := []rune("ize"); hasSuffix(s, suffix) {
lenSuffix := len(suffix)
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
}
// Return.
return result
}
func step5a(s []rune) []rune {
// Initialize.
lenS := len(s)
result := s
// Do it!
if 'e' == s[lenS-1] {
lenSuffix := 1
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
} else if 1 == m {
if c := subSlice[len(subSlice)-1]; !(hasConsonantVowelConsonantSuffix(subSlice) && 'w' != c && 'x' != c && 'y' != c) {
result = subSlice
}
}
}
// Return.
return result
}
func step5b(s []rune) []rune {
// Initialize.
lenS := len(s)
result := s
// Do it!
if 2 < lenS && 'l' == s[lenS-2] && 'l' == s[lenS-1] {
lenSuffix := 1
subSlice := s[:lenS-lenSuffix]
m := measure(subSlice)
if 1 < m {
result = subSlice
}
}
// Return.
return result
}
func StemString(s string) string {
// Convert string to []rune
runeArr := []rune(s)
// Stem.
runeArr = Stem(runeArr)
// Convert []rune to string
str := string(runeArr)
// Return.
return str
}
func Stem(s []rune) []rune {
// Initialize.
lenS := len(s)
// Short circuit.
if 0 == lenS {
/////////// RETURN
return s
}
// Make all runes lowercase.
for i := 0; i < lenS; i++ {
s[i] = unicode.ToLower(s[i])
}
// Stem
result := StemWithoutLowerCasing(s)
// Return.
return result
}
func StemWithoutLowerCasing(s []rune) []rune {
// Initialize.
lenS := len(s)
// Words that are of length 2 or less is already stemmed.
// Don't do anything.
if 2 >= lenS {
/////////// RETURN
return s
}
// Stem
s = step1a(s)
s = step1b(s)
s = step1c(s)
s = step2(s)
s = step3(s)
s = step4(s)
s = step5a(s)
s = step5b(s)
// Return.
return s
}

2
vendor/github.com/lytics/multibayes/.travis.yml generated vendored Normal file
View File

@ -0,0 +1,2 @@
language: go
script: go test -race -cpu 1,2,4 -v ./...

63
vendor/github.com/lytics/multibayes/README.md generated vendored Normal file
View File

@ -0,0 +1,63 @@
Multibayes
==========
[![Build Status](https://travis-ci.org/lytics/multibayes.svg?branch=master)](https://travis-ci.org/lytics/multibayes) [![GoDoc](https://godoc.org/github.com/lytics/multibayes?status.svg)](https://godoc.org/github.com/lytics/multibayes)
Multiclass naive Bayesian document classification.
Often in document classification, a document may have more than one relevant classification -- a question on [stackoverflow](http://stackoverflow.com) might have tags "go", "map", and "interface".
While multinomial Bayesian classification offers a one-of-many classification, multibayes offers tools for many-of-many classification. The multibayes library strives to offer efficient storage and calculation of multiple Bayesian posterior classification probabilities.
## Usage
A new classifier is created with the `NewClassifier` function, and can be trained by adding documents and classes by calling the `Add` method:
```go
classifier.Add("A new document", []string{"class1", "class2"})
```
Posterior probabilities for a new document are calculated by calling the `Posterior` method:
```go
classifier.Posterior("Another new document")
```
A posterior class probability is returned for each class observed in the training set, which the user can use to determine class assignment. A user can then assign classifications according to his or her own heuristics -- for example, by using all classes that yield a posterior probability greater than 0.8
## Example
```go
documents := []struct {
Text string
Classes []string
}{
{
Text: "My dog has fleas.",
Classes: []string{"vet"},
},
{
Text: "My cat has ebola.",
Classes: []string{"vet", "cdc"},
},
{
Text: "Aaron has ebola.",
Classes: []string{"cdc"},
},
}
classifier := NewClassifier()
classifier.MinClassSize = 0
// train the classifier
for _, document := range documents {
classifier.Add(document.Text, document.Classes)
}
// predict new classes
probs := classifier.Posterior("Aaron's dog has fleas.")
fmt.Printf("Posterior Probabilities: %+v\n", probs)
// Posterior Probabilities: map[vet:0.8571 cdc:0.2727]
```

9
vendor/github.com/lytics/multibayes/doc.go generated vendored Normal file
View File

@ -0,0 +1,9 @@
// Multiclass naive Bayesian document classification.
//
// While multinomial Bayesian classification offers
// one-of-many classification, multibayes offers tools
// for many-of-many classification. The multibayes
// library strives to offer efficient storage and
// calculation of multiple Bayesian posterior classification
// probabilities.
package multibayes

66
vendor/github.com/lytics/multibayes/encoding.go generated vendored Normal file
View File

@ -0,0 +1,66 @@
package multibayes
import (
"encoding/json"
"io/ioutil"
)
type jsonableClassifier struct {
Matrix *sparseMatrix `json:"matrix"`
}
func (c *Classifier) MarshalJSON() ([]byte, error) {
return json.Marshal(&jsonableClassifier{c.Matrix})
}
func (c *Classifier) UnmarshalJSON(buf []byte) error {
j := jsonableClassifier{}
err := json.Unmarshal(buf, &j)
if err != nil {
return nil
}
*c = *NewClassifier()
c.Matrix = j.Matrix
return nil
}
// Initialize a new classifier from a JSON byte slice.
func NewClassifierFromJSON(buf []byte) (*Classifier, error) {
classifier := &Classifier{}
err := classifier.UnmarshalJSON(buf)
if err != nil {
return nil, err
}
return classifier, nil
}
func LoadClassifierFromFile(filename string) (*Classifier, error) {
buf, err := ioutil.ReadFile(filename)
if err != nil {
return nil, err
}
return NewClassifierFromJSON(buf)
}
func (s *sparseColumn) MarshalJSON() ([]byte, error) {
return json.Marshal(s.Data)
}
func (s *sparseColumn) UnmarshalJSON(buf []byte) error {
var data []int
err := json.Unmarshal(buf, &data)
if err != nil {
return err
}
s.Data = data
return nil
}

73
vendor/github.com/lytics/multibayes/sparse.go generated vendored Normal file
View File

@ -0,0 +1,73 @@
package multibayes
type sparseMatrix struct {
Tokens map[string]*sparseColumn `json:"tokens"` // []map[tokenindex]occurence
Classes map[string]*sparseColumn `json:"classes"` // map[classname]classindex
N int `json:"n"` // number of rows currently in the matrix
}
type sparseColumn struct {
Data []int `json:"data"`
}
func newSparseColumn() *sparseColumn {
return &sparseColumn{
Data: make([]int, 0, 1000),
}
}
func (s *sparseColumn) Add(index int) {
s.Data = append(s.Data, index)
}
// return the number of rows that contain the column
func (s *sparseColumn) Count() int {
return len(s.Data)
}
// sparse to dense
func (s *sparseColumn) Expand(n int) []float64 {
expanded := make([]float64, n)
for _, index := range s.Data {
expanded[index] = 1.0
}
return expanded
}
func newSparseMatrix() *sparseMatrix {
return &sparseMatrix{
Tokens: make(map[string]*sparseColumn),
Classes: make(map[string]*sparseColumn),
N: 0,
}
}
func (s *sparseMatrix) Add(ngrams []ngram, classes []string) {
if len(ngrams) == 0 || len(classes) == 0 {
return
}
for _, class := range classes {
if _, ok := s.Classes[class]; !ok {
s.Classes[class] = newSparseColumn()
}
s.Classes[class].Add(s.N)
}
// add ngrams uniquely
added := make(map[string]int)
for _, ngram := range ngrams {
gramString := ngram.String()
if _, ok := s.Tokens[gramString]; !ok {
s.Tokens[gramString] = newSparseColumn()
}
// only add the document index once for the ngram
if _, ok := added[gramString]; !ok {
added[gramString] = 1
s.Tokens[gramString].Add(s.N)
}
}
// increment the row counter
s.N++
}

181
vendor/github.com/lytics/multibayes/stopbytes.go generated vendored Normal file
View File

@ -0,0 +1,181 @@
package multibayes
var (
stopbytes = [][]byte{
[]byte(`i`),
[]byte(`me`),
[]byte(`my`),
[]byte(`myself`),
[]byte(`we`),
[]byte(`our`),
[]byte(`ours`),
[]byte(`ourselves`),
[]byte(`you`),
[]byte(`your`),
[]byte(`yours`),
[]byte(`yourself`),
[]byte(`yourselves`),
[]byte(`he`),
[]byte(`him`),
[]byte(`his`),
[]byte(`himself`),
[]byte(`she`),
[]byte(`her`),
[]byte(`hers`),
[]byte(`herself`),
[]byte(`it`),
[]byte(`its`),
[]byte(`itself`),
[]byte(`they`),
[]byte(`them`),
[]byte(`their`),
[]byte(`theirs`),
[]byte(`themselves`),
[]byte(`what`),
[]byte(`which`),
[]byte(`who`),
[]byte(`whom`),
[]byte(`this`),
[]byte(`that`),
[]byte(`these`),
[]byte(`those`),
[]byte(`am`),
[]byte(`is`),
[]byte(`are`),
[]byte(`was`),
[]byte(`were`),
[]byte(`be`),
[]byte(`been`),
[]byte(`being`),
[]byte(`have`),
[]byte(`has`),
[]byte(`had`),
[]byte(`having`),
[]byte(`do`),
[]byte(`does`),
[]byte(`did`),
[]byte(`doing`),
[]byte(`would`),
[]byte(`should`),
[]byte(`could`),
[]byte(`ought`),
[]byte(`i'm`),
[]byte(`you're`),
[]byte(`he's`),
[]byte(`she's`),
[]byte(`it's`),
[]byte(`we're`),
[]byte(`they're`),
[]byte(`i've`),
[]byte(`you've`),
[]byte(`we've`),
[]byte(`they've`),
[]byte(`i'd`),
[]byte(`you'd`),
[]byte(`he'd`),
[]byte(`she'd`),
[]byte(`we'd`),
[]byte(`they'd`),
[]byte(`i'll`),
[]byte(`you'll`),
[]byte(`he'll`),
[]byte(`she'll`),
[]byte(`we'll`),
[]byte(`they'll`),
[]byte(`isn't`),
[]byte(`aren't`),
[]byte(`wasn't`),
[]byte(`weren't`),
[]byte(`hasn't`),
[]byte(`haven't`),
[]byte(`hadn't`),
[]byte(`doesn't`),
[]byte(`don't`),
[]byte(`didn't`),
[]byte(`won't`),
[]byte(`wouldn't`),
[]byte(`shan't`),
[]byte(`shouldn't`),
[]byte(`can't`),
[]byte(`cannot`),
[]byte(`couldn't`),
[]byte(`mustn't`),
[]byte(`let's`),
[]byte(`that's`),
[]byte(`who's`),
[]byte(`what's`),
[]byte(`here's`),
[]byte(`there's`),
[]byte(`when's`),
[]byte(`where's`),
[]byte(`why's`),
[]byte(`how's`),
[]byte(`a`),
[]byte(`an`),
[]byte(`the`),
[]byte(`and`),
[]byte(`but`),
[]byte(`if`),
[]byte(`or`),
[]byte(`because`),
[]byte(`as`),
[]byte(`until`),
[]byte(`while`),
[]byte(`of`),
[]byte(`at`),
[]byte(`by`),
[]byte(`for`),
[]byte(`with`),
[]byte(`about`),
[]byte(`against`),
[]byte(`between`),
[]byte(`into`),
[]byte(`through`),
[]byte(`during`),
[]byte(`before`),
[]byte(`after`),
[]byte(`above`),
[]byte(`below`),
[]byte(`to`),
[]byte(`from`),
[]byte(`up`),
[]byte(`down`),
[]byte(`in`),
[]byte(`out`),
[]byte(`on`),
[]byte(`off`),
[]byte(`over`),
[]byte(`under`),
[]byte(`again`),
[]byte(`further`),
[]byte(`then`),
[]byte(`once`),
[]byte(`here`),
[]byte(`there`),
[]byte(`when`),
[]byte(`where`),
[]byte(`why`),
[]byte(`how`),
[]byte(`all`),
[]byte(`any`),
[]byte(`both`),
[]byte(`each`),
[]byte(`few`),
[]byte(`more`),
[]byte(`most`),
[]byte(`other`),
[]byte(`some`),
[]byte(`such`),
[]byte(`no`),
[]byte(`nor`),
[]byte(`not`),
[]byte(`only`),
[]byte(`own`),
[]byte(`same`),
[]byte(`so`),
[]byte(`than`),
[]byte(`too`),
[]byte(`very`),
[]byte(`-`),
}
)

33
vendor/github.com/lytics/multibayes/testutil.go generated vendored Normal file
View File

@ -0,0 +1,33 @@
package multibayes
type document struct {
Text string
Classes []string
}
func getTestData() []document {
documents := []document{
{
Text: "My dog has fleas.",
Classes: []string{"vet"},
},
{
Text: "My cat has ebola.",
Classes: []string{"vet", "cdc"},
},
{
Text: "Aaron has ebola.",
Classes: []string{"cdc"},
},
}
return documents
}
func (c *Classifier) trainWithTestData() {
testdata := getTestData()
for _, document := range testdata {
c.Add(document.Text, document.Classes)
}
}

166
vendor/github.com/lytics/multibayes/tokenize.go generated vendored Normal file
View File

@ -0,0 +1,166 @@
package multibayes
import (
"bytes"
"encoding/base64"
"regexp"
"strings"
"github.com/blevesearch/bleve/analysis"
regexp_tokenizer "github.com/blevesearch/bleve/analysis/tokenizer/regexp"
"github.com/blevesearch/go-porterstemmer"
)
const (
tokenSeparator = "_"
)
type ngram struct {
Tokens [][]byte
}
// encodes in base64 for safe comparison
func (ng *ngram) String() string {
encoded := make([]string, len(ng.Tokens))
for i, token := range ng.Tokens {
encoded[i] = string(token)
//encoded[i] = base64.StdEncoding.EncodeToString(token) // safer?
}
return strings.Join(encoded, tokenSeparator)
}
func decodeNGram(s string) (*ngram, error) {
encodedTokens := strings.Split(s, tokenSeparator)
tokens := make([][]byte, len(encodedTokens))
var err error
for i, encodedToken := range encodedTokens {
tokens[i], err = base64.StdEncoding.DecodeString(encodedToken)
if err != nil {
return nil, err
}
}
return &ngram{tokens}, nil
}
type tokenizerConf struct {
regexp *regexp.Regexp
NGramSize int64
}
type tokenizer struct {
regexp_tokenizer.RegexpTokenizer
Conf *tokenizerConf
}
func validateConf(tc *tokenizerConf) {
tc.regexp = regexp.MustCompile(`[0-9A-z_'\-]+|\%|\$`)
// TODO: We force NGramSize = 1 so as to create disjoint ngrams,
// which is necessary for the naive assumption of conditional
// independence among tokens. It would be great to allow ngrams
// to be greater than 1 and select only disjoint ngrams from the
// tokenizer.
tc.NGramSize = 1
}
func newTokenizer(tc *tokenizerConf) (*tokenizer, error) {
validateConf(tc)
return &tokenizer{*regexp_tokenizer.NewRegexpTokenizer(tc.regexp), tc}, nil
}
// Tokenize and Gramify
func (t *tokenizer) Parse(doc string) []ngram {
// maybe use token types for datetimes or something instead of
// the actual byte slice
alltokens := t.Tokenize([]byte(strings.ToLower(doc)))
filtered := make(map[int][]byte)
for i, token := range alltokens {
exclude := false
for _, stop := range stopbytes {
if bytes.Equal(token.Term, stop) {
exclude = true
break
}
}
if exclude {
continue
}
tokenString := porterstemmer.StemString(string(token.Term))
//tokenBytes := porterstemmer.Stem(token.Term) // takes runes, not bytes
if token.Type == analysis.Numeric {
tokenString = "NUMBER"
} else if token.Type == analysis.DateTime {
tokenString = "DATE"
}
filtered[i] = []byte(tokenString)
}
// only consider sequential terms as candidates for ngrams
// terms separated by stopwords are ineligible
allNGrams := make([]ngram, 0, 100)
currentTokens := make([][]byte, 0, 100)
lastObserved := -1
for i, token := range filtered {
if (i - 1) != lastObserved {
ngrams := t.tokensToNGrams(currentTokens)
allNGrams = append(allNGrams, ngrams...)
currentTokens = make([][]byte, 0, 100)
}
currentTokens = append(currentTokens, token)
lastObserved = i
}
// bring in the last one
if len(currentTokens) > 0 {
ngrams := t.tokensToNGrams(currentTokens)
allNGrams = append(allNGrams, ngrams...)
}
return allNGrams
}
func (t *tokenizer) tokensToNGrams(tokens [][]byte) []ngram {
nTokens := int64(len(tokens))
nNGrams := int64(0)
for i := int64(1); i <= t.Conf.NGramSize; i++ {
chosen := choose(nTokens, i)
nNGrams += chosen
}
ngrams := make([]ngram, 0, nNGrams)
for ngramSize := int64(1); ngramSize <= t.Conf.NGramSize; ngramSize++ {
nNGramsOfSize := choose(nTokens, ngramSize)
for i := int64(0); i < nNGramsOfSize; i++ {
ngrams = append(ngrams, ngram{tokens[i:(i + ngramSize)]})
}
}
return ngrams
}
// not a binomial coefficient -- combinations must be sequential
func choose(n, k int64) int64 {
return max(n-k+int64(1), 0)
}
func max(x, y int64) int64 {
if x > y {
return x
}
return y
}

16
vendor/modules.txt vendored Normal file
View File

@ -0,0 +1,16 @@
# github.com/blevesearch/bleve v0.8.1
github.com/blevesearch/bleve/analysis
github.com/blevesearch/bleve/analysis/tokenizer/regexp
github.com/blevesearch/bleve/document
github.com/blevesearch/bleve/geo
github.com/blevesearch/bleve/index
github.com/blevesearch/bleve/index/store
github.com/blevesearch/bleve/numeric
github.com/blevesearch/bleve/registry
github.com/blevesearch/bleve/search
github.com/blevesearch/bleve/search/highlight
github.com/blevesearch/bleve/size
# github.com/blevesearch/go-porterstemmer v1.0.2
github.com/blevesearch/go-porterstemmer
# github.com/lytics/multibayes v0.0.0-20161108162840-3457a5582021
github.com/lytics/multibayes

4
whitelist.txt Normal file
View File

@ -0,0 +1,4 @@
guns
cors
/favicon.ico
favicon

26
zgc.go Normal file
View File

@ -0,0 +1,26 @@
package main
import (
"log"
"runtime"
"time"
)
func init() {
log.Println("Garbage Collector Thread Starting")
go memoryCleanerThread()
}
func memoryCleanerThread() {
for {
time.Sleep(10 * time.Minute)
log.Println("Time to clean memory...")
runtime.GC()
log.Println("Garbage Collection done.")
}
}