module line(p1,p2,w) { hull() { translate(p1) circle(r=w); translate(p2) circle(r=w); } } module polyline(points, index, w) { if(index < len(points)) { line(points[index - 1], points[index],w); polyline(points, index + 1, w); } }

For creating a bezier curve I had to define a few functions. I’m wondering why OpenSCAD has such a weak function definition, where you basically only can define one statement and not a whole block like in almost every other language. However, I wrote a recursive implementation of sampling one point at the bezier curve and call this function several times to build an array of points.

function choose(n, k)= k == 0? 1 : (n * choose(n - 1, k - 1)) / k; function _point_on_bezier_rec(points,t,i,c)= len(points) == i ? c : _point_on_bezier_rec(points,t,i+1,c+choose(len(points)-1,i) * pow(t,i) * pow(1-t,len(points)-i-1) * points[i]); function _point_on_bezier(points,t)= _point_on_bezier_rec(points,t,0,[0,0]); //a bezier curve with any number of control points //parameters: //points - the control points of the bezier curve (number of points is variable) //resolution - the sampling resolution of the bezier curve (number of returned points) //returns: //resolution number of samples on the bezier curve function bezier(points,resolution)=[ for (t =[0:1.0/resolution:1+1.0/(resolution/2)]) _point_on_bezier(points,t) ];

The first parameter is just an array of 2D-points, defining the control points. Note, that you can define the resolution of the sampling via the second parameter.

In the following example, I created a simple vase with 6 control points. The whole resolution can be defined as always with the $fn global variable. I recommend using a small value while debugging because it is significantly faster. I just set the number of control points of the bezier curve to the same value. With the variables radius, height and strength, I defined the size of the vase. The control points p0 to p5 are relative to these variables to allow a variable size. I used a cylinder for the base plate because it is difficult to define a really flat bezier curve, which can be printed nicely. Or one could use boolean combination to cut the base flat. However, by calling the bezier function, create a polyline with the result points and extrude this with a rotate_extrude, one gets a pretty nice result.

resolution = 100; $fn = resolution; radius = 20; height = 90; strength = 1; p0 = [radius,0]; p1 = [radius*3,height*0.2]; p2 = [-radius,height*0.4]; p3 = [radius*3,height*0.7]; p4 = [0,height*0.8]; p5 = [radius*0.8,height*1]; translate([0,0,-strength]) cylinder(r=radius+strength,h=strength*2); rotate_extrude() polyline(bezier([p0,p1,p2,p3,p4,p5],resolution),1,strength);

This could be also combined into one function:

//create a 3D rotational model with a bezier curve of given points, resolution and thickness module bezier_model(points,resolution,thickness) { translate([0,0,thickness/2]) rotate_extrude() polyline(bezier(points,resolution),1,thickness/2); }

Result image:

Thanks for reading and also thanks for the useful comments, I updated the source code on this post!

]]>After a bit classifier tuning a technique called Smith Waterman local alignment, which is normally a bio-informatics algorithm for comparing DNA sequences, came into my mind. I couldn’t find a python implementation, so I wrote it myself a really simple version of the metric. For understanding please consider https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm which explains it quite good.

import numpy as np def smith_waterman_distance(seq1, seq2, match=3, mismatch=-1, insertion=-1, deletion=-1, normalize=1): '''simple and general smith waterman distance for NLP feature extraction''' # switch sequences, so that seq1 is the longer sequence to search for seq2 if len(seq2) > len(seq1): seq1, seq2 = seq2, seq1 # create the distance matrix mat = np.zeros((len(seq2) + 1, len(seq1) + 1)) # iterate over the matrix column wise for i in range(1, mat.shape[0]): # iterate over the matrix row wise for j in range(1, mat.shape[1]): # set the current matrix element with the maximum of 4 different cases mat[i, j] = max( # negative values are not allowed 0, # if previous character matches increase the score by match, else decrease it by mismatch mat[i - 1, j - 1] + (match if seq1[j - 1] == seq2[i - 1] else mismatch), # one character is missing in seq2, so decrease the score by deletion mat[i - 1, j] + deletion, # one additional character is in seq2, so decrease the scare by insertion mat[i, j - 1] + insertion ) # the maximum of mat is now the score, which is returned raw or normalized (with a range of 0-1) return np.max(mat) / (len(seq2) * match) if normalize else np.max(mat)

In the next code block you can see two outputs of the function which describe the metric in the context of feature extraction for NLP classification.

seq1 = 'You need a driver\'s-license for doin the job' seq2 = 'drivers license' print(smith_waterman_distance(seq1, seq2, 3, -2, -2, -2, 1)) #outputs 0.844444444444 seq1 = 'Some random nonesense Sentence of a job description' seq2 = 'drivers license' print(smith_waterman_distance(seq1, seq2, 3, -2, -2, -2, 1)) #outputs 0.266666666667

The advantage of the SWD is, that you can punish faults in the string differently. E.g. If you want accept strings, that can have other characters in between, you can just set the insertion parameter to 0 or a small negative value. The same is valid for the deletion parameter which punishes incompleteness.

After calculating the SWD for some handcrafted terms for all samples one have a neat and dense feature vector. In the competition, I stacked it on top of the BOW feature vector, which gives me a performance boost of approximately 8% in f1-score.

]]>