Huggingface'stransformers library is a great resource for natural language processing tasks, and it includes an implementation of OpenAI's CLIP model including a pretrained model clip-vit-large-patch14. The CLIP model is a powerful image and text embedding model that can be used for a wide range of tasks, such as image captioning and similarity search.
The CLIPModel documentation provides examples of how to use the model to calculate the similarity of images and captions, but it is less clear on how to obtain the raw embeddings of the input data. While the documentation provides some guidance on how to use the model's embedding layer, it is not always clear how to extract the embeddings for further analysis or use in other tasks.
Furthermore, the documentation does not cover how to calculate similarity between text and image embeddings yourself. This can be useful for tasks such as image-text matching or precalculating image embeddings for later (or repeated) use.
In this post, we will show how to obtain the raw embeddings from the CLIPModel and how to calculate similarity between them using PyTorch. With this information, you will be able to use the CLIPModel in a more flexible way and adapt it to your specific needs.
Benchmark example: Logit similarity score between text and image embeddings
Here's the example from CLIPModel documentation we'd ideally like to split into text and image embeddings and then calculate the similarity score between them ourselves:
from PIL import Imageimport requestsfrom transformers import AutoProcessor, CLIPModelmodel = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32")url = "http://images.cocodataset.org/val2017/000000039769.jpg"image = Image.open(requests.get(url, stream=True).raw)inputs = processor( text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)outputs = model(**inputs)logits_per_image = outputs.logits_per_image # this is the image-text similarity scoreprobs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
If you run the code and print(logits_per_image) you should get:
from PIL import Imageimport requestsfrom transformers import AutoProcessor, AutoTokenizer, CLIPModelmodel = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")# Get the text featurestokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-large-patch14")inputs = tokenizer(["a photo of a cat", "a photo of a dog"], padding=True, return_tensors="pt")text_features = model.get_text_features(**inputs)print(text_features.shape) # output shape of text features# Get the image featuresprocessor = AutoProcessor.from_pretrained("openai/clip-vit-large-patch14")url = "http://images.cocodataset.org/val2017/000000039769.jpg"image = Image.open(requests.get(url, stream=True).raw)inputs = processor(images=image, return_tensors="pt")image_features = model.get_image_features(**inputs)print(image_features.shape) # output shape of image features
Looks pretty good! Two 768 item tensors for the two labels, and one similarly sized for the image! Now let's see if we can calculate the similarity between the two...
A friend recently started a project to remotely boot his router (which tends to hang randomly) with Raspberry Pi. Unfortunately, the rpi-rf tool was not quite recognizing the signals. I pitched in to help, and as he did not have access to an oscilloscope, but had an Arduino Uno, I thought maybe I could figure it out with that.
Fast forward a few weeks later, I have been experimenting with four methods analyzing my own Nexa 433 MHz remote controller:
Having learned a lot, I thought to document the process for others to learn from, or maybe even hijack to analyze their smart remotes. In this first part, I will cover the process with Arduino Uno, and the following posts will go through the other three methods.
Starting Simple: Arduino and 433 MHz receiver
Having purchased a rather basic Hope Microelectronics (RFM210LCF-433D) 3.3V receiver for the 433 MHz spectrum signals, it was easy to wire to Arduino:
Connect GND and 3.3V outputs from Arduino to GND and VCC
Connect Arduino PIN 8 to DATA on the receiver
Connect a fourth "enable" pin to GND as well to turn the receiver on
I wrote a simple Arduino script that measures the PIN 8 voltage every 50 microseconds (20 kHz), recording the length of HIGH/LOW pulses in a unsigned short array. Due to memory limitation of 2 kB, there is only space for about 850 edges, and the maximum length of a single edge is about 65 000 samples, i.e. bit more than three seconds.
Once the buffer is filled with edge data or maximum "silence" is reached, the code prints out the data over serial, resets the buffer and starts again, blinking a LED for 5 seconds so you know when you should start pressing those remote control buttons. Or perhaps "press a button", as at least my Nexa pretty much fills the buffer with a single key press, as it sends the same data of about 130 edges a minimum of 5 times, taking almost 700 edges!
It also turned out that the "silence" limit is rarely reached, as the Hope receiver is pretty good at catching stray signals from other places when there is nothing transmitting nearby (it likely has automatic sensitivity to "turn up the volume" if it doesn't hear anything).
In recent years, the use of graphics processing units (GPUs) has led to the adoption of methods like PBKDF2 (Password-Based Key Derivation Function 2) for secure password storage. PBKDF2 is a key derivation function that is designed to be computationally expensive in order to slow down dictionary attacks and other brute force attacks on passwords. With the increase in processing power that GPUs provide, PBKDF2 has become a popular choice for password storage.
As the development of processing power continues to advance, it has become necessary to increase the number of iterations used in PBKDF2 in order to maintain a high level of security. With more iterations, it becomes even more difficult for an attacker to crack a password using brute force methods.
Recently, I had an idea. What if it were possible to run PBKDF2 arbitrarily long and print out points that match certain criteria? This could potentially provide an even higher level of security for password storage, as the number of iterations could be increased to levels that would make brute force attacks infeasible. It's an idea worth exploring and I'm excited to see what the future holds for PBKDF2 and other password security measures.
Bitcoin difficulty
One of the key features of the Bitcoin network is its use of difficulty to scale the hardness of block signing based on the number of computers that are currently mining. In other words, as more computers join the network and begin trying to solve the cryptographic puzzles required to add new blocks to the blockchain, the difficulty of these puzzles increases in order to maintain a consistent rate of block creation. This ensures that the network remains secure and resistant to attacks, even as the number of miners grows over time.
The basic idea behind this technique is fairly simple: by requiring that a certain number of zeros be added to the block hash, the complexity of the puzzle increases in powers of two. Every hash is essentially
random, and modifying the hashed data by the tiniest bit results in a new hash. Every other hash ends in zero, and every other in one. With two zero bits, it's every 4th. To zero a full byte (8 bits) you already need 256 (2^8) tries. With three bytes, it's already close to 17 million.
Printing out PBKDF2 steps at deterministic points
Combining the two ideas is one way to deterministically create encryption keys of increasing difficulty:
Having just spent 4 hours trying to get a Python pseudocode version of PBKDF2 to match with hashlib.pbkdf2_hmac() output, I thought I'll post Yet Another Example how to do it. I thought I could just use hashlib.sha256 to calculate the steps, but turns out HMAC is not just a concatenation of password, salt and counter.
So, without further ado, here's a 256 bit key generation with password and salt:
import hashlib, hmacdef pbkdf2(pwd, salt, iter): h = hmac.new(pwd, digestmod=hashlib.sha256) # create HMAC using SHA256 m = h.copy() # calculate PRF(Password, Salt+INT_32_BE(1)) m.update(salt) m.update(b'\x00\x00\x00\x01') U = m.digest() T = bytes(U) # copy for _ in range(1, iter): m = h.copy() # new instance of hmac(key) m.update(U) # PRF(Password, U-1) U = m.digest() T = bytes(a^b for a,b in zip(U,T)) return Tpwd = b'password'salt = b'salt'# both should print 120fb6cffcf8b32c43e7225256c4f837a86548c92ccc35480805987cb70be17bprint(pbkdf2(pwd, salt, 1).hex())print(hashlib.pbkdf2_hmac('sha256', pwd, salt, 1).hex())# both should print c5e478d59288c841aa530db6845c4c8d962893a001ce4e11a4963873aa98134aprint(pbkdf2(pwd, salt, 4096).hex())print(hashlib.pbkdf2_hmac('sha256', pwd, salt, 4096).hex())
Getting from pseudocode to actual working example was surprisingly hard, especially since most implementations on the web are on lower level languages, and Python results are mostly just using a library.
Simplifying the pseudo code further
If you want to avoid the new...update...digest and skip the hmac library altogether,
the code becomes even simpler. HMAC is quite simple
to implement with Python. Here's gethmac function hard-coded to SHA256 and an even shorter pbkdf2:
WebSocket is a protocol that allows for real-time, bidirectional communication between a client and a server. It is often used in web applications to enable features such as chat, live updates, and multiplayer games.
In this tutorial, I will show you how to create a minimalistic WebSocket server using Go and the nhooyr websocket library, and a JavaScript client to test it out. You will learn how to handle WebSocket connections, send and receive messages, and close the connection when necessary.
By the end of this tutorial, you will have a working WebSocket server and client that you can use as a starting point for your own WebSocket-based applications.
Setting up the project
You should first set up a simple "Hello world" go project, something along the lines of this tutorial. After you have a project going, let's install nhooyr.io/websocket WebSocket library (Go's own seems deprecated and Gorilla development has ceased some years ago):
$ go get nhooyr.io/websocket
The whole system will consist of main.go that will contain a simple net/http server that will:
Serve a simple WebSocket echo server at /echo
Serve static files from static subfolder – essentially other addresses including / will try content from there. We'll
put index.html under that subfolder.
Basic webserver stuff:
func main() { address := "localhost:1234" http.HandleFunc("/echo", echoHandler) log.Printf("Starting server, go to http://%s/ to try it out!", address) http.Handle("/", http.FileServer(http.Dir("static"))) err := http.ListenAndServe(address, nil) log.Fatal(err)}
Now the echoHandler will do a few essential items:
Upgrade the connection into a WebSocket one with websocket.Accept
Log errors and defer connection close in case of errors
Loop forever (or actually 10 minutes in this sample), reading messages from
the socket and writing them back.
Note that I've used InsecureSkipVerify to accept connections from any
origin, you might want to modify the code for a tighter policy:
I have to confess I have a thing for small prototyping boards, especially ones
with Bluetooth or WLAN connectivity. So when I was offered the opportunity to
get a couple of Seeed Studio's tiny Bluetooth
devboards with Nordic's
nRF52840 in them to try out, I
jumped at the opportunity. So full disclosure, I did not buy these myself, but neither did I get any compensation, so what follows will be rather unbiased first impressions! I will cover:
The basic specifications of the two units
How to (re)program the device with Arduino
Help to troubleshoot upload.tool.serial errors on Arduino
Tips and notes on using the USB mass storage mode
Initial summary
I'm interested in trying out the PDM microphone, accelerometer and BLE functionality later on, so check back for updates!
Basic specifications of the Seeed XIAO BLE nrf52840
The Seeed XIAO BLE units come in two varieties, both sharing quite beefy specs:
Bluetooth 5.0 with an onboard antenna
Nordic nRF52840, ARM Cortex-M4 32-bit processor with FPU, 64 MHz
Low power consumption and battery charging chip for untethered IoT use cases
Onboard 2 MB flash
Additionally, the Sense variant contains a PDM microphone and a 6-axis accelerometer. The units arrived from China quite quickly and came in sweet little Seeed plastic packages, pin headers included (not soldered in):
You can get both directly from Seeed, with very reasonable $9.90 and $15.99 price points. Nordic's chips are quite hard to source from AliExpress cheaply (yes I have looked :), so I'd consider both pretty much a bargain.
Board quality seems very good, pads are shiny and components well placed. The USB port is of modern USB-C variety, and the form factor is really small,
just 20 x 17.5 mm or the size of a nickel x dime. and the thickness of a half dollar or so (U.S. readers, you're welcome!). The PCB is one-sided which makes it easy to embed in various configurations.
Outside differences of the basic model and Sense variant is one additional chip that contains the PDM microphone. I think the accelerometer is hidden inside the (seemingly FCC and CE compliant) shielding.
There is also an absurdly tiny reset button on the opposite corner to the microphone pad (top left above) that is a bit tricky to press. I'd prefer a slightly larger one, but it beats shorting pins any day.
Classic blink test with Arduino
You can follow the instructions on Seeed Studio wiki to install the necessary development tools to build firmware for the device. Short version: