We are using TLDRead to target COVID research for now. There is tons of research and most of it is on the ground floor, this creates an exciting but frightening responsibility. Going forward, we firmly believe that TLDRead has the ability to change how the average person gains access to scientific research. The ability to understand source material is one of the most important things to be an informed person. Most research is passed to media representatives that summarize and distribute the information to the public but that can lead to disingenuous summaries, Machine Learning doesn't have this issue. We are also going to be adding an audio feature so you can listen while driving.
Built by: @yevbar @toe_ws
I love this idea! I feel instead of working on audio features, polish the summarization model and make it more and more accurate. Summarising Research papers aren't a straightforward task considering the number of nuances that have to be ignored.
Since we are processing such a massive amalgamation of data, some of the summaries are going to be much better than others, the ones that aren't summarized really well are in other languages, or are mainly Chemical compounds/Math calculations that don't summarize easily. Currently working on a script that will parse those types of research papers out of the data set.
Thanks! We are currently running a script we built that scrapes all the 5k+ Research PDF's found here:https://www.reddit.com/r/DataHoarder/comments/exdka0/the_coronavirus_papers_unlocked_5352_scientific/ . It's going to be a good start to maintain data sets and will help train the program.
We liked Audio Features because it gives people the opportunity to digest the information in a familiar way, but most of our time is going to be spent summarizing.
For sure! Tho, as part of translating to the audio feature, we'd need to have better summarizing sentences for that to work ;)
Nice! Who does the summarisation for the papers? In terms of design it would be nice to have a bit more on the card for each paper cos its a bit hard to tell what they are about (specifically rather than the general context) with such short titles. Once i got inside I really saw the value prop so maybe add the first bullet and "+3 more" or something like that! Excited to see this have more content
We summarize using machine learning :).
The algorithm we went with is a “bag of words” approach:
Break down your document into sentences
Remove stop words or any other funky text
Create a scoreboard that maps every word to the number of times it occurs in the document
Then, for every sentence, add up the “score” of each word in that sentence to come up with a ‘sentence score’
Now the highest scoring sentences should be the most “significant” ones and, to generate the summary of a research article, we simply grab the five highest scoring sentences and slap them together!
Thanks for the input, and agreed, we will be sure to add more to each card.
Cool idea!
Thanks!
this is great!
Thanks!
This seems great for people trying to learn more about a field or getting an introduction to a different field. I can see plenty of uses for this. Does your summary take into account the usual structure of a research paper?
Thank You! yes it does :)
.
Thanks for the feedback!
Very cool idea, is the process of summarizing the papers automated?
yes it is! I have a blog post written on it : https://yev.bar/posts/2020-04-11-summaries.html
Thank you! yes it is.
Cool idea!
Thanks!
I've always wanted a tldr for research papers. I think it would be a great service if research could be more accurately understood by more people.
From browsing your MVP so far, I'm not convinced it will be possible to summarize with ML/AI until more fundamental progress is made in those fields. Right now it seems you end up pulling some relevant sentences out of the abstract, but what you end up with is a shorter, but less comprehensible version of the abstract.
However, I think a crowdsourced version of this COULD work. And your audio idea made me think of another variant. Get the authors on a short podcast episode to directly talk about their research to lay people. Another variant would be getting a handful of covid researchers to help read and summarize other covid papers. Just some ideas.
Very useful. Thx for building! Is this live-feeding from current news articles so if I come back in a week it'll have new articles?
Thank you! Actually no, we are scraping the internet for the most recent scientific research and then we are summarizing those articles using ML. The point is to steer people away from the news and giving them access to source material in a way that's easy to understand.
this is nice :) any thoughts on extending using better summarization models?
Thank you! Yes, definitely, however, a lot of the COVID research is filled with Calculations. We are parsing all of the articles and trying to develop a system that identifies the research with the least amount of calculative summaries that the average person doesn't understand.
Great job. This is a really important topic. You might consider entering this Kaggle competition because you're already doing a lot of this work already. https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/tasks
Hey Thanks, good suggestion, we will do that.
This is a really cool idea! I'm curious about the possibility for user submissions or corrections.
Users can upload their own artciles to tldread-admin.herokuapp.com! However, we have a password locked it currently.
We are using TLDRead to target COVID research for now. There is tons of research and most of it is on the ground floor, this creates an exciting but frightening responsibility. Going forward, we firmly believe that TLDRead has the ability to change how the average person gains access to scientific research. The ability to understand source material is one of the most important things to be an informed person. Most research is passed to media representatives that summarize and distribute the information to the public but that can lead to disingenuous summaries, Machine Learning doesn't have this issue. We are also going to be adding an audio feature so you can listen while driving.
TLDRead is still in progress but wouldn't mind any particular feedback about design/styles as that's something I'm currently working on for this project
Great project, and very useful! Would love to input an article I have -- I assume you have this functionality in mind already. The UI is simple and easy to use. One suggestion is when I click on TLDRead in the top left corner, to go back to the main page, I'd like to go to the page with the articles, not the initial landing page. Great work so far!
Hey Edward, we do have that functionality. I just finished building it, we are currently gathering a bunch of COVID research articles as that's the most important use case right now and will make the admin panel public later.